0% found this document useful (0 votes)

87 views6 pages

Assignment 3 Based On Unit 3

This document discusses various concepts related to time series analysis and predictive modeling. It covers decision trees and how they can be used to predict customer purchasing behavior. It also discusses Naive Bayes classification and how it uses Bayes' theorem for probabilistic classification. Finally, it discusses time series analysis, including its components of trend, seasonality, and cyclical patterns. It also outlines the Box-Jenkins methodology for time series analysis and modeling.

Uploaded by

Anu stephie Nadar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views6 pages

Assignment 3 Based On Unit 3

Uploaded by

Anu stephie Nadar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Assignment 3 based on Unit 3

Unit III
1. How to to predict whether customers will buy a product or not? Explain with respect to
decision tree.
i) A Decision Tree is a tree-like graph with nodes representing the place where we pick an
attribute and ask a question; edges represent the answers to the question, and the leaves
represent the actual output or class label.
ii) Figure shows an example of using a decision tree to predict whether customers will buy a
product.

iii) The term branch refers to the outcome of a decision and is visualized as a line connecting
two nodes.
iv) If a decision is numerical, the "greater than" branch is usually placed on the right, and the
"less than" branch is placed on the left.
v) Depending on the nature of the variable, one of the branches may need to include an "equal
to “component.
vi) Internal nodes are the decision or test points. Each internal node refers to an input variable
or an attribute.
vii) The top internal node is called the root. The decision tree in Figure7-1 is a binary tree in
that each internal node has no more than two branches.
viii) The depth of a node is the minimum number of steps required to reach the node from the
root. In Figure 7-1 for example, nodes Income and Age have a depth of one, and the four
nodes on the bottom of the tree have a depth of two.
ix) Leaf nodes are at the end of the last branches on the tree. They represent class labels—the
outcome of all the prior decisions.
x) The path from the root to a leaf node contains a series of decisions made at various internal
nodes.
xi) The decision tree inFigure7-1 shows that females with income less than or equal
to$45,000 and males 40years old or younger are classified as people who would purchase the
product.
xii) In traversing this tree, age does not matter for females, and income does not matter for
males.

2. Explain a probabilistic classification method based on Naive Bayes' theorem.

Hiren Parkar 22306A1031

i) Naive Bayes is a probabilistic classification method based on Bayes' theorem. Bayes'
theorem gives the relationship between the probabilities of two events and their conditional
probabilities.
ii) A naive Bayes classifier assumes that the presence or absence of a particular feature of a
class is unrelated to the presence or absence of other features. For example, an object can
be classified based on its attributes such as shape, colour, and weight.
iii) The input variables are generally categorical, but variations of the algorithm can accept
continuous variables, there are also ways to convert continuous variables into categorical
ones. This process is often referred to as the discretization of continuous variables.
iv) For an attribute such as income, the attribute can be converted into categorical values as
shown below.
• Low Income: income < $10,000
• Working Class: $10,000 < income < $50,000
• Middle Class: $50,000 < income < $1,000,000
• Upper Class: income >$1,000,000
v) The output typically includes a class label and its corresponding probability score. The
probability score is not the true probability of the class label, but it's proportional to the true
probability.
Vi) Application
a) Spam filtering is a classic use case of naive Bayes text classification. Bayesian spam
filtering has become a popular mechanism to distinguish spam e-mail from legitimate e-
mail.
b) Naive Bayes classifiers can also be used for fraud detection. In the domain of auto
insurance, for example, based on a training set with attributes such as driver's rating,
vehicle age, vehicle price, historical claims by the policy holder, police report status, and
claim genuineness, naive Bayes can provide probability- based classification of whether
a new claim is genuine.

vii) The conditional probability of event C occurring, given that event A has already occurred,
is denoted as P(C|A), which can be found using the formula in Equation 5-6.

Equation 5-7 can be obtained with some minor algebra and substitution of the conditional
probability.

Where c is the class label and A is observed attributes

Equation 5-7 is the most common form of the Baye’s theorem.
viii) Mathematically, Bayes’ theorem gives the relationship between the probabilities of C
and A, P(C) and P(A), and the conditional probabilities of C given A and A, given C, namely
P(C/A) and P(A/C)

3. How to model a structure of observations taken over time? Explain with respect to Time
series analysis. Also explain any two of its applications.

Hiren Parkar 22306A1031

i) Time series analysis attempts to model the underlying structure of observations taken
over time, A time series, denoted Y = a + bX , is an ordered sequence of equally spaced
values over time.
ii) For example, Figure 6-1 provides a plot of the monthly number of international airline
passengers over a 12-year period. In this example, the time series consists of an ordered
sequence of 144 values.

iii) Following are the goals of time series analysis:

• Identify and model the structure of the time series.
• Forecast future values in the time series.
iv) Time series analysis has many applications in finance, economics, biology, engineering,
retail, and manufacturing.
1) Retail sales: For various product lines, a clothing retailer is looking to forecast future
monthly sales. These forecasts need to account for the seasonal aspects of the
customer's purchasing decisions.
2) Stock trading: Some high-frequency stock traders utilize a technique called pairs trading.
In pairs trading, an identified strong positive correlation between the prices of two
stocks is used to detect a market opportunity. Suppose the stock prices of Company A
and Company B consistently move together. Time series analysis can be applied to the
difference of these companies' stock prices over time. A statistically larger than expected
price difference indicates that it is a good time to buy the stock of Company A and sell
the stock of Company B, or vice versa.

4. What are the components of time series? Explain each of them. Also write the main
steps of Box-Jenkins methodology for time series analysis.
A time series can consist of the following components:

Hiren Parkar 22306A1031

 Trend (a long period of time)
 Seasonality (within a year)
 Cyclic (a span of more than one year)
 Random
1) Trend –
i)The trend refers to the long-term relatively smooth pattern that persists over number of
years in a time series.
ii)It indicates whether the observation values are increasing or decreasing over time.
iii) Examples of trends are a steady increase in sales month over month, number of airline
passengers, the population, agricultural production, items manufactured, number of births
and deaths, number of industry or any factory, number of schools or colleges etc.
2) Seasonality-
i)The seasonality component describes a pattern appears in a regular interval wherein the
frequency of occurrence is within a year or even shorter .
ii) This variation will be present in a time series if the data are recorded hourly, daily,
weekly, quarterly, or monthly.
iii) For example, monthly retail sales can fluctuate over the year due to the weather and
holidays.
3) Cyclic-
i) A cyclic component also refers to a periodic fluctuation, but beyond a frequency of one
year.
ii)For example, retails sales are influenced by the general state of the economy. Thus, a
retail sales time series can often follow the lengthy boom-bust cycles of the economy.
4) Random-
Although noise is certainly part of this random component, there is often some underlying
structure to this random component that needs to be modelled to forecast future values of
a given time series.

The Box-Jenkins methodology for time series analysis involves the following three main
steps:
1) Condition data and select a model.
a. Identify and account for any trends or seasonality in the time series,
b. Examine the remaining time series and determine a suitable model.
2) Estimate the model parameters.
3) Assess the model and return to Step 1, if necessary.

5. Explain Autoregressive Integrated Moving Average Model in detail.

6. What are major challenges with text analysis? Explain with examples.
i) Text analysis suffers from the curse of high dimensionality.
e.g. If there are 50 distinct words, then its call a book with dimension 50.
The smallest corpus(quantity) in the list, the complete works of Shakespeare, contains about
0.88 million words.

Hiren Parkar 22306A1031

ii) In contrast, the Google n-gram corpus(a collection of written texts,) contains one trillion
words from publicly accessible web pages.
Out of the one trillion words in the Google n-gram corpus, there might be one million distinct
words, which would correspond to one million dimensions.
iii) The high dimensionality of text is an important issue, and it has a direct impact on the
complexities of many text analysis tasks.
iv) Another major challenge with text analysis is that most of the time the text is not
structured.
v) As we know, data may be semi-structured(XML), quasi-structured(data with irregular data
formats that can be formatted with effort, tools, and time) or unstructured data.
vi) Table 9-2 on the next slide shows some example data sources and data formats that text
analysis may have to deal with.

Hiren Parkar 22306A1031

7. What are various text analysis steps? Explain in detail.
A text analysis problem usually consists of three important steps:
• Parsing –
i) Parsing is the process that takes unstructured text and imposes a structure for further
analysis.
ii) The unstructured text could be a plain text file, a weblog, an Extensible Markup Language
(XML) file, a Hyper Text Markup Language (HTML) file, or a Word document.
iii) Parsing deconstructs the provided text and renders it in a more structured way for the
subsequent steps.

• Search and Retrieval –

i) Search and retrieval is the identification of the documents in a corpus that contain search
items such as specific words, phrases, topics, or entities like people or organizations.
ii) These search items are generally called key terms. Search and retrieval originated from
the field of library science and is now used extensively by web search engines.

• Text Mining -
i) Text mining discovers the meaningful insights pertaining to domains or problems of
interest.
ii) With the proper representation of the text, many of the techniques such as clustering and
classification, can be adapted to text mining.
iii) For example, the k-means can be modified to cluster text documents into groups, where
each group represents a collection of documents with a similar topic. The distance of a
document to a centroid represents how closely the document talks about that topic.

8. How to retrieve information and applying text analysis? Explain with respect to Term
Frequency.

Hiren Parkar 22306A1031

Spectral and Big Data
No ratings yet
Spectral and Big Data
61 pages
AIDS HA4 Answers
No ratings yet
AIDS HA4 Answers
8 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
13 pages
Data Mining Overview and Applications
No ratings yet
Data Mining Overview and Applications
6 pages
Reference Papers
No ratings yet
Reference Papers
7 pages
Unit 3 Portal Notes
No ratings yet
Unit 3 Portal Notes
11 pages
MBA Analytics For Finance 11
No ratings yet
MBA Analytics For Finance 11
12 pages
Data Mining Architecture Overview
No ratings yet
Data Mining Architecture Overview
40 pages
Data Science, ML, AI: Key Differences
No ratings yet
Data Science, ML, AI: Key Differences
37 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
Data Mining: Decision Trees Explained
No ratings yet
Data Mining: Decision Trees Explained
8 pages
DW&DM (Unit - 4)
No ratings yet
DW&DM (Unit - 4)
9 pages
Context PDF
No ratings yet
Context PDF
31 pages
Imp Notes Buss Analysis
No ratings yet
Imp Notes Buss Analysis
47 pages
Business Intelligence Unit 5
No ratings yet
Business Intelligence Unit 5
12 pages
Statistics and ML
No ratings yet
Statistics and ML
11 pages
Data Science
No ratings yet
Data Science
32 pages
Big Data Analytics Algorithm, Tools in Systematic Review
No ratings yet
Big Data Analytics Algorithm, Tools in Systematic Review
7 pages
Data Science: Data Governance Guide
No ratings yet
Data Science: Data Governance Guide
44 pages
AAM Unit 2
No ratings yet
AAM Unit 2
17 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Ch-Five Econometrics Normal
No ratings yet
Ch-Five Econometrics Normal
11 pages
IEA 01 Probability & Statastical Method
No ratings yet
IEA 01 Probability & Statastical Method
30 pages
ML Notes
No ratings yet
ML Notes
50 pages
Summary of Chapters 9-12: Chapter 9 Explains The Need of Time Series Analysis in Data Analysis. There Are Several Models
No ratings yet
Summary of Chapters 9-12: Chapter 9 Explains The Need of Time Series Analysis in Data Analysis. There Are Several Models
2 pages
Seminar Report (T9247)
No ratings yet
Seminar Report (T9247)
28 pages
DWH Unit 4
No ratings yet
DWH Unit 4
10 pages
BCA Guide to Data Analytics
No ratings yet
BCA Guide to Data Analytics
56 pages
Approach To Textual Data Analysis
No ratings yet
Approach To Textual Data Analysis
11 pages
Business Analytics Essentials
No ratings yet
Business Analytics Essentials
18 pages
Citation 28 - On-The-Combination-Of-Naive-Bayes-And-Decision-Trees-For-Intrusi
No ratings yet
Citation 28 - On-The-Combination-Of-Naive-Bayes-And-Decision-Trees-For-Intrusi
6 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Quantitative Techniques Quick Notes
No ratings yet
Quantitative Techniques Quick Notes
9 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
23 SBE11e PPT Ch18a
No ratings yet
23 SBE11e PPT Ch18a
47 pages
Assignment Part A
No ratings yet
Assignment Part A
7 pages
Fcthgchgtbelow
No ratings yet
Fcthgchgtbelow
6 pages
Can You Convert Into PDF or Word File
No ratings yet
Can You Convert Into PDF or Word File
4 pages
Dav Cia 2
No ratings yet
Dav Cia 2
6 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
Data Mining and Other Analogous Disciplines
No ratings yet
Data Mining and Other Analogous Disciplines
4 pages
HW1
No ratings yet
HW1
4 pages
DAV Module 3
No ratings yet
DAV Module 3
19 pages
Module 3
No ratings yet
Module 3
7 pages
Forecasting Techniques Guide
No ratings yet
Forecasting Techniques Guide
15 pages
Datascience Interview
100% (1)
Datascience Interview
31 pages
Time Series Analysis and Spectral Analysis
No ratings yet
Time Series Analysis and Spectral Analysis
11 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
18 pages
DAV Solution
No ratings yet
DAV Solution
22 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
20 pages
Audine - Laurence Rue - MME - 321 - Practical - Exercises No - 1
No ratings yet
Audine - Laurence Rue - MME - 321 - Practical - Exercises No - 1
12 pages
UNIVAR4
No ratings yet
UNIVAR4
56 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Module 6
No ratings yet
Module 6
82 pages
Raymond Retail Management Analysis
No ratings yet
Raymond Retail Management Analysis
22 pages
Digital Image Processing Practicals
No ratings yet
Digital Image Processing Practicals
3 pages
Subtraction Without Borrowing CCE
No ratings yet
Subtraction Without Borrowing CCE
1 page
Social Factors
No ratings yet
Social Factors
1 page
Violence Against Women in India
No ratings yet
Violence Against Women in India
16 pages
The Two Merchants of Seri Notes
No ratings yet
The Two Merchants of Seri Notes
3 pages
Java EE 7 For Beginners
No ratings yet
Java EE 7 For Beginners
1,283 pages
Economies and Diseconomies of Scale Explained
No ratings yet
Economies and Diseconomies of Scale Explained
8 pages
Retail Management Study: McDonald's
No ratings yet
Retail Management Study: McDonald's
68 pages
Lateral Thinking in Money-Lender Tale
No ratings yet
Lateral Thinking in Money-Lender Tale
3 pages
Managing Age Diversity in the Workplace
No ratings yet
Managing Age Diversity in the Workplace
1 page
Understanding Popular Culture Dynamics
No ratings yet
Understanding Popular Culture Dynamics
19 pages
Minnowbrook Perspective on Public Admin
No ratings yet
Minnowbrook Perspective on Public Admin
23 pages
Dessalegn Tolesa
No ratings yet
Dessalegn Tolesa
72 pages
Application for Teaching Position
No ratings yet
Application for Teaching Position
4 pages
Intro to Political Science Syllabus
No ratings yet
Intro to Political Science Syllabus
4 pages
Mcqs
No ratings yet
Mcqs
3 pages
SOC 1305: Introduction to Sociology Guide
No ratings yet
SOC 1305: Introduction to Sociology Guide
8 pages
Communication Education: Based Approach To The
No ratings yet
Communication Education: Based Approach To The
19 pages
Nutrition Status:: 1.direct Method
No ratings yet
Nutrition Status:: 1.direct Method
8 pages
Factors Influencing Art Course Choices
No ratings yet
Factors Influencing Art Course Choices
23 pages
Learning References
No ratings yet
Learning References
13 pages
Effective Global Communication Skills
No ratings yet
Effective Global Communication Skills
6 pages
Geophysics Contractors Directory
No ratings yet
Geophysics Contractors Directory
11 pages
Research1 q2 Mod3 Types of Research v3
100% (1)
Research1 q2 Mod3 Types of Research v3
30 pages
Chapter 13 - Prejudice-1
No ratings yet
Chapter 13 - Prejudice-1
6 pages
Parts of A Research Chapter 1
No ratings yet
Parts of A Research Chapter 1
11 pages
Chapter 1 - Information Theory
No ratings yet
Chapter 1 - Information Theory
55 pages
KCET Mockresults Cutoff
No ratings yet
KCET Mockresults Cutoff
37 pages
2002 Borchert M
No ratings yet
2002 Borchert M
78 pages
Narrative Therapy
No ratings yet
Narrative Therapy
6 pages
Marshall Cavendish Primary Math Series
0% (1)
Marshall Cavendish Primary Math Series
10 pages
MCQ Psychology - Learning
No ratings yet
MCQ Psychology - Learning
2 pages
Agencies of Education.
100% (1)
Agencies of Education.
4 pages
Territories - and - Identities Valentin Mihailov 2014
No ratings yet
Territories - and - Identities Valentin Mihailov 2014
364 pages
Challenges & Opportunities in OB
No ratings yet
Challenges & Opportunities in OB
5 pages
Understanding Genre Approaches
No ratings yet
Understanding Genre Approaches
37 pages
Structural Analysis of Complex Networks 2011
No ratings yet
Structural Analysis of Complex Networks 2011
501 pages
Concept of Bureaucracy, Characteristics and Its Advantages-Disadvantages.
No ratings yet
Concept of Bureaucracy, Characteristics and Its Advantages-Disadvantages.
13 pages

Assignment 3 Based On Unit 3

Uploaded by

Assignment 3 Based On Unit 3

Uploaded by

Assignment 3 based on Unit 3

2. Explain a probabilistic classification method based on Naive Bayes' theorem.

Hiren Parkar 22306A1031

Where c is the class label and A is observed attributes

Hiren Parkar 22306A1031

iii) Following are the goals of time series analysis:

Hiren Parkar 22306A1031

5. Explain Autoregressive Integrated Moving Average Model in detail.

Hiren Parkar 22306A1031

Hiren Parkar 22306A1031

• Search and Retrieval –

Hiren Parkar 22306A1031

You might also like