0% found this document useful (0 votes)

20 views15 pages

Big Data Part-I

The document covers various concepts related to big data, statistics, and machine learning, including definitions of population, data types in R, and types of analytics. It explains key topics such as probability, correlation, decision trees, and support vector machines, along with their applications and methodologies. Additionally, it discusses data manipulation functions and visualization techniques, emphasizing the importance of data analysis in extracting insights.

Uploaded by

Harshad chandanshive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views15 pages

Big Data Part-I

Uploaded by

Harshad chandanshive

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

SET NUMBER:-1 [Big Data]

Q1]
a) Population: In statistics, a population refers to the entire set of individuals or items that are
of interest in a particular study. This can be a complete collection of data from which a sample
may be drawn.
b) Operators in R: Operators in R are symbols or functions that perform operations on data.
They can be classified into arithmetic operators (e.g., +, -, *, /), relational operators (e.g., ==, >,
<), logical operators (e.g., &, |, !), and assignment operators (e.g., <-).
c) Array in R: An array in R is a data structure that can hold data in more than two dimensions.
It is a multi-dimensional generalization of vectors and matrices, where data is organized in
rows, columns, and potentially more dimensions.
d) Sample: A sample is a subset of a population, selected to represent the larger group. It is
used in statistics to make inferences about the population based on the sample data.
e) Machine Learning: Machine learning is a subset of artificial intelligence that focuses on the
development of algorithms and statistical models that allow computers to learn from and
make predictions or decisions based on data.
f) Data Frame: A data frame in R is a two-dimensional, tabular data structure that can hold
different types of variables (e.g., numeric, character) in columns, similar to a spreadsheet or
SQL table.
g) Market Basket Analysis: Market basket analysis is a data mining technique used to
understand the purchase behavior of customers by analyzing co-occurrences of items in
transactions. It helps identify patterns and associations between products.
h) Data Analytics: Data analytics is the process of examining datasets to draw conclusions
about the information they contain, often using specialized systems and software. It involves
various techniques to analyze data, derive insights, and support decision-making.
i) head() and tail(): In R, head() is a function that returns the first few rows of a data frame or
vector, while tail() returns the last few rows. By default, both functions return six rows, but this
can be adjusted.
j) Data Types in R: Common data types in R include:
• Numeric: For numbers (e.g., integers, floats).
• Character: For text strings.
• Logical: For TRUE/FALSE values.
• Factor: For categorical data with levels.
• List: A collection of objects that can be of different types.
• Data frame: A table-like structure for storing data.

Q2]
a) Explain Probability in detail
Probability is a branch of mathematics that deals with the likelihood of different outcomes. It
quantifies uncertainty and is fundamental to statistics, helping to predict future events based
on known information.
Key Concepts:
• Experiment: An action or process that leads to one or more outcomes (e.g., rolling a
die).
• Sample Space (S): The set of all possible outcomes of an experiment (e.g., for a die, S
= {1, 2, 3, 4, 5, 6}).
• Event (E): A specific outcome or a set of outcomes (e.g., rolling an even number: E =
{2, 4, 6}).
• Probability of an Event (P(E)): The measure of the likelihood that an event will occur,
calculated as:
P(E)=Number of favorable outcomesTotal number of possible outcomesP(E) =
\frac{\text{Number of favorable outcomes}}{\text{Total number of possible
outcomes}}P(E)=Total number of possible outcomesNumber of favorable outcomes
Types of Probability:
1. Theoretical Probability: Based on the reasoning behind probability (e.g., flipping a fair
coin).
2. Empirical Probability: Based on observations or experiments (e.g., how often an event
occurs in real life).
3. Subjective Probability: Based on personal judgment or experience rather than exact
calculations.

b) Explain The Types of Analytics

Analytics can be broadly categorized into four main types:
1. Descriptive Analytics: Focuses on summarizing historical data to identify patterns
and trends. It answers the question, "What happened?" Techniques include data
aggregation and mining.
2. Diagnostic Analytics: Explores data to understand why something happened. It
answers questions like, "Why did it happen?" This often involves statistical analysis
and data visualization.
3. Predictive Analytics: Uses historical data and statistical algorithms to forecast future
outcomes. It answers "What is likely to happen?" Common techniques include
regression analysis and machine learning models.
4. Prescriptive Analytics: Recommends actions based on data analysis. It answers
"What should be done?" This often uses optimization and simulation techniques to
suggest the best course of action.

c) Explain Correlation with its types

Correlation measures the relationship between two or more variables, indicating how one
variable may change in relation to another.
Types of Correlation:
1. Positive Correlation: As one variable increases, the other variable also tends to
increase (e.g., height and weight).
2. Negative Correlation: As one variable increases, the other variable tends to decrease
(e.g., the number of hours studied and the number of mistakes made).
3. No Correlation: No apparent relationship between the variables (e.g., shoe size and
intelligence).
Correlation Coefficient: This is a numerical value (ranging from -1 to 1) that quantifies the
strength and direction of a relationship:
• 1: Perfect positive correlation
• -1: Perfect negative correlation
• 0: No correlation

d) Explain the Application of Big Data

Big data refers to extremely large datasets that can be analyzed to reveal patterns, trends, and
associations, especially relating to human behavior and interactions. Its applications include:
1. Healthcare: Analyzing patient data for better diagnosis, treatment plans, and
predicting outbreaks.
2. Finance: Fraud detection, risk management, and algorithmic trading.
3. Retail: Personalized marketing, inventory management, and customer experience
enhancement.
4. Transportation: Optimizing routes, traffic management, and predictive maintenance
for vehicles.
5. Manufacturing: Improving supply chain efficiency, predictive maintenance, and quality
control.
e) Explain Machine Learning
Machine learning (ML) is a subset of artificial intelligence that enables systems to learn from
data, identify patterns, and make decisions with minimal human intervention.
Key Concepts:
1. Supervised Learning: The model is trained on labeled data (input-output pairs). The
algorithm learns to predict outcomes based on the input data (e.g., regression,
classification).
2. Unsupervised Learning: The model is trained on unlabeled data and tries to find
patterns or groupings (e.g., clustering, dimensionality reduction).
3. Reinforcement Learning: The model learns by interacting with its environment and
receiving feedback in the form of rewards or penalties (e.g., training robots or playing
games).
Applications:
• Natural Language Processing: Chatbots, translation services.
• Image Recognition: Facial recognition, object detection.
• Recommendation Systems: Product recommendations on e-commerce sites.
• Predictive Analytics: Forecasting sales or customer behavior.

Q3]
a) How Naive Bayes Algorithm works
Naive Bayes is a probabilistic classifier based on Bayes' theorem, which assumes
independence between features. It is particularly effective for large datasets and text
classification.
How it works:
1. Bayes’ Theorem: It calculates the probability of a class based on prior knowledge of
conditions related to the data.

P(C∣X)=P(X∣C)⋅P(C)P(X)P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)⋅P(C)

where:

o P(C∣X)P(C|X)P(C∣X) is the posterior probability of class CCC given feature XXX.

o P(X∣C)P(X|C)P(X∣C) is the likelihood of feature XXX given class CCC.
o P(C)P(C)P(C) is the prior probability of class CCC.
o P(X)P(X)P(X) is the evidence.
2. Independence Assumption: It assumes that the presence of a particular feature in a
class is unrelated to the presence of any other feature.
3. Classification: The algorithm computes the posterior probabilities for all classes and
assigns the class with the highest probability.
b) Explain Decision Tree with example
A decision tree is a flowchart-like structure where each internal node represents a test on a
feature, each branch represents the outcome of the test, and each leaf node represents a
class label.
Example: Consider a dataset for predicting whether someone will play outside based on
weather conditions. The features could be "Outlook" (Sunny, Overcast, Rain), "Temperature"
(Hot, Mild, Cool), and "Humidity" (High, Normal).
• Root Node: "Outlook"
o Sunny → Check "Humidity"
▪ High → No
▪ Normal → Yes
o Overcast → Yes
o Rain → Check "Temperature"
▪ Hot → No
▪ Mild → Yes
▪ Cool → Yes
The tree is used to make predictions by traversing from the root to a leaf node based on feature
values.

c) Explain Support Vector Machine (SVM) with example

SVM is a supervised learning algorithm used for classification and regression. It works by
finding the optimal hyperplane that separates data points of different classes in a high-
dimensional space.
Example: Imagine a dataset with two features, and two classes: Class A (blue) and Class B
(red). The SVM algorithm will find a hyperplane that maximizes the margin between the
closest points (support vectors) of each class.
• If the data is linearly separable, SVM will create a linear hyperplane.
• If not, SVM can use kernel tricks (like polynomial or radial basis function kernels) to
transform the data into a higher dimension where it becomes separable.

d) Explain Digital Data with its types

Digital data refers to information represented in binary format (0s and 1s) that can be
processed by computers.
Types of Digital Data:
1. Structured Data: Highly organized, easily searchable data (e.g., databases,
spreadsheets).
2. Unstructured Data: Data that lacks a predefined structure (e.g., text files, images,
videos).
3. Semi-structured Data: Contains elements of both structured and unstructured data
(e.g., XML, JSON).
e) Explain Association Rule Mining
Association rule mining is a technique used to discover interesting relationships between
variables in large datasets. It's commonly used in market basket analysis to find sets of
products that frequently co-occur in transactions.
Key Concepts:
• Support: The frequency of the itemset appearing in the dataset.
• Confidence: The likelihood that a rule holds true for a given itemset.
• Lift: Measures how much more likely the itemset is to occur together compared to
random chance.
Example: If 100 transactions include bread and 80 transactions include butter, and 60
transactions include both, an association rule could be:
• If a customer buys bread, they are likely to buy butter.
o Support = 60100=0.6\frac{60}{100} = 0.610060=0.6
o Confidence = 6080=0.75\frac{60}{80} = 0.758060=0.75
This means there is a 75% chance that customers who buy bread will also buy butter.

Q5]
a) Data Manipulation Functions
Data manipulation functions are essential for cleaning, transforming, and analyzing datasets in
programming environments like R. Common functions include:
• filter(): Used to subset rows based on certain conditions.
• select(): Allows you to choose specific columns from a dataset.
• mutate(): Creates or modifies existing columns with new calculations or
transformations.
• arrange(): Sorts the data by specified columns.
• summarize(): Computes summary statistics for specified groups of data.
These functions are often part of the dplyr package, which is widely used for data
manipulation in R.

b) Any 5 Types of Data Visualization

1. Bar Chart: Displays categorical data with rectangular bars, showing the frequency or
value of each category.
2. Line Graph: Used to visualize data trends over time, with points connected by lines.
3. Scatter Plot: Shows the relationship between two continuous variables using dots
plotted on a Cartesian plane.
4. Histogram: Represents the distribution of numerical data by dividing it into bins and
showing the frequency of data points in each bin.
5. Box Plot: Summarizes data distributions through their quartiles, highlighting the
median, range, and potential outliers.
c) Loops in R
Loops are control structures that allow repetitive execution of code blocks. Common types
include:
for Loop: Iterates over a sequence or vector. Example:
for (i in 1:5) {
print(i)
}

while Loop: Continues to execute as long as a specified condition is true. Example:

count <- 1
while (count <= 5) {
print(count)
count <- count + 1
}
repeat Loop: Repeats indefinitely until a break statement is encountered. Example:
repeat {
print("Hello")
break}
SET NUMBER:-2 [Big Data]
Q1]
a) Big Data: Big data refers to extremely large datasets that cannot be easily managed,
processed, or analyzed using traditional data processing tools. It typically involves high
volume, velocity, and variety, often requiring specialized technologies for storage and analysis.
b) Data Manipulation: Data manipulation involves transforming, cleaning, or organizing data
to prepare it for analysis. This can include operations like filtering, sorting, aggregating, or
merging datasets.
c) Data Science: Data science is an interdisciplinary field that combines statistics,
mathematics, computer science, and domain expertise to extract insights and knowledge
from data. It encompasses various processes including data collection, analysis, and
visualization.
d) Statistical Inference: Statistical inference is the process of drawing conclusions about a
population based on a sample of data from that population. It involves using statistical
methods to estimate parameters, test hypotheses, and make predictions.
e) Stages of Data Science:
1. Data Collection
2. Data Cleaning and Preparation
3. Data Exploration and Analysis
4. Data Modeling
5. Model Evaluation
6. Deployment and Monitoring
7. Communication of Results
f) Machine Learning: Machine learning is a subset of artificial intelligence that enables
systems to learn from data, identify patterns, and make decisions with minimal human
intervention. It uses algorithms to analyze data, learn from it, and improve over time.
g) Support Vector Machine (SVM): SVM is a supervised machine learning algorithm used for
classification and regression tasks. It works by finding the hyperplane that best separates the
data points of different classes in a high-dimensional space.
h) Use of Histogram: A histogram is a graphical representation of the distribution of numerical
data. It displays the frequency of data points within specified ranges (bins), helping to visualize
the shape, central tendency, and variability of the data.
i) Data Analysis: Data analysis is the systematic examination of data to extract meaningful
insights, identify patterns, and support decision-making. It can involve various techniques,
including statistical analysis, data visualization, and exploratory data analysis.
j) Use of Themes: In data analysis and visualization, themes refer to the underlying patterns or
topics that emerge from the data. Identifying themes helps to organize and interpret data,
making it easier to communicate findings and insights to stakeholders.

Q2]
a) Explain different Types of Data Analytics
1. Descriptive Analytics:
o Definition: Focuses on summarizing historical data to identify trends and
patterns.
o Examples: Dashboards, reports, and data visualization tools.
2. Diagnostic Analytics:
o Definition: Investigates past data to understand why something happened.
o Examples: Root cause analysis, correlation analysis.
3. Predictive Analytics:
o Definition: Uses statistical models and machine learning techniques to
forecast future outcomes based on historical data.
o Examples: Sales forecasting, risk assessment.
4. Prescriptive Analytics:
o Definition: Recommends actions based on predictive insights to achieve
desired outcomes.
o Examples: Optimization models, simulation.
b) Advantages and Disadvantages of Machine Learning
Advantages:
1. Automation: Reduces manual intervention in data analysis.
2. Scalability: Can handle vast amounts of data efficiently.
3. Predictive Power: Provides accurate forecasts and insights.
4. Adaptability: Learns and improves from new data over time.
Disadvantages:
1. Data Dependence: Requires large, high-quality datasets for effective training.
2. Complexity: Algorithms can be difficult to interpret (black-box issue).
3. Overfitting: Models may perform well on training data but poorly on unseen data.
4. Resource Intensive: May require significant computational power and time.
c) Explain the Process of Data Analysis
1. Define Objectives: Clearly outline what you want to achieve.
2. Data Collection: Gather relevant data from various sources.
3. Data Cleaning: Preprocess the data to remove inaccuracies and inconsistencies.
4. Exploratory Data Analysis (EDA): Analyze data to uncover patterns and insights.
5. Data Modeling: Apply statistical or machine learning models to the data.
6. Interpret Results: Analyze the output of models to derive insights.
7. Communicate Findings: Present results in a clear and actionable manner.
8. Implement Decisions: Use insights to inform business or operational strategies.
d) Explain Probability Distribution Modeling
Definition: Probability distribution modeling is a statistical approach used to describe how
values of a random variable are distributed. It provides insights into the likelihood of different
outcomes.
Types of Probability Distributions:
1. Normal Distribution: Symmetrical distribution characterized by its mean and
standard deviation; used in many natural phenomena.
2. Binomial Distribution: Models the number of successes in a fixed number of trials;
used in scenarios with two possible outcomes.
3. Poisson Distribution: Models the number of events occurring in a fixed interval; useful
for counting occurrences.
4. Exponential Distribution: Describes the time between events in a Poisson process;
applicable in reliability analysis.
Applications: Used in risk assessment, quality control, and decision-making processes.

e) Explain Applications of Big Data

1. Healthcare: Analyzing patient data for personalized treatment plans and predicting
disease outbreaks.
2. Finance: Fraud detection, risk management, and customer segmentation.
3. Retail: Optimizing inventory, personalized marketing, and enhancing customer
experiences.
4. Manufacturing: Predictive maintenance, supply chain optimization, and quality
control.
5. Telecommunications: Network optimization, customer churn prediction, and service
personalization.
6. Social Media: Sentiment analysis, trend tracking, and user engagement optimization.
These applications demonstrate how big data can drive innovation and efficiency across
various sectors.

Q3]
a) Advantages and Disadvantages of SVM (Support Vector Machine)
Advantages:
1. Effective in High Dimensions: SVM is effective in high-dimensional spaces and is still
effective when the number of dimensions exceeds the number of samples.
2. Versatility: Can be used for both classification and regression tasks.
3. Robust to Overfitting: Particularly in high-dimensional space, SVM can be robust to
overfitting due to its use of margins.
4. Clear Margin of Separation: Works well when there is a clear margin of separation
between classes.
Disadvantages:
1. Computationally Intensive: SVM can be slow to train, especially with large datasets.
2. Memory Consumption: Requires significant memory, making it less suitable for very
large datasets.
3. Choice of Kernel: Performance depends heavily on the choice of the kernel and its
parameters.
4. Difficult to Interpret: The model can be difficult to interpret, especially with non-linear
kernels.

b)Explain Data Frame with Example

A data frame is a two-dimensional, table-like structure used in data analysis, primarily in R
and Python's Pandas. It can contain different types of variables (numeric, character, etc.) and
is similar to a spreadsheet.
# Create a data frame
data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Score = c(90.5, 85.0, 88.0)
)

# Display the data frame

print(data)
This creates a data frame with three columns: Name, Age, and Score, with three rows of data.

c)Explain Types of Regression Models

1. Linear Regression:
o Definition: Models the relationship between a dependent variable and one or
more independent variables using a linear equation.
o Use Case: Predicting sales based on advertising spend.
2. Multiple Regression:
o Definition: Extends linear regression to include multiple independent
variables.
o Use Case: Predicting house prices based on various features like size, location,
and age.
3. Polynomial Regression:
o Definition: Models the relationship using a polynomial equation, allowing for
non-linear relationships.
o Use Case: Modeling growth trends that are not linear.
4. Logistic Regression:
o Definition: Used for binary classification tasks; models the probability of a
binary outcome.
o Use Case: Predicting whether a customer will buy a product (yes/no).
5. Ridge and Lasso Regression:
o Definition: Regularization techniques to prevent overfitting in linear models by
adding penalties to the loss function.
o Use Case: Used in scenarios with many predictors.

d)What is Histogram with Example in R

A histogram is a graphical representation of the distribution of numerical data, showing the
frequency of data points within specified ranges (bins).
# Create a vector of data
data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5)

# Create a histogram
hist(data, main = "Histogram of Data", xlab = "Values", ylab = "Frequency", col = "blue", border
= "black")
This code creates a histogram showing how frequently each value appears in the data set.

e)Explain Functions Included in the “dplyr” Package

The dplyr package in R provides a set of functions designed to manipulate data frames
efficiently. Here are some key functions:
1. filter(): Selects rows based on specific conditions.
o Example: filter(data, Age > 30)
2. select(): Chooses specific columns from a data frame.
o Example: select(data, Name, Score)
3. mutate(): Adds new variables or modifies existing ones.
o Example: mutate(data, Score = Score + 5)
4. summarise(): Reduces data to summary statistics.
o Example: summarise(data, AverageScore = mean(Score))
5. arrange(): Reorders rows based on specified columns.
o Example: arrange(data, desc(Age))
6. group_by(): Groups data by one or more variables for further analysis.
o Example: group_by(data, Age)
o
7. join() functions: Merge two data frames based on a common column.
o Examples: inner_join(), left_join(), right_join(), full_join()
These functions make data manipulation in R more intuitive and efficient.

Q5]
a) Tools Used in Big Data
1. Apache Hadoop: An open-source framework that enables distributed storage and
processing of large data sets across clusters of computers.
2. Apache Spark: A fast, in-memory data processing engine with elegant and expressive
development APIs for big data applications.
3. NoSQL Databases: Tools like MongoDB, Cassandra, and HBase that handle
unstructured data and provide high scalability and performance.
4. Apache Kafka: A distributed event streaming platform for building real-time data
pipelines and streaming applications.
5. Tableau: A data visualization tool that helps in creating interactive and shareable
dashboards.
6. Apache Flink: A stream processing framework that allows for stateful computations
over data streams.
b) Advantages of Big Data
1. Informed Decision-Making: Analyzing large data sets helps organizations make data-
driven decisions.
2. Customer Insights: Businesses can gain a deeper understanding of customer
behaviors and preferences.
3. Operational Efficiency: Improved data analytics can lead to more efficient operations
and cost reductions.
4. Predictive Analytics: Organizations can forecast trends and behaviors, enhancing
strategic planning.
5. Competitive Advantage: Leveraging big data can provide a significant edge over
competitors who do not utilize such insights.
c) Advantages and Disadvantages of EM Algorithms
Advantages:
1. Flexibility: EM algorithms can handle incomplete data and are applicable to various
statistical models.
2. Efficiency: They are computationally efficient for parameter estimation in large
datasets.
3. Robustness: EM can converge to a local maximum, making it robust for many practical
applications.
Disadvantages:
1. Local Optima: The algorithm may converge to a local maximum rather than the global
maximum, affecting the quality of results.
2. Sensitivity to Initialization: Results can vary significantly based on the initial
parameter estimates.
3. Convergence Issues: In some cases, the algorithm may take a long time to converge
or may not converge at all.
This overview highlights the critical aspects of tools and advantages associated with big data,
along with a balanced view of EM algorithms.

Big Data
No ratings yet
Big Data
5 pages
Big Data Imp Notes of Big Dats
No ratings yet
Big Data Imp Notes of Big Dats
17 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
APR2019
No ratings yet
APR2019
11 pages
Data Science
No ratings yet
Data Science
24 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Statistics For Applied Science 200l
No ratings yet
Statistics For Applied Science 200l
122 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
Rohan More
No ratings yet
Rohan More
16 pages
Descriptive Analytics in Business Decisions
No ratings yet
Descriptive Analytics in Business Decisions
30 pages
Untitled Document
No ratings yet
Untitled Document
8 pages
150+ Detailed Mathematics Questions and Answers
No ratings yet
150+ Detailed Mathematics Questions and Answers
7 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
PGP-AIML Curriculum - Great Lakes
No ratings yet
PGP-AIML Curriculum - Great Lakes
43 pages
R Companion Data Mining
No ratings yet
R Companion Data Mining
370 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
7 pages
PavicJakov WEKA
No ratings yet
PavicJakov WEKA
40 pages
AIML
No ratings yet
AIML
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Live Classroom 2
No ratings yet
Live Classroom 2
40 pages
Important Tems
No ratings yet
Important Tems
61 pages
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
No ratings yet
Anintroductiontomachinelearning: Michaelclark Centerforsocialresearch Universityofnotredame
43 pages
Unit3 Datamining
No ratings yet
Unit3 Datamining
5 pages
Data Science Practitioner Guide
No ratings yet
Data Science Practitioner Guide
403 pages
Statistics For Machine Learning Part 01 1719342613
No ratings yet
Statistics For Machine Learning Part 01 1719342613
27 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
Ai Syllabus
No ratings yet
Ai Syllabus
7 pages
FDS Lecture Notes 2024 01 28
No ratings yet
FDS Lecture Notes 2024 01 28
217 pages
Big Data Essentials & Challenges
No ratings yet
Big Data Essentials & Challenges
71 pages
Unit 3
No ratings yet
Unit 3
97 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
Machine Learning Foundations
No ratings yet
Machine Learning Foundations
119 pages
Practical Machine Learning R
90% (10)
Practical Machine Learning R
149 pages
BigData QB (C.format)
No ratings yet
BigData QB (C.format)
6 pages
ML Unit1
No ratings yet
ML Unit1
15 pages
Analytics Boot Camp
No ratings yet
Analytics Boot Camp
126 pages
Data Science Dse
No ratings yet
Data Science Dse
24 pages
ML/Data Science Interview Cheat Sheet
No ratings yet
ML/Data Science Interview Cheat Sheet
17 pages
Python Data Science Essentials
No ratings yet
Python Data Science Essentials
11 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
No ratings yet
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
325 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
ML Interview Cheat Sheets Overview
No ratings yet
ML Interview Cheat Sheets Overview
18 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
4 pages
Machine Learning Techniques Explained
100% (1)
Machine Learning Techniques Explained
12 pages
Notes
No ratings yet
Notes
18 pages
Imp Notes Buss Analysis
No ratings yet
Imp Notes Buss Analysis
47 pages
Business Data Analytics Part 4
No ratings yet
Business Data Analytics Part 4
52 pages
Mastering Predictive Analytics With R - Sample Chapter
No ratings yet
Mastering Predictive Analytics With R - Sample Chapter
57 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Introduction To Data Mining 1
No ratings yet
Introduction To Data Mining 1
23 pages
HRM Imp Que Answers
No ratings yet
HRM Imp Que Answers
14 pages
Asian College Attendance Log
No ratings yet
Asian College Attendance Log
4 pages
Business Mathematics Overview
No ratings yet
Business Mathematics Overview
8 pages
Assignmet1 (SYBCA)
No ratings yet
Assignmet1 (SYBCA)
11 pages
Sybba (CA) Timetable
No ratings yet
Sybba (CA) Timetable
2 pages
The Impact of Small Scale Business On The Economy Development in Ilorin
No ratings yet
The Impact of Small Scale Business On The Economy Development in Ilorin
15 pages
Dinesh SAP EC CMP v1
No ratings yet
Dinesh SAP EC CMP v1
9 pages
Mixed Methods Research Guide
No ratings yet
Mixed Methods Research Guide
32 pages
Financial Fraud: Detection and Prevention
No ratings yet
Financial Fraud: Detection and Prevention
58 pages
Basics On Block Access
No ratings yet
Basics On Block Access
2 pages
Databaser
No ratings yet
Databaser
137 pages
Creating XML with HJoin and Composer Steps
No ratings yet
Creating XML with HJoin and Composer Steps
7 pages
LIAM
No ratings yet
LIAM
27 pages
OpenClinica Data Import & Export Guide
No ratings yet
OpenClinica Data Import & Export Guide
41 pages
SQL Alias:: To Make Selected Columns More Readable.
No ratings yet
SQL Alias:: To Make Selected Columns More Readable.
17 pages
Applied AI & Data Science Program Overview
No ratings yet
Applied AI & Data Science Program Overview
25 pages
MBA Report: ICICI Bank Analysis
No ratings yet
MBA Report: ICICI Bank Analysis
65 pages
Flutter sqflite Database Guide
No ratings yet
Flutter sqflite Database Guide
6 pages
CAC vs. Solo Coding: Key Differences
No ratings yet
CAC vs. Solo Coding: Key Differences
2 pages
Memory Management Techniques Overview
No ratings yet
Memory Management Techniques Overview
81 pages
Networker 9.x - NMDA Administration Guide
No ratings yet
Networker 9.x - NMDA Administration Guide
504 pages
MIS: Uses and Challenges in Organizations
No ratings yet
MIS: Uses and Challenges in Organizations
3 pages
Research Methodology Guide
No ratings yet
Research Methodology Guide
27 pages
Mounika Reddy 596709889
No ratings yet
Mounika Reddy 596709889
4 pages
Thesis Topics in Computer Vision
100% (2)
Thesis Topics in Computer Vision
6 pages
Course 2 Excel Basics For Data Analysis
No ratings yet
Course 2 Excel Basics For Data Analysis
46 pages
Delta Ia-Plc Dvppf01-s I Tse 20140707
No ratings yet
Delta Ia-Plc Dvppf01-s I Tse 20140707
2 pages
The Utah Lifting Index An Exploration of Low Back
No ratings yet
The Utah Lifting Index An Exploration of Low Back
9 pages
Hris PDF
No ratings yet
Hris PDF
1,210 pages
A Study On Customer Satisfaction in Hyundai Motors With Special Reference From Jains Hyundai, Tiruvannamalai
No ratings yet
A Study On Customer Satisfaction in Hyundai Motors With Special Reference From Jains Hyundai, Tiruvannamalai
42 pages
Statistics Assignment Guide
No ratings yet
Statistics Assignment Guide
2 pages
Corporate Governance in Airbus
No ratings yet
Corporate Governance in Airbus
9 pages
Impact of A Broken Family To School Aged Children
100% (2)
Impact of A Broken Family To School Aged Children
10 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
16 pages
Module 4 - Entity Relationship (ER) Modeling
No ratings yet
Module 4 - Entity Relationship (ER) Modeling
13 pages

Big Data Part-I

Uploaded by

Big Data Part-I

Uploaded by

SET NUMBER:-1 [Big Data]

b) Explain The Types of Analytics

c) Explain Correlation with its types

d) Explain the Application of Big Data

P(C∣X)=P(X∣C)⋅P(C)P(X)P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)⋅P(C)

o P(C∣X)P(C|X)P(C∣X) is the posterior probability of class CCC given feature XXX.

c) Explain Support Vector Machine (SVM) with example

d) Explain Digital Data with its types

b) Any 5 Types of Data Visualization

while Loop: Continues to execute as long as a specified condition is true. Example:

e) Explain Applications of Big Data

b)Explain Data Frame with Example

# Display the data frame

c)Explain Types of Regression Models

d)What is Histogram with Example in R

e)Explain Functions Included in the “dplyr” Package

You might also like