FAI Notes - Unit 5

This document provides an overview of data analysis and processing, emphasizing the importance of data analysis and visualization in uncovering insights for decision-making. It covers key concepts such as data collection, cleaning, transformation, exploration, and modeling, as well as various types of data and techniques for handling missing values. Additionally, it discusses the significance of data visualization and the tools used for effective analysis and representation of data.

Uploaded by

ahesan.agk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views12 pages

FAI Notes - Unit 5

Uploaded by

ahesan.agk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

FUNDAMENTALS OF AI

UNIT-5
DATA ANALYSIS AND PROCESSING
5.1. Introduction to Data Analysis and Visualization
 Data analysis and visualization are essential components of the field of data science. They involve the
exploration, interpretation, and presentation of data to uncover meaningful insights and facilitate better
decision-making. In this introduction, we'll cover the basic concepts and techniques used in data
analysis and visualization.
 Data Analysis: Data analysis refers to the process of inspecting, transforming, and modeling data to
discover useful information, draw conclusions, and support decision-making. It involves several steps,
including data collection, data cleaning, data transformation, data exploration, and data modeling.
o Data Collection: Gathering relevant data from various sources, such as databases, surveys,
APIs, or web scraping.
o Data Cleaning: Preparing the data for analysis by addressing missing values, handling outliers,
resolving inconsistencies, and standardizing the format.
o Data Transformation: Converting the data into a suitable format for analysis, which may
involve reshaping the data, aggregating it, or creating new variables.
o Data Exploration: Exploring the data to gain an understanding of its characteristics,
relationships, and patterns. This can involve descriptive statistics, data visualization, and
exploratory data analysis (EDA) techniques.
o Data Modelling: Applying statistical and machine learning techniques to build models that
can make predictions, classifications, or identify patterns in the data.
Here are some key aspects of data analysis:
o Descriptive Statistics: Descriptive statistics provide a summary of the main characteristics of
a dataset. This includes measures such as mean, median, mode, standard deviation, range, and
percentiles. Descriptive statistics help understand the central tendency, dispersion, and shape
of the data distribution.
o Inferential Statistics: Inferential statistics allows us to make inferences and draw conclusions
about a larger population based on a sample. Techniques like hypothesis testing, confidence
intervals, and regression analysis are commonly used in inferential statistics.
o Exploratory Data Analysis (EDA): EDA involves visually and quantitatively exploring the
data to gain insights and identify patterns. Techniques such as data visualization, summary
statistics, and correlation analysis are used to understand relationships, detect outliers, and
uncover hidden patterns in the data.
FUNDAMENTALS OF AI

o Data Mining: Data mining is the process of discovering patterns and relationships in large
datasets. It involves applying statistical algorithms, machine learning techniques, and data
visualization to extract valuable information from the data.
o Predictive Analytics: Predictive analytics uses historical data to make predictions or forecasts
about future events or outcomes. Techniques such as regression analysis, time series analysis,
and machine learning algorithms are employed to build predictive models.
o Text and Sentiment Analysis: Text analysis involves extracting information and insights from
textual data. It includes techniques such as text mining, natural language processing (NLP),
and sentiment analysis to analyze and interpret text-based data.
o Machine Learning: Machine learning algorithms are used to build models that can
automatically learn from data and make predictions or take actions without explicit
programming. Supervised learning, unsupervised learning, and reinforcement learning are
common types of machine learning techniques.
o Data Wrangling: Data wrangling, also known as data munging or data pre-processing,
involves cleaning, transforming, and reshaping the data to make it suitable for analysis. This
step ensures that the data is accurate, complete, and formatted correctly.
o Data Integration: Data integration involves combining data from multiple sources into a
single unified dataset. It may require merging, joining, or blending data to create a
comprehensive view for analysis.
o Data Interpretation: Data interpretation is the process of making sense of the analyzed data
and drawing meaningful insights and conclusions. It involves critically analyzing the results,
considering the context, and making data-driven decisions.
o When conducting data analysis, it is important to follow a systematic approach, considering
the specific goals and questions at hand. It often involves an iterative process of refining the
analysis based on the insights gained from each step.
 Data Visualization: Data visualization is the graphical representation of data to facilitate
understanding and communicate insights effectively. It involves creating visual representations, such
as charts, graphs, maps, or dashboards, to present data in a visually appealing and informative way.
 Benefits of Data Visualization:
o Easy comprehension: Visual representations make it easier to understand complex data
patterns and relationships.
o Insight discovery: Visualizations can help identify trends, outliers, correlations, and other
patterns that might not be apparent in raw data.
o Effective communication: Visualizations enable the clear and concise communication of
findings and insights to a wider audience.
FUNDAMENTALS OF AI

o Common Data Visualization Techniques: Bar Charts and Histograms are used to
representcategorical or numerical data by displaying the frequency or distribution of values.
o Line Charts: Ideal for showing trends and changes over time, typically used for time series or
sequential data.
o Scatter Plots: Depict the relationship between two numerical variables, showing how they
correlate or cluster.
o Pie Charts: Display parts of a whole, useful for illustrating proportions or percentages.
o Heatmaps: Present data in a grid-like format using colors to represent values, commonly used
for matrices or geographic data.
o Geographic Maps: Visualize data on a geographical map, showing regional or spatial patterns.
 Tools for Data Analysis and Visualization: Several tools and programming languages are commonly
used for data analysis and visualization, including:
o Python: Popular programming language with libraries such as pandas, NumPy, and Matplotlib
for data manipulation and visualization.
o R: Statistical programming language with packages like dplyr, ggplot2, and shiny for data
analysis and visualization.
o Tableau: A powerful data visualization tool that allows for interactive and dynamic dashboards
and reports.
o Power BI: Microsoft's business analytics tool that provides data visualization and interactive
reporting capabilities.
o Excel: Widely used spreadsheet software with built-in data analysis and visualization features.
Remember, data analysis and visualization are iterative processes, where insights gained from
visualization can lead to further analysis and refinement of the data. The ultimate goal is to transform
raw data into actionable insights that drive informed decision-making and problem-solving.
5.2. Types of data
Data can be classified into different types based on its nature and characteristics. The most common types
of data are:
 Numerical Data: Numerical data represents quantitative values and can be further categorized into
two subtypes:
o Continuous Data: Continuous data can take any value within a specific range. It is typically
measured on a continuous scale and can have decimal or fractional values. Examples include
height, weight, temperature, and time.
o Discrete Data: Discrete data consists of whole numbers or distinct values that cannot be
subdivided further. It represents countable or categorical data. Examples include the number of
students in a class, the number of cars in a parking lot, or the number of items sold.
FUNDAMENTALS OF AI

o Categorical Data: Categorical data represents qualitative or categorical variables. It

includes distinct categories or groups without any inherent numerical meaning.
Categorical data can be further dividedinto two subtypes:
o Nominal Data: Nominal data represents categories with no inherent order or
hierarchy. Examples include gender (male/female), marital status
(single/married/divorced), or eye color (blue/brown/green).
o Ordinal Data: Ordinal data represents categories with a specific order or ranking.
The categories havea relative position or rank but may not have a fixed numerical
difference between them. Examples include education levels (high school,
college, graduate), rating scales (1-star, 2-star, 3-star), or satisfaction levels (low,
medium, high).
o Time Series Data: Time series data consists of observations collected over a
sequence of time intervals. It represents data points recorded at regular intervals, such
as daily, monthly, or yearly. Time series data is commonly used to analyze trends,
patterns, and seasonality over time. Examples include stock prices, weather data, or
website traffic over time.
o Textual Data: Textual data represents unstructured or semi-structured textual
information. It includes documents, articles, social media posts, emails, or any other
form of textual content. Analyzing textual data involves techniques such as text
mining, natural language processing (NLP), and sentiment analysis.
o Spatial Data: Spatial data represents information about geographic locations or
features on the Earth's surface. It includes coordinates, polygons, maps, or any data
associated with a specific location. Spatial data is used in various domains such as
geography, urban planning, environmental science, and GPS navigation systems.
o Binary Data: Binary data consists of only two possible values, typically represented
as 0 and 1. It is often used in computer science and digital systems, representing on/off
states, true/false conditions, orpresence/absence of certain characteristics.
Understanding the type of data is crucial for selecting appropriate analysis techniques, visualization
methods, and statistical models. It helps determine the appropriate summary statistics, data
transformations, and inferential methods to apply when analyzing and interpreting the data.
5.3. Introduction to Data Pre-processing
Data pre-processing is a crucial step in the data analysis pipeline. It involves preparing raw data to
ensure it is in a suitable format for analysis. Data pre-processing aims to address common issues such as
missing values, outliers, inconsistent formats, and noise, among others. By performing data pre-
processing, you can enhance the quality of the data and improve the accuracy and effectiveness of
subsequent analysis and modeling.
FUNDAMENTALS OF AI

Here are the key steps involved in data pre-processing:

 Data Cleaning:
o Handling Missing Values: Missing values can occur due to various reasons, such as data
collection errors or incomplete records. Common strategies for handling missing values
include:
 Deleting rows or columns with missing values: This approach is suitable when the
amount of missing data is small and will not significantly impact the analysis.
 Imputing missing values: Missing values can be replaced with estimated or calculated
values. Techniques like mean imputation, median imputation, mode imputation, or
advanced imputation methods (e.g., regression imputation, K-nearest neighbors
imputation) can be used.
o Dealing with Outliers: Outliers are data points that significantly deviate from the normal data
distribution. Outliers can be addressed by:
 Removing outliers: Outliers can be identified using statistical techniques (e.g., z-score,
box plots) and then removed from the dataset if they are deemed irrelevant or erroneous.
 Transforming outliers: For certain situations, transforming the data (e.g., applying
logarithmic transformation) can reduce the impact of outliers without removing them
entirely.
o Handling Noise: Noise refers to irrelevant or erroneous data that may arise due to
measurement errors or data collection issues. Techniques like smoothing, filtering, or using
algorithms (e.g., moving averages, median filtering) can help reduce noise in the data.
 Data Integration:
o Data integration involves combining data from multiple sources to create a unified dataset. This
step is essential when dealing with data collected from different databases, files, or formats.
Techniques such as merging, joining, or concatenating can be employed to integrate data
effectively.
 Data Transformation:
o Normalization: Normalization ensures that numerical features are on a similar scale,
preventing one feature from dominating the analysis due to its larger values. Common
normalization techniques include:
o Z-score normalization: Transforming the values to have a mean of 0 and a standard deviation
of 1.
o Feature Scaling: Feature scaling is particularly useful when working with machine learning
algorithms that are sensitive to the scale of input features. Scaling techniques like
standardization or using scaling methods (e.g., robust scaling) can be applied to normalize the
range of numerical features.
FUNDAMENTALS OF AI

o Min-max scaling: Rescaling the values to a specified range, often between 0 and 1.
 Standardization (z-score normalization):
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df_filled[['Age', 'Income']] = scaler.fit_transform(df_filled[['Age', 'Income']])
print("\nDataFrame after standardization:")
print(df_filled)
 Normalization (min-max scaling):
from sklearn.preprocessing import MinMaxScaler
min_max_scaler = MinMaxScaler()
df_filled[['Age', 'Income']] = min_max_scaler.fit_transform(df_filled[['Age', 'Income']])
print("\nDataFrame after min-max scaling:")
print(df_filled)
o Encoding Categorical Variables: Machine learning algorithms generally require numerical
input, so categorical variables need to be encoded. Common encoding techniques include:
o One-hot encoding: Representing each category as a binary feature column.
o Label encoding: Assigning a numerical label to each category.
o Dimensionality Reduction: Dimensionality reduction techniques are employed to reduce the
number of features while retaining the most important information. This helps overcome the
curse of dimensionality andcan improve the efficiency and interpretability of analysis. Popular
dimensionality reduction methods include
o Principal Component Analysis (PCA): Transforming the original features into a lower-
dimensional space using linear combinations of the original variables.
o Feature Selection: Selecting a subset of relevant features based on statistical techniques,
domain knowledge, or machine learning algorithms.
 Data Discretization:
o Data discretization involves converting continuous data into discrete intervals or bins.
Discretization can be useful when working with algorithms that require categorical or ordinal
data. Techniques like equal-width binning, equal-frequency binning, or entropy-based binning
can be used.
 Handling Imbalanced Data:
Imbalanced data occurs when the distribution of classes or categories in the dataset is skewed, with
one class being significantly more prevalent than others. Techniques for handling imbalanced data
include:
o Under-sampling: Randomly removing samples from the majority class to achieve a balanced
distribution.
FUNDAMENTALS OF AI

o Oversampling: Creating synthetic samples in the minority class to balance the distribution.
o Class-weighting: Assigning higher weights to the minority class during model training to give
it more importance.
5.4. Handling Missing Values
Handling missing values is an essential part of data pre-processing. Missing values can occur due to
various reasons such as data collection errors, equipment failures, or survey non-responses. Dealing
with missing values appropriately is crucial to ensure the accuracy and reliability of data analysis. Here
are some common strategies for handling missing values:
 Deleting Rows or Columns: If the amount of missing data is small and doesn't significantly impact
the analysis, you can choose to delete the rows or columns with missing values. However, this approach
should be used cautiously, as it can lead to a loss of valuable information.
 Delete Columns
# Drop the 'Name' column
df = df.drop(columns=['Name'])
print("\nDataFrame after dropping 'Name' column:")
print(df)
 Delete Rows
# Drop rows with any missing values
df_dropped_rows = df.drop()
print("\nDataFrame after dropping rows with any missing values:")
print(df_dropped_rows)
 Mean/Median/Mode Imputation: Replace missing values with the mean, median, or mode of the
available data for the respective feature. This method assumes that the missing values have a similar
distribution to the observed values.
 Mean Imputation
df_mean_imputed['Age']
df_mean_imputed['Age'].fillna(df_mean_imputed['Age'].mean())
print("\nDataFrame after mean imputation:")
print(df_mean_imputed)
 Median Imputation
df_median_imputed['Age']
df_median_imputed['Age'].fillna(df_median_imputed['Age'].median())
print("\nDataFrame after median imputation:")
print(df_median_imputed)
FUNDAMENTALS OF AI

 Mode Imputation
df_mode_imputed['Gender'] =
df_mode_imputed['Gender'].fillna(df_mode_imputed['Gender'].mode()[0])
print("\nDataFrame after mode imputation:")
print(df_mode_imputed)
 Forward/Backward Filling: Propagate the last known value forward or the next known value
backward to fill in missing values. This approach is suitable when missing values occur in sequences
or time series data.
 Forward Filling
df_ffill = df.copy()
df_ffill = df_ffill.ffill()
print("\nDataFrame after forward filling:")
print(df_ffill)
 Backward Filling
df_bfill = df.copy()
df_bfill = df_bfill.bfill()
print("\nDataFrame after backward filling:")
print(df_bfill)
 K-Nearest Neighbors (KNN) Imputation: Identify the K nearest neighbors based on available
features and use their values to impute missing values. KNN imputation works well when the missing
values are related to the values of their neighboring data points.
5.5. Handling Outliers and Inconsistencies
 Handling outliers and inconsistencies is an important step in data pre-processing to ensure the accuracy
and reliability of data analysis. Outliers are extreme values that deviate significantly from the normal
data pattern, while inconsistencies refer to data values that are illogical or contradictory. Here's a
detailed explanation of how to handle outliers and inconsistencies:
 Handling Outliers:
o Identify Outliers: Identifying outliers is an important step in data pre-processing to understand
the distribution of data and identify any extreme values that may significantly deviate from the
normal pattern. Here are several approaches to identifying outliers
 Visual Inspection: Plot the data using techniques like box plots, histograms, or scatter plots to visually
identify potential outliers. Outliers may appear as points far away from the main distribution or as
values outside a certain range.
 Statistical Methods: Utilize statistical techniques such as z-scores or interquartile range (IQR) to
quantitatively identify outliers. Observations that fall beyond a specified threshold (e.g., z-score > 3
or outside 1.5 times the IQR) can be considered as outliers.
FUNDAMENTALS OF AI

 Decide on the Treatment Approach:

o Remove Outliers: If the outliers are deemed irrelevant or caused by measurement errors, you
may choose to remove them from the dataset. However, be cautious about removing too many
outliers, as it can affect the representativeness of the data.
o Transform Outliers: Instead of removing outliers, you can transform their values to reduce
their impact. Common transformation techniques include logarithmic transformation, square
root transformation, or Winsorization (replacing extreme values with the nearest values within
a certain range).
o Apply the Chosen Approach: If you decide to remove outliers, you can delete the
corresponding data points. However, ensure that the removal does not introduce bias or
significantly alter the overall data distribution.
 If you choose to transform outliers, apply the appropriate transformation method to adjust the values
while maintaining their relative order and pattern.
 Handling Inconsistencies:
o Identify Inconsistencies:
 Perform Data Validation: Check for logical inconsistencies within the data. For
example, verify that age values are within a reasonable range, dates are in a valid
format, or categorical variables contain expected categories.
 Cross-Referencing: Cross-reference data across different sources or variables to
identify inconsistencies. For instance, compare customer addresses with postal code
databases to identify address discrepancies.
o Resolve Inconsistencies:
 Manual Inspection and Correction: Inspect the inconsistent data points and manually
correct them based on available information or expert knowledge.
 Imputation: If inconsistencies cannot be manually resolved, impute missing or
inconsistent values using appropriate imputation techniques (as discussed in the
previous response).
o Document Changes:
 Keep a record of the changes made during the inconsistency handling process. This
documentation ensures transparency and allows others to understand the data pre-
processing steps undertaken.
 When handling outliers and inconsistencies, it is important to consider the context and
domain knowledge. Understanding the data generation process and consulting subject
matter experts can aid in making informed decisions regarding outlier treatment and
resolving inconsistencies.
FUNDAMENTALS OF AI

5.6. Introduction to machine learning

 Machine learning is a subfield of artificial intelligence (AI) that focuses on the development of
algorithms and models that enable computers to learn and make predictions or decisions
without being explicitly programmed. It involves training a computer system to automatically
learn patterns, relationships, and insights from data, and then use that knowledge to perform
tasks or make predictions.
 In traditional programming, developers write explicit instructions for a computer to follow.
However, in machine learning, the computer learns from examples and data, iteratively
improving its performance over time. This learning process can be categorized into three main
types of machine learning:
 Supervised Learning: In supervised learning, the algorithm is trained on labelled data, where
each data point is associated with a known target or output variable. The goal is to learn a
mapping function that can predict the output variable given new, unseen inputs. Examples of
supervised learning algorithms include linear regression, decision trees, random forests,
support vector machines (SVM), and neural networks.
 Unsupervised Learning: In unsupervised learning, the algorithm is trained on unlabelled data,
where there is no predefined target variable. The objective is to find patterns, structures, or
relationships within the data. Unsupervised learning can be used for tasks such as clustering
similar data points, dimensionality reduction, or anomaly detection. Common unsupervised
learning algorithms include k-means clustering, hierarchical clustering, principal component
analysis (PCA), and association rule mining.
 Reinforcement Learning: Reinforcement learning involves an agent that learns to make
decisions in an environment to maximize a cumulative reward. The agent interacts with the
environment, receives feedback in the form of rewards or penalties, and adjusts its actions
based on the received feedback. Reinforcement learning is commonly used in applications such
as game playing, robotics, and autonomous systems.
Machine learning algorithms typically go through the following steps:
 Data Collection: Gathering relevant data that represents the problem or task at hand. The
quality and quantity of data play a crucial role in the performance of machine learning models.
 Data Pre-processing: Cleaning, transforming, and preparing the data for analysis. This
includes handling missing values, dealing with outliers, encoding categorical variables, and
scaling or normalizing the data.
 Model Selection and Training: Choosing an appropriate machine learning algorithm and
training it on the labelled or unlabelled data. This involves splitting the data into training and
validation sets, feeding the data into the algorithm, and optimizing its parameters.
FUNDAMENTALS OF AI

 Model Evaluation: Assessing the performance of the trained model using evaluation metrics
such as accuracy, precision, recall, or mean squared error, depending on the specific problem
and the type of algorithm used.
 Model Deployment: Once the model is trained and evaluated, it can be deployed to make
predictions or decisions on new, unseen data. This could involve integrating the model into a
larger system or application.
 Machine learning has a wide range of applications across various industries, including finance,
healthcare, marketing, image and speech recognition, natural language processing, and
FUNDAMENTALS OF AI

recommendation systems, to name just a few. It continues to advance and evolve, with new
algorithms and techniques being developed to tackle more complex problems and improve
performance.
 It's worth noting that machine learning requires careful consideration of data quality, feature
selection, model evaluation, and ethical considerations to ensure reliable and unbiased results.
The field of machine learning is constantly evolving, with ongoing research and development
focused on enhancing algorithms, handling large-scale datasets, and addressing interpretability
and fairness challenges.

Da End Sem
No ratings yet
Da End Sem
5 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Data Analysis and Visualization Uploaded 1744086898389
No ratings yet
Data Analysis and Visualization Uploaded 1744086898389
47 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
38 pages
DTS 201 Lecture Note
No ratings yet
DTS 201 Lecture Note
24 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Unit 1
No ratings yet
Unit 1
36 pages
UNIT - II Artificial Intelligence Second Part
No ratings yet
UNIT - II Artificial Intelligence Second Part
9 pages
Introduction To Data Analytics Techniques and Tools
No ratings yet
Introduction To Data Analytics Techniques and Tools
9 pages
Advanced Data Analytics and Visualization Course Material
No ratings yet
Advanced Data Analytics and Visualization Course Material
45 pages
DV Classnotes
No ratings yet
DV Classnotes
28 pages
Manual - DV
No ratings yet
Manual - DV
51 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Bi Tools - Comparative Study
No ratings yet
Bi Tools - Comparative Study
14 pages
Dsbda Ut6
No ratings yet
Dsbda Ut6
11 pages
23SC3201 Data Science and Challenges-2
No ratings yet
23SC3201 Data Science and Challenges-2
28 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
DA Chapter 1 Notes
No ratings yet
DA Chapter 1 Notes
3 pages
Data Analytics Mastery Syllabus
No ratings yet
Data Analytics Mastery Syllabus
5 pages
FTA-Module 1-Notes
No ratings yet
FTA-Module 1-Notes
24 pages
703 (A) Data Visualization Unit-1 Notes
No ratings yet
703 (A) Data Visualization Unit-1 Notes
5 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Data Analysis
No ratings yet
Data Analysis
36 pages
Notes Unit 1
No ratings yet
Notes Unit 1
8 pages
ANSWERS - End Sem Lab Data Visualization Using Tableau
No ratings yet
ANSWERS - End Sem Lab Data Visualization Using Tableau
5 pages
DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
Data 101 Terms
No ratings yet
Data 101 Terms
6 pages
Welcome
No ratings yet
Welcome
8 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
Data Science-Unit1
No ratings yet
Data Science-Unit1
11 pages
Unit 1
No ratings yet
Unit 1
57 pages
Basic of Computer Orientation: Data Analysis
No ratings yet
Basic of Computer Orientation: Data Analysis
9 pages
Unit 1
No ratings yet
Unit 1
9 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Data Analytics
No ratings yet
Data Analytics
6 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Rma Midterm Reviewer
No ratings yet
Rma Midterm Reviewer
11 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
Introduction To Data Analysis
100% (1)
Introduction To Data Analysis
94 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Python For Data Analysis
100% (1)
Python For Data Analysis
84 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Essential Data Science Notes - A Concise PDF Guide
No ratings yet
Essential Data Science Notes - A Concise PDF Guide
20 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
Bda U-5
No ratings yet
Bda U-5
33 pages
Week 1
No ratings yet
Week 1
50 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Data Visulization and Power Bi Lab Manual
100% (1)
Data Visulization and Power Bi Lab Manual
42 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
FDS
No ratings yet
FDS
7 pages
Unit 1 Notes - Data Analysis Using R
No ratings yet
Unit 1 Notes - Data Analysis Using R
17 pages
Techniques of Data Analysis
No ratings yet
Techniques of Data Analysis
9 pages
Data Analysis New1
No ratings yet
Data Analysis New1
36 pages
Unit 2, 3
No ratings yet
Unit 2, 3
9 pages
All Unit DV Notes
No ratings yet
All Unit DV Notes
31 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
Fda 1
No ratings yet
Fda 1
5 pages
Data Wrangling & Analysis Tools
No ratings yet
Data Wrangling & Analysis Tools
9 pages
Ett MCQ 1
No ratings yet
Ett MCQ 1
18 pages
Ett MCQ 2
No ratings yet
Ett MCQ 2
18 pages
ETT LabManual Index Certificate
No ratings yet
ETT LabManual Index Certificate
18 pages
ETT Index Certificate
No ratings yet
ETT Index Certificate
3 pages
ETT Handbook
No ratings yet
ETT Handbook
24 pages
NSM Notes
No ratings yet
NSM Notes
102 pages
NSM Assignment Questions
No ratings yet
NSM Assignment Questions
2 pages
NSM Notes - Unit 4
No ratings yet
NSM Notes - Unit 4
17 pages
Survey Data Processing Guide
No ratings yet
Survey Data Processing Guide
29 pages
Interventional Research
No ratings yet
Interventional Research
85 pages
Statistical Analysis - r1 & r2
No ratings yet
Statistical Analysis - r1 & r2
44 pages
Bartlett & Hughes, 2020
No ratings yet
Bartlett & Hughes, 2020
14 pages
DP-100 Exam Prep: Dumps & Practice Tests
No ratings yet
DP-100 Exam Prep: Dumps & Practice Tests
13 pages
Key Concepts in Exploratory Data Analysis (EDA)
No ratings yet
Key Concepts in Exploratory Data Analysis (EDA)
5 pages
A Study of Early Prediction and Classification of Arthritis Disease Using Soft Computing Techniques
No ratings yet
A Study of Early Prediction and Classification of Arthritis Disease Using Soft Computing Techniques
13 pages
Jaggia BA 1e Chap002 PPT
No ratings yet
Jaggia BA 1e Chap002 PPT
35 pages
3 Inconsistent Data Data Integration and Transformation in Data Mining
No ratings yet
3 Inconsistent Data Data Integration and Transformation in Data Mining
10 pages
Personalized Marketing Leveraging AI For Culturally - 2025 - Alexandria Enginee
No ratings yet
Personalized Marketing Leveraging AI For Culturally - 2025 - Alexandria Enginee
14 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
Machine Learning for Cost Estimation in Nepal
No ratings yet
Machine Learning for Cost Estimation in Nepal
62 pages
Applied Longitudinal Data Analysis For Medical Science: A Practical Guide 3rd Edition Twisk Full Access
No ratings yet
Applied Longitudinal Data Analysis For Medical Science: A Practical Guide 3rd Edition Twisk Full Access
106 pages
BC 2014 Session2
No ratings yet
BC 2014 Session2
45 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
06 02 Lessonarticle
No ratings yet
06 02 Lessonarticle
4 pages
2 - Data Management and Wrangling
No ratings yet
2 - Data Management and Wrangling
33 pages
Module 2 Data Collection and Preparation For Students
No ratings yet
Module 2 Data Collection and Preparation For Students
4 pages
Statistics Basics for Beginners
No ratings yet
Statistics Basics for Beginners
53 pages
AI & ML Exam Model Answers Sep 2023
No ratings yet
AI & ML Exam Model Answers Sep 2023
21 pages
Nusbaumer Et Al. 2013 Multidimensional Energy Poverty Index
No ratings yet
Nusbaumer Et Al. 2013 Multidimensional Energy Poverty Index
17 pages
UNIT I Viva Questions
No ratings yet
UNIT I Viva Questions
6 pages
Patrick Dattalo - Analysis of Multiple Dependent Variables
No ratings yet
Patrick Dattalo - Analysis of Multiple Dependent Variables
191 pages
Raking With IBM SPSS Statistics
100% (1)
Raking With IBM SPSS Statistics
84 pages
Ai Auditing - Checklist For Ai Auditing Scores - Edpb Spe Programme - en
No ratings yet
Ai Auditing - Checklist For Ai Auditing Scores - Edpb Spe Programme - en
21 pages
Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
IBM SPSS Statistics 20 Features Overview
No ratings yet
IBM SPSS Statistics 20 Features Overview
5 pages
Narcissistic Adolescents' Attention-Seeking Following Social Rejection Links
No ratings yet
Narcissistic Adolescents' Attention-Seeking Following Social Rejection Links
11 pages
PGDDS 2023-24
No ratings yet
PGDDS 2023-24
13 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages

FAI Notes - Unit 5

Uploaded by

FAI Notes - Unit 5

Uploaded by

FUNDAMENTALS OF AI

o Categorical Data: Categorical data represents qualitative or categorical variables. It

Here are the key steps involved in data pre-processing:

 Decide on the Treatment Approach:

5.6. Introduction to machine learning

You might also like