0% found this document useful (0 votes)
14 views29 pages

Introduction To Data Science-Compressed

The document provides an overview of data science, its evolution, and the stages involved in data processing. It highlights the importance of data pre-processing, applications across various industries, and the roles of professionals in the field. Additionally, it discusses future trends such as AutoML and the significance of clean data for accurate analytics and model performance.

Uploaded by

anshulsharma7162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views29 pages

Introduction To Data Science-Compressed

The document provides an overview of data science, its evolution, and the stages involved in data processing. It highlights the importance of data pre-processing, applications across various industries, and the roles of professionals in the field. Additionally, it discusses future trends such as AutoML and the significance of clean data for accurate analytics and model performance.

Uploaded by

anshulsharma7162
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data

Science
and
DATA
PRE-
PROCESSING
Showcased by
Anshul Sharma Anmol Sharma Aditya Sikarwar
Please Slide
Challenges
Introduction
Applications
Data Pre-
Stages in a
in
to Data
Data
of Data
Processing
CONTENTS
Conclusion
Data Science
Pre
Science
Science
Techniques
Project
Processing

Please Slide
Introduction
to Data
Science Conclusion

Stages in a Challenges
CONTENTS in Data
Data Science
Project Pre
Processing

Applications Data Pre-


of Data Processing
Science Techniques

Please Slide
Please Slide
Data Science is an interdisciplinary field that uses
scientific methods, processes, algorithms, and systems to
extract knowledge and insights from structured and
unstructured data.

Please Slide
1 Decision making 2 AI applications
Enables data-driven decision Powers artificial intelligence
making and machine learning
applications

3 Pattern discovery 4 Efficiency


Helps uncover hidden Improves operational
patterns and trends efficiency across industries

Please Slide
Please Slide
EVOLUTION OF DATA SCIENCE
1960s
Early statistical analysis

Please Slide
EVOLUTION OF DATA SCIENCE
1960s
Early statistical analysis

1 2

1980s-1990s
Data mining concepts

Please Slide
EVOLUTION OF DATA SCIENCE
1960s 2000s
Early statistical analysis Emergence of big data technologies

1 2 3

1980s-1990s
Data mining concepts

Please Slide
EVOLUTION OF DATA SCIENCE
1960s 2000s
Early statistical analysis Emergence of big data technologies

1 2 3 4

1980s-1990s Current era


Data mining concepts AI-driven analytics

Please Slide
ROLES IN DATA SCIENCE
Data Scientist Data Analyst Data Engineer
Data Scientist Data Analyst Data Engineer

ML Engineer Business Analyst


Machine Learning Engineer Business Analyst

Please Slide
Please Slide
Data collection Sources
Gathering raw data from various APIs, databases, web scraping,
sources sensors

Please Slide
PRE-PROCESSING
1 2

Data preparation Time consumption


Cleaning and preparing data for analysis
Most time-consuming phase (≈60-80% of project time)

Please Slide
MODELING
1 2

Statistical techniques Algorithm training


Applying statistical and machine learning techniques Algorithm selection and training

Please Slide
DEPLOYMENT
1 2

Model Implementation Dashboard Creation


in production environments or APIs for end-users

Please Slide
Please Slide
APPLICATION OF DATA
SCIENCE
1 Healthcare 2 Finance
•Fraud detection
•Disease prediction models
•Algorithmic trading
•Medical image analysis
•Credit scoring
•Drug discovery
•Risk management
•Personalized treatment
plans

3 Marketing 4 Cybersecurity

•Customer segmentation •Anomaly detection


•Churn prediction •Threat intelligence
•Sentiment analysis •Network security monitoring
•Recommendation systems •Malware analysis

Please Slide
Please Slide
Cleaning
1 2 3

Data Cleaning Data Cleaning Data Cleaning


Handling missing values Removing duplicates Correcting inconsistencies

Please Slide
Integration
1 2 3

Data Combination Schema Resolution Entity Matching


Combining data from multiple Resolving schema conflicts Entity resolution
sources

Please Slide
Transformation

2 Aggregation

1
Normalization

Feature engineering
3

Please Slide
Reduction

Reduction Sampling Selection


Dimensionality reduction Sampling techniques Feature selection

Please Slide
Discretization
Algorithm boost

Improving algorithm efficiency


Data processing
Model clarity
Converting continuous attributes to
discrete intervals Enhancing interpretability

1 3

Please Slide
Please Slide
Future Trends
AutoML AI cleaning
Automated machine learning AI-powered data cleaning
(AutoML)

Real-time Data quality


Real-time data processing
Increased focus on data quality

Please Slide
Importance of Clean Data
Accurate analytics Model performance

Foundation for accurate analytics 1 2 Critical for model performance

Error reduction
Reliable insights
Reduces downstream errors and
4 3
Ensures reliable business insights
costs

Please Slide
Over

You might also like