PARULINSTITUTEOF ENGINEERING &TECHNOLOGY
FACULTY OF ENGINEERING & TECHNOLOGY
PARULUNIVERSITY
Subject: Foundations of Data Science
Unit 1 : Introduction to Data Science and Analytics
Artificial Intelligence & Data Science
Outline
• Data Science Basics
Definition and Scope
• Data Science Process and Workflow
• Business Intelligence vs Data Science
• Essential Skills and Tools for Data Scientists
• Prerequisites for a Data Science Career
Introduction to Data Science and Analytics
Definition:
Data Science is an interdisciplinary field that employs scientific methods,
processes, algorithms, and systems to extract knowledge and insights from
data. It integrates principles from mathematics, statistics, computer science,
and domain-specific expertise to uncover meaningful patterns and make
data-driven decisions.
Introduction to Data Science and Analytics
Scope
• Predictive Analytics: Anticipating future outcomes using historical data.
• Descriptive Analytics: Summarizing past data to identify trends and insights
• Prescriptive Analytics: Recommending actions based on data-driven
predictions.
• Real-world Applications: Examples include improving patient outcomes in
healthcare, optimizing supply chains in logistics, enabling personalized
recommendations in e-commerce, and enhancing user experiences in
entertainment platforms.
Data Science Process
and Workflow
Data Science Process and Workflow
Phases:
1. Problem Definition(Business Understanding):
Objective: Understand the business problem or research question that data
science will address.
Key Activities:
• Stakeholder consultations to identify goals.
• Translating business objectives into analytical questions.
Methods:
• Stakeholder interviews, brainstorming, and requirement gathering.
Data Science Process and Workflow
2. Data Understanding or Collection
Objective: Gather the data required for analysis, either from existing
sources or by creating new datasets.
Key Activities:
• Identifying data sources: Databases, APIs, or scraping.
• Collecting structured, semi-structured, and unstructured data.
Tools:
• SQL, Python (requests, BeautifulSoup for web scraping), API clients.
Data Science Process and Workflow
3. Data Preparation (Data Wrangling/Munging)
Objective: Clean, transform, and preprocess raw data to make it usable for
analysis.
Key Activities:
• Handling missing values, outliers, and inconsistent formats.
• Transforming data into suitable formats: Encoding categorical variables,
scaling numerical data.
• Data integration: Merging and concatenating datasets.
Tools:
• Pandas, NumPy, SQL, and ETL pipelines.
Data Science Process and Workflow
4. Modeling
Objective: Build predictive or descriptive models to analyze data.
Key Activities:
• Choosing appropriate algorithms: Regression, classification, clustering, etc.
• Training and validating models.
• Optimizing model parameters for performance.
Tools:
• Python (scikit-learn, TensorFlow, PyTorch), R.
Data Science Process and Workflow
5. Exploratory Data Analysis (EDA)
Objective: Uncover patterns, trends, and
anomalies in the data to inform subsequent
steps.
Key Activities:
• Summary statistics: Mean, median,
variance.
• Visualizing data distributions,
correlations, and relationships.
• Hypothesis testing and preliminary
insights.
Tools:
• Python (Matplotlib, Seaborn), R
(ggplot2), Tableau.
Data Science Process and Workflow
6. Model Evaluation
Objective: Assess the model's accuracy and reliability using evaluation
metrics.
Key Activities:
• Selecting metrics: Accuracy, precision, recall, F1-score, RMSE.
• Cross-validation to test model robustness.
• Iterative improvement based on feedback.
Tools:
• scikit-learn, R (caret), PyCaret.
Data Science Process and Workflow
7. Communication of Results
Objective: Present findings to stakeholders clearly and persuasively.
Key Activities:
• Data visualization to highlight insights and trends.
• Preparing reports and dashboards for non-technical audiences.
• Explaining the implications of the findings.
Tools:
• Tableau, Power BI, Matplotlib, Seaborn.
Data Science Process and Workflow
8. Deployment and Maintenance
Objective: Integrate the model into operational workflows or products.
Key Activities:
• Building APIs for model integration.
• Monitoring model performance over time.
• Updating the model as new data becomes available.
Tools:
• Flask, FastAPI, Docker, Kubernetes.
Business Intelligence
vs
Data Science
Business Intelligence vs Data Science
Definition and scope
BI DS
• BI involves collecting, organizing, • Data Science is a broader field that uses
and analyzing historical business data statistical methods, machine learning, and
to provide actionable insights. advanced analytics to uncover patterns,
• Focuses primarily on descriptive and predict future trends, and optimize
diagnostic analytics to answer “What processes.
happened?” and “Why did it • Answers “What will happen?” and
happen?”. “How can we make it happen?” through
• Aims to support decision-making predictive and prescriptive analytics.
through reports, dashboards, and • Encompasses a full workflow, from data
visualizations. extraction and wrangling to modeling and
deployment.
Business Intelligence vs Data Science
Techniques and Tools
BI DS
Techniques Techniques
• Descriptive statistics and summary • Advanced statistical modeling
reports. • Machine Learning and Artificial Intelligence
• Dashboard creation and visualization. • Data Mining and Pattern Recognition
• OLAP (Online Analytical Processing)
for multidimensional analysis. Tools
• Python(Pandas, scikit-learn, Tensorflow
Tools • R(caret,ggplot2)
• Tableau, PowerBI, Qlikview, Looker • Jupyter Notebooks, Apache Spark
• SQL based reporting tools
Business Intelligence vs Data Science
Workflow
BI DS
1. Data Collection from Internal 1. Define the Problem or Research
Systems( ERP, CRM) or external Question.
sources. 2. Data Acquisition from structured or
2. Data Transformation and Cleaning unstructured sources.
3. Dashboard and Report 3. Data Wrangling and EDA
Development for decision-makers. 4. Model Building using algorithms and
4. Monitoring KPIs( Key Performance statistical methods.
Indicators). 5. Deployment and Continuous
Improvement.
Business Intelligence vs Data Science
Applications
BI DS
• Performance tracking(e.g., sales,
• Predictive modeling(e.g., sales
revenue growth).
• Customer segmentation for forecasting, demand prediction).
• Natural Language Processing (e.g.,
marketing.
• Operational efficiency chatbots, sentiment analysis).
• Recommendation Systems (e.g.,
dashboards.
Netflix, Amazon)
Business Intelligence vs Data Science
Aspect Business Intelligence (BI) Data Science
Focused on reporting and Emphasis on predictive and
Purpose
descriptive analytics. prescriptive analytics.
Retrospective, analyzing Forward-looking, focusing on
Approach
historical data. future trends.
Querying, reporting, and Advanced modeling,
Methods
dashboards. simulations, and algorithms.
Dashboards (Power BI, Machine learning frameworks
Tools
Tableau), SQL. (TensorFlow, Scikit-learn).
Generates solutions,
Provides insights for informed
Outcome predictions, and
decision-making.
optimizations.
Essential Skills
and
Tools for Data Scientists
Essential Skills and Tools for Data Scientists
Technical Skills:
Programming Languages: Proficiency in Python, R,
and SQL.
Statistics: Deep understanding of descriptive and
inferential statistics.
Data Manipulation: Expertise in using libraries like
Pandas and NumPy.
Machine Learning: Familiarity with supervised,
unsupervised, and reinforcement learning techniques.
Essential Skills and Tools for Data Scientists
Data Visualization: Ability to create compelling visualizations with
tools like Matplotlib, Seaborn, Tableau, and Power BI.
Big Data Technologies: Knowledge of Hadoop, Spark, and other
distributed computing systems.
Cloud Computing: Understanding of platforms like AWS, Azure, and
Google Cloud for deploying scalable solutions.
Essential Skills and Tools for Data Scientists
Soft Skills:
• Communication: Translating technical findings into actionable
business insights.
• Problem-Solving: Analytical thinking to address complex
challenges.
• Collaboration: Working effectively within cross-functional teams.
Prerequisites for a Data
Science Career
Prerequisites for a Data Science Career
Educational Background:
• A degree in computer science, statistics, mathematics, engineering,
or related fields is often required.
• Advanced degrees (Masters or Ph.D.) can be beneficial for roles
involving research or specialized expertise.
Prerequisites for a Data Science Career
Technical Knowledge:
• Programming Skills: Proficiency in Python, R, or Java is fundamental.
• Mathematical Foundations: A solid grasp of linear algebra, calculus,
and probability.
• Data Handling: Familiarity with SQL and NoSQL databases.
Prerequisites for a Data Science Career
Certifications and Courses:
• Online platforms like Coursera, edX, and Udemy offer relevant
courses.
• Certifications from organizations such as Google, Microsoft, or IBM
can add value.
Prerequisites for a Data Science Career
Practical Experience:
• Participate in internships, hackathons, or projects to gain hands-on
experience.
• Contribute to open-source data science projects or build a portfolio
showcasing your work.
Prerequisites for a Data Science Career
Interpersonal Skills:
• Adaptability to new tools and techniques.
• Strong presentation skills to convey findings effectively.
• Curiosity and a continuous learning mindset to keep up with rapid
advancements in the field.
Prerequisites
QUESTIONS for a Data Science Career
1. Define data science and explain how it differs from traditional data analysis
techniques.
2. List and describe the key steps in the data science workflow. How does each step
contribute to solving a real-world problem?
3. Compare Business Intelligence (BI) and Data Science in terms of objectives,
methods, and typical outcomes. Provide one example where both are used together.
4. What are the essential technical and analytical skills required for a career in data
science? Why are these skills important?
5. Given a dataset with missing values and outliers, describe the steps you would take
to preprocess and prepare it for analysis.
Introduction to Data Science and Analytics
Thank You!!!
[Link]