0% found this document useful (0 votes)
39 views7 pages

Q1. Explain Data Science Process Along With Detailed Diagram

The Data Science Process consists of sequential steps including setting research goals, retrieving data, preparing data, exploring data, modeling, and presenting results. Data preparation involves sub-phases such as data cleansing, integration, and transformation to ensure data quality and suitability for analysis. Various data exploration techniques are employed to understand data patterns, while data science has applications across commercial, governmental, non-profit, and academic sectors.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Q1. Explain Data Science Process Along With Detailed Diagram

The Data Science Process consists of sequential steps including setting research goals, retrieving data, preparing data, exploring data, modeling, and presenting results. Data preparation involves sub-phases such as data cleansing, integration, and transformation to ensure data quality and suitability for analysis. Various data exploration techniques are employed to understand data patterns, while data science has applications across commercial, governmental, non-profit, and academic sectors.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Q1. Explain data science process along with detailed diagram.

Data Science Process

The Data Science Process involves several important steps to solve a problem using data. These
steps are followed in a sequence to ensure useful and reliable results for a business or
organization.

1. Setting the Research Goal

In this step, we define what the project is about. A project charter is prepared, which includes:

• The problem to be solved,

• The benefits to the business,

• What data and resources are needed,

• Timetable and expected outcomes.

2. Retrieving Data

Once the goal is set, the required data is collected. The data may come from:

• Company databases,

• Excel files,

• External sources like third-party services.

We also check if the data is accessible and of good quality.

3. Data Preparation

This step includes:

• Data Cleaning: Removing errors or wrong values,

• Data Integration: Combining data from multiple sources,

• Data Transformation: Converting data into a suitable format for analysis.

4. Data Exploration (EDA – Exploratory Data Analysis)

In this step, we try to understand the data better using:

• Descriptive statistics,
• Charts and graphs,

• Identifying trends and outliers.

This helps in making better decisions in the next step.

5. Data Modelling (Model Building)

Here, we build models using machine learning, statistics, or other techniques. This includes:

• Choosing the right method,

• Training and testing the model,

• Improving it based on performance.

6. Presentation and Automation

The final step is to present the results to the business through:

• Reports,

• Presentations, or

• Dashboards.

Sometimes, the whole process is automated so that it can be used again in future projects.
Q2) Write a short note on the following sub-phases of Data Preparation

1. Data Cleansing

2. Data Integration

3. Data Transformation

4.Sub-phases of Data Preparation

1. Data Cleansing

• This step involves removing incorrect, missing, or duplicate values from the dataset.

• It ensures that the data is accurate, consistent, and reliable for analysis.

• Example: Fixing spelling errors, filling missing values, or removing outliers.

2. Data Integration

• In this step, data from multiple sources is combined into a single dataset.

• It helps in creating a complete and unified view of the data.

• Example: Merging customer details from different departments like sales and support.

3. Data Transformation

• This process involves converting data into a suitable format for analysis or modeling.

• It includes tasks like normalization, scaling, encoding categorical data, etc.

• Example: Changing date formats, converting text to numbers, or scaling values between
0 and 1.

Q3) Write a note on Data exploration techniques.


Data Exploration, also known as Exploratory Data Analysis (EDA), is the process of
understanding the data before building models. It helps identify patterns, trends, and errors in
the data.

Simple Graphs

• These are basic visualizations used to understand the distribution and patterns in data.

• Examples: Bar charts, Histograms, Pie charts, Line graphs.

2. Combined Graphs

• These graphs combine multiple data variables in one visual.

• Useful to study relationships or comparisons.

• Example: Scatter plot with trend lines, or box plots with groupings.

3. Link and Brush

• An interactive technique used in data visualization.

• When you select data in one graph, the corresponding data is highlighted (brushed) in
another graph.

• Helps to explore relationships across multiple plots.

4. Nongraphical Techniques

• These are non-visual methods of exploring data.

• Includes summary statistics, like:

o Mean, Median, Mode

o Standard Deviation
o Skewness and Kurtosis

• Helps in understanding data behavior numerically.

Q4)List the Benefits and Uses of Data Science and Big Data.

1. Commercial Applications

• Customer Insights: Understands user behavior to improve services (e.g., Google


AdSense).

• Advertising: Delivers personalized ads in real-time (e.g., MaxPoint).

• Human Resources: Helps in candidate screening and employee mood analysis.

• Finance: Predicts markets, evaluates risks, and automates trades.

2. Governmental Applications

• Fraud Detection: Detects fraud and criminal activities.

• Public Data Sharing: Platforms like Data.gov offer open access to data.

• Surveillance: Monitors individuals and gathers intelligence (e.g., NSA).

3. Non-profit Applications

• Fundraising: Boosts campaigns using data (e.g., WWF).

• Social Impact: Aids NGOs in using data for social good (e.g., DataKind).

4. Academic Applications

• Research: Supports research and improves student experience.

• Online Learning: MOOCs use data to enhance e-learning (e.g., Coursera).

Q5) List and explain the Facets/Types of Data in Data Science.


1. Structured Data

• Follows a defined data model and fits into rows and columns (e.g., Excel, SQL
databases).

• Easy to store, manage, and query using SQL.

• Some structured forms (like hierarchies) can be tricky to handle in traditional databases.

2. Unstructured Data

• Does not follow a fixed format or model.

• Difficult to analyze due to varying content and context (e.g., emails, social media posts).

3. Natural Language

• A type of unstructured data written in human language.

• Hard to process due to ambiguity and context-specific meanings.

• Techniques used: sentiment analysis, entity recognition, summarization, etc.

4. Machine-Generated Data

• Created automatically by machines without human input.

• Examples: server logs, IoT data, call records, network logs.

• Grows rapidly with IoT development.

5. Graph-Based Data

• Represents relationships between entities (nodes and edges).

• Useful in social networks, recommendation systems, and fraud detection.

6. Audio, Video, and Images

• Rich media formats that require advanced tools to store and analyze.

• Used in speech recognition, facial recognition, video analytics, etc.

7. Streaming Data

• Real-time, continuous flow of data.

• Comes from sensors, live feeds, and online activities.


• Requires tools for real-time processing (e.g., Apache Kafka, Spark Streaming).

Q6) Explain different types of data in data science.

Same as previous

Q7)Differentiate between list and tuples in python.

Bhavika patil

You might also like