0% found this document useful (0 votes)
19 views11 pages

Data Science and Analytics Theory Complete

The document provides comprehensive notes on Data Science and Analytics, covering topics such as data types, analytics classifications, and applications in business. It also discusses data preparation, visualization techniques, and the use of R for statistical analysis and modeling. Additionally, it addresses challenges in data analytics and the significance of predictive and textual analytics.

Uploaded by

zeenu9547
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Data Science and Analytics Theory Complete

The document provides comprehensive notes on Data Science and Analytics, covering topics such as data types, analytics classifications, and applications in business. It also discusses data preparation, visualization techniques, and the use of R for statistical analysis and modeling. Additionally, it addresses challenges in data analytics and the significance of predictive and textual analytics.

Uploaded by

zeenu9547
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Science and Analytics - Theory Notes

Unit 1: Introduction to Data, Data Science and Analytics

1. Data and Data Science:

- Data refers to raw facts and figures that are collected and processed for analysis.

- Data Science is an interdisciplinary field that uses scientific methods, algorithms, and systems to extract knowledge

and insights from structured and unstructured data.

2. Data Analytics and Data Analysis:

- Data Analytics is the broader process of examining data sets to draw conclusions and support decision-making.

- Data Analysis is a component of data analytics and refers specifically to the process of inspecting, cleaning,

transforming, and modeling data.

3. Classification of Analytics:

- Descriptive Analytics: Summarizes past data to understand what happened.

- Diagnostic Analytics: Investigates why something happened.

- Predictive Analytics: Forecasts future outcomes using historical data.

- Prescriptive Analytics: Recommends actions based on data analysis.

4. Application of Analytics in Business:

- Enhances decision-making

- Improves operational efficiency

- Supports customer behavior analysis

- Assists in market trend identification

- Optimizes resource allocation

5. Types of Data:

- Nominal Data: Categorical data without any order (e.g., gender, colors).

- Ordinal Data: Categorical data with a meaningful order (e.g., rankings).

- Scale Data: Quantitative data, either interval or ratio (e.g., income, temperature).
Data Science and Analytics - Theory Notes

6. Big Data and its Characteristics:

- Big Data refers to extremely large datasets that traditional data processing software cannot handle efficiently.

- Characteristics (5 Vs): Volume, Velocity, Variety, Veracity, Value

7. Applications of Big Data:

- Customer insights and behavior prediction

- Fraud detection in finance

- Personalized recommendations in e-commerce

- Predictive maintenance in manufacturing

- Trend analysis in social media and marketing

8. Challenges in Data Analytics:

- Data privacy and security

- Integration of data from multiple sources

- Managing data quality and consistency

- Shortage of skilled professionals

- High cost of data tools and infrastructure

Unit 2: Data Preparation, Summarisation and Visualisation Using Spreadsheet

1. Data Preparation and Cleaning:

- Identifying and correcting errors or inconsistencies to improve data quality before analysis.

2. Sort and Filter:

- Sorting arranges data in a specific order; filtering displays only the data that meets certain criteria.

3. Conditional Formatting:

- Applies specific formatting to cells that meet certain conditions to visually highlight important information.

4. Text to Column:
Data Science and Analytics - Theory Notes

- Splits the content of one cell into multiple cells based on a delimiter (e.g., comma, space).

5. Removing Duplicates:

- Identifies and deletes repeated entries in datasets to maintain data integrity.

6. Data Validation:

- Restricts the type of data that can be entered into a cell, ensuring accuracy and consistency.

7. Identifying Outliers:

- Detects data points that differ significantly from other observations; important for accurate analysis.

8. Covariance and Correlation Matrix:

- Covariance: Measures how two variables change together.

- Correlation Matrix: Shows the strength and direction of linear relationships between variables.

9. Moving Averages:

- A technique used to smooth out short-term fluctuations and highlight trends in data over time.

10. Finding Missing Values:

- Identifying and handling gaps in data, using methods like imputation or deletion.

11. Summarisation:

- Summarizing data using statistical measures such as mean, median, mode, totals, etc.

12. Visualisation Tools:

- Scatter Plots: Show relationships between two variables.

- Line Charts: Display data trends over time.

- Histograms: Show the frequency distribution of a dataset.

- Pivot Tables: Summarize large datasets by grouping and aggregating data.

- Pivot Charts: Visual representations of pivot tables.

- Interactive Dashboards: Combine visualizations to provide an overview for decision-making.


Data Science and Analytics - Theory Notes

Unit 1: Introduction to Data, Data Science and Analytics

1. Data and Data Science:

- Data refers to raw facts and figures that are collected and processed for analysis.

- Data Science is an interdisciplinary field that uses scientific methods, algorithms, and systems to extract knowledge

and insights from structured and unstructured data.

2. Data Analytics and Data Analysis:

- Data Analytics is the broader process of examining data sets to draw conclusions and support decision-making.

- Data Analysis is a component of data analytics and refers specifically to the process of inspecting, cleaning,

transforming, and modeling data.

3. Classification of Analytics:

- Descriptive Analytics: Summarizes past data to understand what happened.

- Diagnostic Analytics: Investigates why something happened.

- Predictive Analytics: Forecasts future outcomes using historical data.

- Prescriptive Analytics: Recommends actions based on data analysis.

4. Application of Analytics in Business:

- Enhances decision-making

- Improves operational efficiency

- Supports customer behavior analysis

- Assists in market trend identification

- Optimizes resource allocation

5. Types of Data:

- Nominal Data: Categorical data without any order (e.g., gender, colors).

- Ordinal Data: Categorical data with a meaningful order (e.g., rankings).

- Scale Data: Quantitative data, either interval or ratio (e.g., income, temperature).
Data Science and Analytics - Theory Notes

6. Big Data and its Characteristics:

- Big Data refers to extremely large datasets that traditional data processing software cannot handle efficiently.

- Characteristics (5 Vs): Volume, Velocity, Variety, Veracity, Value

7. Applications of Big Data:

- Customer insights and behavior prediction

- Fraud detection in finance

- Personalized recommendations in e-commerce

- Predictive maintenance in manufacturing

- Trend analysis in social media and marketing

8. Challenges in Data Analytics:

- Data privacy and security

- Integration of data from multiple sources

- Managing data quality and consistency

- Shortage of skilled professionals

- High cost of data tools and infrastructure

Unit 2: Data Preparation, Summarisation and Visualisation Using Spreadsheet

1. Data Preparation and Cleaning:

- Identifying and correcting errors or inconsistencies to improve data quality before analysis.

2. Sort and Filter:

- Sorting arranges data in a specific order; filtering displays only the data that meets certain criteria.

3. Conditional Formatting:

- Applies specific formatting to cells that meet certain conditions to visually highlight important information.

4. Text to Column:
Data Science and Analytics - Theory Notes

- Splits the content of one cell into multiple cells based on a delimiter (e.g., comma, space).

5. Removing Duplicates:

- Identifies and deletes repeated entries in datasets to maintain data integrity.

6. Data Validation:

- Restricts the type of data that can be entered into a cell, ensuring accuracy and consistency.

7. Identifying Outliers:

- Detects data points that differ significantly from other observations; important for accurate analysis.

8. Covariance and Correlation Matrix:

- Covariance: Measures how two variables change together.

- Correlation Matrix: Shows the strength and direction of linear relationships between variables.

9. Moving Averages:

- A technique used to smooth out short-term fluctuations and highlight trends in data over time.

10. Finding Missing Values:

- Identifying and handling gaps in data, using methods like imputation or deletion.

11. Summarisation:

- Summarizing data using statistical measures such as mean, median, mode, totals, etc.

12. Visualisation Tools:

- Scatter Plots: Show relationships between two variables.

- Line Charts: Display data trends over time.

- Histograms: Show the frequency distribution of a dataset.

- Pivot Tables: Summarize large datasets by grouping and aggregating data.

- Pivot Charts: Visual representations of pivot tables.

- Interactive Dashboards: Combine visualizations to provide an overview for decision-making.


Data Science and Analytics - Theory Notes

Unit 3: Getting Started with R

1. Introduction to R:

- R is a programming language and environment specifically designed for statistical computing and graphics.

2. Advantages of R:

- Open-source and free to use

- Extensive libraries for data analysis and visualization

- Strong community support

- Excellent for statistical modeling

3. Installation of R Packages:

- Packages can be installed using install.packages("package_name")

- Required packages must be loaded using library("package_name")

4. Importing Data from Spreadsheet Files:

- Data can be imported using read.csv(), read.table(), or functions from packages like readxl.

5. Commands and Syntax:

- R is case-sensitive and uses functions for most operations.

- Syntax is generally function-based, e.g., mean(data), summary(data)

6. Packages and Libraries:

- R has thousands of packages available via CRAN and other repositories for various types of analysis.

7. Data Structures in R:

- Vectors: One-dimensional data structure

- Matrices: Two-dimensional data with elements of the same type

- Arrays: Multi-dimensional generalization of matrices

- Lists: Collection of different types of elements


Data Science and Analytics - Theory Notes

- Factors: Categorical variables

- Data Frames: Tabular data with different data types

8. Conditionals and Control Flows:

- if, else if, else statements for conditional execution

9. Loops:

- for, while, and repeat loops for repetitive tasks

10. Functions and Apply Family:

- User-defined and built-in functions for modular programming

- Apply family (apply, lapply, sapply, etc.) used for efficient looping

Unit 4: Descriptive Statistics Using R

1. Importing Data File:

- Use functions like read.csv(), read_excel() to load data for analysis

2. Data Visualisation Using Charts:

- Histograms: For frequency distribution

- Bar Charts: For categorical comparisons

- Box Plots: For distribution and outlier detection

- Line Graphs: For trends over time

- Scatter Plots: For relationships between variables

3. Data Description:

- Measure of Central Tendency: Mean, Median, Mode

- Measure of Dispersion: Range, Variance, Standard Deviation

4. Relationship Between Variables:


Data Science and Analytics - Theory Notes

- Covariance: Measures how two variables change together

- Correlation: Measures strength and direction of linear relationship

- Coefficient of Determination (R²): Indicates the proportion of variance explained

Unit 5: Predictive and Textual Analytics

1. Simple Linear Regression Models:

- Analyzes the relationship between two continuous variables

2. Confidence and Prediction Intervals:

- Confidence interval gives a range for population parameter

- Prediction interval estimates range for new observations

3. Multiple Linear Regression:

- Models the relationship between one dependent and multiple independent variables

4. Interpretation of Regression Coefficients:

- Shows the effect of each independent variable on the dependent variable

5. Heteroscedasticity:

- Occurs when the variance of errors is not constant

6. Multi-collinearity:

- Happens when independent variables are highly correlated

7. Basics of Textual Data Analysis:

- Analyzing unstructured text data for insights

- Includes understanding context, frequency, and sentiment

8. Significance, Application, and Challenges:

You might also like