Detailed Point Summary of Data Analytics
Data Science vs. Data Analytics
Data Science involves the entire process of collecting, cleaning, structuring, analyzing,
and extracting meaning from datasets.
Data Analytics focuses on analyzing data to answer specific questions, identify trends,
and gain insights. Data science is a broader field that encompasses data analytics.
What is Data Analytics?
It uses data to answer questions, identify trends, and extract insights.
There are four main types, each addressing different questions:
o Descriptive Analytics: Analyzes "what happened" using historical data.
o Predictive Analytics: Uses past data to predict "what might happen" in the
future.
o Prescriptive Analytics: Builds on descriptive and predictive analysis to
recommend actions for optimal outcomes.
o Diagnostic Analytics: Delves deeper to understand "why" something happened.
Value Proposition of Each Type of Data Analytics
Descriptive Analytics
Summarizes historical data (sales, inventory, etc.)
Generates reports on past events
Identifies key characteristics of a dataset
Doesn't draw inferences or make predictions
Descriptive Analytics Process
1. Ask a historical question (e.g., product sales last year)
2. Identify required data
3. Collect and prepare data
4. Analyze data
5. Present results
Examples of Descriptive Analytics Use Cases
Traffic and engagement reports (social media likes, etc.)
Financial statement analysis (ratios, trends)
Summarizing key performance indicators (KPIs)
Predictive Analytics
Uses real-time and/or historical data to make probability-based predictions.
Can infer missing data or predict future trends.
Helps with goal setting, planning, and risk management.
Predictive Analytics Process
1. Ask a forward-thinking question (e.g., product sales next year)
2. Collect and prepare data
3. Develop predictive models
4. Apply models to prepared data
5. Review models and present results
Examples of Predictive Analytics
Forecasting customer behavior, purchasing patterns, and sales trends
Predicting customer preferences and product recommendations
Identifying potential security breaches
Prescriptive Analytics
Recommends courses of action based on descriptive and predictive analysis.
Helps decision-makers choose the best options based on data.
Example of Prescriptive Analytics
GPS navigation apps suggesting routes based on traffic and user-defined goals (shortest
distance, quickest time).
Diagnostic Analytics
Digs deeper into descriptive analytics to discover causes of issues.
Diagnostic Analytics Process
1. Identify anomalies (inconsistencies) in data sets
2. Collect data related to the anomalies
3. Use statistical techniques to uncover relationships and trends that explain anomalies
4. Present possible causes
Example of Diagnostic Analytics
Analyzing subscription cancellations alongside customer comments to determine reasons
for leaving.
Tools for Data Analysis
Excel: Suitable for small datasets and quick analysis.
SQL: Powerful database management tool for retrieving and interacting with data stored
in relational databases.
Tableau: Popular for data visualization, helping present data in an understandable
format.
Power BI: Business analytics service offering data aggregation, analysis, visualization,
and sharing.
Scikit-learn: Open-source machine learning library for Python. Offers simple and
efficient tools for data mining and analysis.
TensorFlow: Open-source machine learning framework for building and deploying
machine learning models, particularly deep learning models.
PyTorch: Open-source machine learning library for deep learning applications, offering
flexibility and speed for building and experimenting with neural network models.
Pandas: Open-source library for data analysis and manipulation in Python.
NumPy: Python library providing a multidimensional array object for numerical
computations.
Matplotlib: Python library for creating static, animated, and interactive visualizations.
Seaborn: Library for statistical graphics in Python, built on top of matplotlib.
Beautiful Soup: Python library for pulling data out of HTML and XML files.
Jupyter Notebook: Open-source web application for creating and sharing documents
with live code, equations, visualizations, and text.
Anaconda Distribution Environment: Popular distribution for data science and
machine learning projects, including Conda (package manager) and pre-installed
packages.
Google Colab: Cloud-based platform allowing users to write and execute Python code in
a Jupyter Notebook environment, with free GPU and TPU access.