0% found this document useful (0 votes)
22 views6 pages

Notes-Introduction To Data Analytics

Uploaded by

Vishnutha Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

Notes-Introduction To Data Analytics

Uploaded by

Vishnutha Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction to Data Analytics

What is Data:
 Data refers to raw facts, observations, measurements, or records.
 It is collected or generated through various processes, such as transactions,
interactions, observations, or experiments.
 In its raw form, data may lack context or meaning, but it serves as the foundation
for information and knowledge when processed and analyzed.

Categories of Data:
 Structured Data:
1. Data that is organized into a predefined format, such as tables in a relational
database, spreadsheets, or CSV files.
2. This is typically easy to search, query, and analyze because it follows a
consistent schema.
 Unstructured Data:
1. Data that does not have a predefined structure or format, such as text
documents, images, videos, social media posts, and sensor data.
2. Unstructured data may require more advanced techniques, such as natural
language processing or computer vision, to extract meaningful insights.
Each piece of data can further be classified as below:

1
Data Growth:
 The volume of data created every day globally is estimated to be around 300-400
Million Tera Bytes per day.
 This is increasing exponentially day by day.
Several primary contributors drive this high volume of data creation:
1. Internet Usage: The proliferation of internet-connected devices, including
smartphones, tablets, computers, IoT devices, and more, leads to a constant stream
of data generation through online activities such as social media interactions, web
browsing, emails, and online transactions.
2. Social Media Platforms: Social media platforms like Facebook, Twitter, Instagram,
and TikTok generate vast amounts of data through user-generated content,
interactions, shares, likes, comments, and more.
3. IoT Devices: The Internet of Things (IoT) devices, such as smart home devices,
wearable technology, sensors, and industrial machinery, continuously generate data
through various sensors and monitoring systems.
4. Business Transactions: Enterprises generate enormous volumes of data through
their business operations, including sales transactions, customer interactions,
financial transactions, supply chain activities, and more.
5. Streaming Services: The popularity of streaming services for music, movies, TV
shows, and gaming contributes significantly to data creation as users consume and
interact with digital content.
6. Research and Scientific Data: Scientific research, experiments, simulations, and
observations generate vast amounts of data, especially in fields such as genomics,
astronomy, climate science, and particle physics.
7. Government and Public Sector: Governments and public sector organizations
collect and generate data through various sources, including census surveys,
administrative records, satellite imagery, and public service delivery.
8. Digital Communication: Email communications, instant messaging, video
conferencing, and other digital communication channels contribute to the continuous
generation of data.

What is Data Analytics:


 Process of analyzing raw data to extract valuable insights and inform decision-
making.
 It involves various techniques and methodologies to uncover patterns, trends,
correlations, and other meaningful information from data sets.

2
How Data Analytics is Used:
 Data analytics is widely used across industries for various purposes, including
improving business operations, optimizing marketing strategies, understanding
customer behaviour, predicting trends, and driving innovation.
 It plays a crucial role in enabling data-driven decision-making and gaining a
competitive advantage in today's data-driven world.
Example:
Coca-Cola, as one of the world's largest beverage companies, utilizes data analytics
across various aspects of its business operations. Here's how Coca-Cola leverages data
analytics:
1. Demand Forecasting: Coca-Cola analyzes historical sales data, market trends, and
consumer behaviour to forecast demand for its products. This helps in optimizing
production schedules, managing inventory levels, and ensuring product availability in
the market.
2. Supply Chain Optimization: Data analytics is used to streamline Coca-Cola's
supply chain operations, including procurement, manufacturing, distribution, and
logistics. By analyzing data related to supplier performance, transportation routes,
warehouse management, and inventory levels, Coca-Cola can improve efficiency,
reduce costs, and minimize disruptions.
3. Marketing and Consumer Insights: Coca-Cola collects and analyzes data from
various sources, including social media, customer feedback, and sales transactions,
to gain insights into consumer preferences, behaviour, and sentiment. This
information informs marketing campaigns, product development strategies, and
brand positioning efforts, allowing Coca-Cola to tailor its offerings to meet evolving
consumer needs.
4. Personalized Marketing: Coca-Cola employs data analytics to personalize
marketing messages and offers for different customer segments. By segmenting
consumers based on demographics, purchase history, and engagement patterns,
Coca-Cola can deliver targeted advertisements, promotions, and product
recommendations, increasing the effectiveness of its marketing efforts.
5. Product Innovation: Data analytics plays a crucial role in Coca-Cola's product
innovation process. By analyzing market data, consumer trends, and competitive
intelligence, Coca-Cola identifies opportunities for new product development, flavor
variations, and packaging innovations that resonate with target audiences and drive
sales growth.

3
6. Retail Execution: Coca-Cola uses data analytics to monitor and optimize its retail
execution strategies, ensuring that its products are displayed prominently, priced
competitively, and available in stores where demand is high. By analyzing retail sales
data, shelf space allocations, and promotional effectiveness, Coca-Cola can
maximize its presence and sales in retail outlets.
Overall, data analytics enables Coca-Cola to make data-driven decisions, improve
operational efficiency, enhance customer engagement, and drive innovation, ultimately
contributing to its continued success in the global beverage market.

Typical roles in Data Analytics:

Data Analyst
Financial Analyst
Logistics Analyst
Business Analyst
Statistician
Product Analyst
Operations Analyst
Marketing Analyst
Risk Analyst
Data Analytics Consultant etc.

Key Components of Data Analytics:


1. Data Collection: The first step in data analytics involves gathering data from various
sources, including databases, spreadsheets, sensors, social media, weblogs, and
IoT devices. Data can be structured (e.g., databases, spreadsheets) or unstructured
(e.g., text, images, videos), and it may come from internal or external sources.
2. Data Preprocessing: Once collected, raw data often needs to be preprocessed to
ensure its quality and usability for analysis. This involves tasks such as cleaning the
data to remove errors and inconsistencies, handling missing values, standardizing
formats, and transforming the data into a suitable structure for analysis.
3. Data Analysis: The core component of data analytics involves analyzing the data to
uncover patterns, trends, correlations, and insights. This can be done using various
techniques and methodologies, including statistical analysis, machine learning
algorithms, data mining, text analytics, and predictive modeling.
4. Data Visualization: Data visualization is the process of representing data visually
using charts, graphs, maps, and other graphical elements. Visualization helps in

4
exploring and understanding the data, communicating insights effectively, and
making data-driven decisions. Common tools for data visualization include Tableau,
Power BI, matplotlib, and ggplot2.

Tools for Data Analysis:


1. Excel: Excel is a widely used tool for data analysis due to its accessibility,
versatility, and familiarity to many users. While Excel is a powerful tool for data
analysis, it may have limitations when dealing with very large datasets or
complex analytical tasks.
2. Database Management Systems (DBMS):
 MySQL: MySQL is an open-source relational database management
system (RDBMS) commonly used for storing and managing structured
data.
 Oracle SQL: This is provided by Oracle
 PostgreSQL: PostgreSQL is another open-source RDBMS known for its
advanced features, extensibility, and support for SQL queries.
 MongoDB: MongoDB is a NoSQL database that stores data in flexible,
JSON-like documents, making it suitable for handling unstructured or
semi-structured data.
3. Programming Languages:
 Python: Python is widely used for data analysis and machine learning
tasks. Libraries like Pandas, NumPy, and SciPy provide powerful tools
for data manipulation, analysis, and statistical modeling.
 R: R is another popular programming language for statistical computing
and graphics. It offers a wide range of packages for data visualization,
statistical analysis, and predictive modeling.
4. Data Visualization Tools:
 Tableau: Tableau is a powerful data visualization tool that allows users
to create interactive dashboards and visualizations from various data
sources.
 Power BI: Power BI is a business analytics tool by Microsoft that enables
users to visualize and share insights from their data through interactive
reports and dashboards.
5. Big Data Tools:
 Hadoop: Apache Hadoop is a framework for distributed storage and
processing of large datasets across clusters of commodity hardware. It

5
includes components like Hadoop Distributed File System (HDFS) and
MapReduce.
 Spark: Apache Spark is a fast and general-purpose cluster computing
framework that provides in-memory data processing capabilities for
large-scale data analytics and machine learning tasks.
 Hive: Apache Hive is a data warehouse infrastructure built on top of
Hadoop that provides SQL-like query language (HiveQL) for querying
and analyzing large datasets stored in Hadoop.
6. Machine Learning Platforms:
 TensorFlow: TensorFlow is an open-source machine learning platform
developed by Google for building and deploying machine learning
models.
 scikit-learn: scikit-learn is a Python library for machine learning that
provides simple and efficient tools for data mining and data analysis.
 Keras: Keras is a high-level neural networks API written in Python that
allows for easy and fast prototyping of deep learning models.

Types of Analysis:
1. Descriptive Analysis: Descriptive analysis involves summarizing and describing the
main features of a dataset, such as its central tendency, dispersion, and distribution.
Descriptive statistics, charts, and graphs are commonly used to present key
characteristics of the data.
2. Diagnostic Analysis: Diagnostic analysis focuses on identifying the causes of observed
patterns or outcomes in the data. It involves analyzing relationships between variables,
identifying correlations, and conducting root cause analysis to understand why certain
events occur.
3. Predictive Analysis: Predictive analysis involves using historical data to make
predictions about future events or outcomes. It includes techniques such as regression
analysis, time series forecasting, and machine learning algorithms to build predictive
models and estimate the likelihood of future events.
4. Prescriptive Analysis: Prescriptive analysis involves recommending actions or
decisions based on the insights gained from data analysis. It aims to optimize outcomes
by identifying the best course of action given the available data and constraints.
Optimization techniques, decision trees, and simulation models are commonly used for
prescriptive analysis.

You might also like