Data Analysis
The importance of using data analysis
Phases
⁃ Identifying the problem and gathering data
⁃ Cleaning and processing data
⁃ Analyzing data
⁃ Drawing insights and making recommendations
⁃ Implementing changes and measuring impact
Habilities
Communication: in order to transmit complex information
clearly and concisely, with storytelling, for example
Diplomacy: the art of navigating delicate situations and
maintaining positive relationships, even when disagreements
arise.
Understand end-users
Technical knowledge
Steps in data-driven decision-making
Prepare
Data preparation is the crucial first step in the data analysis
process. In this stage, data analysts gather, clean, and pre-
process the raw data to make it suitable for analysis. This often
involves removing any inaccuracies, inconsistencies, or
duplicate records, as well as filling in missing values.
Model
In the modeling stage, data analysts create a data model that
represents the structure, relationships, and constraints of the
data. This involves designing a schema, which is a blueprint of
how the data is organized and stored.
Analyze
This step is the core of the data analysis process, where data
analysts dig deep into the data to uncover insights and answer
specific questions. Analysis can take many forms, including:
• Descriptive analysis: Describe what the data looks like in its
basic form.
• Exploratory analysis: Dig deeper to try and find interesting
patterns or relationships between different parts of the data.
• Inferential analysis: Use available data to make guesses or
predictions about things outside the data.
• Predictive analysis: Use statistics to predict what might happen
in the future based on what's happened in the past.
Visualize
By creating charts, graphs, and other visual representations of
data, analysts can more easily spot trends, outliers, and
relationships between variables. This helps them gain a deeper
understanding of the data and communicate their findings to
stakeholders in a way that's easy to understand.
Manage
Data management is a critical aspect of the data analysis
process that ensures the integrity, consistency, and security of
the data being used. This involves implementing best practices
for data storage, backup, and access control, as well as
maintaining data documentation and metadata.
Stages
Data processing: prepare raw data for analysis
Data analysis: transform data into insights
Stakeholder experience
Specific needs, preferences and expectations of stakeholders
engaging with the visualizations and insights provided in a data
analysis report. it impacts how relevant, useful, and
understandable the visualizations and analysis are.
Identify and analysis data
- A good place to start analysis is to streamline the business
requirement from complex to simple, and then establish
relationships between any multiple topics.
- The first is to determine the date to be measured, this include:
• Internal company data
• Data from social media
• Sensor generated data
- A critical source of this information can come from their
Enterprise Resource Planning or ERP system. ERP systems
are designed to collect, store, manage, and interpret structured
data from various business activities
• Structured data: organized into a formatted repository, typically
a database, so it's easily searchable
• Semi-structured data: it contains tags or other markers to
separate data elements and enforce hierarchies of records and
fields within the data
• Un-structured data:
ETL process
Extract: involves retrieving and extracting raw data from
different sources, such as databases, files, or other data
storage systems
Transform: involves cleaning, structuring, and enriching the
data to make it more suitable for analysis
Loading the transformed data into the final storage system
Data sources
• SQL Server databases
• Cloud-based data sources
• Microsoft Excel spreadsheets
• On-premises data sources
• Web-based data sources
• NoSQL databases
Flat file: file type that contains a single data table, with
a uniform structure for every row of data, and does not
have hierarchies.
Types of data
• Structured: quantitative, searchable, sortable, analyzed
Unstructured: does not have a predefined structure or
format. It is best used for qualitative analysis and
usually resides in non-relational databases or
unprocessed file formats. EX: text files, audio, video
Data serialization
Process for converting semi-structured data into a
specific format that can be easily transmitted, stored,
or processed. It is called data serialization. It uses a
method of formatting that will allow the data to
be transmitted or stored in a way that is easily
understood by both the sender and the receiver
without the need to know all the specific details of the
data.
One of the formats that to allow the storage
of unstructured or semi-structured data is a blob. This
is a binary large object where the data is stored in a
binary ones and zeros format.
Data transformation
Data from different sources can be untidy, incomplete,
and inconsistent, making it difficult to draw meaningful
insights. That's why data transformation is a crucial
step
Data combination
Consolidating information means getting information
from various sources or tables together into a single
table and provide a unified view of the data.
Instead of working with multiple separate
tables, having a single consolidated table
reduces complexity and makes it easier to handle data
updates, refreshes, and maintenance tasks.
Append: adding rows of one table or query to another
table or query. By adding multiple lists one below the
other, you will see an increase in the number of rows.
Merge: consolidate data from multiple tables into a
single entity by leveraging a shared column between
the tables
Join: when you merge or combine data from different
places to create a bigger and a more complete dataset.