0% found this document useful (0 votes)
52 views31 pages

Marketing Analysis Project - PowerBI

Detailed description of implementing a Marketing Analysis strategy using PowerBI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views31 pages

Marketing Analysis Project - PowerBI

Detailed description of implementing a Marketing Analysis strategy using PowerBI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to Business Problem

An online retail business, is facing reduced customer engagement and


conversion rates despite launching several new online marketing
campaigns. They are reaching out seeking help to conduct a detailed
analysis and identify areas for improvement in their marketing strategies.

Key Points
Reduced Customer Engagement The number of customer interactions
and engagement with the site and marketing content has declined.
Decreased Conversion Rates Fewer site visitors are converting into
paying customers.
High Marketing Expenses Significant investments in marketing
campaigns are not yielding expected returns.
Need for Customer Feedback Analysis Understanding customer
opinions about products and services is crucial for improving engagement
and conversions.

Requests for Data Analysis to Improve


Marketing Strategy
Help in identifying areas for improvement foe Marketing
Campaigns

Despite their increased investment in marketing, they’ve observed a


decline in customer engagement and conversion rates. The marketing
expenses have gone up, but the return on investment isn’t meeting
expectations. They need a comprehensive analysis to understand the
effectiveness of the current strategies and to find opportunities to optimize
their efforts.
They have data from various sources, including customer reviews, social
media comments, and campaign performance metrics. Will try to uncover
insights that will be invaluable in helping them turn the situation around.
Help in identifying Customer Engagement and Satisfaction

Over the past few months, they’ve noticed a drop in customer


engagement and satisfaction, which is impacting overall conversion rates.
They’ve gathered a significant amount of customer reviews and social
media comments that highlight various issues and sentiments. By
thoroughly analyzing this feedback, they believe, a better understanding of
customers’ needs and pain points will be possible.
We’ll try to decode this feedback and provide actionable insights in order
to guide them in improving customer experience and ultimately boost
engagement and conversion rates.
Calculate and present some important KPI’s

Goals

Increase Conversion Rates:


• Goal: Identify factors impacting the conversion rate and provide
recommendations to improve it.
• Insight: Highlight key stages where visitors drop off and suggest
improvements to optimize the conversion funnel.
Enhance Customer Engagement:
• Goal: Determine which types of content drive the highest
engagement.
• Insight: Analyze interaction levels with different types of
marketing content to inform better content strategies.
Improve Customer Feedback Scores:
• Goal: Understand common themes in customer reviews and
provide actionable insights.
• Insight: Identify recurring positive and negative feedback to
guide product and service improvements.

Tools
SQL
Python
 pandas
 pyodbc
 nltk
 SentimentIntensityAnalyzer
 vader_lexicon
PowerBI Desktop
PowerBI Service
Extract and Clean data with SQL
Extracting and cleaning data with SQL. This step in involves walking through
various SQL transformations applied to each dataset table, preparing our
data for visualization in Power BI.
The following table shows us the tables we’ll be working with, their fields
and DataTypes.

TableName ColumnName DataType


customer_journeyAction nvarchar
customer_journeyCustomerID tinyint
customer_journeyDuration float
customer_journeyJourneyID smallint
customer_journeyProductID tinyint
customer_journeyStage nvarchar
customer_journeyVisitDate date
customer_reviewsCustomerID tinyint
customer_reviewsProductID tinyint
customer_reviewsRating tinyint
customer_reviewsReviewDate date
customer_reviewsReviewID smallint
customer_reviewsReviewText nvarchar
customers Age tinyint
customers CustomerID tinyint
customers CustomerName nvarchar
customers Email nvarchar
customers Gender nvarchar
customers GeographyID tinyint
engagement_data CampaignID tinyint
engagement_data ContentID tinyint
engagement_data ContentType nvarchar
engagement_data EngagementDate date
engagement_data EngagementID smallint
engagement_data Likes smallint
engagement_data ProductID tinyint
ViewsClicksCombine
engagement_data d nvarchar
geography City nvarchar
geography Country nvarchar
geography GeographyID tinyint
products Category nvarchar
products Price float
products ProductID tinyint
products ProductName nvarchar
Dimension table - Products

 We’ll be dropping the Product Category column has it has only one
Category – Sports.
 We’ll be labeling the products by their price using CASE. The values
used are the ones we see in the code (I want to see if there’s a metric
(like average +/- standard deviation or other metric to label products).
This new column may be of good use. So we just added this column.
 There are other transformations and columns we could add but we’ll
leave like this. It’s a matter of thinking about this and if we feel it
makes sense then we just create those.

-- SQL Query to categorize products based on their price

SELECT
ProductID, -- Selects the unique identifier for each product
ProductName, -- Selects the name of each product
Price, -- Selects the price of each product
-- Category, -- Selects the product category for each product

CASE -- Categorizes the products into price categories: Low, Medium,


or High
WHEN Price < 50 THEN 'Low' -- If the price is less than 50,
categorize as 'Low'
WHEN Price BETWEEN 50 AND 200 THEN 'Medium' -- If the price is
between 50 and 200 (inclusive), categorize as 'Medium'
ELSE 'High' -- If the price is greater than 200, categorize as
'High'
END AS PriceCategory -- Names the new column as PriceCategory

FROM
products; -- Specifies the source table from which to select the data
Dimension table – Customers

The Customers table have a geographyID that relates to the table


geography and so we can enrich the Customers table with this information.
PowerBI has great features to visualizer geographical data. So we’ll be
joining customers to geography to get the location fields (Country and City)
for the customers.
As we’ll be bringing the geographical information to the table we’ll drop
geographyID from customers. Also, we won’t be importing the geography
table to PowerBI later on.
-- SQL statement to join dim_customers with dim_geography to enrich
customer data with geographic information

SELECT
c.CustomerID, -- Selects the unique identifier for each customer
c.CustomerName, -- Selects the name of each customer
c.Email, -- Selects the email of each customer
c.Gender, -- Selects the gender of each customer
c.Age, -- Selects the age of each customer
g.Country, -- Selects the country from the geography table to enrich
customer data
g.City -- Selects the city from the geography table to enrich
customer data
FROM
customers as c -- Specifies the alias 'c' for the dim_customers table
LEFT JOIN
geography g -- Specifies the alias 'g' for the dim_geography table
ON
c.GeographyID = g.GeographyID; -- Joins the two tables on the
GeographyID field to match customers with their geographic information

Again, there were other transformations and creations we could think about
on creating that could make sense, like binning ages, but we won’t be doing
that. We can also obtain that directly inside PowerBI (Best practices
determine that, for performance gains, new columns, if it’s possible to
create them in the source system, then it should be done there).
Facts table – Customer_Reviews

Below, a view of 10 rows from the original table

Here we identified that there are double spacing between words and we’re
removing one of those spaces using REPLACE.
NOTE: This is the table we’ll be focusing our Sentiment Analysis using
Pyhton. For the moment we’ll simply remove a space from the double
spaced words.
This is because one of our objectives is to try and understand why customer
engagement and satisfaction have been decreasing, so we’ll be enriching
this table with data in order for it to provide us with more information that
can help explain those issues.
SELECT
ReviewID, -- Selects the unique identifier for each review
CustomerID, -- Selects the unique identifier for each customer
ProductID, -- Selects the unique identifier for each product
ReviewDate, -- Selects the date when the review was written
Rating, -- Selects the numerical rating given by the customer (e.g., 1 to 5 stars)
-- Cleans up the ReviewText by replacing double spaces with single spaces to ensure
the text is more readable and standardized
REPLACE(ReviewText, ' ', ' ') AS ReviewText
FROM
dbo.customer_reviews; -- Specifies the source table from which to select the data
Facts table – Engagement_Data

Below, a view of 5 rows from the original table

We’ve found some issues that need to be addressed:


 Social Media has two different string values “Social media” and
“Socialmedia” and so, we’ll create a standardized values for it
 We’ll be uppercasing the contenttype strings
 We’ll split the field viewsclikscombined in two new columns views and
clicks
 We’ll filter the newsletter content type as it is not relevant for our
analysis
 We’ll reorganize the order of the columns of the table
SELECT
EngagementID, -- Selects the unique identifier for each engagement record
ContentID, -- Selects the unique identifier for each piece of content
CampaignID, -- Selects the unique identifier for each marketing campaign
ProductID, -- Selects the unique identifier for each product
UPPER(REPLACE(ContentType, 'Socialmedia', 'Social Media')) AS ContentType, --
Replaces "Socialmedia" with "Social Media" and then converts all ContentType values to
uppercase
LEFT(ViewsClicksCombined, CHARINDEX('-', ViewsClicksCombined) - 1) AS Views, --
Extracts the Views part from the ViewsClicksCombined column by taking the substring
before the '-' character
RIGHT(ViewsClicksCombined, LEN(ViewsClicksCombined) - CHARINDEX('-',
ViewsClicksCombined)) AS Clicks, -- Extracts the Clicks part from the
ViewsClicksCombined column by taking the substring after the '-' character
Likes, -- Selects the number of likes the content received
-- Converts the EngagementDate to the dd.mm.yyyy format
FORMAT(CONVERT(DATE, EngagementDate), 'dd.MM.yyyy') AS EngagementDate -- Converts
and formats the date as dd.mm.yyyy
FROM
dbo.engagement_data -- Specifies the source table from which to select the data
WHERE
ContentType != 'Newsletter'; -- Filters out rows where ContentType is 'Newsletter'
as these are not relevant for our analysis

Below, the result of our actions on the engagement table


Facts table – Customer_Journey

Below, a view of 10 rows from the original table (we’ll include examples of
some issues we found on the records)

This is an important table on our project as it gives us a lot of useful


information for us to perform our analysis. For example, did they view a
product? Did they click on it? Did they buy something?
But before that we noticed some important issues that need to be
addressed. For that we’ll use a somewhat more complex SQL querying.
We noticed that there are some duplicates on the table. We’ll need to
remove those. To identify them we can use the following query
SELECT
JourneyID, -- Select the unique identifier for each journey (and any other
columns you want to include in the final result set)
CustomerID, -- Select the unique identifier for each customer
ProductID, -- Select the unique identifier for each product
VisitDate, -- Select the date of the visit, which helps in determining the
timeline of customer interactions
Stage, -- Select the stage of the customer journey (e.g., Awareness,
Consideration, etc.)
Action, -- Select the action taken by the customer (e.g., View, Click,
Purchase)
Duration, -- Select the duration of the action or interaction
-- Use ROW_NUMBER() to assign a unique row number to each record within the
partition defined below
ROW_NUMBER() OVER (
-- PARTITION BY groups the rows based on the specified columns that
should be unique
PARTITION BY CustomerID, ProductID, VisitDate, Stage, Action
-- ORDER BY defines how to order the rows within each partition
(usually by a unique identifier like JourneyID)
ORDER BY JourneyID
) AS row_num -- This creates a new column 'row_num' that numbers each row
within its partition
FROM
customer_journey -- Specifies the source table from which to select the
data
ORDER BY JourneyID

This query retrieves all rows, and if there are duplicates, it assigns a
sequential number to the duplicate rows.
A different way to find duplicates is to use the following query.
SELECT
JourneyID,
CustomerID,
ProductID,
VisitDate,
Stage,
Action,
Duration,
COUNT(*) AS DuplicateCount
FROM customer_journey
GROUP BY JourneyID, CustomerID, ProductID, VisitDate, Stage, Action, Duration
ORDER BY JourneyID

Discovering the duplicates and addressing are two distinct actions. As we’ll
be importing the data to PowerBI later, we have to find those and then
assure that we only import non duplicate records. We’ll get there.
We also noticed that there are NULL values in the duration column. For this
exercise we’ll be replacing those values with a value.
There are other approaches to address NULLS in our data both different
techniques to address them or different metrics that, depending, on our
analysis, could make more sense. For this case we’ll proceed with replacing
them with an average duration, calculated from the values we have
available our table.
In this case we opted to calculate the average duration of the visits for the
visit day. This is the syntax used to calculate the mentioned average
duration
AVG(Duration) OVER (PARTITION BY VisitDate) AS avg_duration

Then, we’ll replace the NULL values with this value using the following
syntax
COALESCE(Duration, avg_duration) AS Duration

These were the issues, that really needed to be addressed, we found in this
table.
Putting all together we have the final SQL query that addresses them:
SELECT
JourneyID, -- Selects the unique identifier for each journey to ensure data
traceability
CustomerID, -- Selects the unique identifier for each customer to link
journeys to specific customers
ProductID, -- Selects the unique identifier for each product to analyze
customer interactions with different products
VisitDate, -- Selects the date of the visit to understand the timeline of
customer interactions
Stage, -- Uses the uppercased stage value from the subquery for consistency in
analysis
Action, -- Selects the action taken by the customer (e.g., View, Click,
Purchase)
COALESCE(Duration, avg_duration) AS Duration -- Replaces missing durations
with the average duration for the corresponding date
FROM
(
-- Subquery to process and clean the data
SELECT
JourneyID, -- Selects the unique identifier for each journey to ensure
data traceability
CustomerID, -- Selects the unique identifier for each customer to link
journeys to specific customers
ProductID, -- Selects the unique identifier for each product to
analyze customer interactions with different products
VisitDate, -- Selects the date of the visit to understand the timeline
of customer interactions
UPPER(Stage) AS Stage, -- Converts Stage values to uppercase for
consistency in data analysis
Action, -- Selects the action taken by the customer (e.g., View,
Click, Purchase)
Duration, -- Uses Duration directly, as it's already a numeric type
AVG(Duration) OVER (PARTITION BY VisitDate) AS avg_duration, --
Calculates the average duration for each date, using only numeric values
ROW_NUMBER() OVER (
PARTITION BY CustomerID, ProductID, VisitDate, UPPER(Stage), Action
-- Groups by these columns to identify duplicate records
ORDER BY JourneyID -- Orders by JourneyID to keep the first
occurrence of each duplicate
) AS row_num -- Assigns a row number to each row within the partition
to identify duplicates
FROM
customer_journey -- Specifies the source table from which to select
the data
) AS subquery -- Names the subquery for reference in the outer query
WHERE
row_num = 1; -- Keeps only the first occurrence of each duplicate group
identified in the subquery

Final table without Duplicates and NULLS 1


Data Enrichment with Sentiment Analysis using
Python

We’ll be enhancing our marketing data by incorporating sentiment analysis


taking some steps in order to start joining all our stages together and create
a cohesive project.
We’re going to use the review text and use a python library to perform
sentiment analysis on it. Basically analyzing what the text say and then
retrieve a positive or negative sentiment from it.
As we can see there’s already a Rating field that gives us a way of
“knowing” more or less the sentiment of the review. For this exercise we’ll
be focusing only on the review text and try to determine if what was written
was positive or negative. This will add more information than we can then
combine or use together with the score to enrich or deepen the analysis.
In order to work with python, you’ll have to take the steps to install it and
then use a program or online notebook to work with it. Personally I use
Visual Studio Code and sometimes an online notebook “Jupyter Notebook”.
Import python libraries

Python is a versatile and powerful programming language widely used for


tasks ranging from data analysis to web development and machine
learning. It operates by executing code written in a readable, high-level
syntax, allowing developers to leverage its extensive libraries and
frameworks for specialized tasks. For sentiment analysis, Python offers
libraries like nltk, which we used here to simplify the process of analyzing
text data and extracting insights, such as identifying whether the sentiment
is positive or negative. Its adaptability and simplicity make Python a key
tool for integrating data enrichment steps, such as the one described in this
project.

# pip install pandas nltk pyodbc

import pandas as pd
import pyodbc
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
#before importing these libraries to the project we need to install them. To do so we
need to import them with.

# Download the VADER lexicon for sentiment analysis if not already present.
nltk.download('vader_lexicon')
#This is another download of a component we'll need to use during the exercise. It's part
of nltk library so it's just a matter of bringing into play

For this project, we’ll utilize several Python libraries to perform sentiment
analysis and connect to our data.

 pandas is a powerful library for data manipulation and analysis,


allowing us to structure, clean, and process our data in an efficient and
readable way
 pyodbc provides the tools needed to establish a connection between
Python and our SQL database, enabling us to query and retrieve data
directly into Python.
For sentiment analysis, we’ll use,
 nltk (Natural Language Toolkit), a robust library designed for text
processing. Specifically, we’ll leverage its,
 VADER (Valence Aware Dictionary and sEntiment
Reasoner) tool from the SentimentIntenstyAnalyzer module,
which is well-suited for analyzing sentiment in text data. Before
using these components, you’ll need to ensure they are installed,
and for VADER, you’ll need to download its lexicon using,
 nltk.download('vader_lexicon').
Together, these tools allow us to seamlessly integrate data retrieval,
manipulation, and sentiment analysis into our workflow.
Create dataframe
Let’s start the enrichment of our data with Sentiment Anlysis
# We're creating a dataframe with a SQL statement
# we're difining atributes to connect to the database instance
# we're just saying give us these columns. No transformation is being done

# Define a function to fetch data from a SQL database using a SQL query
def fetch_data_from_sql():
# Define the connection string with parameters for the database connection
conn_str = (
"Driver={ODBC Driver 17 for SQL Server};" # Specify the driver for SQL Server
"Server=DESKTOP-PC06D3B;" # Specify your SQL Server instance
"Database=PortfolioProject_MarketingAnalytics;" # Specify the database name
"Trusted_Connection=yes;" # Use Windows Authentication for the connection
)
# Establish the connection to the database
conn = pyodbc.connect(conn_str)

# Define the SQL query to fetch customer reviews data


query = "SELECT ReviewID, CustomerID, ProductID, ReviewDate, Rating, ReviewText FROM
customer_reviews"

# Execute the query and fetch the data into a DataFrame


df = pd.read_sql(query, conn)

# Close the connection to free up resources


conn.close()

# Return the fetched data as a DataFrame


return df

# Fetch the customer reviews data from the SQL database


customer_reviews_df = fetch_data_from_sql()

We’ve created our dataframe customer_reviews_df in pandas. It may seem


a bit abstract the fact that you don’t visualize what you’re creating but you
can always get a peek at what the dataframe you created.
# Display the first few rows of the DataFrame
display(customer_reviews_df.head())

As we can see, our dataframe is populated with the data we queried from
the database.
Create new sentiment analysis columns
Next we’ll create 3 new columns for our dataframe (table if you wish).
These 3 new columns are the ones that will enrich our data in order to
perform other analysis. If you’re like me and quite new to python, then
looking at the final python code, it may confuses you.
So we’ll be decomposing it and joining the related parts so you can
understand the process.
In Python, a common workflow is to first define the functions that
encapsulate the logic or calculations needed for data transformations. After
defining the functions, we apply them to the dataframe columns to create
new fields.

SentimentScore

Let’s look at the example of SentimentScore, a new column we’ll be


creating. First we define the function calculate_sentiment:
def calculate_sentiment(review):
# Get the sentiment scores for the review text
sentiment = sia.polarity_scores(review)
# Return the compound score, which is a normalized score between -1 (most negative)
and 1 (most positive)
return sentiment['compound']

and only when creating the new column (field) SentimentScore we use the
function to populate the values of the new column we’re creating
customer_reviews_df['SentimentScore'] =
customer_reviews_df['ReviewText'].apply(calculate_sentiment)

So, at the same time we create the column and also assign the field
ReviewText, from our dataframe, as the parameter (review) that the
function calculate_sentiment will use to return the polarity score.
The same method and sequence of actions are applied for the other two
functions and columns that we’ll be creating. We’ll go through both next.
SentimentCategory

Again, first we define the function that will compute the values to populate
the new SentimentCategory column.
This function will use the sentimentscore calculated before and rating
coming from our customer_reviews table, to attribute a category to the
sentiment. It uses IF statements to create the categories.
def categorize_sentiment(score, rating):
# Use both the text sentiment score and the numerical rating to determine sentiment
category
if score > 0.05: # Positive sentiment score
if rating >= 4:
return 'Positive' # High rating and positive sentiment
elif rating == 3:
return 'Mixed Positive' # Neutral rating but positive sentiment
else:
return 'Mixed Negative' # Low rating but positive sentiment
elif score < -0.05: # Negative sentiment score
if rating <= 2:
return 'Negative' # Low rating and negative sentiment
elif rating == 3:
return 'Mixed Negative' # Neutral rating but negative sentiment
else:
return 'Mixed Positive' # High rating but negative sentiment
else: # Neutral sentiment score
if rating >= 4:
return 'Positive' # High rating with neutral sentiment
elif rating <= 2:
return 'Negative' # Low rating with neutral sentiment
else:
return 'Neutral' # Neutral rating and neutral sentiment

Again, when creating the column in our dataframe we’ll also define how it
will be calculated by assigning the function to its calculation.
customer_reviews_df['SentimentCategory'] = customer_reviews_df.apply(

lambda row: categorize_sentiment(row['SentimentScore'], row['Rating']), axis=1)

Without entering too much on the code, the lambda function operates per
row and the axis=1 reafirmes that the function operates in rows not
columns.
SentimentBucket

This column is merely to further demonstrate how we can enrich our data
with fields that we think can make sense. It does not mean that it will be
user or super useful but the idea is that we can enrich our data with
information that maybe either required or we think can be useful.
This columns will divide the sentiment into 4 buckets, similar to the
previous column but another take on categorizing the sentiment. This
function only uses SentimentScore and IF statements to categorize the
sentiments.
First we define how the function will operate
def sentiment_bucket(score):
if score >= 0.5:
return '0.5 to 1.0' # Strongly positive sentiment
elif 0.0 <= score < 0.5:
return '0.0 to 0.49' # Mildly positive sentiment
elif -0.5 <= score < 0.0:
return '-0.49 to 0.0' # Mildly negative sentiment
else:
return '-1.0 to -0.5' # Strongly negative sentiment

and then, when creating the column we do what we’ve done to the other 2
columns
customer_reviews_df['SentimentBucket'] =
customer_reviews_df['SentimentScore'].apply(sentiment_bucket)

This ends our python part of our project. As we can see python is a powerful
tool to transform and enrich data both with methods native to python and
with methods that produce the same result as if they were done using
other, more conventional tools.
Here is our dataframe with the data we created, ready to export as a .csv
file and ready to be then imported to PowerBI
Full python code

# Define a function to calculate sentiment scores using VADER


# Here we're defining a function to grade the sentiment of the reviews
def calculate_sentiment(review):
# Get the sentiment scores for the review text
sentiment = sia.polarity_scores(review)
# Return the compound score, which is a normalized score between -1 (most negative)
and 1 (most positive)
return sentiment['compound']

# Define a function to categorize sentiment using both sentiment score and review rating
# Here we're defining a function that will populate two fields (score and rating). Score
is a field created by VADER. Rating comes from our customer_reviews table.
def categorize_sentiment(score, rating):
# Use both the text sentiment score and the numerical rating to determine sentiment
category
if score > 0.05: # Positive sentiment score
if rating >= 4:
return 'Positive' # High rating and positive sentiment
elif rating == 3:
return 'Mixed Positive' # Neutral rating but positive sentiment
else:
return 'Mixed Negative' # Low rating but positive sentiment
elif score < -0.05: # Negative sentiment score
if rating <= 2:
return 'Negative' # Low rating and negative sentiment
elif rating == 3:
return 'Mixed Negative' # Neutral rating but negative sentiment
else:
return 'Mixed Positive' # High rating but negative sentiment
else: # Neutral sentiment score
if rating >= 4:
return 'Positive' # High rating with neutral sentiment
elif rating <= 2:
return 'Negative' # Low rating with neutral sentiment
else:
return 'Neutral' # Neutral rating and neutral sentiment

# Here we’re defining a function to create buckets populated with values only based on
the score given to the text.
def sentiment_bucket(score):
if score >= 0.5:
return '0.5 to 1.0' # Strongly positive sentiment
elif 0.0 <= score < 0.5:
return '0.0 to 0.49' # Mildly positive sentiment
elif -0.5 <= score < 0.0:
return '-0.49 to 0.0' # Mildly negative sentiment
else:
return '-1.0 to -0.5' # Strongly negative sentiment

# Apply sentiment analysis to calculate sentiment scores for each review


# Here we are creating the column 'SentimentScore', calculated as per the definition.
customer_reviews_df['SentimentScore'] =
customer_reviews_df['ReviewText'].apply(calculate_sentiment)

# Apply sentiment categorization using both text and rating


# Here we are creating a column 'SentimentCategory', calculated as per the definition
customer_reviews_df['SentimentCategory'] = customer_reviews_df.apply(
lambda row: categorize_sentiment(row['SentimentScore'], row['Rating']), axis=1)

# Apply sentiment bucketing to categorize scores into defined ranges


# Here we are creating a column 'SentimentBucket', calculated as per the definition
customer_reviews_df['SentimentBucket'] =
customer_reviews_df['SentimentScore'].apply(sentiment_bucket)
PowerBI
Importing tables to PowerBI
To do so let’s connect PowerBI to SQL server and select the tables we want
to import. We’ll be importing the tables with the data “as is”. Then we will,
from inside Power Query, be applying the SQL queries we used before to
transform the data on its source and bring the transformed data, with which
we will then be working. For the moment, lets import the data “as is”.

The tables were imported, PowerBI already created some relationships


between them (which we will be deleting and creating new ones so the
model serves our needs to try to achieve our goals.
Querying SQL database from Power Query
Opening Power Query and passing through the tables we can see that the
data came exactly as it was on the SQL server, meaning that the queries we
used to transform data weren’t persistent. We used them to know how to
query the database from PowerBI and bring the data as we want it to arrive.
For example on customer_journey there are NULLS and duplicates

And the same happens with the other tables. This import was meant to
show how we can query the SQL database from inside Power Query. Let’s
start:
In the right pane “Query settings” we can see the
applied steps. Those steps resulted on the import of
the data as shown on top.
If we remove the “navigation step” we’ll end up with
no data, just seeing the name of the tables in the
database. Then clicking on the settings wheel we get
the import window again
Clicking on “Advanced Options “the area for us to insert the SQL query
appears.
We then paste our query on that area and click “OK”

And there we have it. The data is imported with the transformations we’ve
done in SQL before.

Now we’ll replicate this step for every table and bring to Power Query our
transformed and enriched data. After importing all tables to Power Query is
very important to check the data types with which Power Query imported
the fields.
From here we can see that,
 visitdate has been imported as Date/Time when no time stamp exists
 ProductID has been imported as decimal when probably whole number
is better
NOTE: Our enriched table with sentiment analysis will also be imported, but
via .csv as we generated the .csv file
We’ll now do these verifications and prepare our data, so we can continue
to PowerBI desktop and start building our model.

Building the model


Creating a Date Table
But before moving on and creating the relationships we need to create a
date table and profit from PowerBI's excellence in handling date and time.
A date table is a critical component in any analytical data model because it
provides a centralized structure for handling time-based data. It allows for
consistent time-based aggregations, comparisons, and calculations, such as
year-over-year growth, month-to-date performance, or weekly trends. By
including fields for various date granularities (day, month, quarter, year), as
well as fiscal periods and custom time groupings, a date table enables
flexible and efficient time-based analysis across all connected datasets. It
also simplifies querying by removing the need for repetitive and complex
date calculations, ensuring accuracy and consistency in time-related
insights.
Power BI excels in working with time and date data, offering built-in
functionality and features specifically designed for time intelligence. With
its robust support for date tables and automatic creation of date
hierarchies, Power BI’s seamless integration with custom date tables
ensures precise and flexible time-based reporting, empowering users to
uncover actionable insights with minimal effort. Whether visualizing data at
a granular daily level or summarizing it by quarters and years, Power BI
makes time-based analysis intuitive and highly effective.
Date tables can be created from scratch or we can use pre-built scripts with
date tables already created. For our project we’ll create a new table with
the following script
Calendar =
ADDCOLUMNS (
CALENDAR ( DATE ( 2023, 1, 1 ), DATE ( 2025, 12, 31 ) ),
"DateAsInteger", FORMAT ( [Date], "YYYYMMDD" ),
"Year", YEAR ( [Date] ),
"Monthnumber", FORMAT ( [Date], "MM" ),
"YearMonthnumber", FORMAT ( [Date], "YYYY/MM" ),
"YearMonthShort", FORMAT ( [Date], "YYYY/mmm" ),
"MonthNameShort", FORMAT ( [Date], "mmm" ),
"MonthNameLong", FORMAT ( [Date], "mmmm" ),
"DayOfWeekNumber", WEEKDAY ( [Date] ),
"DayOfWeek", FORMAT ( [Date], "dddd" ),
"DayOfWeekShort", FORMAT ( [Date], "ddd" ),
"Quarter", "Q" & FORMAT ( [Date], "Q" ),
"YearQuarter",
FORMAT ( [Date], "YYYY" ) & "/Q"
& FORMAT ( [Date], "Q" )
)

This is the resulting table

Creating the relationships and data model

We won’t be entering in detailed explanation on our model relationships


and just present our final model. We will merely present the relationships
we created and the final data model.
Our data model is specifically designed to address the key issues of reduced
customer engagement, declining conversion rates, and the need to optimize
marketing strategies. By integrating data from customer interactions,
reviews, marketing campaigns, and temporal dimensions, the model
provides a holistic view of the business. The inclusion of a well-defined
Calendar table allows for time-based analysis, enabling us to track trends in
customer behavior, sentiment, and engagement over specific periods. The
relationships between key tables such as
customer_reviews_with_sentiment, engagement_data, and
customer_journey ensure that we can connect customer feedback, online
interactions, and the stages of their journey seamlessly.
Our approach to enriching customer reviews with sentiment analysis further
deepens our ability to understand customer satisfaction. The
customer_reviews_with_sentiment table combines raw reviews with
calculated sentiment scores and categories, allowing us to identify recurring
themes in positive and negative feedback. Coupled with its connection to
the products and customers tables, we can pinpoint which products or
customer segments are driving or hindering engagement. This sentiment
data, combined with engagement metrics from the engagement_data table,
provides actionable insights into the types of content and campaigns that
are most effective, directly addressing the need to improve marketing
strategies and customer satisfaction.
Lastly, the model’s relationships allow for granular and segmented analysis
of the customer journey. The customer_journey table, linked to Calendar,
products, and customers, enables us to uncover the stages where
customers drop off and to identify patterns in behavior across
demographics and time periods. This supports the goal of optimizing the
conversion funnel by highlighting areas where improvements can be made.
Additionally, the integration of these datasets with temporal and campaign
data allows us to evaluate the ROI of marketing efforts, ensuring that
resources are allocated effectively. Overall, the structure of our model
ensures we can comprehensively address the challenges posed by declining
engagement and conversions while laying the groundwork for data-driven
decision-making.
Building the report on PowerBI Desktop

From the get go, state that, as our data is somewhat similar to possible
data related to marketing data, but still demo data, we will be building a
comprehensive report to try achieving our goals of finding out why some of
the issues pointed in the beginning may be occurring, but keeping in mind
that our base for that is for demo purposes.
Performing an initial time analysis on several metrics we can see that the
demo data didn’t fully account to answer in full to the issues presented to
us, still we’ll be creating the report.
Having this pointed out and having our tables imported and our data model
defined, let’s start creating our report.

Creating Measures (and why)

Using measures in Power BI instead of directly dragging fields into visuals is


a cornerstone of building dynamic, accurate, and scalable reports. Measures
act as reusable calculations that ensure consistency across all visuals,
enabling you to apply complex business logic without repeating or
reconfiguring it every time. For example, a Clicks measure can sum
engagement data with precision while dynamically adapting to filters,
slicers, and user interactions. This flexibility allows your metrics to reflect
real-time context, such as campaign performance over specific time frames
or segmented by customer demographics. Moreover, measures are
efficient, leveraging Power BI’s DAX engine to handle calculations on the fly,
ensuring optimal performance even with large datasets. They also simplify
the report-building process by ensuring a clean, intuitive field list, where
end users can easily find pre-defined business metrics without worrying
about underlying calculations.
From the user’s perspective, measures make reports in Power BI Service
more interactive and accessible. End users benefit from dynamic metrics
that update instantly when filters or slicers are applied, providing
meaningful insights without requiring technical knowledge of the data
structure. Additionally, measures integrate seamlessly with time
intelligence, supporting advanced analytics like year-to-date trends or
monthly comparisons with minimal effort. They also play a vital role in
maintaining governance, ensuring that all stakeholders are aligned on the
same metric definitions, and simplifying updates when business logic
changes. Whether creating KPI dashboards, applying row-level security, or
building custom tooltips, measures are essential for delivering polished,
powerful reports that empower users to make informed decisions.
Business Measures

Needless to say that, with the amount of facts we have on our model, we
could be building a lot more measures, either directly from the data or
derived from it. In this link, you can see a table with possible measures |
visuals we could be creating using the available data, but for the purpose of
this project we’ll create the following measures.
Clicks = SUM(engagement_data[Clicks])

Likes = SUM(engagement_data[Likes])

Views = SUM(engagement_data[Views])

Rating (Average) = AVERAGE(customer_reviews_with_sentiment_VH[Rating])

Number of Cust Reviews = DISTINCTCOUNT(customer_reviews_with_sentiment_VH[ReviewID])

Number of Cust Journeys = DISTINCTCOUNT(customer_journey[JourneyID])

Number of Campaigns = DISTINCTCOUNT(engagement_data[CampaignID])

Coversion Rate =
VAR TotalVisitors = CALCULATE(COUNT(customer_journey[JourneyID]),customer_journey[Action]
= "View")
VAR TtoalPurchases = CALCULATE(
COUNT(customer_journey[JourneyID]),customer_journey[Action] = "Purchase")
RETURN
IF(
TotalVisitors = 0, 0, DIVIDE(TtoalPurchases, TotalVisitors)

As models can quickly scale, and possibly more than one people have
access to the report with build permissions, it’s a good practice to create a
standalone table, in the model, with the measures created. It acts as a
central repository for these objects, so others can quickly deploy more
visuals using existing measures, without the need to be looking for them.
Here they are, organized in the separate table, created to accommodate
them and accessible for other report contributors.
Creating the Report

You might also like