REPORT
Ibrahim Ghauri
23i-2653
Insights from Data Cleaning and Preprocessing
The data cleaning process involved:
Selecting Important Columns: Only relevant columns were retained, reducing the
dataset size and focusing analysis on key attributes like trackId, trackName, price,
averageUserRating, and userRatingCount.
Outlier Removal: We used the IQR method to remove outliers from the price column,
ensuring more accurate price-related analysis.
New Columns:
o Popularity Indicator: Apps were categorized as "High," "Medium," or "Low"
based on userRatingCount.
o High Rating Flag: Apps with a rating >= 3 were marked as "high-rated."
These steps ensured that the dataset was cleaner and more focused for analysis.
Key Observations Based on Analysis
Key insights from the analysis include:
1. Genre Distribution: Certain genres dominate the market, revealing app preferences.
2. Price Distribution: Most apps are free or priced under $1, indicating a trend toward low-
cost or free apps to attract users.
3. Rating Correlation: High-rated apps tend to have more user ratings, indicating a positive
relationship between app quality and user base.
4. Paid vs Free Apps: Free apps dominate the market, but paid apps tend to have slightly
higher user ratings.
Significance of Each Visualization
The following visualizations were used to provide meaningful insights into the dataset:
1. Pie Chart – Proportion of Free vs Paid Apps
Visualization: A pie chart showing the distribution of free and paid apps.
Significance: This visualization highlights the market dominance of free apps compared
to paid apps. It shows the prevalence of free apps, providing context for understanding
how price affects app popularity and user acquisition strategies. A better proportion of
free apps suggests the market trend towards offering apps for free to increase user
engagement.
2. Bar Chart – Average User Rating for Free vs Paid Apps
Visualization: A bar chart comparing the average user ratings of free and paid apps.
Significance: This visualization helps to interpret the relationship between app pricing
and user satisfaction. It reveals that paid apps have slightly higher average ratings,
which may imply that users are willing to rate paid apps more positively, potentially due
to better quality or expectations that come with payment. It also shows the potential
correlation between user satisfaction and pricing.
3. Histogram – Distribution of User Ratings
Visualization: A histogram displaying the distribution of user ratings across apps.
Significance: The histogram provides insights into the spread of user ratings. Most apps
have ratings concentrated around the higher end of the scale, showing that users tend
to give positive feedback. This could indicate that apps in the dataset are generally well-
received or that people are more likely to rate when they are satisfied.
4. Scatter Plot – Price vs User Rating (Paid Apps)
Visualization: A scatter plot showing the relationship between price and user rating for
paid apps.
Significance: This scatter plot is important for identifying trends between app price and
user rating. It shows a slight positive correlation between the price of an app and its
average user rating. This suggests that higher-priced apps may offer better quality or
user experience, leading to higher ratings, which is valuable for app developers to
consider when pricing their apps.
5. Correlation Heatmap
Visualization: A correlation matrix heatmap to visualize relationships between
numerical variables.
Significance: The heatmap helps in understanding the strength of the relationships
between key variables. In this case, it highlights how price correlates with
averageUserRating, and how the genre (after one-hot encoding) relates to these
metrics. By visualizing the correlation matrix, we can identify significant relationships
that could guide further analysis or business decisions.
6. Genre Distribution Bar Chart
Visualization: A bar chart representing the distribution of app genres.
Significance: This chart provides a clear view of the most and least common genres in
the dataset, offering insights into which types of apps dominate the market. Developers
can use this information to tailor their app offerings towards more popular genres, or
identify niche markets with less competition.
7. Price Distribution Analysis
Visualization: A categorical analysis using bins to group apps based on their price.
Significance: This analysis shows how apps are priced within different ranges, offering
insights into the overall pricing strategy in the app market. Most apps are priced
between $0 and $1, which is essential for understanding the price sensitivity in this
market.
Conclusion
This analysis highlights key trends in app pricing, user ratings, and genre preferences. The
visualizations provide clear insights into how app price influences user ratings, and how free
apps dominate the market. These findings can guide app developers in making informed
decisions regarding pricing and development strategies.