0% found this document useful (0 votes)
6 views3 pages

Introduction To Data Science Notes and Report

The report details the data cleaning and preprocessing steps taken to analyze app data, including selecting important columns, removing outliers, and creating new indicators for popularity and ratings. Key insights reveal trends in genre distribution, pricing, and the correlation between user ratings and app quality, with visualizations enhancing the understanding of these relationships. The findings suggest that free apps dominate the market and that pricing strategies significantly influence user satisfaction and engagement.

Uploaded by

ibrahim ghauri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Introduction To Data Science Notes and Report

The report details the data cleaning and preprocessing steps taken to analyze app data, including selecting important columns, removing outliers, and creating new indicators for popularity and ratings. Key insights reveal trends in genre distribution, pricing, and the correlation between user ratings and app quality, with visualizations enhancing the understanding of these relationships. The findings suggest that free apps dominate the market and that pricing strategies significantly influence user satisfaction and engagement.

Uploaded by

ibrahim ghauri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

REPORT

Ibrahim Ghauri

23i-2653

Insights from Data Cleaning and Preprocessing

The data cleaning process involved:

 Selecting Important Columns: Only relevant columns were retained, reducing the
dataset size and focusing analysis on key attributes like trackId, trackName, price,
averageUserRating, and userRatingCount.
 Outlier Removal: We used the IQR method to remove outliers from the price column,
ensuring more accurate price-related analysis.
 New Columns:
o Popularity Indicator: Apps were categorized as "High," "Medium," or "Low"
based on userRatingCount.
o High Rating Flag: Apps with a rating >= 3 were marked as "high-rated."

These steps ensured that the dataset was cleaner and more focused for analysis.

Key Observations Based on Analysis

Key insights from the analysis include:

1. Genre Distribution: Certain genres dominate the market, revealing app preferences.
2. Price Distribution: Most apps are free or priced under $1, indicating a trend toward low-
cost or free apps to attract users.
3. Rating Correlation: High-rated apps tend to have more user ratings, indicating a positive
relationship between app quality and user base.
4. Paid vs Free Apps: Free apps dominate the market, but paid apps tend to have slightly
higher user ratings.

Significance of Each Visualization

The following visualizations were used to provide meaningful insights into the dataset:

1. Pie Chart – Proportion of Free vs Paid Apps


Visualization: A pie chart showing the distribution of free and paid apps.
Significance: This visualization highlights the market dominance of free apps compared
to paid apps. It shows the prevalence of free apps, providing context for understanding
how price affects app popularity and user acquisition strategies. A better proportion of
free apps suggests the market trend towards offering apps for free to increase user
engagement.
2. Bar Chart – Average User Rating for Free vs Paid Apps
Visualization: A bar chart comparing the average user ratings of free and paid apps.
Significance: This visualization helps to interpret the relationship between app pricing
and user satisfaction. It reveals that paid apps have slightly higher average ratings,
which may imply that users are willing to rate paid apps more positively, potentially due
to better quality or expectations that come with payment. It also shows the potential
correlation between user satisfaction and pricing.
3. Histogram – Distribution of User Ratings
Visualization: A histogram displaying the distribution of user ratings across apps.
Significance: The histogram provides insights into the spread of user ratings. Most apps
have ratings concentrated around the higher end of the scale, showing that users tend
to give positive feedback. This could indicate that apps in the dataset are generally well-
received or that people are more likely to rate when they are satisfied.
4. Scatter Plot – Price vs User Rating (Paid Apps)
Visualization: A scatter plot showing the relationship between price and user rating for
paid apps.
Significance: This scatter plot is important for identifying trends between app price and
user rating. It shows a slight positive correlation between the price of an app and its
average user rating. This suggests that higher-priced apps may offer better quality or
user experience, leading to higher ratings, which is valuable for app developers to
consider when pricing their apps.
5. Correlation Heatmap
Visualization: A correlation matrix heatmap to visualize relationships between
numerical variables.
Significance: The heatmap helps in understanding the strength of the relationships
between key variables. In this case, it highlights how price correlates with
averageUserRating, and how the genre (after one-hot encoding) relates to these
metrics. By visualizing the correlation matrix, we can identify significant relationships
that could guide further analysis or business decisions.
6. Genre Distribution Bar Chart
Visualization: A bar chart representing the distribution of app genres.
Significance: This chart provides a clear view of the most and least common genres in
the dataset, offering insights into which types of apps dominate the market. Developers
can use this information to tailor their app offerings towards more popular genres, or
identify niche markets with less competition.
7. Price Distribution Analysis
Visualization: A categorical analysis using bins to group apps based on their price.
Significance: This analysis shows how apps are priced within different ranges, offering
insights into the overall pricing strategy in the app market. Most apps are priced
between $0 and $1, which is essential for understanding the price sensitivity in this
market.
Conclusion

This analysis highlights key trends in app pricing, user ratings, and genre preferences. The
visualizations provide clear insights into how app price influences user ratings, and how free
apps dominate the market. These findings can guide app developers in making informed
decisions regarding pricing and development strategies.

You might also like