0% found this document useful (0 votes)
13 views3 pages

Introduction To Data Science Notes and Report 2

The report outlines the data cleaning and preprocessing steps taken to prepare a dataset for analysis, including column selection, outlier removal, and the creation of new columns for enhanced insights. Key findings reveal trends in app genres, pricing, and user ratings, indicating a market dominated by free apps and a correlation between app quality and user base. Visualizations such as pie charts and scatter plots were utilized to illustrate these insights, aiding developers in making informed decisions about app development and marketing strategies.

Uploaded by

ibrahim ghauri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

Introduction To Data Science Notes and Report 2

The report outlines the data cleaning and preprocessing steps taken to prepare a dataset for analysis, including column selection, outlier removal, and the creation of new columns for enhanced insights. Key findings reveal trends in app genres, pricing, and user ratings, indicating a market dominated by free apps and a correlation between app quality and user base. Visualizations such as pie charts and scatter plots were utilized to illustrate these insights, aiding developers in making informed decisions about app development and marketing strategies.

Uploaded by

ibrahim ghauri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Report for assignment 2

Ibrahim ghauri
23i-2653

Insights Gained from Data Cleaning and Preprocessing

During the data cleaning and preprocessing stage, several key tasks were performed to prepare
the dataset for analysis. These steps helped in refining the dataset and removing any noise that
could affect the results:

1. Column Selection and Removal of Unnecessary Data:


We identified and extracted only the essential columns that are crucial for the analysis,
such as trackId, trackName, price, averageUserRating, genres, and userRatingCount. This
helped reduce the size of the dataset and focused the analysis on the most relevant
attributes.
2. Outlier Removal:
Using the Interquartile Range (IQR) method, we removed outliers in the price column.
This was important as extreme price values could skew the analysis of app pricing
trends. After filtering, the dataset became more representative of typical app prices.
3. New Column Creation for Enhanced Analysis:
o A popularity indicator was added, categorizing apps into "High," "Medium," and
"Low" popularity based on userRatingCount.
o A new binary column, isHighRating, was introduced, marking apps with an
average user rating greater than or equal to 3 as "high-rated" (1) and the rest as
"low-rated" (0).
These additions allowed us to categorize and segment the data for deeper
insights.

Key Observations and Interpretations Based on Your Analysis and Visualizations

From the analysis of the cleaned data, several key insights emerged:

1. Genre Distribution:
The most common genres in the dataset were identified, revealing which genres are
most popular among users. This allows us to infer trends in app preferences and
potentially suggest areas of improvement or focus for app developers.
2. Price Distribution:
The analysis showed that most apps in the dataset fall into the lower price brackets
(e.g., between $0 and $1), with a significant proportion of apps being free. This suggests
that many developers prefer a free or low-cost model, likely driven by the desire to
attract more users.
3. Rating Analysis:
We found that high-rated apps tend to have higher user counts, indicating a possible
correlation between app quality (as measured by user rating) and user base. However,
the low-rated apps still tend to have a significant number of ratings, which may suggest
that poor-quality apps can still attract a substantial number of downloads.
4. Paid vs Free Apps:
The analysis revealed that free apps dominate the market compared to paid apps.
However, paid apps that do exist tend to show a positive correlation between price and
user rating, suggesting that higher-quality apps can command higher prices.

Significance of Each Visualization

The following visualizations were used to provide meaningful insights into the dataset:

1. Pie Chart – Proportion of Free vs Paid Apps


Visualization: A pie chart showing the distribution of free and paid apps.
Significance: This visualization highlights the market dominance of free apps compared
to paid apps. It shows the prevalence of free apps, providing context for understanding
how price affects app popularity and user acquisition strategies. A better proportion of
free apps suggests the market trend towards offering apps for free to increase user
engagement.
2. Bar Chart – Average User Rating for Free vs Paid Apps
Visualization: A bar chart comparing the average user ratings of free and paid apps.
Significance: This visualization helps to interpret the relationship between app pricing
and user satisfaction. It reveals that paid apps have slightly higher average ratings,
which may imply that users are willing to rate paid apps more positively, potentially due
to better quality or expectations that come with payment. It also shows the potential
correlation between user satisfaction and pricing.
3. Histogram – Distribution of User Ratings
Visualization: A histogram displaying the distribution of user ratings across apps.
Significance: The histogram provides insights into the spread of user ratings. Most apps
have ratings concentrated around the higher end of the scale, showing that users tend
to give positive feedback. This could indicate that apps in the dataset are generally well-
received or that people are more likely to rate when they are satisfied.
4. Scatter Plot – Price vs User Rating (Paid Apps)
Visualization: A scatter plot showing the relationship between price and user rating for
paid apps.
Significance: This scatter plot is important for identifying trends between app price and
user rating. It shows a slight positive correlation between the price of an app and its
average user rating. This suggests that higher-priced apps may offer better quality or
user experience, leading to higher ratings, which is valuable for app developers to
consider when pricing their apps.
5. Correlation Heatmap
Visualization: A correlation matrix heatmap to visualize relationships between
numerical variables.
Significance: The heatmap helps in understanding the strength of the relationships
between key variables. In this case, it highlights how price correlates with
averageUserRating, and how the genre (after one-hot encoding) relates to these
metrics. By visualizing the correlation matrix, we can identify significant relationships
that could guide further analysis or business decisions.
6. Genre Distribution Bar Chart
Visualization: A bar chart representing the distribution of app genres.
Significance: This chart provides a clear view of the most and least common genres in
the dataset, offering insights into which types of apps dominate the market. Developers
can use this information to tailor their app offerings towards more popular genres, or
identify niche markets with less competition.
7. Price Distribution Analysis
Visualization: A categorical analysis using bins to group apps based on their price.
Significance: This analysis shows how apps are priced within different ranges, offering
insights into the overall pricing strategy in the app market. Most apps are priced
between $0 and $1, which is essential for understanding the price sensitivity in this
market.

Conclusion

This analysis and visualization helped us uncover key insights regarding app pricing, user
ratings, and the impact of genre on app success. The visualizations have been particularly useful
for illustrating relationships between user ratings and price, as well as understanding market
trends in terms of app popularity, pricing, and genres. These findings can assist developers in
making informed decisions regarding app development, pricing strategies, and marketing
efforts.

You might also like