0% found this document useful (0 votes)

15 views86 pages

M04 Lecture Notes

The document provides an overview of Python libraries, including standard and third-party libraries, and explains the concepts of modules, packages, and classes. It details the data analytics process, covering data collection, preprocessing, analysis, and sharing insights, while also introducing recommendation systems and their filtering techniques. Additionally, it discusses the use of the Pandas library for data manipulation and analysis in Python.

Uploaded by

Berly Brigith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views86 pages

M04 Lecture Notes

Uploaded by

Berly Brigith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Libraries in Python

© Faculty of Management
Libraries in Python
A library is a collection of pre-written code that is aimed at providing a set of
capabilities and functionalities for use in other programs.

Types of Python Libraries

• Standard Libraries:
• Included with Python, these libraries help you perform file I/O, simple
persistence, and data serialization
• Third-Party Libraries:
• Additional libraries that can be installed as needed.
• More specialized functionality, like data analysis, machine learning, or
image processing.
Standard Libraries

The Python Standard Library. (n.d.). Python. https://docs.python.org/3/library/index.html String Methods (n.d.). Python. https://docs.python.org/3/library/index.html
Third-Party Libraries

TextBlob: Simplified Text Processing (n.d.). https://textblob.readthedocs.io/en/dev/

Textblob. (n.d). PyPI. https://pypi.org/project/textblob/0.9.0/

Documentation

Source: scikit-learn developers (BSD License(2024) Count Vectorizor. https://scikit-

learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer
Library
A library consists of modules and packages designed to offer a range of capabilities
and functionalities for incorporation into other programs.

Library Package Module

Module
A module is a single Python file that sentiment_analyzer.py
contains definitions and statements.
lexicon = {"good": 1, "excellent": 1,
"bad": -1, "terrible": -1, "not": -0.5}
It can include functions, classes,
variables, and even runnable code def analyze_sentiment(lexicon, text):
that is meant to be imported and
………..
used in other Python scripts.
return sentiment_score
Example 1: Module with functions
Module sentiment_analyzer.py
lexicon = {"good": 1, "excellent": 1, "bad": -1, "terrible": -1, "not": -0.5}

def analyze_sentiment(lexicon, text):

………..
return sentiment_score
Import functions in Module
import sentiment_analyzer
# Example usage
text = input("Enter a text to analyze its sentiment: ")
result = sentiment_analyzer.analyze_sentiment(text)
print(f"The sentiment of the text is: {result}")
Example 2: Module with class
Module
sentiment_analyzer2.py
class SentimentAnalyzer:
def __init__(self):
self.lexicon={"good": 1, "excellent": 1, "bad": -1, "terrible": -1, "not": -0.5}
def analyze_sentiment(self, text):
…………….
# use self.lexicon for analyzing sentiment

return score
Class
Class
• Blueprint for creating objects
• Defines a set of attributes and methods
that are common to all objects of that
type
Class
Attributes SentimentAnalyzer
• Variable associated with a Class or Object Attributes
lexicon
Method Class
Method
• Function that is defined inside a class
analyze_sentiment()
body
Object
• An instance of a class Class
• When a class is defined, no SentimentAnalyzer
Object 1
memory is allocated until an MovieReview
object is created from the class
• Each object can hold different
data, but they share the Object 2
lexicon
functionalities defined in their RestaurantReview
respective class analyze_sentiment()
Object 3
ProductReview
Defining a Class
sentiment_analyzer2.py
class SentimentAnalyzer:
def __init__(self, lexicon):
self.lexicon = lexicon # This is an attribute

def analyze_sentiment(self, text):

sentiment_score = 0
words = text.split()
for word in words:
sentiment_score += self.lexicon.get(word, 0)
return sentiment_score
Using the Class
# Import the SentimentAnalyzer class from the sentiment_analyzer2 module
from sentiment_analyzer2 import SentimentAnalyzer

# Create an instance of SentimentAnalyzer for restaurant reviews

restaurant_lexicon = { "good": 1, "bad": -1, "delicious": 5, "tasty": 4, "bland": -4 }
restaurant_review_analyzer = SentimentAnalyzer(restaurant_lexicon)

# Example usage
restaurant_review = "The appetizers were bland but the main course was delicious"
restaurant_result = restaurant_review_analyzer.analyze_sentiment(restaurant_review)
print(f"The sentiment score of the restaurant review is: {restaurant_result}")
Package
A package is a collection of Python
modules under a common namespace.
Package Module
Definition __init__.py
• A Package is achieved by having a
directory with a special file named text_analysis tokenizer.py
__init__.py Class
• Can be empty but signifies that the SentimentAnalyzer
directory is a package sentiment_analyzer.py
• Can be imported the same way a
module can be
Module vs Package vs Library
Module Package Library

A library can include

A module is the A package is a collection
multiple packages or
simplest form of code of modules organized
standalone modules
organization, consisting into directories, possibly
that provide a wide
of a single file. with sub-packages.
range of functionalities.
How to Import
Using Python libraries is straightforward—
• Import the relevant libraries or,
• Specific functions based on the tasks at hand.
• Importing the Entire Library or Package
import text analysis
text_analyzer = text_analysis.sentiment_analyzer.SentimentAnalyzer(lexicon_1)

• Importing Specific Functions or Classes

from text_analysis.sentiment_analyzer import SentimentAnalyzer
text_analyzer = SentimentAnalyzer(lexicon_1)
Python Libraries for Analytics
Data Analytics Process

• Define your data needs

• Clearly outline the goals and objectives of your analysis.
• Identify sources
• Determine where to gather data from.
• Databases, surveys, APIs, or external providers.
• Collect relevant data
• Consider both structured and unstructured data
Data
Data is the raw material used in various fields to develop analysis, interpretation,
and decision-making processes.

Structured Data Unstructured Data

Textual Image
Data Video Audio
File
• Sales transactions • Social media posts
• Stock price trends • Customer reviews
• Employee data • Product images
Structured Data
Organized in clear tables with rows and columns.

• Well-suited for mathematical and statistical analysis.

• Follow consistent formats and data types.
• Easily integrated into databases and systems.
Unstructured Data
Lacks predefined structure or format.

Textual Data Image File Video Audio

• Diverse formats like text, image, audio, and video.

• Requires specialized techniques like NLP and image recognition.
• Reveals qualitative insights, sentiment, and context.
Structured VS Unstructured Data
Structured Data Unstructured Data
• Can be analyzed using standard • Often requires specialized analytics
statistical methods and business techniques tailored to the specific
intelligence tools data types

Combining structured and unstructured data analytics

allows organizations to understand their data and make
more informed decisions comprehensively.
Data Preprocessing

• Data Cleaning
• Handle missing values, outliers, and inconsistencies in the collected data.
• Data Transformation
• Standardize formats, normalized values, and create derived features if needed.
• Data Integration
• Combine data from different sources while maintaining data quality.
Python libraries

Structured Data

Unstructured Data
-Text

Unstructured Data
- Image
Data Analysis

• Exploratory Data Analysis

• Descriptive statistics, visualizations, and summaries of data.
• Model building and apply analytics techniques
• Construct proper models based on analysis goals.
• Evaluation and Validation
• Assess model performance, validate results, etc.
Python libraries

Descriptive Analysis

Predictive Analysis
Insight Sharing

• Interpret Finding
• Derive meaningful insights and patterns from the analysis results.
• Visualization
• Create charts, graphs, and visual representations to communicate insights.
• Sharing and Reporting
• Present findings to stakeholders through reports, presentations, or interactive tools.
Demo Intro:
Recommendation Systems

© Faculty of Management
Recommendation System
An algorithmic tool that suggests items or
content aligned with user preferences,
aiding in their discovery process.

Recommend User

Items
Cold Start Problem
The "cold start problem" in recommendation systems refers to the challenge of
providing accurate suggestions for new users or items with limited data.

• Popularity-based Filtering
• Content-based Filtering
Popularity-based Filtering
Popularity-based filtering is a simple and intuitive method to suggest items to users
based on their popularity or overall popularity among all users.

Item Characteristics
• Ratings (#, avg.)
• Year of Release
• Genres Recommend

Items Users
Content-based Filtering
Content-based filtering is a technique that suggests items to users based on the
attributes or features of the items themselves and the user's past interactions or
preferences.

Item Characteristics + User Preferences

• Genre
• Director
• Actor
Recommend
Goal of Activities
In these demo activities:

• Understand the data structure of Pandas (DataFrame & Series)

for structured data handling and descriptive analysis
• Handle unstructured data (i.e., text) and converting it to
structured format.
• Apply machine learning algorithms using the recommendation
system framework.
Popularity-based
Filtering
Recommendation Systems

Prepare Data : Item-related Data

movieID year genres avgRating numRatings

A 2023 Adventure|Comedy 7.4 224,000
B 2013 Romance|Sci-Fi 8.0 664,000
C 2023 Adventure 6.1 70,000
D 2010 Adventure|Sci-Fi 8.8 25,000,000
Pandas
An open-source data handling and manipulation library for Python.

Before using the library, you need to import it into your code

import pandas as pd

Note: pd is a common abbreviation throughout this program.

There’s no need to repeatedly mention ‘pandas’ every time.
Demo - Documentation

Lustigs, I. (n.d.). Cheatsheet for pandas, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Data Collection
& Data Pre-Processing

© Faculty of Management
Data Structures in Pandas
The data structure of pandas
1. DataFrame
Two-dimensional table-like Series Series DataFrame
structure that can store and apples oranges apples oranges
manipulate data with rows
and columns 0 3 0 0 0 3 0
2. Series 1 2 1 3 1 2 3
One-dimensional labelled 2 0 2 7 2 0 7
array that can hold data of 3 1 3 2 3 1 2
any type
Creating Series
Let’s create a Series which contains average ratings of movies.

import pandas as pd Index avgRating

0 7.4
avgRating_list=[7.4,8.0,6.1, 8.8] 1 8.0
avgRating=pd.Series(avgRating_list) 2 6.1
3 8.8

pd.Series() # function creates a pandas Series

While the list uses a zero-based integer index to access the values, the
Pandas Series can have values associated with an explicitly defined index.
Manipulating Series
Use movieId as the index for the Series instead of index number.

import
Index pandas as pd
avgRating Index avgRating
0 7.4 A 7.4
1 8.0
avgRating_list=[7.4,8.0,6.1, 8.8] B 8.0
2 6.1
movieId=['A','B','C','D'] C 6.1
3 8.8 D 8.8

avgRating_r=pd.Series(avgRating_list,
index=movieId)
Accessing Elements of a Series
Index avgRating
Access an element
A 7.4
B 8.0
C 6.1
D 8.8
# Access and print the value in the 'avgRating_r' Series using the index label 'A'
print(avgRating_r['A'])

# Access and print the value in the 'avgRating_r' Series using the numeric index 0
print(avgRating_r[0])
Creating DataFrames
Create a DataFrame which consists of movieId, year, and genres.
Keys
movies_dict={ movies_df
'movieId': ['A','B','C','D'], movie year genres
'year’. : [2023,2013,2023,2010], Id
'genres’. : ['Adventure|Comedy','Romance|Sci-Fi’, A 2023 Adventure|Comedy
'Adventure','Adventure|Sci-Fi’]} B 2013 Romance|Sci-Fi
C 2023 Adventure
movies_df=pd.DataFrame(movies_dict) D 2010 Adventure|Sci-Fi

Values
Example: Create a DataFrame
Create a DataFrame with ‘avgRating’ and ‘numRatings’ columns and set the
movieId as the index.

ratings_df
Index avgRating numRatings
A 7.4 224,000
B 8.0 664,000
C 6.1 70,000
D 8.8 25,000,000
Add a Column
Using the index, create a movieID column and add it to the DataFrame.
Then reset the index to be numeric

ratings_df ratings_df
Index avgRating numRatings Index movieId avgRating numRatings
A 7.4 224,000 0 A 7.4 224,000
B 8.0 664,000 1 B 8.0 664,000
C 6.1 70,000 2 C 6.1 70,000
D 8.8 25,000,000 3 D 8.8 25,000,000
Process: Add a Column
We add the new column by assigning the index to it.
ratings_df['movieId'] = ratings_df.index

Specifies the new column named Retrieves the index of the DataFrame;
'movieId' within the DataFrame these index values will be assigned to
'ratings_df' the 'movieId' column

Assignment operator that assigns a

value to the 'movieId' column

If needed, we can then reset the index to default integers.

ratings_df.reset_index(drop=True, inplace=True)
Combine Data Sets (Join)
• What is the Join operation?
• Why do we need to join data?
• How do we join data?
Example: Join
Combine datasets
movies_df ratings_df
movieId year genres movieId avgRating numRatings
A 2023 Adventure|Comedy A 7.4 224,000
B 2013 Romance|Sci-Fi B 8.0 664,000
C 2023 Adventure C 6.1 70,000
D 2010 Adventure|Sci-Fi D 8.8 25,000,000

combined_df = movies_df.join(ratings_df.set_index('movieId'))

combined_df = pd.merge(movies_df, ratings_df, on='movieId')

Join Methods Inner Join Left Join

Join Table 1 Table 2 Table 1 Table 2

movies_df
(Table 1)
pd.merge(movies_df, ratings_df, pd.merge(movies_df, ratings_df,
ratings_df on='movieID', how='inner') on='movieID', how='left')
(Table 2) Right Join Full Outer Join

Table 1 Table 2 Table 1 Table 2

pd.merge(movies_df, ratings_df, pd.merge(movies_df, ratings_df,

on='movieID', how='right' on='movieID', how='outer')
Export DataFrame -Demo
• Export DataFrame to CSV file

combined_df.to_csv('movie_data.csv', index=False)

• Export DataFrame to Excel file

combined_df.to_excel('movie_data.xlsx', index=False)
Data Analysis

sorted_df = df.sort_values(by='avgRating')

• Sort in Descending Order

sorted_df = df.sort_values(by='avgRating', ascending=False)

Sorting: Descending order
• Sort by multiple columns by descending order

sorted_df=df[['movieId','year','avgRating',]].sort_values(by=['year','avgRating'], ascending=False)
Sorting: Specify order
• Sort by multiple columns by explicitly specifying the sorting order of each column.

sorted_df=df[['movieId','year','avgRating',]].sort_values(by=['year','avgRating'], ascending=[True,False])
Filtering DataFrames
query() allows Boolean expressions for filtering rows

df.query('numRatings > 1000000')

df.query('year > 2020 & avgRating >= 8.5')

df.query('genres.str.contains("Sci-Fi") & numRatings >= 200000 & avgRating > 7.5')

Practice : Demo
Blockbusters: "List the movies that are extremely popular, with more than a
million ratings."

blockbuster_movies = df.query('numRatings > 1000000')

Filtering DataFrames
Trending High-Quality Movies: "Find all movies released after 2018 with an
average rating of at least 8.5."

top_recent_movies = df.query('year >= 2018 & avgRating >= 8.5')

String Methods in Pandas
Pandas provides a variety of string methods that are accessible via the
‘ .str’ accessor

df.genres str.contains("Sci-Fi")

df.genres.str.contains("Advenure")
Demo
Popular Sci-Fi Adventures: "I love sci-fi! Can you show me sci-fi movies with
at least 200,000 ratings and an average rating higher than 7.5?“

sci_fi_hits = df.query('genres.str.contains("Sci-Fi") & numRatings >= 2000

00 & avgRating > 7.5')
Insight Sharing

Source: Jamian (2021, July 31). Recommendation Engines— Netflix and Amazon product recommendations techniques. Medium.
https://jamian.medium.com/recommendation-engines-netflix-and-amazon-product-recommendations-techniques-3f93896d85b0
Content-Based Filtering:
Textual Data
Recommendation Systems

© Faculty of Management
Content-based Filtering
Develop a content-based filtering system
to integrate personalized
recommendations. User’s Preferences

Prepare Data: User

• Item-related Data
• Users’ preferences for the items
Items Recommend
Content-based Filtering
Item
movieId year genres
A 2023 Adventure|Comedy
B 2013 Romance|Sci-Fi
C 2023 Adventure
A B C D D 2010 Adventure|Sci-Fi

UserId year genres

1 2014 Adventure|Sci-Fi
User
Sources: (A) Barbie movie poster, (B) Her movie poster, (C) Transformers movie poster, (D) Inception movie poster, Interstellar movie poster; IMDB
Content-Based Filtering Process
Content-based filtering relies on analyzing the content or characteristics of items and
building user profiles based on their preferences.

2. User profile creation 1. Item profile creation

3. Similarity calculation

4. Rank & Recommend

Recommendations are made by comparing the similarity between the user profile
and the profiles of the items.
Demo: Import related libraries
• Handling and manipulating the dataset to calculate the s
imilarity score between users’ preferred items and the li
st of items

• Convert data into a numerical format

• Calculate similarity score
Demo : Import related libraries
Library Package Module

sklearn feature_extraction text Class

from sklearn.feature_extraction.text import CountVectorizer
Text Vectorization
The process of converting textual data into numerical vectors that
machine learning algorithms can understand and process.

genres
Adventure|Comedy
Romance|Sci-Fi
Adventure Comedy Romance Sci-Fi
Adventure
1 1 0 0
Adventure|Sci-Fi
0 0 1 1
1 0 0 0
1 0 0 1
Demo Import related libaries
from sklearn.metrics.pairwise import cosine_similarity

movieId Adventure Comedy Romance Sci-Fi

A 1 1 0 0
B 0 0 1 1 Cosine similarity
C 1 0 0 0
D 1 0 0 1

Similarity
UserId Adventure Comedy Romance Sci-Fi
1 1 0 0 1
Data Collection
& Data Pre-Processing

A B C D
movieId Adventure Comedy Romance Sci-Fi
A 1 1 0 0
B 0 0 1 1
C 1 0 0 0
D 1 0 0 1
Sources: (A) Barbie movie poster, (B) Her movie poster, (C) Transformers movie poster, (D) Inception movie poster; IMDB
User Profile Creation

UserId Adventure Comedy Romance Sci-Fi

1 1 0 0 1

User 1
CountVectorizer
1. Initialize the CountVectorizer

# Create a CountVectorizer instance with a custom tokenizer

vectorizer = CountVectorizer(tokenizer=lambda x: x.split('|'))

Lambda Function
• A small anonymous function defined using the lambda keyword in Python
• It can take any number of arguments but has only one expression
• Here, lambda x: x.split('|') takes an input x (which will be the genre string like
"Adventure|Comedy") and returns a list by splitting x on the '|' character
Apply CountVectorizer
2. Apply Vectorizer to the Genre Data:

# Apply the vectorizer to the 'genres' column of our DataFrame

genre_matrix = vectorizer.fit_transform(combined_data['genres'])

• vectorizer is an object of the class CountVectorizer

• The method fit_transform is called on this object. This method
does two main things:
• Fit: It learns or creates a vocabulary from the data provided, which
in this case, is the unique genres.
• Transform: It converts the data into a numerical format based
on the vocabulary it learned during the fit stage.
Demo
Show the results at the end.
# Display the genre matrix in array format.
print(genre_matrix.toarray())

# Retrieve and print the unique genres

unique_genres = vectorizer.get_feature_names_out()
print("Unique genres:", unique_genres)

Unique genres: ['adventure' 'comedy' 'romance' 'sci-fi']

Data Analysis

© Faculty of Management
Similarity Calculation
genre_matrix[-1]
UserId Adventure Comedy Romance Sci-Fi
1 1 0 0 1
genre_matrix[:-1]
movieId Adventure Comedy Romance Sci-Fi
Similarity A 1 1 0 0
B 0 0 1 1
C 1 0 0 0
D 1 0 0 1
Cosine Similarity in Content-Based Filtering
Cosine similarity is a metric used to measure the similarity between
two vectors in a multi-dimensional space.

The cosine similarity value ranges from -1 to 1.

• 1 indicating exactly the same direction.
• -1 indicating completely opposite directions.

Source: pyimagesearch.com
Cosine Similarity
Cosine similarity

UserId Adventure Comedy Romance Sci-Fi

1 1 0 0 1

Similarity
movieId Adventure Comedy Romance Sci-Fi
A 1 1 0 0
Item Ranking and Recommendation
Cosine similarity

User_movieId CS_Genre
1_A 0.5
1_B 0.5
1_C 0.7071
1_D 1

User Movie D
Sources: Inception movie poster and Interstellar movie poster; IMDB
Insight Sharing

User

Recommend

Sources: Inception movie poster and Interstellar movie poster; IMDB

More Data?

To enhance the generation of a content-based recommendation

system,
what additional factors can be considered?

Restaurant Review Production Analysis Using Python
No ratings yet
Restaurant Review Production Analysis Using Python
33 pages
Final Report Internship .5
No ratings yet
Final Report Internship .5
41 pages
7 - Functions and Modules
No ratings yet
7 - Functions and Modules
38 pages
Modules in Python
No ratings yet
Modules in Python
8 pages
Python & Data Analytics Internship Review
No ratings yet
Python & Data Analytics Internship Review
20 pages
Python Sentiment Analysis Guide
No ratings yet
Python Sentiment Analysis Guide
5 pages
Python Sentiment Analysis of Reviews
No ratings yet
Python Sentiment Analysis of Reviews
43 pages
Mini Project
No ratings yet
Mini Project
16 pages
Text Data Analysis and Visualization Techniques
No ratings yet
Text Data Analysis and Visualization Techniques
22 pages
Unit-VI-Introduction-to-Libraries - And-Modules (NEP)
No ratings yet
Unit-VI-Introduction-to-Libraries - And-Modules (NEP)
25 pages
Python for Developers & Analysts
No ratings yet
Python for Developers & Analysts
23 pages
Introduction To Python
No ratings yet
Introduction To Python
71 pages
Text Analysis in Business Using Python
No ratings yet
Text Analysis in Business Using Python
5 pages
PP Unit-Iv
No ratings yet
PP Unit-Iv
36 pages
Artificial and Data Science
No ratings yet
Artificial and Data Science
52 pages
Adobe Scan 15 Apr 2025
No ratings yet
Adobe Scan 15 Apr 2025
19 pages
Sentiment Analysis Using Vectotizer
No ratings yet
Sentiment Analysis Using Vectotizer
37 pages
Dads402-Unstructured Data Analysis
No ratings yet
Dads402-Unstructured Data Analysis
11 pages
Python Weather Forecasting Guide
No ratings yet
Python Weather Forecasting Guide
36 pages
Sentiment Analysis PDF
No ratings yet
Sentiment Analysis PDF
4 pages
Ece 2318 GENERAL DATA AND ITS TYPES
No ratings yet
Ece 2318 GENERAL DATA AND ITS TYPES
34 pages
Python Packages for Developers
No ratings yet
Python Packages for Developers
54 pages
PythonDASE - 2025 Version1
No ratings yet
PythonDASE - 2025 Version1
44 pages
Twitter Sentiment Analysis Guide
No ratings yet
Twitter Sentiment Analysis Guide
7 pages
Project
No ratings yet
Project
15 pages
AI Lab Report BIM
No ratings yet
AI Lab Report BIM
34 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
Episode 3 - Transcription
No ratings yet
Episode 3 - Transcription
4 pages
Data Science Project
No ratings yet
Data Science Project
24 pages
Interview Prep
No ratings yet
Interview Prep
3 pages
Python Lec3
No ratings yet
Python Lec3
28 pages
Python Modules: A Student Guide
No ratings yet
Python Modules: A Student Guide
4 pages
Python U-5
No ratings yet
Python U-5
76 pages
14 - Python Libraries
No ratings yet
14 - Python Libraries
9 pages
Answers 1 - 5
No ratings yet
Answers 1 - 5
28 pages
Python-Based Tweet Sentiment Analysis
No ratings yet
Python-Based Tweet Sentiment Analysis
4 pages
3.4 The Python Standard Library
No ratings yet
3.4 The Python Standard Library
11 pages
NCSPCN 12 CRP
No ratings yet
NCSPCN 12 CRP
3 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
Python-Based News Collection System
No ratings yet
Python-Based News Collection System
20 pages
Twitter Sentiment Analysis Guide
No ratings yet
Twitter Sentiment Analysis Guide
39 pages
Zomato Statistics Analysis in Python
No ratings yet
Zomato Statistics Analysis in Python
33 pages
45 Ijmtst0806103
No ratings yet
45 Ijmtst0806103
4 pages
BERT for Social Media Sentiment Analysis
No ratings yet
BERT for Social Media Sentiment Analysis
34 pages
Department of Masters of Comp. Applications
No ratings yet
Department of Masters of Comp. Applications
12 pages
Surbhi
No ratings yet
Surbhi
12 pages
Python U-5 Combined Notes
No ratings yet
Python U-5 Combined Notes
76 pages
Kavin
No ratings yet
Kavin
13 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Data Science Report
No ratings yet
Data Science Report
126 pages
Aca 21 Ram
No ratings yet
Aca 21 Ram
68 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
Data Analytics Course Overview and Tools
No ratings yet
Data Analytics Course Overview and Tools
24 pages
Python Data Analytics Course Guide
No ratings yet
Python Data Analytics Course Guide
36 pages
Python for Data Analytics Essentials
No ratings yet
Python for Data Analytics Essentials
41 pages
Python Object-Oriented Programming Guide
No ratings yet
Python Object-Oriented Programming Guide
32 pages
Naive Bayes for Sentiment Analysis Guide
No ratings yet
Naive Bayes for Sentiment Analysis Guide
10 pages
Ari711s Supplementary Test
No ratings yet
Ari711s Supplementary Test
7 pages
ENG Au - PI - PERFALGAN - V2.0 - 04sep2009
No ratings yet
ENG Au - PI - PERFALGAN - V2.0 - 04sep2009
20 pages
Personalized and Adaptive Learning Educational Pra
No ratings yet
Personalized and Adaptive Learning Educational Pra
11 pages
TESDA Circular No. 150-2020
No ratings yet
TESDA Circular No. 150-2020
42 pages
7612 en 03
No ratings yet
7612 en 03
338 pages
Cape Comm Studies Review 2021
No ratings yet
Cape Comm Studies Review 2021
15 pages
Earthquakes & Volcanoes P1 S2 QP
No ratings yet
Earthquakes & Volcanoes P1 S2 QP
15 pages
Grade 4 Q 2 Reading
No ratings yet
Grade 4 Q 2 Reading
14 pages
CMPT 363: User Interface Design: Fall 2022
No ratings yet
CMPT 363: User Interface Design: Fall 2022
43 pages
701P48938 FreeFlow Accxes V13.0 Drivers Install Guide
No ratings yet
701P48938 FreeFlow Accxes V13.0 Drivers Install Guide
42 pages
Ringkasan Materi Optimasi Tugas Mata Kul
No ratings yet
Ringkasan Materi Optimasi Tugas Mata Kul
15 pages
PureMaths 2012 Paper1 Marking Scheme
No ratings yet
PureMaths 2012 Paper1 Marking Scheme
14 pages
DH Universal Sidecar
No ratings yet
DH Universal Sidecar
1 page
Minor Triads and Inversions: Concept 2
100% (1)
Minor Triads and Inversions: Concept 2
2 pages
Anatomy of the Upper Respiratory Tract
No ratings yet
Anatomy of the Upper Respiratory Tract
36 pages
Learning Disabilities
No ratings yet
Learning Disabilities
24 pages
Cab Request for RICO Auto Visit
No ratings yet
Cab Request for RICO Auto Visit
2 pages
321B Excavator Hydraulic System: Kga1-Up AKG501-UP 9CZ1001-UP
No ratings yet
321B Excavator Hydraulic System: Kga1-Up AKG501-UP 9CZ1001-UP
2 pages
Live Break Bundle - Free Edition User Guide
100% (1)
Live Break Bundle - Free Edition User Guide
15 pages
AI Course Overview for CSE Students
No ratings yet
AI Course Overview for CSE Students
77 pages
Better Than Words
50% (2)
Better Than Words
8 pages
Sensory Techniques for Test Anxiety
No ratings yet
Sensory Techniques for Test Anxiety
9 pages
MOD-5 Notes
No ratings yet
MOD-5 Notes
58 pages
Alkanes Chemistry
No ratings yet
Alkanes Chemistry
13 pages
Ethiopian Royal Seals & Letters Pre-1900
No ratings yet
Ethiopian Royal Seals & Letters Pre-1900
30 pages
9 Marciak
No ratings yet
9 Marciak
11 pages
Borg0040 Ifu 2020-06-29 Ka-Ab Lot 127 - PT
No ratings yet
Borg0040 Ifu 2020-06-29 Ka-Ab Lot 127 - PT
32 pages
Free Office 365 ProPlus Activation Guide
No ratings yet
Free Office 365 ProPlus Activation Guide
1 page
BSD Assignment 2-Solutions
No ratings yet
BSD Assignment 2-Solutions
2 pages
How To Make Empanada Dough For Bakingingredients
No ratings yet
How To Make Empanada Dough For Bakingingredients
2 pages

M04 Lecture Notes

Uploaded by

M04 Lecture Notes

Uploaded by

Libraries in Python

Types of Python Libraries

TextBlob: Simplified Text Processing (n.d.). https://textblob.readthedocs.io/en/dev/

Textblob. (n.d). PyPI. https://pypi.org/project/textblob/0.9.0/

Source: scikit-learn developers (BSD License(2024) Count Vectorizor. https://scikit-

Library Package Module

def analyze_sentiment(lexicon, text):

def analyze_sentiment(self, text):

# Create an instance of SentimentAnalyzer for restaurant reviews

A library can include

• Importing Specific Functions or Classes

• Define your data needs

Structured Data Unstructured Data

• Well-suited for mathematical and statistical analysis.

Textual Data Image File Video Audio

• Diverse formats like text, image, audio, and video.

Combining structured and unstructured data analytics

• Exploratory Data Analysis

Item Characteristics + User Preferences

• Understand the data structure of Pandas (DataFrame & Series)

Prepare Data : Item-related Data

movieID year genres avgRating numRatings

Note: pd is a common abbreviation throughout this program.

import pandas as pd Index avgRating

pd.Series() # function creates a pandas Series

Assignment operator that assigns a

If needed, we can then reset the index to default integers.

combined_df = pd.merge(movies_df, ratings_df, on='movieId')

Join Table 1 Table 2 Table 1 Table 2

Table 1 Table 2 Table 1 Table 2

pd.merge(movies_df, ratings_df, pd.merge(movies_df, ratings_df,

• Export DataFrame to Excel file

• Sort in Descending Order

sorted_df = df.sort_values(by='avgRating', ascending=False)

df.query('numRatings > 1000000')

df.query('year > 2020 & avgRating >= 8.5')

df.query('genres.str.contains("Sci-Fi") & numRatings >= 200000 & avgRating > 7.5')

blockbuster_movies = df.query('numRatings > 1000000')

top_recent_movies = df.query('year >= 2018 & avgRating >= 8.5')

sci_fi_hits = df.query('genres.str.contains("Sci-Fi") & numRatings >= 2000

Prepare Data: User

UserId year genres

2. User profile creation 1. Item profile creation

4. Rank & Recommend

• Convert data into a numerical format

sklearn feature_extraction text Class

movieId Adventure Comedy Romance Sci-Fi

UserId Adventure Comedy Romance Sci-Fi

# Create a CountVectorizer instance with a custom tokenizer

# Apply the vectorizer to the 'genres' column of our DataFrame

• vectorizer is an object of the class CountVectorizer

# Retrieve and print the unique genres

Unique genres: ['adventure' 'comedy' 'romance' 'sci-fi']

The cosine similarity value ranges from -1 to 1.

UserId Adventure Comedy Romance Sci-Fi

Sources: Inception movie poster and Interstellar movie poster; IMDB

To enhance the generation of a content-based recommendation

You might also like