NATURAL LANGUAGE PROCESSING
NAME : Bavya C
CLASS : AI-DS ‘A’
ROLLNO : 22AD010
1. Word Frequency Analysis
Explanation:
Word frequency analysis identifies how often each word appears in a text. It helps determine the text's dominant
themes and frequent patterns.
Steps to Solve:
Preprocessing: Clean the text to remove punctuation, convert to lowercase, and split into words.
Count Total Words: Count all the words in the cleaned list.
Calculate Word Frequencies: Use a dictionary or collections.Counter to calculate the frequency of
each word.
Find the Most Common Word: Identify the word with the highest count.
Python Implementation:
OUTPUT:
Total words: 31
Word frequencies: Counter({'data': 3, 'is': 2, 'and': 2, 'that': 1, 'science': 1, 'an': 1, 'interdisciplinary': 1, 'field': 1, 'uses': 1,
'various': 1, 'techniques': 1, 'algorithms': 1, 'tools': 1, 'to': 1, 'extract': 1, 'insights': 1, 'knowledge': 1, 'from': 1, 'structured': 1,
'unstructured': 1, 'driven': 1, 'decisionmaking': 1, 'transforming': 1, 'industries': 1, 'worldwide': 1})
Most common word: 'data' appears 3 times.
2. Measures of Central Tendency
Explanation:
Word lengths in the text are analyzed using three statistical measures:
Mean: The average length of words.
Median: The middle value in the sorted word lengths.
Mode: The most frequently occurring word length.
Steps to Solve:
Preprocess the text and calculate word lengths.
Use statistical formulas or libraries to compute the mean, median, and mode.
Evaluate which measure best represents the data.
Python Implementation:
OUTPUT:
Mean word length: 6.06
Median word length: 6.0
Mode word length: 4
Typical word length: Median, as it reduces the impact of very short or long words.
3. Visualization
Explanation:
Visualizing the word frequencies offers insights into the text's structure and focus:
Top 5 Words: Identifies the most frequently occurring words.
Bar Chart: Compares the frequencies of these top words.
Insights: Highlights dominant themes or filler words.
Steps to Solve:
1. Extract the top 5 most common words.
2. Plot their frequencies using a bar chart.
3. Analyze the chart to draw conclusions.
Python Implementation:
OUTPUT:
OUTPUT (Bar Chart):
A bar chart with the following:
Words: data, is, and, that, science
Frequencies: 3, 2, 2, 1, 1