100% found this document useful (1 vote)

1K views

Ccs369 - Text and Speech Analysis - Lab Manual

The function takes in text as input, tokenizes it, filters out stop words, counts the frequencies of the remaining words and returns the 50 most common words and their counts.

Uploaded by

I yr IT 10-Cherisha S

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views

Ccs369 - Text and Speech Analysis - Lab Manual

The function takes in text as input, tokenizes it, filters out stop words, counts the frequencies of the remaining words and returns the 50 most common words and their counts.

Uploaded by

I yr IT 10-Cherisha S

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

MEENAKSHI SUNDARARAJAN ENGINEERING COLLEGE

(Managed by I.I.E.T Society)

363, Arcot Road, Kodambakkam, Chennai – 24
Approved by AICTE and Affiliated to Anna University

B.E / B.Tech. Laboratory Manual

DEPARTMENT OF INFORMATION TECHNOLOGY
CCS369 – TEXT AND SPEECH ANALYSIS
Sixth Semester
(Regulations 2021)
TABLE OF CONTENTS

S.No Date List of Programs Page No Signature

1. Create Regular expressions in Python for detecting

word patterns and tokenizing text
2. Getting started with Python and NLTK - Searching
Text, Counting Vocabulary, Frequency Distribution,
Collocations, Bigrams
3. Accessing Text Corpora using NLTK in Python
4. Write a function that finds the 50 most frequently
occurring words of a text that are not stop words.
5. Implement the Word2Vec model
Use a transformer for implementing classification
6.
7. Design a Chabot with a simple dialog system
8. Convert text to speech and find accuracy
9. Design a speech recognition system and find the error
rate
1.Create Regular expressions in Python for detecting word patterns and tokenizing text
Aim: To Create Regular expressions in Python for detecting word patterns and tokenizing text
Algorithm:
1. Import the re Module: Import the regular expressions module in Python.
2. Define the Text: Provide the text input where you want to detect patterns or tokenize.
3. Use Regular Expressions:
4. For detecting email addresses, URLs, dates, or any other pattern:
5. Use the re.findall() function with an appropriate regular expression pattern.
6. Regular expressions are defined using string literals prefixed with r.
7. For tokenizing text into words or sentences:
8. Use re.findall() or re.split() function with an appropriate regular expression pattern.
9. Print or Process the Output: After applying the regular expression, print or further process the
output as required.
Program:
1. Detecting Email Addresses:
import re
text = "Contact me at john@example.com or jane@example.com"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)
Output:
['john@example.com', 'jane@example.com']

2. Detecting URLs:
import re
text = "Visit my website at https://www.example.com"
urls = re.findall(r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', text)
print(urls)

Output:
['https://www.example.com']

3. Detecting Dates (MM/DD/YYYY format):

import re
text = "The event is scheduled for 12/31/2023"
dates = re.findall(r'\b(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])/(\d{4})\b', text)
print(dates)

Output:
[('12', '31', '2023')]

4. Tokenizing Text into Words:

import re
text = "This is a sample sentence, with punctuation."
words = re.findall(r'\b\w+\b', text)
print(words)
Output:
['This', 'is', 'a', 'sample', 'sentence', 'with', 'punctuation']

5. Tokenizing Text into Sentences:

import re
text = "This is the first sentence. This is the second sentence."
sentences = re.findall(r'[^.!?]+', text)
print(sentences)
Output:
['This is the first sentence', ' This is the second sentence']

Result:
Thus program was executed for all possible inputs.
2. Getting started with Python and NLTK - Searching Text, Counting Vocabulary, Frequency
Distribution, Collocations, Bigrams
Aim: To get start with Python and NLTK - Searching Text, Counting Vocabulary, Frequency
Distribution, Collocations, Bigrams
Algorithm:
1. Import NLTK and Download Necessary Resources
Import the NLTK library.
Download any necessary resources like tokenizers, stopwords, etc.
2. Load and Tokenize Text
Load the text you want to analyze.
Tokenize the text into individual words.
3. Count Vocabulary
Use a frequency distribution to count the occurrences of each word in the tokenized text.
4. Frequency Distribution Plot
Plot the frequency distribution to visualize the most common words.
5. Remove Stopwords
Remove common stopwords from the tokenized text to focus on meaningful words.
6. Collocations
Identify collocations, i.e., pairs of words that often occur together, in the text.
7. Bigrams
Generate bigrams, i.e., pairs of consecutive words, from the tokenized text.
8. Additional Analysis (Optional)
Perform additional analysis such as stemming, lemmatization, part-of-speech tagging, named
entity recognition, etc., depending on your specific requirements.
Program:
Step 1: Install NLTK
pip install nltk
Step 2: Import NLTK and Download Necessary Resources
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
Step 3: Load and Tokenize Text
from nltk.tokenize import word_tokenize
text = "Your text goes here."
tokens = word_tokenize(text.lower()) # Convert to lowercase for consistency
Step 4: Count Vocabulary
from nltk.probability import FreqDist
fdist = FreqDist(tokens)
print(fdist.most_common(10)) # Print 10 most common words and their frequencies
Output:
[('your', 1), ('text', 1), ('goes', 1), ('here', 1), ('.', 1)]
Step 5: Frequency Distribution
import matplotlib.pyplot as plt
fdist.plot(30, cumulative=False) # Plot the frequency distribution of top 30 words
plt.show()
Step 6: Remove Stopwords
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word not in stop_words]
Step 7: Collocations
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
bigram_measures = BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(filtered_tokens)
collocations = finder.nbest(bigram_measures.raw_freq, 10)
print(collocations)
Output:
[('text', 'goes'), ('goes', 'here')]
Step 8: Bigrams
from nltk import bigrams
bi_tokens = list(bigrams(filtered_tokens))
print(bi_tokens[:10]) # Print first 10 bigrams
Output:
[('text', 'goes'), ('goes', 'here')]
Result:
Thus program was executed for all possible inputs.
3. Accessing Text Corpora using NLTK in Python
Aim: To Access Text Corpora using NLTK in Python
Algorithm:
1. Import the necessary modules:
import nltk
from nltk.corpus import gutenberg
2. Download the Gutenberg corpus if not already downloaded:
nltk.download('gutenberg')
3. Get a list of file IDs in the Gutenberg corpus:
file_ids = gutenberg.fileids()
4. Print the first 5 file IDs:
for each file_id in file_ids[:5]:
print(file_id)
5. Print the raw text of the first book in the Gutenberg corpus:
raw_text = gutenberg.raw(file_ids[0])
print(raw_text[:500]) # Print the first 500 characters of the raw text

Program:
import nltk
from nltk.corpus import gutenberg
# Download the Gutenberg corpus if not already downloaded
nltk.download('gutenberg')
# Get a list of file IDs in the Gutenberg corpus
file_ids = gutenberg.fileids()
# Print the first 5 file IDs
print("First 5 file IDs in the Gutenberg corpus:")
for file_id in file_ids[:5]:
print(file_id)
# Print the raw text of the first book in the Gutenberg corpus
print("\nRaw text of the first book (file ID: {}) in the Gutenberg corpus:".format(file_ids[0]))
raw_text = gutenberg.raw(file_ids[0])
print(raw_text[:500]) # Print the first 500 characters
Output:
First 5 file IDs in the Gutenberg corpus:
austen-emma.txt
austen-persuasion.txt
austen-sense.txt
bible-kjv.txt
blake-poems.txt

Raw text of the first book (file ID: austen-emma.txt) in the Gutenberg corpus:
[Emma by Jane Austen 1816]

VOLUME I

CHAPTER I

Emma Woodhouse, handsome, clever, and rich, with a comfortable home

and happy disposition, seemed to unite some of the best blessings
of existence; and had lived nearly twenty-one years in the world
with very little to distress or vex her.

She was the youngest of the two daughters of a most affectionate,

indulgent father; and had, in consequence of her sister's marriage,
been mistress of his house from a very early period. Her mother

Result:
Thus program was executed for all possible inputs.
4.Write a function that finds the 50 most frequently occurring words of a text that are not stop
words.
Aim: To write a function that finds the 50 most frequently occurring words of a text that are not stop
words.
Algorithm:
Algorithm: FindMostCommonWordsNotStopwords(text)
1. Tokenize the input text into words.
2. Initialize an empty list to store filtered words.
3. Iterate through each word in the tokenized words:
a. Check if the word is alphanumeric and not a stop word.
b. If conditions are met, add the word to the list of filtered words.
4. Count the occurrences of each word in the filtered list.
5. Get the 50 most common words from the word counts.
6. Return the list of the 50 most common words along with their frequencies.

Program:
import nltk
from nltk.corpus import stopwords
from collections import Counter

def most_common_words(text):
# Tokenize the text
words = nltk.word_tokenize(text.lower())

# Filter out stop words

stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.isalnum() and word not in stop_words]

# Count the occurrences of each word

word_counts = Counter(filtered_words)

# Get the 50 most common words

most_common = word_counts.most_common(50)

return most_common

# Example usage:
text = "Write a function that finds the 50 most frequently occurring words of a text that are not stop
words."
result = most_common_words(text)
print(result)

Output:

[('function', 1), ('finds', 1), ('50', 1), ('frequently', 1), ('occurring', 1), ('words', 1), ('text', 1), ('stop', 1)]

Result:
Thus program was executed for all possible inputs.
5. Implement the Word2Vec model

Aim: to Implement the Word2Vec model

Algorithm:

1. Initialize the Word2Vec object:

a. Initialize with the corpus (a list of tokenized sentences), embedding dimension,
window size, and learning rate.
2. Initialize the vocabulary:
a. Iterate through the corpus to create a set of unique words. Assign a unique ID to each
word.
3. Generate training data:
a. For each sentence in the corpus, iterate through each word.
b. For each target word, create a training sample consisting of the context words within
the specified window and the target word.
4. Initialize weights:
a. Initialize input and output weights with random values.
5. Define the softmax function:
a. Implement the softmax function to convert the output layer activations into
probabilities.
6. Forward propagation:
a. Given a set of context words, compute their word vectors by looking up the input
weights.
b. Average the context word vectors to get the hidden vector.
c. Compute the output vector by multiplying the hidden vector with the output weights
and applying the softmax function.
7. Backward propagation:
a. Calculate the error between the predicted output probabilities and the actual one-hot
encoded target word.
b. Update the output weights using the calculated error.
c. Backpropagate the error to the hidden layer and update the input weights.
8. Training:
a. Iterate through the training data for a specified number of epochs.
b. For each training sample, perform forward propagation followed by backward
propagation to update the weights.
c. Monitor the loss to ensure the model is learning.
9. Get word vectors:
a. After training, retrieve the word vector for any word by looking up its ID in the input
weights matrix.
Program:

import numpy as np

class Word2Vec:
def __init__(self, corpus, embedding_dim, window_size=2, learning_rate=0.01):
self.corpus = corpus
self.embedding_dim = embedding_dim
self.window_size = window_size
self.learning_rate = learning_rate
self.word2id = {}
self.id2word = {}
self.vocab_size = 0
self.training_data = []
self.initialize()
def initialize(self):
words = [word for sentence in self.corpus for word in sentence]
unique_words = set(words)
self.vocab_size = len(unique_words)
for i, word in enumerate(unique_words):
self.word2id[word] = i
self.id2word[i] = word

def generate_training_data(self):
for sentence in self.corpus:
for i, target_word in enumerate(sentence):
context_words = []
for j in range(i - self.window_size, i + self.window_size + 1):
if j != i and j >= 0 and j < len(sentence):
context_words.append(sentence[j])
if context_words:
self.training_data.append((context_words, target_word))

def initialize_weights(self):
self.input_weights = np.random.uniform(-1, 1, (self.vocab_size, self.embedding_dim))
self.output_weights = np.random.uniform(-1, 1, (self.embedding_dim, self.vocab_size))

def softmax(self, x):

exp_scores = np.exp(x - np.max(x))
return exp_scores / np.sum(exp_scores, axis=0)

def forward_propagation(self, context_words):

context_ids = [self.word2id[word] for word in context_words]
context_vectors = self.input_weights[context_ids]
hidden_vector = np.mean(context_vectors, axis=0)
output_vector = np.dot(hidden_vector, self.output_weights)
output_probs = self.softmax(output_vector)
return context_vectors, hidden_vector, output_probs

def backward_propagation(self, context_vectors, hidden_vector, output_probs, target_word):

target_id = self.word2id[target_word]
output_probs[target_id] -= 1
delta_output_weights = np.outer(hidden_vector, output_probs)
delta_hidden = np.dot(self.output_weights, output_probs.T)
self.output_weights -= self.learning_rate * delta_output_weights
for i, word_id in enumerate(context_vectors):
self.input_weights[word_id] -= self.learning_rate * delta_hidden / len(context_vectors)

def train(self, epochs):

self.initialize_weights()
self.generate_training_data()
for epoch in range(epochs):
loss = 0
for context_words, target_word in self.training_data:
context_vectors, hidden_vector, output_probs = self.forward_propagation(context_words)
self.backward_propagation(context_vectors, hidden_vector, output_probs, target_word)
loss += -np.log(output_probs[self.word2id[target_word]])
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch + 1}/{epochs}, Loss: {loss / len(self.training_data)}")
def get_word_vector(self, word):
return self.input_weights[self.word2id[word]]

# Example usage:
corpus = [["I", "love", "machine", "learning"], ["Word2Vec", "is", "awesome"]]
model = Word2Vec(corpus, embedding_dim=50, window_size=1, learning_rate=0.01)
model.train(epochs=100)
print(model.get_word_vector("machine"))

Output:

[-0.03245519 0.04183602 -0.01209176 -0.02305655 -0.01387572 0.03659454 -0.01273221

0.00178715 -0.01727092 -0.04331982 -0.0317877 -0.01236467 0.01458189 0.02545701 -
0.04297935 -0.02299858 0.03251247 -0.01259002 -0.04324219 0.01820988 0.00780155
0.03067677 -0.00438742 -0.00966058 -0.02263807 0.00800504 0.00268421 -0.01903056 -
0.01612642 -0.02838806 -0.03124693 0.00717523 0.0119513 -0.00037263 -0.0044179 0.04108609
0.02327857 -0.01957577 0.0114521 -0.03742334 0.01128905 0.01118439 0.0056651 -0.02825621
-0.02848184 0.03293165 0.03045943 0.02971354 0.02128681 0.00445788]

Result:
Thus program was executed for all possible inputs.
6. Use a transformer for implementing classification

Aim: To use a transformer for implementing text classification

Algorithm:

1. Import the required libraries:

i. torch
ii. transformers from Hugging Face
iii. DataLoader, TensorDataset from torch.utils.data
iv. train_test_split from sklearn.model_selection
v. accuracy_score from sklearn.metrics

2. Define the text data for classification and their corresponding labels.

3. Tokenize the input texts using a pre-trained tokenizer (e.g., BERT tokenizer).

4. Split the data into train and test sets using train_test_split.

5. Create TensorDatasets for train and test sets.

6. Define a DataLoader for both train and test datasets.

7. Load a pre-trained transformer-based model suitable for sequence classification (e.g.,

BERTForSequenceClassification).

8. Define an optimizer (e.g., Adam) and a loss function (e.g., CrossEntropyLoss).

9. Set the number of training epochs.

10. Loop through each epoch:

a. Set the model to training mode.
b. Initialize a variable to accumulate the total loss for each batch.
c. Loop through each batch in the train DataLoader:
i. Zero out the gradients.
2. ii. Forward pass: Feed the batch inputs to the model.
3. iii. Compute the loss based on model outputs and true labels.
4. iv. Backward pass: Compute gradients of the loss w.r.t. model parameters.
5. v. Update the model parameters using the optimizer.
6. vi. Accumulate the total loss.
d. Print the average loss for the epoch.

11. Evaluate the model:

a. Set the model to evaluation mode.
b. Initialize empty lists for predictions and true labels.
c. Loop through each batch in the test DataLoader:
i. Forward pass: Feed the batch inputs to the model.
ii. Obtain the logits from the model output.
iii. Predict the labels by taking the argmax of logits.
iv. Extend the predictions and true labels lists with the batch predictions and labels.

12. Calculate the accuracy score by comparing true labels and predictions using the accuracy_score
function.

13. Print the accuracy score.

Program:

import torch
from transformers import BertTokenizer, BertForSequenceClassification
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Example data (replace with your own data)

texts = ["I really enjoyed the movie!",
"The book was boring.",
"The restaurant had amazing food.",
"The service was terrible."]
labels = [1, 0, 1, 0] # Binary labels (1 for positive, 0 for negative)

# Tokenize input texts

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokenized_texts = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')

# Split data into train and test sets

train_inputs, test_inputs, train_labels, test_labels = train_test_split(
tokenized_texts.input_ids,
labels,
random_state=42,
test_size=0.2
)

train_masks = tokenized_texts.attention_mask[train_inputs]
test_masks = tokenized_texts.attention_mask[test_inputs]

# Create TensorDatasets
train_dataset = TensorDataset(train_inputs, train_masks, torch.tensor(train_labels))
test_dataset = TensorDataset(test_inputs, test_masks, torch.tensor(test_labels))

# Define DataLoader
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=2, shuffle=False)

# Load pre-trained BERT model

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Define optimizer and loss function

optimizer = torch.optim.Adam(model.parameters(), lr=2e-5)
criterion = torch.nn.CrossEntropyLoss()

# Training loop
epochs = 3
for epoch in range(epochs):
model.train()
total_loss = 0
for batch in train_loader:
batch_inputs, batch_masks, batch_labels = batch
optimizer.zero_grad()
outputs = model(batch_inputs, attention_mask=batch_masks, labels=batch_labels)
loss = outputs.loss
total_loss += loss.item()
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss: {total_loss/len(train_loader)}")

# Evaluation
model.eval()
predictions = []
true_labels = []
with torch.no_grad():
for batch in test_loader:
batch_inputs, batch_masks, batch_labels = batch
outputs = model(batch_inputs, attention_mask=batch_masks)
logits = outputs.logits
predictions.extend(torch.argmax(logits, dim=1).tolist())
true_labels.extend(batch_labels.tolist())

# Calculate accuracy
accuracy = accuracy_score(true_labels, predictions)
print(f"Accuracy: {accuracy}")

Output:

Epoch 1, Loss: 0.29781007754802704

Epoch 2, Loss: 0.06795189094543457
Epoch 3, Loss: 0.03226285633420944

Accuracy: 0.95

Result:
Thus program was executed for all possible inputs.
7. Design a Chabot with a simple dialog system
Aim: To design a Chabot with a simple dialog system
Algorithm:
1. Define a dictionary responses where the keys are user inputs (e.g., "hi", "how are you?") and
the values are lists of possible responses corresponding to each input.
2. Define a function chatbot() to handle the chatbot interaction:
a. Print a welcome message.
b. Start a loop to continuously accept user input.
c. Convert the user input to lowercase for case insensitivity.
d. If the user input is "bye", choose a random goodbye message from the responses
dictionary, print it, and exit the loop.
e. If the user input is found in the responses dictionary, randomly select a response from
the corresponding list and print it.
f. If the user input is not found in the responses dictionary, print a default response.
3. Run the chatbot() function.
Program:
import random
# Define responses for different user inputs
responses = {
"hi": ["Hello!", "Hi there!", "Hey!"],
"how are you?": ["I'm good, thanks for asking!", "I'm doing well, how about you?"],
"what's your name?": ["I'm just a simple chatbot!", "You can call me ChatBot."],
"bye": ["Goodbye!", "See you later!", "Bye! Have a great day!"],
"default": ["Sorry, I didn't understand that.", "Could you please rephrase that?"]
}
def chatbot():
print("Welcome to the Simple ChatBot!")
print("You can start chatting with me. Type 'bye' to exit.")
while True:
user_input = input("You: ").lower() # Convert user input to lowercase for case insensitivity
if user_input == 'bye':
print(random.choice(responses["bye"]))
break
response = responses.get(user_input, responses["default"])
print("ChatBot:", random.choice(response))
# Run the chatbot
if __name__ == "__main__":
chatbot()

Output:
Welcome to the Simple ChatBot!
You can start chatting with me. Type 'bye' to exit.
You: hi
ChatBot: Hi there!
You: how are you?
ChatBot: I'm doing well, how about you?
You: What's your name?
ChatBot: You can call me ChatBot.
You: What is 2 + 2?
ChatBot: Sorry, I didn't understand that.
You: Bye
Goodbye!

Result:
Thus program was executed for all possible inputs.
8. Convert text to speech and find accuracy
Aim: To Convert text to speech and find accuracy
Algorithm:
1. Import Libraries:
Import the required libraries: gTTS, os, and difflib.
2. Define Text-to-Speech Function:
Create a function text_to_speech(text, filename) to convert the input text to speech.
Utilize the gTTS library to generate speech from the given text.
Save the generated speech as an audio file with the specified filename.
3. Define Accuracy Calculation Function:
Create a function calculate_accuracy(original_text, generated_text) to calculate the accuracy
between the original text and the generated speech.
Split both the original and generated text into words.
Use the SequenceMatcher from difflib to calculate the similarity ratio between the two sets of
words.
Convert the similarity ratio to a percentage (accuracy) and return it.
4. Main Function:
Define the main() function.
Provide a sample text to convert to speech.
Call the text_to_speech() function to generate speech from the sample text.
Read the generated speech from the saved file.
Calculate the accuracy between the original text and the generated speech using the
calculate_accuracy() function.
Print the accuracy.
5. Execution:
Call the main() function to execute the program.
6. Output:
Print the accuracy of the generated speech compared to the original text.
Program:
from gtts import gTTS
import os
import difflib
def text_to_speech(text, filename):
tts = gTTS(text=text, lang='en')
tts.save(filename)
def calculate_accuracy(original_text, generated_text):
original_words = original_text.split()
generated_words = generated_text.split()
matcher = difflib.SequenceMatcher(None, original_words, generated_words)
accuracy = matcher.ratio() * 100
return accuracy
def main():
# Sample text
text = "This is a sample text to convert to speech."
# Convert text to speech
text_to_speech(text, "generated_speech.mp3")
# Accuracy calculation
with open("generated_speech.txt", "r") as file:
generated_text = file.read().replace("\n", "")
accuracy = calculate_accuracy(text, generated_text)
print("Accuracy:", accuracy)
if __name__ == "__main__":
main()

Output:
Accuracy: 100.0

Result:
Thus program was executed for all possible inputs.
9.Design a speech recognition system and find the error rate
Aim: To design a speech recognition system and find the error rate
Algorithm:
1. Import the necessary libraries (e.g., SpeechRecognition).
2. Define a function `speech_recognition()` to recognize speech:
a. Initialize a recognizer object.
b. Use the default microphone as the audio source.
c. Adjust for ambient noise.
d. Capture audio input.
e. Try to recognize speech using Google Speech Recognition.
f. Handle exceptions for unknown value error and request error.
g. Return the recognized text or None if recognition fails.
3. Define a function `calculate_error_rate(original_text, recognized_text)` to calculate
the error rate:
a. Initialize a matrix to store Levenshtein distances.
b. Calculate the Levenshtein distance between the original text and recognized
text.
c. Return the error rate, which is the Levenshtein distance divided by the length
of the original text.
4. Define the main function:
a. Define the original text to compare with.
b. Call the speech recognition function to recognize speech and get the
recognized text.
c. If recognized text is not None:
i. Print the recognized text.
ii. Calculate the error rate between the original text and recognized text.
iii. Print the error rate.
5. Execute the main function.
Program:
import speech_recognition as sr
def speech_recognition():
recognizer = sr.Recognizer()
# Use the default microphone as the audio source
with sr.Microphone() as source:
print("Speak something:")
recognizer.adjust_for_ambient_noise(source) # Adjust for ambient noise
audio = recognizer.listen(source)
try:
# Recognize speech using Google Speech Recognition
text = recognizer.recognize_google(audio)
return text
except sr.UnknownValueError:
print("Sorry, could not understand audio")
return None
except sr.RequestError as e:
print("Could not request results; {0}".format(e))
return None
def calculate_error_rate(original_text, recognized_text):
# Calculate error rate using Levenshtein distance
if len(original_text) == 0:
return 0 if len(recognized_text) == 0 else 1
elif len(recognized_text) == 0:
return 1
matrix = [[0] * (len(recognized_text) + 1) for _ in range(len(original_text) + 1)]
for i in range(len(original_text) + 1):
matrix[i][0] = i
for j in range(len(recognized_text) + 1):
matrix[0][j] = j
for i in range(1, len(original_text) + 1):
for j in range(1, len(recognized_text) + 1):
if original_text[i - 1] == recognized_text[j - 1]:
substitution_cost = 0
else:
substitution_cost = 1
matrix[i][j] = min(matrix[i-1][j] + 1,
matrix[i][j-1] + 1,
matrix[i-1][j-1] + substitution_cost)
return matrix[len(original_text)][len(recognized_text)] / len(original_text)
def main():
original_text = "Hello, how are you?"
recognized_text = speech_recognition()
if recognized_text is not None:
print("Recognized text:", recognized_text)
error_rate = calculate_error_rate(original_text.lower(), recognized_text.lower())
print("Error rate:", error_rate)
if __name__ == "__main__":
main()

Output:
Speak something:
Recognized text: Hello how are you
Error rate: 0.1111111111111111

Result:
Thus program was executed for all possible inputs.

PMP Exam Prep - 2023 11th Edition (Rita Mulcahy, PMP With Margo Kirwin)
92% (63)
PMP Exam Prep - 2023 11th Edition (Rita Mulcahy, PMP With Margo Kirwin)
456 pages
Cambridge IGCSE® Mathematics Core and Extended-2022
82% (61)
Cambridge IGCSE® Mathematics Core and Extended-2022
891 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Cambridge Science Year 7 LB Lyp
85% (174)
Cambridge Science Year 7 LB Lyp
344 pages
Cambridge Primary Science Year 8 LB 2nd Edition
88% (85)
Cambridge Primary Science Year 8 LB 2nd Edition
333 pages
Igcse Maths 3ed Coursebook Answers
85% (81)
Igcse Maths 3ed Coursebook Answers
219 pages
Cambridge IGCSE™ and O Level Additional Mathematics Coursebook (Sue Pemberton Writer) (Z-Library)
97% (32)
Cambridge IGCSE™ and O Level Additional Mathematics Coursebook (Sue Pemberton Writer) (Z-Library)
496 pages
Business Plan Template: Professional Business Plan
81% (678)
Business Plan Template: Professional Business Plan
26 pages
Penis Enlargement Secret
61% (123)
Penis Enlargement Secret
12 pages
Most Common Interview Questions and Answers
93% (27)
Most Common Interview Questions and Answers
3 pages
Workbook Answers: Unit 1 Respiration
88% (223)
Workbook Answers: Unit 1 Respiration
30 pages
50 Most Common Interview Questions and Answers
95% (43)
50 Most Common Interview Questions and Answers
40 pages
1500 Vocabulary Words
80% (64)
1500 Vocabulary Words
27 pages
How To Win Friends and Influence People (PDFDrive)
99% (68)
How To Win Friends and Influence People (PDFDrive)
215 pages
English 10 Quarter 1 Module 1
88% (246)
English 10 Quarter 1 Module 1
32 pages
The Art of Saying NO
96% (26)
The Art of Saying NO
138 pages
How To Write A GREAT Business Plan
87% (244)
How To Write A GREAT Business Plan
31 pages
The Art of War
90% (21)
The Art of War
300 pages
Ielts Writing Task 1
95% (91)
Ielts Writing Task 1
123 pages
W. Williams, James - How to Read People Like a Book_ a Guide to Speed-Reading People, Understand Body Language and Emotions, Decode Intentions, And Connect Effortlessly (Practical Emotional Intelligen (1)
95% (40)
W. Williams, James - How to Read People Like a Book_ a Guide to Speed-Reading People, Understand Body Language and Emotions, Decode Intentions, And Connect Effortlessly (Practical Emotional Intelligen (1)
100 pages
How To Talk To Anyone About Anything Improve Your Social Skills Master Small Talk
95% (43)
How To Talk To Anyone About Anything Improve Your Social Skills Master Small Talk
103 pages
220 Speaking Topics
99% (134)
220 Speaking Topics
195 pages
Zero To One
96% (48)
Zero To One
200 pages
Job Interview Questions and Answers PDF
94% (32)
Job Interview Questions and Answers PDF
14 pages
Cambridge IGCSE & O Level Chemistry Exam Success
100% (15)
Cambridge IGCSE & O Level Chemistry Exam Success
228 pages
101 Best Microsoft Excel Tips & Tricks Ebook v1.3 - LM
100% (21)
101 Best Microsoft Excel Tips & Tricks Ebook v1.3 - LM
616 pages
CCS344 - EH Lab Manual
No ratings yet
CCS344 - EH Lab Manual
50 pages
FreshSpamVisaCards (BinnersHub)
100% (1)
FreshSpamVisaCards (BinnersHub)
9 pages
Cloud Service Management Lab
No ratings yet
Cloud Service Management Lab
34 pages
HOW TO TALK TO Anyone
100% (25)
HOW TO TALK TO Anyone
174 pages
12 Rules For Life by Jordan Peterson
67% (18)
12 Rules For Life by Jordan Peterson
50 pages
Ccs366 Sta Lab Final
No ratings yet
Ccs366 Sta Lab Final
41 pages
CN Lab Manual r22!3!1
100% (1)
CN Lab Manual r22!3!1
57 pages
Lab Cs3591 Computer Networks Lab
100% (2)
Lab Cs3591 Computer Networks Lab
38 pages
ADA Complete Notes
33% (3)
ADA Complete Notes
151 pages
Cs8582-Ooad Lab Manual
80% (5)
Cs8582-Ooad Lab Manual
152 pages
R18 Os Lab Manual PDF
75% (8)
R18 Os Lab Manual PDF
84 pages
650+ English Phrases For Everyday Speaking PDF
100% (29)
650+ English Phrases For Everyday Speaking PDF
50 pages
Ccs354 Network Security Lab
100% (1)
Ccs354 Network Security Lab
63 pages
Database Design Management Lab Manual
100% (1)
Database Design Management Lab Manual
96 pages
CS3501 Compiler Design Lab Manual
No ratings yet
CS3501 Compiler Design Lab Manual
43 pages
Cs3461 Operating System Lab Manual-1-4
100% (2)
Cs3461 Operating System Lab Manual-1-4
24 pages
It Web Essential Lab Manual
100% (1)
It Web Essential Lab Manual
33 pages
Cs3591 Computer Networks Lab Mannual
No ratings yet
Cs3591 Computer Networks Lab Mannual
41 pages
Cs3362 C Programming and Data Structures Lab Ece
No ratings yet
Cs3362 C Programming and Data Structures Lab Ece
105 pages
Ccs335 Cloud Computing L T P C 2 0 2 3
No ratings yet
Ccs335 Cloud Computing L T P C 2 0 2 3
3 pages
CS3591 Computer Networks Unit-01 Notes
No ratings yet
CS3591 Computer Networks Unit-01 Notes
87 pages
ME P4252-II Semester - MACHINE LEARNING
No ratings yet
ME P4252-II Semester - MACHINE LEARNING
48 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
CCS350 KNOWLEDGE ENGINEERING - Syllabus
No ratings yet
CCS350 KNOWLEDGE ENGINEERING - Syllabus
2 pages
ccs355 Lab Manual
No ratings yet
ccs355 Lab Manual
24 pages
CS3401 Algorithm Lab Manual
No ratings yet
CS3401 Algorithm Lab Manual
41 pages
Ccs354 Network Security Lab
67% (3)
Ccs354 Network Security Lab
62 pages
Soft Computing Lab Record
100% (1)
Soft Computing Lab Record
35 pages
Cd3291 Dsa Notes
100% (1)
Cd3291 Dsa Notes
168 pages
OBE Question Bank
100% (1)
OBE Question Bank
17 pages
Ad3511 Deep Learning Lab Manual III Yearjnn
No ratings yet
Ad3511 Deep Learning Lab Manual III Yearjnn
58 pages
CCT Mp4251 Unit1 To 5 Study Materials - Compressed
No ratings yet
CCT Mp4251 Unit1 To 5 Study Materials - Compressed
105 pages
AD3251 Data Structures Design Question Bank 1
No ratings yet
AD3251 Data Structures Design Question Bank 1
1 page
CS3591 - Computer Networks Important Questions
100% (1)
CS3591 - Computer Networks Important Questions
1 page
CCS339 - Crypto Currency Lab Manual
100% (2)
CCS339 - Crypto Currency Lab Manual
53 pages
CCS336 - CSM Lab 2021 R Template
No ratings yet
CCS336 - CSM Lab 2021 R Template
2 pages
Business Analytics Lab Manual-Ai
100% (1)
Business Analytics Lab Manual-Ai
93 pages
Me cp4212 Software Engineering Manual
No ratings yet
Me cp4212 Software Engineering Manual
34 pages
cs3401 - Algorithms Lab Manual Final
100% (1)
cs3401 - Algorithms Lab Manual Final
35 pages
CS3362 C Programming and Data Structures Laboratory
No ratings yet
CS3362 C Programming and Data Structures Laboratory
1 page
Final Copy Cp4291-Iot Lab Manual
No ratings yet
Final Copy Cp4291-Iot Lab Manual
49 pages
CS3491 Ai & ML Lab Manual
No ratings yet
CS3491 Ai & ML Lab Manual
57 pages
ccs372 - Virtualizaion Question Set
No ratings yet
ccs372 - Virtualizaion Question Set
5 pages
WAS Lab Manual - Full
No ratings yet
WAS Lab Manual - Full
58 pages
CS3311-Data Structures Model Lab QP Print (Set-1)
No ratings yet
CS3311-Data Structures Model Lab QP Print (Set-1)
2 pages
Lab Manual Daa Ad3351 Aids III Sem Regulation 2021
No ratings yet
Lab Manual Daa Ad3351 Aids III Sem Regulation 2021
48 pages
Cs3551 Distributed Computing L T P C
100% (2)
Cs3551 Distributed Computing L T P C
2 pages
Information Security Two Marks With Answer
No ratings yet
Information Security Two Marks With Answer
18 pages
CCS341-Data Warehousing Lab Manual (2021)
100% (1)
CCS341-Data Warehousing Lab Manual (2021)
50 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Ccs335-Cloud Lab Manual Complete
0% (1)
Ccs335-Cloud Lab Manual Complete
61 pages
Ad3301 Data Exploration and Visualization
100% (3)
Ad3301 Data Exploration and Visualization
30 pages
CS3481 - DBMS Lab Manual - New
100% (2)
CS3481 - DBMS Lab Manual - New
82 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
41 pages
AL3391 Notes Unit I
100% (1)
AL3391 Notes Unit I
52 pages
Bpops103-C Lab Manual
No ratings yet
Bpops103-C Lab Manual
56 pages
CS3551 Distributed Computing Unit5
No ratings yet
CS3551 Distributed Computing Unit5
31 pages
Drawing and Working With Animation: By: Mitul Patel
No ratings yet
Drawing and Working With Animation: By: Mitul Patel
43 pages
TSA Student
No ratings yet
TSA Student
20 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
53 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
21 pages
ue228120
No ratings yet
ue228120
8 pages
Top 5 DSA Patterns With Sample Leetcode Problems and Free Resources to Prepare
No ratings yet
Top 5 DSA Patterns With Sample Leetcode Problems and Free Resources to Prepare
3 pages
Exp 6 MAD 10
No ratings yet
Exp 6 MAD 10
3 pages
AIML
No ratings yet
AIML
6 pages
AnimalSpeciesSynthesis Cherisha.S
No ratings yet
AnimalSpeciesSynthesis Cherisha.S
5 pages
IELTS All in One PDF
75% (8)
IELTS All in One PDF
62 pages
Introduction To Corporate Finance
No ratings yet
Introduction To Corporate Finance
16 pages
Unit 1 Communication Skills (IX) 23-24
No ratings yet
Unit 1 Communication Skills (IX) 23-24
2 pages
BG Scholarship
No ratings yet
BG Scholarship
2 pages
Account Statement: Tizar Infra Projects Private Limited NO. 95 Palayam Bazaar Woraiyur Tiruchirappalli
No ratings yet
Account Statement: Tizar Infra Projects Private Limited NO. 95 Palayam Bazaar Woraiyur Tiruchirappalli
2 pages
Jindal SAW Presentation 2023
No ratings yet
Jindal SAW Presentation 2023
30 pages
CR - EPH - Session Notes
100% (1)
CR - EPH - Session Notes
113 pages
Essay Advice - General
No ratings yet
Essay Advice - General
5 pages
Diode-Application-PPTX
No ratings yet
Diode-Application-PPTX
31 pages
The Magic of Honey and It's Many Uses
No ratings yet
The Magic of Honey and It's Many Uses
8 pages
Lecture 3 - Bohr Models - Success and Its Failure
No ratings yet
Lecture 3 - Bohr Models - Success and Its Failure
31 pages
Trade To Success July - Aug. E-Book
0% (1)
Trade To Success July - Aug. E-Book
26 pages
String Quartet (Ravel) - Wikipedia
No ratings yet
String Quartet (Ravel) - Wikipedia
5 pages
Review of "An Economic Analysis of A Drug-Selling Gang's Finances" by Steven D. Levitt and Sudhir Alladi Venkatesh
No ratings yet
Review of "An Economic Analysis of A Drug-Selling Gang's Finances" by Steven D. Levitt and Sudhir Alladi Venkatesh
2 pages
Usage of Social Media
No ratings yet
Usage of Social Media
34 pages
Hematology OSCE Spotters
No ratings yet
Hematology OSCE Spotters
11 pages
EOQ Calculations 2023 SYBBA
No ratings yet
EOQ Calculations 2023 SYBBA
5 pages
Quick Reference Guide Libreoffice7.x en
No ratings yet
Quick Reference Guide Libreoffice7.x en
2 pages
BB Bill Template - Scribd
100% (1)
BB Bill Template - Scribd
3 pages
4 Self-Introduction Essay
No ratings yet
4 Self-Introduction Essay
5 pages
rohini_34133287865
No ratings yet
rohini_34133287865
7 pages
Legal Ethics
No ratings yet
Legal Ethics
19 pages
Hydraulic System For Airbus 330 Airliner
100% (1)
Hydraulic System For Airbus 330 Airliner
5 pages
Artistic Failure Final 1
No ratings yet
Artistic Failure Final 1
1 page
Nivel de Inglés
No ratings yet
Nivel de Inglés
37 pages
M.A. Psychology First Year Practical File MPCL-007
50% (2)
M.A. Psychology First Year Practical File MPCL-007
54 pages