Program 4:
Use word embeddings to improve prompts for Generative AI model. Retrieve similar words
using word embeddings. Use the similar words to enrich a GenAI prompt. Use the AI model to
generate responses for the original and enriched prompts. Compare the outputs in terms of
detail and relevance.
1. Importing Pre-trained Word Embeddings
import gensim.downloader as api
• gensim.downloader allows us to download pre-trained word embeddings from
gensim's model repository.
• These models provide pre-trained word vectors, so we don't have to train our own
Word2Vec model from scratch.
2. Loading the GloVe Model
model = api.load("glove-wiki-gigaword-50")
• This downloads and loads the GloVe (Global Vectors for Word Representation)
model trained on Wikipedia (glove-wiki-gigaword-50).
• The model contains word vectors of size 50 (50-dimensional vector representations
of words).
• Each word is mapped to a high-dimensional numerical representation, which helps
find semantic similarities.
3. Function Definition: enrich_prompt
def enrich_prompt(prompt, num_similar=3):
• This function takes a prompt (text input) and enriches it by adding similar words.
• num_similar=3: Specifies the number of similar words to add for each word in the
prompt.
4. Splitting the Prompt into Words
words = prompt.split()
• Splits the prompt (input sentence) into individual words.
5. Initialize an Empty List
enriched_words = []
• Creates an empty list enriched_words to store words along with their similar words.
6. Loop Through Each Word in the Prompt
for word in words:
• Iterates through each word in the prompt.
7. Finding Similar Words Using GloVe
try:
similar_words = [w for w, _ in model.most_similar(word,
topn=num_similar)]
• model.most_similar(word, topn=num_similar): Finds the num_similar (default
3) most similar words for the given word.
• It returns a list of tuples: (similar_word, similarity_score), but we only extract
similar_word.
• Example:
model.most_similar("cat", topn=3)
May return:
[('dog', 0.91), ('kitten', 0.85), ('feline', 0.83)]
Meaning "dog", "kitten", and "feline" are most similar to "cat".
enriched_words.append(word + " (" + ", ".join(similar_words) + ")")
• Formats the word by appending its similar words in parentheses.
• Example:
"cat" → "cat (dog, kitten, feline)"
• Appends this to the enriched_words list.
9. Handling Words Not Found in GloVe
except KeyError:
enriched_words.append(word)
• If the word is not found in the GloVe vocabulary, it remains unchanged.
• This avoids errors for uncommon words or typos.
10. Join Words Back Into a Sentence
return " ".join(enriched_words)
• Converts the list of enriched words back into a sentence.
11. Define an Original Prompt
original_prompt = "Write a story about a cat."
• This is the original input sentence.
12. Generate an Enriched Prompt
enriched_prompt = enrich_prompt(original_prompt)
• Calls the function enrich_prompt() with the input "Write a story about a
cat."
• Returns an enriched version of the prompt with similar words added.
13. Print the Results
print("Original Prompt:", original_prompt)
print("Enriched Prompt:", enriched_prompt)
• Prints both the original and enriched prompts.
Example Output
Original Prompt: Write a story about a cat.
Enriched Prompt: Write a (another, an, one) story (stories, book, tale)
about (than, there, more) a (another, an, one) cat.
• The function adds context to the prompt by suggesting words that are semantically
related.
Summary
• Loads pre-trained GloVe embeddings from gensim.
• Finds similar words for each word in the input prompt.
• Formats the enriched prompt by adding similar words in parentheses.
• Handles missing words gracefully.
• This technique can be used for prompt expansion, NLP applications, and creative
writing.
4a:
pip install gensim
import gensim.downloader as api
model = api.load("glove-wiki-gigaword-50")
def enrich_prompt(prompt, num_similar=3):
words = prompt.split()
enriched_words = []
for word in words:
try:
similar_words = [w for w, _ in model.most_similar(word, topn=num_similar)]
enriched_words.append(word + " (" + ", ".join(similar_words) + ")")
except KeyError:
enriched_words.append(word)
return " ".join(enriched_words)
original_prompt = "Write a story about a Dog."
enriched_prompt = enrich_prompt(original_prompt)
print("Original Prompt:", original_prompt)
print("Enriched Prompt:", enriched_prompt)
4b:
import gensim.downloader as api
import random
# Load the GloVe model (only needs to be done once)
try:
model = api.load("glove-wiki-gigaword-50")
except ValueError:
print("Downloading glove-wiki-gigaword-50 model...")
model = api.load("glove-wiki-gigaword-50")
def enrich_prompt(prompt, num_similar=3):
"""
Enriches a prompt by adding similar words to each word in the prompt.
Args:
prompt (str): The original prompt.
num_similar (int): The number of similar words to add.
Returns:
str: The enriched prompt.
"""
words = prompt.split()
enriched_words = []
for word in words:
try:
similar_words = [w for w, _ in model.most_similar(word, topn=num_similar)]
enriched_words.append(word + " (" + ", ".join(similar_words) + ")")
except KeyError:
enriched_words.append(word)
return " ".join(enriched_words)
def generate_story(prompt, length=100):
"""
Generates a simple story based on a prompt. This is a VERY basic
example and does not use any advanced language models. It's just
to illustrate the difference in story content.
Args:
prompt (str): The prompt to base the story on.
length (int): The approximate length of the story in words.
Returns:
str: A generated story.
"""
story = ""
words = prompt.split()
possible_next_words = words[:] # start with the words from the prompt
current_word = random.choice(words)
story += current_word + " "
for _ in range(length - 1):
next_word = random.choice(possible_next_words)
story += next_word + " "
possible_next_words.append(next_word) #add prev word
# Add some simple logic to make the story slightly more coherent.
if next_word in ["a", "an", "the"]:
possible_next_words.extend(words) # Boost words from the original prompt
if next_word in [".", "?", "!"]:
possible_next_words.extend(words[:]) # restart with key words from prompt
# add random meaningful words
meaningful_words = ["happily", "suddenly", "quietly", "jumped","ran", "slept","ate",
"thought","dreamed"]
possible_next_words.append(random.choice(meaningful_words))
return story + "."
# Example Usage:
original_prompt = "Write a story about a cat."
enriched_prompt = enrich_prompt(original_prompt)
print("Original Prompt:", original_prompt)
print("Enriched Prompt:", enriched_prompt)
original_story = generate_story(original_prompt)
enriched_story = generate_story(enriched_prompt)
print("\nOriginal Story:\n", original_story)
print("\nEnriched Story:\n", enriched_story)
# Compare the results
print("\nStory Lengths:")
print("Original Story:", len(original_story.split()))
print("Enriched Story:", len(enriched_story.split()))
print("\nOriginal Prompt Response Length:", len(original_prompt))
print("Enriched Prompt Response Length:", len(enriched_prompt))
inprotected.com