0% found this document useful (0 votes)
31 views18 pages

Section 1 NVIDIA Prompt Course

This document serves as an introduction to prompt engineering with large language models, detailing various techniques and tools provided by LangChain. It covers topics such as using NVIDIA Inference Microservices, interacting with the OpenAI API, streaming and batching model responses, and the importance of iterative prompt development. Additionally, it discusses creating reusable prompt templates to enhance functionality and efficiency when working with language models.

Uploaded by

Parth Kokil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views18 pages

Section 1 NVIDIA Prompt Course

This document serves as an introduction to prompt engineering with large language models, detailing various techniques and tools provided by LangChain. It covers topics such as using NVIDIA Inference Microservices, interacting with the OpenAI API, streaming and batching model responses, and the importance of iterative prompt development. Additionally, it discusses creating reusable prompt templates to enhance functionality and efficiency when working with language models.

Uploaded by

Parth Kokil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Section 1: Introduction to Prompting¶

This section is the beginning of grand endeavor into working programmatically


with large language models by way of prompt engineering. In this section you'll
start with the most basic of considerations, like what kinds of models are we
going to work with today, and how can we send prompts to them and view their
responses. You'll gradually build towards more complex prompts, and begin
learning about the vast surface area of helpful tooling for working with large
language models provided to us by LangChain. But you may be surprised, even
with the introductory techniques covered in this section, by the end you'll be
performing exercises that demonstrate the capacity to perform a significant
amount of working using large language models.

Section Table of Contents


1. NVIDIA NIM for Prompt Engineering: In this notebook we introduce
NVIDIA NIMs and how we are going to be using them in today's course
environment.
2. Hello World with the OpenAI Library: In this notebook, you will learn
how to interact with the OpenAI API to generate text completions using the
Llama 3.1 8b model.
3. Hello World with LangChain: In this notebook, we will learn how to
interact with LangChain to generate chat completions using the Llama 3.1
8b model.
4. Streaming and Batching: In this notebook you'll learn how to stream
model responses and handle multiple chat completion requests in batches.
5. Iterative Prompt Development: In this notebook, you will learn the
importance of iterating on prompts to achieve the desired responses from
an LLM and explore how to write prompts that are specific.
6. Prompt Templates: In this notebook you'll learn how to capture reusable
LLM functionality in prompt templates, and begin working with the
powerful prompt template tools provided by LangChain.

Objective
By the time you complete this notebook you will be able to:
 Know how we are going to utilize NVIDIA Inference Microservices to
conduct prompt engineering
 Introduce benefits of running a locally-hosted large language model as
opposed to API-hosted LLM
Hello World with OpenAI Library
Objectives
By the time you complete this notebook you will:
 Understand how to set up and use the OpenAI library.
 Generate text completions using the Llama 3.1 8b instruct model.
 Learn to interpret and utilize the API response.
 Understand the importance of using chat completion endpoints with chat
models like Llama 3.1 8b instruct.

CODE:
from openai import OpenAI
base_url = 'http://llama:8000/v1'
api_key = 'an_arbitrary_string'
client = OpenAI(base_url= base_url, api_key=api_key)
available_models = client.models.list()
available_models
available_models.data[0].id
model = 'meta/llama-3.1-8b-instruct'
prompt = 'Tell me a short fun fact about space.'
prompt = 'What is the OpenAI API?'
response = client.chat.completions.create(
model=model,
messages=[{'role': 'user', 'content': prompt}]
)
model_response = response.choices[0].message.content
print(model_response)

Hello World With LangChain


By the time you complete this notebook, you will:
 Have an introductory understanding of LangChain.
 Generate simple chat completions using LangChain.
 Compare the differences between using LangChain and
the OpenAI library for chat completion.

from langchain_nvidia_ai_endpoints import ChatNVIDIA


base_url = 'http://llama:8000/v1'
model = 'meta/llama-3.1-8b-instruct'
llm = ChatNVIDIA(base_url=base_url, model=model,
temperature=0)
# You may have noticed we set a value called temperature to 0. temperature. which is a floating
point value between 0 and 1 is a way to control the randomness of a model's responses. When set
to 0, the LLM will always generate the text that it considers as having the highest probability of
coming next. When set to higher values, it can generate text that is not necessarily what it
considers to be the highest probability of coming next therefore introducing randomness and a
sense of creativity in its generations.

We won't discuss modifying temperature to higher values in great detail, but remember,
set it to 0 if you want deterministic responses, and set it higher if you want less
deterministic (i.e. more creative) responses.

prompt = 'Who are you?'


result = llm.invoke(prompt)
print(result)
output: content='I\'m an artificial intelligence model known as Llama.
Llama stands for "Large Language Model Meta AI."'
response_metadata={'role': 'assistant', 'content': 'I\'m an artificial
intelligence model known as Llama. Llama stands for "Large Language
Model Meta AI."', 'token_usage': {'prompt_tokens': 16, 'total_tokens': 39,
'completion_tokens': 23}, 'finish_reason': 'stop', 'model_name':
'meta/llama-3.1-8b-instruct'} id='run-947ead1d-8933-4120-97d3-
32def5809c46-0' role='assistant'
The result is similar to what we obtained using the OpenAI client, but it
also includes metadata about the conversation and token usage. This will
be useful for maintaining conversation context in more advanced
applications.
print(result.content)
output: I'm an artificial intelligence model known as Llama. Llama stands
for "Large Language Model Meta AI."
#By completing this notebook, you should now have a basic
understanding of how to use LangChain to generate chat completions and
parse out the model response, which we hope you'll agree, is quite
straight forward.
In the next notebook, you'll go a little further into using chat completions
with LangChain by learning how to stream model responses and handle
multiple chat completion requests in batches.

Note14: Streaming and Batching

Objectives
By the time you complete this notebook, you will:
 Learn to stream model responses.
 Learn to batch model responses.
 Compare the performance of batch processing to single prompt chat
completion.

From langchain_nvidia_ai_endpoints import ChatNVIDIA


#Create a module instance
base_url = 'http://llama:8000/v1'
model = 'meta/llama-3.1-8b-instruct'
llm = ChatNVIDIA(base_url=base_url, model=model, temperature=0)
prompt = 'Where and when was NVIDIA founded?'
result = llm.invoke(prompt)
print(result.content)
Streaming Responses
As an alternative to the invoke method, you can use the stream method to
receive the model response in chunks. This way, you don't have to wait for
the entire response to be generated, and you can see the output as it is
being produced. Especially for long responses, or in user-facing
applications, streaming output can result in a much better user
experience.
prompt = 'Explain who you are in roughly 500 words.'
#Given this prompt, let's see how the stream function works.
for chunk in llm.stream(prompt):
# The stream method in LangChain serves as a foundational tool and
shows the response as it is being generated. This can make the
interaction with the LLMs feel more responsive and improve the user
experience.
print(chunk.content, end='')

Batching Responses
You can also use batch to call the prompts on a list of inputs. Calling batch
will return a list of responses in the same order as they were passed in .

Batch method is designed to process multiple prompts concurrently,


effectively running the responses in parallel as much as possible. This
allows for more efficient handling of multiple requests, reducing the
overall time needed to generate responses for a list of prompts. By
batching requests, you can leverage the computational power of the
language model to handle multiple inputs simultaneously, improving
performance and throughput.

state_capital_questions = [
'What is the capital of California?',
'What is the capital of Texas?',
'What is the capital of New York?',
'What is the capital of Florida?',
'What is the capital of Illinois?',
'What is the capital of Ohio?'
]
capitals = llm.batch(state_capital_questions)
len(capitals)
for capital in capitals:
print(capital.content)

OUTPUT: The capital of California is Sacramento.


The capital of Texas is Austin.
The capital of New York is Albany.
The capital of Florida is Tallahassee.
The capital of Illinois is Springfield.
The capital of Ohio is Columbus.
Exercise: Batch Process to Create an FAQ Document
For this exercise you'll use batch processing to respond to a variety of
LLM-related questions in service of creating an FAQ document (in this
notebook setting the document will just be something we print to screen).
Here is a list of LLM-related questions.
faq_questions = [
'What is a Large Language Model (LLM)?',
'How do LLMs work?',
'What are some common applications of LLMs?',
'What is fine-tuning in the context of LLMs?',
'How do LLMs handle context?',
'What are some limitations of LLMs?',
'How do LLMs generate text?',
'What is the importance of prompt engineering in LLMs?',
'How can LLMs be used in chatbots?',
'What are some ethical considerations when using LLMs?'
]
# your job is to populate faq_answers below with a list of responses to each of
the questions. Use the batch method to make this very easy.

Upon successful completion, you should be able to print the return value
of calling the following create_faq_document with faq_questions and
faq_answers and get an FAQ document for all of the LLM-related questions
above

faq_answers = llm.batch(faq_questions)
def create_faq_document(faq_questions, faq_answers):
faq_document = ''
for question, response in zip(faq_questions, faq_answers):
faq_document += f'{question.upper()}\n\n'
faq_document += f'{response.content}\n\n'
faq_document += '-'*30 + '\n\n'

return faq_document
print(create_faq_document(faq_questions, faq_answers))

Summary
In this notebook you learned how to stream and batch model responses,
and used batched LLM calls to generate a helpful FAQ document.
In the next notebook you'll begin focusing more heavily on the creation of
prompts themselves with an emphasis on iterative prompt development
and engineering prompts that are very specific.

Note15: Iterative Prompt Development

Introduction to Prompt Iteration


Prompt iteration involves refining and modifying prompts to
achieve more accurate and relevant responses from the language
model. The goal is to make prompts that are as specific and clear
as possible to guide the model towards the desired outcome.

Objectives
By the time you complete this notebook you will:
 Get comfortable with a process of iterative prompt development.
 Understand the importance of prompt specificity.
 Learn how to work properly with multi-line string prompts.

from langchain_nvidia_ai_endpoints import ChatNVIDIA


#we will use the following helper function to print streaming responses from the LLM.

def sprint(stream):
for chunk in stream:

print(chunk.content, end='')

base_url = 'http://llama:8000/v1'

model = 'meta/llama-3.1-8b-instruct'

llm = ChatNVIDIA(base_url=base_url, model=model, temperature=0)

## Prompt Iteration Example: Learning to Bake a Cake

For the sake of exploring the iterative process, however, we'll start with a very general
prompt.

prompt = 'Tell me about cakes.'

sprint(llm.stream(prompt))

we could have given them this simple statement and gotten a reply that we needed, but
when prompting LLMs we should always aim to be specific.

prompt = 'Tell me about baking cakes.'

sprint(llm.stream(prompt))

Let's try again, this time being even more specific:


prompt = 'How do I bake a cake?'
sprint(llm.stream(prompt))
This is a major improvement, perhaps even sufficient, but given the details of
what we really want, we can be even more specific.
prompt = '''\
I want to bake a cake but have never done it. \
I need step by step instructions for what to buy, how to bake the cake, how to
decorate it, and how to serve and store it. \
I need estimated times for every step. I just want a list I can follow from
beginning to end.'''
sprint(llm.stream(prompt))

Lengthy Prompts
Of note for the last prompt is that it was significantly longer than our previous prompts. In
general, you should not be shy about writing lengthy prompts, which ultimately result in
better opportunities to be highly specific.
Exercise: Practice Writing Specific Prompts
For this exercise, you'll attempt a toy problem that will force you to work on your prompt
specificity, and likely manage multi-line prompts.

Your goal is to write a prompt that will get the model to respond with the following exact
text.

target_address = """\
Some Company
12345 NW Green Meadow Drive
Portland, OR 97203"""
You should store the content of the model's response in a variable called
llm_address, which we'll define for now as an empty string.

llm_address = ''
When you've successfully completed the exercise, the following comparison
should return True.

llm_address == target_address

### Your Work Here


prompt = """\
Write the following target address, exactly like I pass it to you. \
Don't add any additional text or comment or helpful dialogue, just the address:

Some Company
12345 NW Green Meadow Drive
Portland, OR 97203
"""
llm_address = llm.invoke(prompt).content
print(llm_address)
llm_address == target_address
True
Prompt Injection
While we're discussing prompt specificity, we'd like to take a moment to discuss
an exploit you will need to look out for that leverages prompt specificity to ill
effect. The exploit is called prompt injection.
Summary
By completing this notebook, you've begun to internalize the process of iterative
prompt development and recognize the importance of specificity in prompt
engineering.
When writing prompts to be used in applications (and not just as one-off prompts
in a converstaion with an LLM), we typically, after crafting a prompt that appears
to work well for a given use case, want to generalize the prompt into a template
that can be reused with a variety distinct inputs. In the next notebook we'll
discuss capturing LLM functionality in prompt templates, and introduce working
with prompt templates in LangChain.

NOTE B. 16: # Prompt Templates


Objectives
By the time you complete this notebook you will:
 Appreciate the need and ability to capture LLM-related tasks in
prompt templates.
 Be able to create reusable prompt templates with LangChain.
 Use prompt templates to perform a variety of LLM-powered tasks on
a collection of provided text samples.

from langchain_nvidia_ai_endpoints import ChatNVIDIA


from langchain_core.prompts import ChatPromptTemplate

base_url = 'http://llama:8000/v1'
model = 'meta/llama-3.1-8b-instruct'
llm = ChatNVIDIA(base_url=base_url, model=model, temperature=0)

Prompt Templates As Reusable Functionality


Prompting is not so different. If you have a one off task, you just write a
prompt for it:
one_off_prompt = "Translate the following from English to Spanish: 'Today
is a good day.'"
print(llm.invoke(one_off_prompt).content)

If, however, you'd like to create reusable functionality, you might abstract
part of the prompt away into arguments so that you're left with something
you could reuse with arbitrary inputs, like the following:

def translate_from_english_to_spanish(english_statement):
return f"Translate the following from English to Spanish. Provide just the
translated text: {english_statement}"
english_statements = [
'Today is a good day.',
'Tomorrow will be even better.',
'Next week, who can say.'
]
prompts = [translate_from_english_to_spanish(english_statement) for
english_statement in english_statements]
prompts
translations = llm.batch(prompts)
for translation in translations:
print(translation.content)

Our translate_from_english_to_spanish function therefore creates a


prompt template that capture the functionality of translating an English
statement to Spanish.
Of course we could abstract even more out of our prompt and create an
even more general template, if we wish:
def translate(from_language, to_language, statement):
return f"Translate the following from {from_language} to
{to_language}. Provide only the translated text: {statement}"
print(llm.invoke(translate('English', 'French', 'Computers have many
languages of their own')).content)
LangChain's ChatPromptTemplate.from_template

Section 2: LangChain Expression


Language (LCEL), Runnables, and
Chains
 LangChain Expression Language and Chains: In this notebook you
will learn about LangChain runnables, and the ability to compose them
into chains using LangChain Expression Language (LCEL).
 Runnable Functions: In this notebook you will learn how to convert
custom functions into runnables that can be included in LangChain chains.
 Combining Chains: In this notebook you'll learn how to compose multiple
LLM-related chains.
 Parallel Chains: In this notebook you'll learn how to create and use
parallel chains.

1. LangChain Expression Language and Chains

Objectives
By the time you complete this notebook you will:
 Understand LangChain runnables as units of work in LangChain.
 Intentionally use LLM instances and prompt templates as runnables.
 Create and use runnable output parsers.
 Compose runnables into LangChain chains using LCEL pipe syntax.

LangChain Expression Language (LCEL)


LCEL is a declaritive way to compose runnables into chains: reusable
compositions of functionality. We chain runnables together through LCEL's pipe |
operator, which at a high level, will pipe the output of one runnable to the next.
For those of you who have worked with the Unix command line, you'll be familar
with the | operator as a way to chain together the functionality of various
programs in service of a larger goal.
If you don't know any Bash, don't worry too much about the following cell, but for
those of you who do, you'll see we create a chain via the pipe operator to print
"hello pipes" with echo, reverse the string with rev and then uppercase the
reversed string with tr.

%%bash
echo hello pipes | rev | tr 'a-z' 'A-Z'

Output Parsers
In LangChain, Output Parsers are used to convert the raw text output from
a language model (like ChatGPT) into a structured format that your app or
code can easily work with — like Python dictionaries, JSON, or numbers.

Let's begin with perhaps the most straightforward output parser,


StrOutputParser, which is going to save us all the repetitive boilerplate of fishing
the content field out of our model responses.

from langchain_core.output_parsers import StrOutputParser


parser = StrOutputParser()
parser.batch(['parse this string', 'and this string too'])

Additionally, and most importantly, we would also expect to be able to use


parser in a chain. Let's recreate the chain from earlier but extend it by piping the
model ouput into the output parser.

chain = template | llm | parser


print(chain.get_graph().draw_ascii())

And now let's invoke the chain, passing in the expected arguments.

chain.invoke({"question": "Who invented the use of the pipe symbol in Unix


systems?"})

Exercise: Translation Revisited


Create a chain that is able to translate a given statement, source language, and
target languages you specify.

translate_template = ChatPromptTemplate.from_template("""Translate the


following statement from {from_language} to {to_language}. \
Provide only the translated text: {statement}""")
translation_chain = translate_template | llm | parser
print(translation_chain.get_graph().draw_ascii())
translation_chain.input_schema.schema()
translation_chain.invoke({
"from_language": "English",
"to_language": "German",
"statement": "No matter who you are it's fun to learn new things."
})

In this notebook you learned how to work with runnables, and in particular, 3 of
the core LangChain runnables: LLM instances, prompt templates, and output
parsers.
In the next notebook we will continue our focus on creating and composing
runnables by introducing the ability to create custom runnables.

Runnable Functions
Objectives
By the time you complete this notebook you will:
 Understand how to create custom runnable functions and include them in
your LangChain chains.
 Use custom runnable functions to preprocess data before sending it to an
LLM.
 Use custom functions to batch translate raw text into prompt templates.
 Create a LangChain sentiment analysis chain utilizing multiple custom
runnable functions.

Using RunnableLambda to Create Custom Runnable Functions


What is RunnableLambda in LangChain?
RunnableLambda lets you turn any normal Python function into a
LangChain runnable.

💡 Why is this useful?


LangChain's pipelines are made of runnables — like LLMs, prompts, parsers. But
what if you want to add your own logic, like a calculation or a filter?
👉 You can wrap your own function with RunnableLambda and plug it into the
chain.
from langchain_core.runnables import RunnableLambd
def double(x):
return 2*x
#It should come as no surprise that this simple Python function does not have a
LangChain runnable's invoke (or batch or stream) method.
try:
double.invoke(2)
except AttributeError:
print('`double` is a Python function and does not have an `invoke` method.')
#However, we can easily convert it into a LangChain runnable by passing it into
RunnableLambda.
runnable_double = RunnableLambda(double)
runnable_double.invoke(6)
runnable_double.batch([2,4,6,8])
#[4, 8, 12, 16]
Like other runnables, custom function runnables like runnable_double can be
composed into chains.
multiply_by_eight = runnable_double | runnable_double | runnable_double
multiply_by_eight.invoke(11)

Data Management
Whether for formatting, correction, or validation, you may wish to perform some
work on data passing through your chains either before or after interacting with
an LLM.
As an example, suppose you are building a sentiment analysis application where
user reviews are analyzed for their sentiment. User reviews can contain various
inconsistencies like mixed capitalization, extra whitespace, and contractions.
Normalizing this text before sending it to the LLM can improve the accuracy of
the sentiment analysis.
The following normalize_text function will normalize text by converting it to
lowercase, expanding contractions, and removing extra whitespace.
import re
import contractions # pip install contractions

def normalize_text(text):
# Convert text to lowercase
text = text.lower()

# Expand contractions
text = contractions.fix(text)

# Remove extra whitespace


text = re.sub(r'\s+', ' ', text).strip()
Exercise: Create Runnable Function to Normalize Text
Use what you've learned so far about creating runnable functions to create one
out of the normalize_text function provided above.
Upon successful implementation, you should be able to use it to batch process
the following toy list of reviews.
Feel free to check out the Solution below if you get stuck.
reviews = [
"I LOVE this product! It's absolutely amazing. ",
"Not bad, but could be better. I've seen worse.",
"Terrible experience... I'm never buying again!!",
"Pretty good, isn't it? Will buy again!",
"Excellent value for the money!!! Highly recommend."
]

RunnableLambda(normalize_text).batch(reviews)

Formatting Text for Prompt Templates


normalized_reviews = [
'i love this product! it is absolutely amazing.',
'not bad, but could be better. i have seen worse.',
'terrible experience... i am never buying again!!',
'pretty good, is not it? will buy again!',
'excellent value for the money!!! highly recommend.'
]
Let us assume now that we would like to pipe these normalized reviews in to a
prompt template for sentiment analysis like the following sentiment_template.
sentiment_template = ChatPromptTemplate.from_template("""In a single word,
either 'positive' or 'negative', \
provide the overall sentiment of the following piece of text: {text}""")

We know from the previous notebook that to invoke the above template, we
need to pass in a dictionary that contains keys for its placeholders ({text} in the
above template), for example:
sentiment_template.invoke({"text": 'i love this product! it is absolutely
amazing.'})
#Therefore, in order to prepare the items in normalized_review for being piped
into sentiment_template, we need to convert each line of text into a dictionary
with the key "text" and the value the actual line of text.
Let's create a runnable lambda to accomplish this. For this function we'll use an
actual lambda function since the work we need to do is so minimal and define
the runnable lambda straightaway.
prep_for_sentiment_template = RunnableLambda(lambda text: {"text": text})
#We can now use prep_for_sentiment_template to prep normalized_reviews for
sentiment_template.
prep_for_sentiment_template.batch(normalized_reviews)
[{'text': 'i love this product! it is absolutely amazing.'},
{'text': 'not bad, but could be better. i have seen worse.'},
{'text': 'terrible experience... i am never buying again!!'},
{'text': 'pretty good, is not it? will buy again!'},
{'text': 'excellent value for the money!!! highly recommend.'}]

Exercise: Create a Sentiment Analysis Chain


For this exercise, create a sentiment analysis chain that you can pass the original
reviews list above into as a batch.
Your chain should:
 Normalize the raw reviews.
 Prepare the normalized reviews for use in sentiment_template (defined
above).
 Pipe the prepared normalized reviews through the sentiment_template.
 Pipe the prompt templates to llm (already defined above).
 Conclude by parsing the LLM outputs with an instance of StrOutputParser,
which you will need to instantiate.

parser = StrOutputParser()
sentiment_chain = RunnableLambda(normalize_text) |
prep_for_sentiment_template | sentiment_template | llm | parser
sentiment_chain.batch(reviews)

output: ['Positive', 'Neutral', 'Negative', 'Positive', 'Positive']

Combining Chains¶

You might also like