Generative AI Notes
Generative AI Notes
So, in this section, we are going to learn about generative AI and Amazon Bedrock. So, actually
nowadays, when you hear about AI, most people mean generative AI. That's because when
ChatGPT came out, everyone was thinking about AI and how to interact with it with prompts in the
chatbot. But AI is so much more. Still, in this section, we're gonna talk about generative AI, and
Amazon Bedrock is the main service on AWS that does generative AI. This is actually one of the
main topic of the exam and also one of the fastest growing AWS service. So I hope you're excited,
and I will see you in this section.
So let's talk about Generative AI, just before we go into Amazon Bedrock, which is a service for
Gen AI on AWS. But I first wanna take a step back, and just understand what is Gen AI? So
Generative AI or Gen AI is going to be a subset of deep learning, which is a subset of machine
learning, which is a subset of AI. So Gen AI, as the name indicates, is used to generate new data
that is going to be similar to the data it was trained on. So what type of data can we train a Gen AI
with? Well, we can train it on text, on images, on audio, on code and video, and a lot more,
whatever you think of really. So the Generative AI model is going to take a lot of trained data.
Here's an example where we are going to give it a lot of dogs images. So we have our trained data
set, which are going to be dogs, but also we are going to feed them cartoons. So we're gonna have
a bunch of different cartoons that are going to be hand drawn. And the generative model is going to
see so many of these trained data that is going to understand what is a dog, what is a cartoon, and
then if we ask it, "Can you generate a cartoon dog?" It's going to be smart enough to combine the
two together and create a dog that looks like a cartoon. And that is the whole power of Generative
AI, is that it's able to combine its knowledge into new and unique ways. So, we're going to start with
a lot of unlabeled data, and we'll see that means in the later section. And we're going to train what's
called a foundation model. And foundation model are very broad, they're very big, very wide, and
they can adapt to different kind of general tasks. For example, a good foundation model can
generate some text, can summarize some texts, can extract information, can generate images, can
become a chatbot, and can answer any types of questions you have. So as a whole, we feed a lot
of data into a foundational model, which has the option to do a lot of different tasks. So now let's
talk about foundation models. So in order to generate data, as we said, we need to have a
foundation model, and they're trained on variety of inputs. But to let you know how big these models
are, usually to train a foundation model, a good one, it may cost tens of millions of dollars to train,
because it is very computational heavy. It takes a lot of time to train it, and a lot of data. So only a
few big companies, usually, are creating their own foundation model. So here is an example. So we
have GPT-4o, that is the name of the foundation model behind ChatGPT, which is the application
where you can chat with an AI. But there is a wide selection of foundation models from different
companies. So we have OpenAI, this is the company behind ChatGPT, and GPT-4o for example.
We have Meta, so this is the new company behind Facebook. We have Amazon, Google, and
Anthropic, and of course a lot more. But these companies are pretty big, or they have a lot of money
to invest into building these foundation models. So some of these models are going to be open
source, for example, they're going to be free. For example, Meta is working a lot on open source
models. We have Google BERT as well, which was one of the first models in the Gen AI space,
which also is open source. But some are under a commercial license, for example, OpenAI you
have to pay to use GPT at a certain level. Anthropic and so on. And so we're going to see how we
can access these models on AWS as well. So next, after the foundation model, we have the Large
Language Models or LLM. So LLMs are a type of AI that is relying again on a foundation model, but
designed to generate coherent human-like text. So you've been exposed to LLMs a lot before. One
of these is ChatGPT, the GPT-4 type of foundation model, an LLM from the company OpenAI. So I
went to ChatGPT and I asked it, "Are you an LLM?" And it replied, "Yes, I am a Large Language
Model developed by OpenAI, and I can understand and generate human-like text based on the
input I receive." So this was text that looks like a human wrote it, that was answered by ChatGPT
when I asked it, "Are you an LLM?" So the way LLM works is that they're trained on very large
amount of text data. So they're usually very, very, very big models, very heavy, very computational
heavy to use. We're talking about billions of parameters. They're trained on a lot of books, articles,
websites, data, or any other type of text data that is deemed good enough for the LLM training. So it
can perform any wide range of language related tasks such as translation, summarization, question
answering, content creation, and so on. So how does it work to use an LLM? So for this, we give it a
prompt. So the prompt is a question, a bunch of texts that you're going to send to the Gen AI model,
the LLM, for example, "What is AWS?" That's the prompt. And we'll have a whole section in this
course to understanding how to create a good prompt. Then the model is going to leverage all the
existing content it has internally, learned from, and then it's going to look at the prompt and answer
it. And so when I asked ChatGPT, "What is AWS?" I get the answer, "AWS is a comprehensive
cloud computings platform provided by Amazon." And you can read the rest, it's a long answer. But
you have to know something, and it's a term you have to learn, which is that the output, the
generated text, is non-deterministic. That means that for every user that is using the same prompt,
you will not necessarily, and usually not, get the same answer. So I went a second time and opened
a new chat window and asked, "What is AWS?" And if you take some time to read this answer, this
was against ChatGPT, you will see that while the answers are similar, and they explain the same
thing, pretty much, they are not the same exact answers. And so this is why it's non-deterministic.
So let's understand why though it is non-deterministic. So let's take a sentence that is going to be,
I'll put it by a LLM, and the sentence is, "After the rain, the streets were." And what's going to
happen is that the LLM is going to generate a list of potential words with probabilities. So what is the
next word that is going to be probably here in this sentence? So the generative model is going to
think, and it's gonna say, okay, maybe it's wet, and there's a 0.4 out of one chance that it's going to
be wet. Or flooded, 0.25. Slippery, 0.15. Empty, muddy, clean, blocked, and so on. So all these
sentences make sense, but there are probabilities that means that some of these words are more
likely to be the next word in that sentence. And an algorithm is going to of course, compute these
probabilities, and another one is going to select a word from that list based on the probabilities. And
for example, we're going to choose the word, flooded. So it's going to be, "After the rain, the streets
were flooded." And this is all done by the Gen AI model. So now we have, "After the rain, the streets
were flooded," and the same process happens over and over again. So what is the next word?
Well, it could be and, with, but, from, until, because, and even a dot. So "After the rain, the streets
were flooded dot," and that's the end of the sentence. So all of these things, again, have associated
probabilities. And then the next word is going to be selected based on these probabilities. So this is
why when you ask the AI twice the same prompt, you may not get the same answers, it is because
the sentence is determined thanks to statistical methods, and not with deterministic methods. So
that's for LLMs, but let's talk about images as well. So Gen AI for images works in a way that, for
example, you can give it a prompt, for example, generate a blue sky with white clouds, and the
word "Hello" written in the sky. And the Gen AI model is going to actually give you that image that of
course we generated for this course. We can also have images generated from images. So here we
give an image of someone playing piano, and we're saying, "Transform this image in a Japanese
anime style." And the outcome image is going to be something similar, but now it looks like it comes
out of a manga. And then we can also generate text from images. So we give it a prompt and say,
"How many apples do you see in the picture?" And we give it a picture with one orange and an
apple. And then the Gen AI is going to look at the image and say, "Well, the picture shows one
apple and the other fruit is an orange." So just to tell you how that works for Gen AI for images, so
you get an idea of how something can generate an image, there's different methods of course, but
one of the popular one nowadays is the diffusion model from a company, for example, named
Stable Diffusion, which is using that model heavily. So let's take a picture, and this is a picture of a
cat. And we're going to do what's called a forward diffusion process. That means that we're going to
add some noise to the image over time. So this is with a little bit of noise, but it's the same image
with a bit of noise. And then we add more noise, we can barely recognize the cat, and then we add
more noise, and it looks like the cat is entirely gone, and all we get is noise. And so we do this for a
lot of pictures, and this is called the forward diffusion process. And once the algorithm is trained to
take images and create noise out of it, we do the opposite. So when we want to generate an image,
we're going to start with noise, and we're also going to give it a prompt and say, "We want a cat with
a computer," and it's called reverse diffusion. So now that the algorithm has seen how to go from an
image to noise, it will go from noise to image. So we give it some noise, randomly, and then it says,
"Okay, I'm going to de-noise it," and it's going to start to look like a cat. And then de-noise it again,
and then de-noise it again, and then we have the cat with a computer. So imagine this is a new
image generated by the AI, and not the exact same it was trained on, of course. So this is how Gen
AI works for texts, for images. Just remember the concept of an LLM, remember the concept that
it's non-deterministic. But now you have a general overview of Gen AI, and I hope you liked it, and I
will see you in the next lecture.
So now let's talk about Amazon Bedrock. So Amazon Bedrock is the one service on AWS that we're
going to use to build generative AI applications. So it's very powerful, and it allows you through an
interface to play with many models and configure them in order to get the outcome you want. So it's
called a fully managed service. That means that for you, there is no service to manage. You just
use the service on AWS and Amazon Web Services will make sure that the service is working for
you. So you're going to keep control of all the data you're going to use to train the model, because it
all happens within your account. It never leaves your accounts. You're going to have a pay-per use
pricing model, but we'll go into the pricing model later on as well. There's a unified API, so that
means that to access Amazon Bedrock and the many models behind Amazon Bedrock, you only
have one way of doing it, which is standardized. You can leverage a wide array of foundation
models. We'll have a look at them in a second. And on top of it, you have advanced features such
as RAG or LLM agents, and you get security, privacy, governance, and responsible AI features as
well within Amazon Bedrock. So we're going to spend a bit of time learning about it and practicing
together. So what type of foundation models do we have access to on Amazon Bedrock? Well,
many companies have agreements with AWS to publish their models on Bedrock. So we have AI21
Labs, we have Cohere, we have Stability.ai, we have of course Amazon themselves. We have
Anthropic, Meta, Mitral AI. And of course, over time more foundation models and more companies
behind these models are going to be added into Amazon Bedrock. So the way it works that
whenever you use one of these models, there's going to, Amazon Bedrock is going to make a copy
of the foundation model, the FM, and it's going to be only available to you. And then in some cases,
you are going to be able to use your own data to fine tune the model to your needs. Again, you
need to know that none of your data is going to be used and sent back to one of these providers to
train the foundation model. What you get happens all within your account and only within your
account. So it's always helpful to get a diagram to understand a service. So I made one for Amazon
Bedrock. So we have at the center of it, foundation models, and those are the ones we've seen.
And we'll have a look in the hands-on to see how we can select them. And we're going to have an
interactive playground that we're going to have a look at it as well in the next lecture in which us, as
a user, we're going to select a model we want to use. And then we're going to start asking questions
to the model. For example, "What is the most popular dish in Italy?" And then maybe the
playground is going to respond, "Pizza and pasta." On top of it, this is just for the basic interactions,
but we can have also knowledge bases or RAG, and we'll have a full lecture dedicated to this. So
don't worry if you don't understand it right now. But the idea is that we're going to be able to provide
more relevant and more accurate responses by fetching data from external data sources that may
have the answer for us. Don't worry, we'll have a deep dive into knowledge bases as well. We can
fine tune our foundation model. That means that we're going to bring our own data and we're going
to update the foundation model in our account, again, with our data to make sure that it's more
adapted to our use case and our data. And finally, to access all of these things is going to be a
unified API. So it's going to be the same for all models. That means that all your applications just
talk in one way, to Amazon Bedrock, and then Amazon Bedrock will do the magic for us. So that's it
for this lecture. I hope you liked it, and I will see you in the next lecture for some practice on
Amazon Bedrock.
004 Amazon Bedrock - Hands On
So, now we're going to get some practice on Amazon Bedrock. And the first thing I want you to do,
if you want to follow along, is to switch the region and to go to US East, Northern Virginia, which is
US-east-1. The reason why I wanna be there for the whole course is that the availability of some AI
services is restricted to some regions for now, and so being in US-east-1 will guarantee that you will
be able to access all the services you need. Okay, so now that we're good, let's go into the search
bar and let's type Amazon Bedrock. Next, we're going to click on Get Started and we are into the
Amazon Bedrock UI. So let's close this, and we're going to explore together. So here we go. So we
are in the Overview, and the first thing I want to show you is the providers list on the left-hand side.
So we have to choose a foundation model to get started with generative AI, and we can choose the
foundation model from many different types of providers. Here, you have a list of all the providers
available to you right now, and this list may extend. So have a look on your own time and if you see
more providers, don't worry, the ones that are the most important are covered in this course. So we
have AI21 Labs, we have Amazon, we have Anthropic, and so on. And so we're going to be using
some of these models. But the good thing about it's that when you click on one of the provider, you
get a lot of information about what the provider is, so the models they have, as well as some
examples that you can use with some of these models, and we'll explore them very, very soon. And
then, the types of models we have access directly from this provider. As you can see for Amazon
Titan, we have access to all the models right here. And for each model, there's what's called a
model card, which explains to us what the model is about. So about, the supported format, and then
some attributes, some languages that are supported, and so on. If you scroll down, you even get
some information about API request. So when you use Amazon Bedrock, right now we're going to
do everything in our browser, but when you want to implement a real application, you will have to
write some code and so therefore they give you some sample code to use these specific models.
So bottom line is Amazon Bedrock makes it super easy for you to understand how to use and select
the right models. But, so let's get access to some of these models. And so to do so, we're going to
scroll down all the way here and click on Model Access. So here on this page, you will conveniently
see all the types of models you have access to on AWS. And to be able to use them, you must first
enable the models. So as you can see right now, there is all these models are available to request.
Some of them can be sometimes unavailable for some reason. You can shoot contact report
sometimes, and so on, but we don't need access to all models. But still, if you wanted to make this
quick, just click on Enable All Models. So here we request access to all these models right here.
We're good to go. Let's click on Next. And then you need to enter some use case details around
your company name, your URL, and this is just some survey information, it's not very, very
important. So enter this. So here I put Stephane Maarek, and I'll put www.stephanemaarek.com.
And then the industry I operate in is going to be education. And then this is for my internal
employees. And then it's just testing Amazon Bedrock. Let's click on Next. And now let's click on
Submit. So this, enabling the models, does not cost any money, but using the models will cost
some money. So in this course, I try to keep cost at a minimum, but when you're using generative
AI and AI overall, it's going to be rarely free. And so therefore, if you want to follow along with me
and perform the hands-on, you will have to spend some money. At the beginning of the course, I
gave you access to the entire budget you will have for this course and that I've spent, but it's good
for you to make your own decision. Again, you cannot just also see me do and it's going to be fine.
You don't have to do everything as I do. So sometimes you will get access fail, for example, these
ones we cannot access to because we should talk to Amazon support. So don't worry too much
about these error messages. As long as you have access to some of these models, you're good to
go. And to get access to the models, it can take a few minutes. So you can see some of them, for
example, from Amazon, are already granted, but others are in progress, for example, from
Anthropic. And if you wanted to read the terms and conditions of the model, you can click here and
access it. So now let me just pause this video until I get access to all my models. Okay, so it took
about a minute, but now as you can see, everything is access granted except the few models from
Meta and from Anthropic. Next, let's go into the Examples tab on the left-hand side. So now we
have access to a lot of examples that Amazon Bedrock is giving us, and they're good to explore
from a learning perspective. So you can explore a lot of them on your own if you wanted to. And you
can filter them by providers, so the ones that are for Amazon specifically. And then the modality, is
it a text type of prompt or is it an image type of prompt? So you can do some text. And for every
prompt right here, you get access to some information. So here, this is a prompt to summarize a
meeting transcript into action items. And you see what the prompt is, you see what the configuration
was, and then on the bottom, you see what the response was. So you don't have to run the
examples on your own if you don't want to, you can just look at them to understand what was the
prompt and what was the response. On top of it, it provides you code samples to implement this
into your own code if you need it to. So it's very handy. And for example, if we look at images, we
can see here, blue backpack on a table, and it generates these kind of images. So this is cool
because we can learn about AI without spending any money, but we want to actually try it out. So
let's go ahead and actually explore what's called the Text Playground. So here, we need to first
select a model. So I'm going to close Configuration and for example, we need to select a model. So
I'll click on Select Model, I'll choose Amazon, and then we can choose Titan G1 Express. Perfect.
And Apply. So as you can see here, the throughput is on-demand. That means that we're going to
be paid on-demand when we use a model. And so as soon as we start using the model, we are
going to spend some money. But let's try it out. Let's actually say what is Amazon Web Services?
And we click on Run. And now we get the response directly from the model. So here we learn about
Amazon Web Services, and it's quite a detailed answer we're getting when data is generated over
time, as you can see, we have a lot of information given to us by the model. So we learn about
overall AWS here, then we learn about the compute aspect right here. Then the databases right
here, the analytics, networking, the mobile, and it goes on and on and on and on. So as you can
see, this type of model, for example, gives us a very lengthy type of answer. You can also have a
look at the Chat. So the Chat here is again for you to select a model. So let's take another one.
Let's take Anthropic, and then we're gonna get Claude 3 Sonnet. And again, I'm going to say, "What
is AWS?" And we'll click on Run. And here we get the answer directly from the model. The cool
thing about this page is that we're going to get some information around the model metrics as well.
So as you can see here, the latency was seven seconds. This is the number of input tokens. So this
is how much text went into the model, so 12 input tokens exactly. And we see the output token
count, so about 300. This is the length of the answer. So token is not exactly a word, it's a bit more
complicated, but this is 300. This is a pretty lengthy answer. And we also have some information
around the configuration. Now, this configuration, we're gonna have to look at it in greater detail
later on, so no need to look at it. But as you can see here, I was able to ask a prompt, and again, I
get an answer for AWS, which stands for Amazon Web Services. And then we get some key points
about AWS. The fact it is Infrastructure as a Service, that has a wide range of services, we're
looking at the AI ones, but there is so many, and so on. So as you can see, different models will
give you different kind of answers, and choosing the model is going to be a big part of your work.
On top of it, we can put text, but we can also add files if you wanted to, if the model support it. And
for image, finally. So here, this is going to cost you a lot more money to generate image than to
generate text, so don't do it if you don't want to. We're talking about cents, but still, 4 cents or 8
cents, but still it is some money. So here's an example. Generate images from a prompt, and we
say, "Blue backpack on the table," and we click on Run. And we get the output of three images right
here of blue backpacks, and they were fully generated by AI, which is pretty cool. So again, the
configurations can be pretty vast. So what type of orientation do we want? What size of the image
we want? How many images do we want, and so on? And so this is how to tune this Image
Playground. But here we've seen the basics of Amazon Bedrock and Gen AI. So we were able to
generate some text directly using the text feature or the chat feature. We were able to generate
some images as well. We're able to have a look at all the providers and some examples associated
with them, and also we saw how to enable access to the models. So that's it for the Overview of
Amazon Bedrock. I hope you liked it, and I will see you in the next lecture.
So now let's talk about the different options we have for base foundation models. So we have to
choose based on a lot of different factors. In the end, it's going to come down to the model types,
the performance requirements you have, the capabilities of the model, the constraints you have, the
compliance you need, and so on. Also, some models may provide you different levels of
customization. Some models may be smaller, others bigger. They could be different levels for
inference, so how to basically get an output of the model. You may have different licensing
agreements. You may have different requirements on context windows, so how much data you can
send it to a foundation model, as well as latency, so how fast a model will come back to you with an
answer. You may also have different factors, such as is the model multimodal? Which means that it
can take a wide combination of types of inputs, for example, audio, text, and video together, and
also give you various types of outputs, for example, again, images, audio, video, text, but all
together at the same time. So there's no clear answer. Of course, it's up for you to test, but there is
something called Amazon Titan, and because this is an AWS certification we're studying for, I
believe Amazon Titan is going to appear at the exam. So what is Amazon Titan? Well, it is the
high-performing foundation model directly from AWS. So you have different levels of Amazon Titan,
but it can do images, text, and you also have multimodal choices, all via the same API you have on
Amazon Bedrock. As well, it can be customized with your own data, so you can fine-tune Amazon
Titan, and that can be very handy. Also, for example, to make a decision on a model, usually, the
smaller models are going to be more cost-effective, but they usually know less things. So it's a
bunch of balancing acts, really, based on what your business needs is on. So let's have a look at
four models to see if we can understand a little bit how our process would go into. So we have
Amazon Titan, and we're going to compare Amazon Titan Text Express. We have, and I'll call it
Llama, but it may be Yama, Llama-2, which is some model out of Meta. We have Claude out of from
Entropic, and then we have Stability AI, which created something called Stable Diffusion. So first of
all, the last model is for image generation only. So if you want just image, maybe the last model can
build for you. But, like, for the other ones, you have different kinds of capabilities. So if we look at
features, for example, well, Amazon Titan can do text, and it can do it in 100-plus languages.
Llama-2 can do large-scale tasks, dialogue, and in English. And Claude can have also text
generation and also language. So it comes down to testing how the model reacts to your inputs.
And one thing that may be very important is the number of tokens you can have as an input to the
model. So Amazon, we have 8K tokens. Llama-2, we have 4K tokens, and Claude, we have 200K
tokens. So that means that on Claude, you can send a lot more words into your context windows.
And so Claude will have more memory and will be able to intake a bigger input, which may be very
important. For example, you may want to send a big context window when you're dealing with a big
code base or when you have to read an entire book and ask questions about that book. Then
maybe Claude is going to be a better fit. So the use cases really depends on the models, but to be
fair, all these models start to look the same and have the same kind of capabilities. They're all
converging towards the same thing. So it's down to mainly testing. But so Amazon Titan is going to
be around content creation, classification, and education. Llama-2 is going to be around tech
generation and customer service. Claude could be for analysis, forecasting, and document
comparison. This is mainly due to the fact it has higher number of token as an input, and Stability AI
is going to be for image creation, for advertising, media, and so on. And so pricing can be a big
factor again. So here, the pricing is given for 1,000 tokens given to the model. So as you can see,
Amazon Titan Text Express is very cheap. It's much cheaper than Llama-2, and Llama-2 is much
cheaper than Claude. So again, this could be a big one because, obviously, the more expensive
models may give you better answers, but sometimes the less expensive models can still give you
good answers, but they're going to be a lot more cost-effective. And as well for Stability AI based on
the image you generate, it may cost you as well some money. So be conscious because you can
very, very quickly accumulate a lot of costs with AI. So I hope you liked it, and I will see you in the
next lecture.
So, now let's explore foundation models in greater detail. So, let's have a look at providers in here
and we can read a lot. So, Amazon is going to be one of the cheapest foundation model on Amazon
Bedrock, because it is provided by Amazon themselves, but you will have maybe slightly less
quality, I would say of output for now than other companies like Anthropic, which is a leading
company in AI with their Claude 3 types of model. You get some open source models as well from
Meta and you can get more information. For example, on images, you can get stuff from Stability AI,
which is really good at creating images with something called stable diffusion. So, each foundation
model provider has their own capability and is good for you to learn about them in your own time.
But from an exam perspective, you are not expected to know the difference between Amazon and
Anthropic and Meta and so on, but you're supposed to know about the differences of capabilities of
a model. For example, if a model can take a text, if it can take input such as files, if it can generate
images, and so on. This is more important. But again, no need to know which models can do what.
This would be way over the board. Okay, so we have these providers right here and there are base
models, but say we want to compare them from an AI perspective. So, let's go back into the chat
and we can select a model. We're going to have a look at Amazon Titan Text G1 Premier. We're
going to apply it. And here we're going to go into compare mode. I'm going to close this right here.
Compare mode, and then select a second model, for example, Anthropic and then Claude 3 Haiku.
Okay, and here we can see the first difference that this model does not support image upload. So, if
we upload an image, only the right hand side model will take into account, and this one doesn't. So,
this is pretty important, because if you do have a use case with images, then the model on the left is
obviously not enough. But let's say, what are the most popular AWS services? And then you look
click on Run. So, we give the same prompt here to both models and we get some answers. So, this
one is saying, I cannot proceed with this request, whereas here we get an answer. This is
interesting as a use case actually, but bottom line is on the model metrics, we have a look and we
can see here the latency. So, we can compare models based on latency. We can compare models
based on pricing. So, this token eight token counts inputs, while this one took a 16 inputs, and then
output well was zero, because we're getting no answers right now. And 340, because we are
getting a lot of answer right here. So, let's do something a bit more simple. For example, what is the
capital of France? And click on Run, and we get, Paris is the capital of France and the capital of
France is Paris. So again, you can have a look also at the quality of the output. Do you prefer this
kind of wording or that kind wording? And you can go as complicated as you want, but the idea is
that you can have a look and compare the models and getting a feel for it. And again, look at the
metrics, if you needed to. Okay, so now let's go back into Amazon Bedrock, and I want to show you
how to customize a model. So, here custom model is for fine tuning. And so, with fine tuning, we
want to take one of the base models and we want to enrich it with our own data to create a custom
model. Why? Well, because maybe we want to give access to the model to data that it doesn't have
yet. For example, internal data. And so, therefore we can customize a model, and we have two
option. We can do a fine tuning job or a continued pre-training job. So, the difference here is that
this is going to be continuous and is going to retrain over time, while a fine tuning job is going to be
just a one time thing. So, let's create a fine tuning job just to see the options and to understand it.
We're not going to do this. We need to select a model. So, not all models can be fine tuned. We
have Amazon Cohere and Meta, and some of them are not accessible, so that's why there's like a
warning sign. But let's say we want to customize Titan Text 1 Express. and apply it, and then we
enter demo fine tuning. So, this is the name of the model. Let's scroll down. So, this is my fine
tuning job. You can set some settings, but it's not important for us. And most importantly, is the
input data. So, to fine tune a model, we need to have input data, and the data has to be in what's
called Amazon S3. So, if you go in the search bar right here and type Amazon S3, and open the S3
service, here we go. The S3 console, Amazon S3, we can see right now, if you click on Buckets that
we don't have any buckets in our accounts. So, buckets are cloud directories in which we can start
to put data and the data lives on the AWS cloud. And so, what we're saying here is that the data
that is needed to customize Amazon Bedrock must be located in an Amazon S3 bucket. So, if you
wanted to, you would need to create a bucket and then put data inside of the bucket. And then here
you would reference the S3 location, being S3, and then the bucket name. And then finally, the path
to your training data. Optionally, you can have validation datasets. And then most importantly, once
we have specified the data, because this is a machine learning training job, we have what's called
hyperparameters. So, hyperparameters are configurations for your machine learning training job,
and this term will come back a few times in this course. And basically, it tells you how the algorithm
should behave. So, we have a lot of different settings here that are two difficult to understand for
this level of AI exam, but basically, you can change them such as epoch, the batch size, so how
many samples must be processed at once before updating model. And the learning rate, how fast
the model is going to learn or not. And these models will imply that your model is going to be
behaved differently based on the type of learning it gets. And so, this is just some advanced
settings to customize it, but hyperparameters are very important when you are a data scientist,
because changing them can really change the quality of your outputs. We'll leave everything as a
default. And then finally, where to store the validation output. So, if you provided a validation
dataset, we can also have the output data in Amazon S3 to be able to analyze it and make sure that
it fits our use case. Next, to do this kind of customization, Amazon Bedrock will need access to
Amazon S3 to write to it. And therefore we need what's called a service role in AWS. It's what gives
the permissions to Amazon Bedrock to write to Amazon S3. But because we don't run one, we're
not going to do this right now. And as you can see here, to create this fine tuning job, we need to
purchase provision throughput, and this is going to be very expensive. So, we don't do this,
because this is billed by the hour. And so therefore, just remember that for fine tuned models to
actually create them and to actually also run them, you will need to purchase provision throughputs.
But here we've seen all the options to create a fine tune model, and that was the most important
thing. Okay, so that's it for models, for base models as well as how to customize them. I hope you
liked it, and I will see you in the next lecture.
So now let's talk about fine-tuning on Amazon Bedrock. So fine-tuning is going to be a big part of
your exam. So the idea is that you're going to adapt a copy of a foundation model and you're going
to add your own data. So when you fine-tune a model, it's actually going to change the underlying
weight of the base foundation model. So you need to provide training data and it needs to adhere to
a specific format and needs to be stored into Amazon S3. So the idea is that, for example, you have
the LLAMA 2 model and you're going to add data from Amazon S3, such as this data or that data,
and we'll have a look at this data very specifically in the next slides. And then Bedrock is going to do
its own thing and you're going to get a fine-tuned version of LLAMA 2, which has your own data as
well. So in order to use a fine-tuned custom model, you must use what's called provisioned
throughput, which is a different pricing model than on-demand. And note that not all models can be
fine-tuned, but few can and they're usually open source. So how can we fine-tune a model? Well,
we have instruction-based fine-tuning. And here, this is to improve the performance on the
pre-trained foundation model on domain-specific tasks. So what does it mean domain-specific
tasks? This is something you'll see at the exam. That means it's going to be further trained on a
particular field or area of knowledge. And here, the trick you need to go and look for at the exam is
that instruction-based fine-tuning is going to use what's called labeled examples and they're going
to be prompt-responses pairs. So it's for labeled data and the prompt-response pairs, look at this,
the prompt is "Who is Stephane Maarek" for example, and the completion. So the response is
"Stephane Maarek is an AWS instructor who dedicates his time to make the best AWS courses so
that his students can pass all certifications with flying colors!" So here, on top of giving information
to the model, we are also showing the model how we want it to order some questions such as,
"Who is Stephane Maarek?" Maybe the answer the model had would be already similar, but with a
different tone. So this is where instruction-based fine-tuning is helpful. Next, we have continued
pre-training. So here the idea is that we continue the training of the foundation model. So here,
because we know foundation model have been trained using unlabeled data, we need to provide,
as well for continued pre-training, unlabeled data. And so this is something you look for in the exam.
If you have unlabeled data, this is a kind of fine-tuning you need. And so it's also called
domain-adaptation fine-tuning, to make a model an expert in a specific domain. For example, I'm
going to feed the entire AWS documentation to a model, and then the model is going to be an
expert on AWS. So here we're just giving all documentation, it is unlabeled data so this is continued
pre-training. And now the domain, the model has become a domain expert. So here's the kind of
input you have here. As you see, there's no prompt output, there's just input. And the input is just a
lot of information. So here this information is around financial data. And as you can see, if you
wanna read through it, it has a lot of acronyms. And so this is very good to teach acronyms or feed
industry-specific terminology into a model. And then you can keep on training the model, it's called
continued pre-training, as more data becomes available. Okay, next you may encounter as well
single-turn messaging and multi-turn messaging. So this is a subset of instruction-based
fine-tuning. But the idea is that, here, we're going to give a hint into a user and an assistant what
the user is asking and what the assistant, so the bot, should be replying. So here we have system,
this is optional context for the conversation. Messages, which is going to contain various messages.
Each will have a role, which is the user of the assistant. And the content, which is the text content of
the message. So here we're fine-tuning how a chat bot should be replying. And for multi-term
messaging, this is the same idea, but this time we have a conversation so we have multiple turns.
And so here we alternate between user and assistant roles, and we have a conversation. And this
helps the model understand how to handle conversations with bigger context. So good to know for
fine-tuning. First of all, re-training a foundational model requires a higher budget because you need
to spend some computations on it. So instruction-based fine-tuning is usually going to be cheaper
because the computations are less intense and, usually, you have less data required. You're just
trying to fine-tune how the model is replying based on specific instructions. If you use continued pre
returning, it's usually more expensive because you need to have a lot more data. Also, it requires
you to have an experienced machine learning engineer to perform the task, even though Bedrock
makes it easy for you. And you must prepare the data, you must do the fine-tuning, and also
evaluate the model. And finally, because you have a fine-tuned model, it's also more expensive
because you have to use provisioned throughput. So now let's talk about transfer learning. So
transfer learning is a bit broader than fine-tuning. It is the concept of using a pre-trained model to
adapt it to a new related task. For example, we have Claude 3 and then we're going to do transfer
learning to adapt it to a new task. So you say maybe it's very similar to fine-tuning, and it is, but for
example, for image classification, we may want to use a pre-trained model that knows how to
recognize edges and images, but we may want to do transfer learning to apply it to recognize
specifically a kind of image. Or for language processing type of models, for example BERT or GPT,
again, they know how to process the language. So now that we have the language figured out, let's
just fine-tune them or use transfer learning to adapt it to newer tasks. So transfer learning is in this
lecture because it can appear in the exam as a general machine learning concept that will be used,
for example, to, as the definition says, adapt a model to a new task. So if you don't see fine-tuning,
just know that the general answer is to use transfer learning because fine-tuning is a specific kind of
transfer learning. So the use cases of fine-tuning is, for example, to have this chatbot designed with
a particular persona or tone, or geared towards a specific purpose such as existing customer or
crafting advertisements. It's also to have trained more up to date than what the model previously
accessed. Also is to train it with exclusive data that you have only. For example, historical emails or
messages or records for customer service interaction. Of course, base foundation models do not
have access to this because this is your data. And for targeted use cases such as categorization or
assessing accuracy. So let's say we're fine-tuning, the exam will ask you about when fine-tuning is
a good idea and the kind of fine-tuning you will need based on the type of data you get, for example
labeled or unlabeled data, as well as maybe some pricing questions. All right, that's it. I hope you
liked it and I will see you in the next lecture.
So in order to choose a model, sometimes you may want to evaluate that model and you may want
to bring some level of rigor when you evaluate that model. So you can do on Amazon Bedrock
what's called Automatic Evaluation. So this is to evaluate a model for quality control and then you're
going to give it some tasks. So you have some built-in task types such as, for example, exercises
on text summarization, question and answer, text classification, or open-ended text generation. And
so you're going to choose one of these text types and then you need to add a prompt datasets or
you can use one of the built-in, curated prompt datasets from AWS on Amazon Bedrock. And then
thanks to all this, scores are going to be calculated automatically. So let me show you what I mean
in a diagram so you really understand what happens. So we have benchmark questions and again,
you can bring your own benchmark questions or you can use the one from AWS. And then of
course, you have questions, but because you've created benchmark, you need to have benchmark
questions, as well as benchmark answers, and the benchmark answers are what would be for you
an ideal answer to your benchmark question. Then you have the model to evaluate and you're
going to submit all the benchmark questions into the model that must be evaluated which is going to
of course, generate some answers and these answers are generated by a GenAI model. And then
of course, we need to compare the benchmark answers to your generated answers. So we
compare these two and because we are in an automatic evaluation, then it's going to be another
model, another GenAI model, called a judge model which is going to look at the benchmark answer
and generate an answer and is going to be asked something along the line of can you tell if these
answers are similar or not? And then it is going to give a grading score and there are different ways
to calculate this grading score. For example, the BERTScore or the F1 or so on, but no need to
linger on that specific jargon for now. So a quick note on benchmark datasets. So they're very
helpful and a benchmark dataset is a curated collection of data designed specifically to evaluate the
performance of a language model and it can cover many different topics, or complexities, or even
linguistic phenomena. So why do you use benchmark datasets? Well, they're very helpful because
you can measure the accuracy of your model, the speed and efficiency, and the scalability of your
model because you may throw a lot of requests at it at the same time. So some benchmark
datasets are designed to allow you to quickly detect any kind of bias and potential discrimination
against a group of people that your model may make, and this is something the exam can ask you.
And so therefore using a benchmark dataset gives you a very quick, low administrative effort to
evaluate your models for potential bias. Of course, it is possible for you to also create your own
benchmark datasets that are going to be specific to your business if you need to have specific
business criteria. Of course, we can do also human evaluations. So this is the exact same idea. We
have benchmark questions and benchmark answers, but then some humans, employees, for
example, from the work team, could be employees of your company or it could be subject matter
experts or SME or whatever, are going to look at the benchmark answers and the generated
answers, and they're going to say okay, this looks correct or not correct. So how can they evaluate?
Well, there's different type of metrics. There's thumbs up or thumbs down, there's ranking and so
on, and then it's going to give a grading score again. So this time there's a human part in it and you
may prefer it. You can again choose from the built-in task types or you can create a custom task
because now humans are evaluating it so you are a little more free. So there are a few metrics you
can use to evaluate the output of an FM from a generic perspective. We have the ROUGE, the
BLEU, the BERTScore, and perplexity and I'm going to give you a high level overview, so we get
you understand them and they should be more than enough for the exam. So rouge is called
Recall-Oriented Understudy for Gisting Evaluation. So here the purpose of it, and I think that's what
you need to understand from a exam perspective, is to evaluate automatic summarization and
machine translation system. So very dedicated to these two things and we have a different kind of
metrics. We have ROUGE-N, and N can change between one, two, three, four usually, used to
measure the number of matching n-grams between reference and generated text. So what does
that mean? That means you have a reference text, this is what you would like the output to be of
your foundation model, and then whatever text has been generated by the foundation model. And
ROUGE is going to look at how many n-grams are matching. So if you take a one-gram, that means
how many words are matching because a one-gram is just a word. But if you take two-grams, that
means that it's a combination of two words. So if you have the apple fell from the tree, you're going
to look at the apple, apple fell, fell from, from the, and the tree, and again, you look at how many
matches between your reference text and you generate a task. If you take a very high gram, for
example, 10-grams, it means you have 10 words matching exactly in the same order from one
reference to the generated text. But it's a very easy one to compute and very easy one to make
sense of. And you have a ROUGE-L which is going to compute the longest common subsequence
between reference and generated text. What is the longest sequence of words that is shared
between the two texts? Which makes a lot of sense, for example, if you have machine translation
systems. Then you have BLEU. So ROUGE, by the way, is red in French and BLEU is blue in
French, so just have some colors. Blue is Bilingual Evaluation Understudy. So here this is to
evaluate the quality of generated text, especially for translation. So this is for translations and it
considers both precision and is going to penalize as well for too much brevity. So it's going to look
at a combination of n-grams. The formula is a little bit different, but if the translation is too short, for
example, it's going to give a bad score. So it's a slightly more advanced metric and I'm not going to
show the mechanism underneath because you don't need to know it, but it's very helpful for
translations and you need to remember it. But these two things, ROUGE and BLEU, they just look
at words, combination of words, and they look at the comparison. But we have something a bit
more advanced. Now because of AI, we have the BERTScore. So here we look for the semantic
similarity between generated text. What does that mean? That means that you're going to compare
the actual meaning of the text and see if the meanings are very similar. So how do we do meaning?
Well, you're going to have a model and it's going to compare the embeddings of both the texts, and
it can compute the cosine similarity between them. So embeddings are something we'll see very,
very soon and they're way to look at a bunch of numbers that represent the text. And if these
numbers are very close between two embeddings, then that means the texts are going to be
semantically similar. And so here with the BERTScore, we're not looking at individual words. We're
looking at the context and the nuance between the text. So it's a very good one now because we
have access to AI. And perplexity is how well the model will predict the next token, so lower is
better, and that means that if a model is very confident about the next token, that means that it will
be less perplexed and therefore more accurate. So just to give you a diagram. Here we have a
generative AI model that we trained on clickstream data, card data, purchase items, and customer
feedback and we're going to generate dynamic product descriptions. And so from this, we can use
the reference one versus the one generated to compute the ROUGE or the BLEU metric, as well as
also look at some similarity in terms of nuance with a BERTScore. And all these things can be
incorporated back into a feedback loop to make sure we can retrain the model and get better
outputs based on the quality of the scores of these metrics. On top of just having these type of
grading of a foundation model, you may have business metrics to evaluate a model on and these
are a little bit more difficult to evaluate, of course, but it could be user satisfaction. So you gather
user feedback and you assess the satisfaction within the model response, so for example, the user
satisfaction of an e-commerce platform, or you can compute what is the average revenue per user,
and of course, well, if the GenAI app is successful, you hope that this metric will go up. Or
cross-domain performance, so is the model able to perform across a varied tasks across different
domains? Conversion rates, so what is the outcome I want? Do I want to have higher conversion
rates? Again, I would monitor this and evaluate my model on that. Or efficiency, what is the
efficiency of the model? How much does it cost me? Is it efficient in competition, in resource
utilization, and so on? So that's it for evaluating a foundation model. I hope you like it and I will see
you in the next lecture.
So now let's see how we can evaluate a model automatically. So this is a deeper type of evaluation,
but on the left-hand side, you can go under assessment and deployment and find model evaluation.
So as you can see, we have three kinds of evaluation. We have the automatic evaluation to
evaluate a model against recommended metrics, and it's going to be fully automated as we've seen
in the diagram in the previous lecture. But we also have humans. You can bring your own work
team and then humans are going to give you feedback and evaluate up to two models at a time.
And then you can also get humans, but this time, they can be an AWS-managed work team. So
different kind of evaluations, but I just want to show you the flow on how to create one just so you
can get a better idea. So let's go for automatic, and I'll call this one DemoEvaluation. We're not
going to run one, we just want to look at the options, but we need to select a model. For example,
let's evaluate Amazon Titan Text G1 - Express. And next, we need to choose what is the type of the
model evaluation task that we're going to evaluate again. So do we want to understand if it's good
at general text generation or if it's good at text summarization or question and answer or text
classification. As you can see, you don't have any custom type of text type, just the ones that are
built in. So let's say we want to evaluate for general tax generation, and then what do we want to
evaluate for? So some metrics. So for example, we can have a look at toxicity to see if the model
will generate harmful, offensive or inappropriate content. So here we need to choose a prompt
dataset to evaluate this metric. So you can either use an available built-in dataset and there's like
the real toxicity prompts or the bold one, which will give some challenges to the AI and we can have
a look at the result of this evaluation, or we can use our own prompt dataset. And again, it needs to
be in Amazon S3. And then here we have more option because we can test against specific use
cases of our business. So either option are fine based on what you're trying to do. And also, you
can have other metrics, such as accuracy. So is it good about factual knowledge for the real world
and robustness? And again, for each of these tests, you have some available datasets or you can
build your own. Okay, and then where do you want to store the evaluation results? Again, in
Amazon S3. To perform this, you need an IAM role again to write to your Amazon S3 bucket. This
can be automated right here. And then you click on create and it would actually do a model
evaluation. Now, we don't do it because, well, this is going to cost us a lot of money because we're
going to run gen AI prompts and then look at their outcomes. So obviously, there's a certain cost
associated with it, but here we get some intention of how the model evaluation work, and you can
explore on your own text summarization, question and answer, classification and so on where
again, built-in datasets are provided for you, which is super nice. And we want to have humans. We
can create a human-based evaluation, so by humans. And here the evaluation, all lowercase, and
here we select again a model, so whichever you want. And then what is the task type? So you have
the same as before, but also, now you can create a custom task type, and this is for your team to
use and to perform a specific task. Here, again, you would have some metrics that can be
suggested, or you can create your own metric if you want to, so you have a bit more control
because now you have humans and humans can do whatever you want. So again, you select this,
you select a rating method. Do you want humans to evaluate based on thumb up or down or to rate
the answers on a five star method? And you can have multiple metrics if you wanted to. And on top
of it, you can add another model to compare between two models if you wanted to. For example,
these two. So a lot more customization is available for human-based evaluation, but the idea
remains the same. So that's it for FM evaluation. I hope you liked it and I will see you in the next
lecture.
So now let's talk about RAG and knowledge bases. So what is RAG? RAG is Retrieval Augmented
Generation. Behind this very fancy name, there is a very simple concept. This allows your
foundation model to reference a data source from outside of it's training data without being fine
tuned. So how does that work? So we have a knowledge base and it's being built and managed by
Amazon Bedrock. For this, it must rely on a data source. For example, Amazon S3. So your data is
going to be Amazon S3, and then automatically Bedrock is going to do some magic. We'll see how
that works exactly, but it's going to build a knowledge base. And then a user is going to ask a
question to your foundation model saying, who is the product manager for John? So that's probably
something related to his company. And of course, the foundation model does not know anything
about John, because John is a very specific query for my company and my own data. So there's
going to be something called a search, and this information is going to be searched in the
knowledge base automatically. Of course, this is all happening behind the scene. But this
knowledge base is backed by something called a Vector database. And to create the data in the
Vector database, again, Bedrock takes care of everything. It's called creating Vector embeddings,
and I will show you how that works later on. But basically, thanks to this Vector database, then
we're able to retrieve the relevant information out of this knowledge base. And for example, we get
some text back from this saying that we have some information about John, such as we have some
support contacts. We have a product manager, which is Jesse Smith. We have an engineer, which
is Sarah Ronald. And so all this is going to go as an augmented prompt. And so it's going to be the
original query, as well as the text that has been retrieved, that are going to be passed together into
the actual foundation model. And the foundation model is going to look at this augmented prompt
and generate a response saying, hey, it looks like Jesse Smith is the product manager for John,
and it's what is called Retrieval Augmented Generation. Retrieval, because we retrieve the data
outside of the foundation model, and it's augmented generation, because we augment the prompt
with that external data that has been retrieved. So hence the name RAG. And so RAG in AWS
Amazon Bedrock is going to be a knowledge base. So, this is very helpful when you need to have
data that needs to be fed, and that is very up to date, that is in real time, and that needs to be fed
into the foundation model. So how does that work? Well, here we have an example. Give me talking
points for benefits of air travel. As you can see, one of these talking points has a little "one" on it
and a link to Air travel.pdf, and that PDF may be an Amazon S3. So this is an example where the
RAG and the knowledge base has been used to give an answer to a specific prompt. So everything
is to go into a Vector database. And so we have several options. So, Vector databases on AWS
and Amazon Bedrock can be of several kinds. So we have two services on AWS that are possible
to use as your Vector database. The first one is called Open Search Service, and the second one is
called Amazon Aurora. So remember the names Amazon Open Search Service or Amazon Aurora.
And we have three other options, and maybe there'll be more in the future, of course. But we have
three other options right now to use for a Vector database. We have MongoDB, Redis, or Pinecone.
And so you can configure a knowledge database backed by any kind of these vector databases, but
if you don't specify anything, AWS is going to create for you an open search service serverless
database for you. So next we need to have what's called an embeddings model. And this is how to
convert the data into these vectors. And so it could be Amazon Titan or Cohere. And it doesn't
matter if you don't use these models as your foundation model. The embeddings model and the
foundation model can be different. So your S3 documents are going to be chunked, and that means
they're going to be split into different parts. And these parts are going to be fed into the embeddings
model, which is going to generate vectors, and place this vector in the vector database. And the
outcome of that is that now these vectors are easily searchable, and so when RAG will look into the
vector database, it will find the right passage in the right document to be able to augment the query.
So the exam may ask you to choose the right RAG vector database, at least from a high level. So
we're going to be expert at databases. I'm going to give you more information that you need to
figure out database, but the first categories I would say would be Amazon OpenSearch Service and
Amazon DocumentDB. So, OpenSearch Service is a search and analytics database, and it's very
good. It's going to be definitely the preferred choices, and you're going to have real time similarity
queries and you can store millions of vector embeddings. The reason it's so good, and that could be
an exam question, is that you have a very good scalable index management, and also you have
very, very fast nearest neighbor search capability, also called KNN. The other one that's really high
performance is Amazon DocumentDB with MongoDB compatibility, which is a NoSQL database.
And again, you get the real time similarity queries and store million of vector embeddings. So I
would keep these ones together. Then we have Amazon Aurora and Amazon RDS for PostgreSQL.
So both of them are going to be relational databases. Aurora is proprietary on AWS, but it's Cloud
friendly. And RDS for PostgreSQL is a relational database as well, but it's open source. Both are
fine. Usually if you see relational database, think one of these two. And finally, if you see graph
database, choose Neptune. So I don't think there's gonna be any question asking you to choose
between the five databases together, but definitely, you'll see a question about choosing our
relational database for a Vector database. And you may have to choose Aura or RDS for
PostgreSQL. Or if you see something around high performance and real time symmatic queries and
million of vectors and embeddings, it's going to be OpenSearch Service or DocumentDB, or
Amazon graph, Neptune. That's about it. I don't wanna go too deep into these databases, but this
should be enough for the exam for you to get going. So what kind of data sources can we use in
Amazon Bedrock? Well, we have Amazon S3, which is a place where you can put a lot of files in
the Cloud directory, Confluence, Microsoft SharePoint, Salesforce, as well as webpages. So it
could be your website, it could be your social media feed, et cetera, et cetera. So anything in the
web. And I'm pretty sure that Amazon Bedrock will add more sources over time. But from an exam
perspective, I think remembering Amazon S3 and maybe these ones should be enough. And of
course, if from an exam perspective, you need to know about more data sources, I will, of course,
include it in the slide. So what are the use cases now for Amazon Bedrock? Well, you can build a
customer service ChatBot where the knowledge base is going to be your products, your features,
your specifications, troubleshooting guides, and frequently asked questions. And so therefore, your
RAG application can be the ChatBot that will answer customer queries, and look up in this
knowledge base. It could be for legal research and analysis where the knowledge base is going to
be laws and regulations, case precedents, legal opinions. and expert analysis. And this time we can
have a ChatBot that is going to have relevant information anytime we have a specific legal query.
And we can have also healthcare question answering. So again, the knowledge base can be
diseases, treatments, clinical guidelines, research papers, and previous patient data. And the
application could be a ChatBot that will answer complex medical queries. So RAG opens up a lot of
possibilities for doing gen AI on AWS. So I hope you liked it and I will see you in the next lecture.
011 Amazon Bedrock - RAG & Knowledge Base - Hands On
So let's go ahead and practice knowledge bases. So on the left-hand side, you have knowledge
base and then here's how it works. So we can first do a little sandbox environment where we can
upload and chat, but then we can create a knowledge base and test it and then use it. So first, I'm
going to take it very easy and we're going to chat with a document so you understand the idea
behind knowledge bases. So you first need to select a model. So we have Anthropic, Cloud 3
Sonnet right now we can use, and we're now going to look at these parameters and we have a chat
prompt template. So this demonstrates how RAG is working. So we're saying hey, you are a
question answering agent and I will provide you with a set of search results. And then your job is to
answer the user's question thanks to this search result. And here are the search results. So they're
going to be right here provided in this prompt. That's why it's called an augmented prompt. It's
because this prompt, it's going to be augmented by the search results performed on our
documents. And this is the whole idea behind RAG. And this whole thing is going to be sent into
your foundation model. So we don't touch this because this is great, and we're going to just provide
some data. So the tag can be from your computer or Amazon S3. We'll use our computer. And then
I'm going to select in the code that you have downloaded in the beginning of the course. And if you
don't have the code, please go to the beginning of the course in section two and you will find the
code and slides download. You download it, and then you go into Bedrock and
Evolution_of_the_internet_detailed.pdf. And this is a document that I've actually generated using
AI. So it's A PDF documented by AI, which contains some information around the internet. So we're
going to chat with this document right now, like if it was a RAG use case and we say when and who
invented the World Wide Web. We run this and it's going to retrieve the relevant information in our
document, pass it into this prompt to augment it, and then answer the question. So here we have
the information that the World Wide Web was invented by Tim Burners-Lee in 1989, and then we
get some information. So we have one and two. These are the sources from our document that
gives us some information to answer our query and we can click on show source details. And it's
going to show you the source chunk one and the source chunk two that were used to answer our
query. This is very nice. And then we can just try a little bit more. So what is the future of the
internet with AI? And we run it. And again, it's going to look in my PDF document that you can have
a look at by the way on your own. And you say hey here, AI is expected to play a more important
role on the internet. For example, we'll have AI-driven chatbots, recommendation systems and
virtual assistants. And again, we can look at the quotes. And so here we have one, two, and three
quotes from our documents. So this is the idea behind RAG and I'm going to stop this lecture
because you've seen how it works from a document perspective. I'm going to create a next lecture
to show you how you can set up a knowledge base. It's a bit more complicated and involved so you
can look at it if you wanted to. But from an exam perspective, you know enough on RAG. So I will
see you in the next lecture to set up a RAG knowledge base entirely if you wanted to, but otherwise,
you can skip it and move on. All right, that's it. I will see you in the next lecture.
So here we go. We're going to set up RAG and a knowledge base on Amazon Bedrock. Now this
hands-on is a bit more complicated than the other one. So you've learned what you need to know
from a knowledge base perspective. This is just technical stuff if you are very curious into how
things works. So to create a knowledge base, we need to move away from what's called a root
user. This is a root user right now we're connected to, we just see the account ID and we go into an
IAM user. So you type IAM in your services and then you go to the IAM console and you're going to
go under users and you're going to create a user. The username is whatever you want. I'm going to
name it Stephane, and I'm going to provide user access to the AWS management console. Now I
know that the recommended user type is here in identity center. This is something we'll see later on.
Right now I wanna make it super simple. So I want to just create an IAM user and then you can use
a custom password that you must know, and I'm going to just enter a password that I know. So here
I go, and I'm good to go. Okay, and then we can untick this. I don't need to create a new password
at login. Then let's click on next. And we're going to attach policies directly. And the one we're
looking for is administrator access. So we're going to give full admin power to this user. Then we
click on next and then create user. And our user is successfully created. Now we have an access to
a console sign-in URL with a username and our password. So let's go into the sign-in URL right
now and press enter. And now we have this new sign-in page. So the account ID we'll leave as is,
the username is whatever you had before, and then the password is your actual password. And
then click on sign in. So here we go, now I am into the console. And I know that I am logged in
using the IAM user because well it says IAM user Stefan, and there is my account ID. So now we're
ready to go back into Amazon Bedrock and then get started. Go on the left hand side onto
knowledge bases and create a knowledge base. So the knowledge base name we'll just leave it as
is. The IAM permissions, we're going to select create and use a new service role. So we leave the
defaults, then we choose Amazon S3, and then we can see actually the options we have. So we
have Amazon S3, but we can also have other data sources such as a web crawler to extract data
from the webpage. Or it could be third party data sources such as Confluence, where you can store
your information or Salesforce for your own CRM or SharePoint to have a look at your documents
hosted on SharePoint. But we'll use Amazon S3 right now in our little PDF file. Let's click on next.
And next we need to create a data source. So it's going to be Amazon S3 on this account. And we
need to select an S3 bucket. So now let's go into Amazon S3 and actually create an S3 bucket. So
we're going to create a bucket and we need to make sure we are in US East one still. And it's a
general purpose type of bucket, and you just have to give it a name. So my demo bucket
knowledge base Stephane, and you see this name is very long and complicated because if you just
enter a name that is already taken, for example test, and then you click on create buckets, you're
going to get an error saying, hey, this bucket name already exists. So the bucket name doesn't exist
in my account, but it exists in someone else's account. And so this is a problem on Amazon S3, the
bucket name needs to be unique and therefore you need to enter a unique name. So if you copy
this entire bucket name, it's not going to work because I have created this bucket myself. So choose
a name that is going to be unique to you. Then you scroll all the way down, we don't need any of
these options and we create this bucket. So now my bucket is created and I'm going to just click on
it and within it, this is where we can upload objects. So let's upload a first object is going to be our
evolution of the internet detailed and click on upload. So it's a very simple interface to upload files
onto the cloud. And now if we look at it in our bucket, we have one object, which is our PDF. So this
is great, and now I just go into, back into Amazon Bedrock and we need to select an S3 URI. So we
browse Amazon S3, we refresh this, doesn't work, okay, so let's cancel this, refresh and recreate
the knowledge base. We scroll down, click on next, and then click on browse S3. And here we go,
now we can select this Amazon S3 bucket. So we choose it and now this is filled. So next we go
click on next, we need to select an embeddings model. So this is how to convert your data into a
vector. So different options here, but I'm just going to take the one from Amazon, this Titan Text
Embeddings V2, and we don't touch anything on the vector dimension. Now for the vector
database. So we have several options. If you wanted to go all the way free, then you would select
Pinecone because Pinecone, if you go on the website of Pinecone, so pinecone.io, and you go on
the pricing, you see that there is a free tier. So you can start free and you can have up to two
gigabytes of storage and then write units and read units all the way, you're good. So this is a setup
I'm not going to do because I'm trying to demonstrate the AWS services of course, but if you
wanted to stay free, I would recommend Pinecone. Now I'm going to pay some money for what I'm
going to do right now, but I'm going to do a quick create with a new vector store because this is
going to use Amazon OpenSearch Serverless, which is an AWS service, and something they most
likely want you to use. Other services on AWS for example is Amazon Aurora. But so we're going to
do a quick create of a new vector store and actually you can see that even though they say it's cost
efficient, I really don't believe it. So for Amazon OpenSearch Serverless pricing, it's used by
computing OCU and you have a minimum billable of two OCUs and OCUs are quite expensive, so
you get $0.24 per OCU per hour, which is around $172 per month. So it's quite expensive, and then
you also pay for the storage. And so what I wanna say is that we are going to be conscious in this
course and make sure that we delete OpenSearch Service right after using it. So we'll pay a
minimal amount if you do the hands-on with me. Again, good for you to know. And if you follow
along and you create a vector store with me on Amazon OpenSearch Serverless, you are going to
pay some money. So please don't forget to delete things. Otherwise this is going to be a very sad
moment. Okay, so let's click on create knowledge base and now we're good to go. Again, you will
get an error if you're not using an IAM user. So this was very important to do. So this will take a little
bit of time and I'm going to pause the video until we are done. Okay, so my knowledge base is now
created and I can click on go to data sources to have a look at it. So it says that yes, my data
source is available and then what I'm going to do is click on it and then sync it to perform a
synchronization of the data that I have in my S3 bucket to my database in OpenSearch. So as you
can see here, I can click on OpenSearch service and have a look at it and confirm that on the left
hand side under dashboard I have access to my collection, which is my Bedrock Knowledge Base.
So it's pretty cool. So I wanna show you a little bit how this works. This is obviously way advanced
knowledge, but I'm always curious. So if you click on this collection, you can see right here that we
have access to our endpoint and we have an OpenSearch dashboard URL that we can look at to
look at what's within our collection. And we can have a look as well at indexes to see that an index
was created. And there is one vector with six documents. So if we go into OpenSearch
Dashboards, and here we are so we can visualize our data. So to do it, we go on the left hand side
under discover, and then you have index pattern. So we click on create index pattern, and then we
paste this name of the index in here, click on next step, and then create index pattern. This is to, we
can tell OpenSearch Dashboards to look for our data, and here we can see all the fields that were
created automatically by Amazon Bedrock. So this is very handy. Now let's go in here and go here.
We click on dashboards, we go, sorry, into discover. And then we can have a look here into our
knowledge base. And here we have access to the actual vectors. So this is some information
around what was the text chunk of my PDF that is in here, what is the ID? And then what is the
actual Bedrock knowledge base vector? And so all these numbers were created by the embeddings
model. So this is advanced, I know, but I wanna show you the deep down things of how it works.
Basically my document was chunked, and then for each chunk a vector was created. And so you
have about one, two, three, four, five, six chunks right now for my documents. But by the way, now I
can go into Amazon Bedrock and I can start testing my knowledge base. So let's configure a model.
We're going to take, for example, again, anthropic haiku or sonnet, let's apply it, perfect. And here
we're gonna say, who invented the worldwide web, click on run. And then is going to retrieve and
generate a response. As you can see, we have pretty much the same answer as before. And in
terms of sources, we can see that now we have a link to evolution of the internet detailed point
PDF. And if I click on it, this takes me directly into Amazon S3 into my PDF file that I have
uploaded. So it's quite cool because now we have set up a full RAG and if you wanted to set up
properly, you would add more documents of course in your Amazon S3 bucket, and then you would
click on the sync button. So here we've practiced how to use RAG, so I'm going to clean it up. And
to do so, I delete the knowledge base. So click on delete, and it's going to delete the knowledge
base, but this doesn't delete your OpenSearch database. So let's wait for this to be done. And next,
so this was very quick, and next, let's go into OpenSearch Service. And under here for this
collection, you're going to delete it. Otherwise this will incur some cost in the long run. So that's it,
we've practiced creating an OpenSearch serverless database. We have created a knowledge base,
we've uploaded some files onto Amazon S3. You can keep your S3 bucket running, this is fine. This
is not going to cost you anything. And then we were able to demonstrate how knowledge bases and
RAG works. So I'm very excited, this was a long hands-on and very advanced I know, but good to
see. I hope you liked it, and I will see you in the next lecture.
So now that we've seen Gen AI and how to use it, let's look at bigger concepts around Gen AI.
More theoretical, but very important to understand, and the exam can ask you a few things about it.
So first is the process of tokenization. It's the idea of converting raw text into a sequence of tokens.
What does that mean? Well, here is a sentence. "Wow, learning AWS with Stephane Maarek is
immensely fun," and here, we have different ways of converting these words into tokens. So we can
do word-based tokenization, and the text is going to be split into individual words, or we can have
subword tokenization, and some words sometimes can be split, too, which is very helpful for long
words, and for the model to have less number of tokens, because some tokens, for example, you
say unacceptable is acceptable with U-N in the beginning, and so therefore, you just need to
understand that un is a negative and acceptable is the token acceptable. Hopefully, that makes
sense, so you can experiment at OpenAI website called Tokenizer, and I put the sentence, "Wow,
learning with Stephane is immensely fun!" As you can see, the "wow" was a token. The comma
itself is a token as well. "Learning AWS with Steph," and so Stephane was split in two, because
probably Steph and Stephane are very close. Steph is just my diminutive, and ane is probably just
the French way of having my name, so it was split. Maarek, right now, aare is being split as well,
probably an error, but the model will figure this out as it goes, and then "is immensely fun," all of
these are tokens, and as well, the exclamation point is also a token. So tokenization is converting
these words into tokens, because now each token has an ID, and it's much more easier to deal with
ID than to deal with the raw text itself. So the context, we know is super important. This is the
number of tokens that an LLM can consider when generating text, so different models have different
context windows, and so the larger the context window, the more information and coherence. And
so it's kind of a race now to have the greatest context window, because the more context window
you have, the more information you can feed to your gen AI model. So if you look at GPT 4 Turbo,
it's 128,000 tokens. Claude 2.1, 200,000 tokens, but for example, Google Gemini 1.5 Pro has 1
million tokens, and up to 10 million token in the context window in research. And that means that for
1 million token, you can have a one-hour video fed to your model or 11 hours of audio or over
30,000 lines of code or 700,000 words. So this is very important, because this really tells us that it's
a very important factor. Now, when you have a large context window, obviously, you're going to get
more benefit out of it, but it will require more memory and more processing power, of course, and
therefore, it may cost a little more. So when you consider a model, the context window is going to
be probably the first factor to consider, making sure that it fits your use case. Next, we have the
concept of embeddings. So we've seen that a little bit with RAG, but now we're gonna go deep into
how that works. So the idea is that you wanna create a vector, and a vector is an array of numerical
values, so many numerical values, out of text, images, or audio. So for example, let's put some text,
and we have, "the cat sat on the mat." So first, we're going to do tokenization, so each word in this
example is going to be extracted, "the cat sat on the mat," and then because we have tokenization,
every word is going to be converted into a token ID. It's just a dictionary that says that the word
"the" is 865 and so on. Next we're going to have an embeddings model, so this is where we're going
to create a vector for each token, so as you can see here, the word "cats," the token "cats," if I may
say, is going to be converted to a vector of many values here, 0.025, and so on, and the word "the"
is going to have its own vector, and the vectors can be very big. It could be 100 values if we wanted
to, and all these vectors are going to be stored in a vector database. So why do we convert these
tokens into vectors? Well, when we have vectors with a very high dimensionality, we can actually
encode many features for one input token, so we can have the meaning of the word, we can have
the synthetic role, the sentiment, if it's a positive or negative word, and so much more, and so the
model is able to capture a lot of information about the word just by storing it into a
high-dimensionality vector, and this is what's used for vector databases and RAG. Finally, because
embeddings model can be easily searchable, thanks to nearest neighbor capability in vector
databases, it is a very good way to use an embeddings model to power a search application, and
that is something that can come up in the exam. So I will do my best to show you this, so words that
have a semantic relationship, that means they're similar, will have similar embeddings. So if we
take the token dog, puppy, cat, and houses, and we make a vector, say, with 100 dimension in
them, so we have 100 numerical values for each and every word or token. And so of course, for us,
it's very difficult as humans to visualize 100 dimensions. We're very good at two dimensions, it's a
sheet of paper. Three dimensions, we're very good at because we can visualize things with our
eyes in three dimensions, but 100 dimensions is very difficult, and so to visualize these things,
sometimes we do what's called dimensionality reduction, so we reduce these 100 dimensions, for
example, to two or three dimensions. So if we did it, for example, we would see something like this.
And in this two-dimension diagram, we see that it looks like a puppy and a dog are related, yes,
because a puppy is a small dog, and it looks like the cat is not too far away from a dog. Well, that's
because it's an animal, but house is very different, so it's going to be far away on that diagram. So
of course, with two dimensions, we don't capture any kind of subtleness, but when we have 100
dimensions, we can really say which words relate to each other and why. Another way to visualize a
high-dimension vector is to use colors, so for example, we use color embedding and say each
combination of numbers is gonna make a color, and visually, we can see, for example, in this very
simplified one, that the puppy and the dog, they're very similar because they're very similar colors,
but house is very different. And so intuitively, we can say that, yes, there is a semantic relationship
between tokens with similar embeddings, and that's why we use them, and that's why, once we
have them in a vector database, we can then do a similarity search on the vector database, so we
give a dog and automatically, we'll be able to pull out all the tokens that have a similar embedding
as the dog, and that's it. So that's it for more concepts on Gen AI, but they appear in the exam, so
hopefully now you understand them. You'll be all good, then. I hope you liked it, and I will see you in
the next lecture.
Now let's talk about Guardrails in Amazon Bedrock. Guardrails allow you to control the interaction
between your users and your Foundation Models. You can set up Guardrails to filter undesirable
and harmful content. For example, say we have Amazon Bedrock and we set up a Guardrail to
block any kind of food recipes, and the user is using your model and saying," Hey, suggest me
something to cook tonight." Then Amazon Bedrock will respond, "Sorry, this is a restricted topic."
This is because we have set up a Guardrail to block this topic. Of course, maybe you don't wanna
block food recipes, but something a bit more relevant to your business. You can al use Guardrails
to remove any personally identifiable information or PII, to make sure that your users are safe. You
can also enhance privacy, and you can reduce hallucinations. We'll see what hallucinations are
later on this course. But the idea is that you wanna make sure that the answers are safe and sound
and that they're not just invented off the block. Guardrails can help you with that. You can also
create multiple Guardrails and multiple level of Guardrails. And you can also monitor and analyze
all the user inputs that will violate the Guardrails to make sure that you have set the Guardrails up
properly. That's it, just a short intro to the Guardrail. I hope you liked it and I will see you in the next
lecture.
So now let's have a look at guardrails. So guardrails are a way for you to filter things based on your
requirements to have a more responsible AI. So let's create a guardrail and see the options we
have. So this is my demo guardrail, and we have here a message for blocked prompt. So in case
the prompt is blocked, what do you want to return to your user? Here, it's going to be, "sorry, the
model cannot answer this question." And this could apply as well, the same message for
responses, or you can customize it. Click on next and now we have options to configure a lot of
things. So we could configure content filters, denied topics, word filters, sensitive information filters,
and add contextual ground checking, so a lot of different options. So let's configure one of them just
so we understand how it works and look at the other categories. So here, for example, let's filter
harmful categories. So here, for example, we have a filter strength to increase the likelihood of
filtering harmful content in a given category. So for example, no hate, no insults, no sexual, no
violence, and no misconduct. Let's click on next. What topics do we want to deny? So here I'm
going to call it recipes. And the definition of a topic is what you want the foundation model to
understand what the topic is. So let's say, for example, the topic is food recipes are instructions on
how to cook specific dishes. And we can also add sample phrases if we really wanted to allow the
AI to understand the type of prompts we're trying to block, but no need for it so let's confirm this. So
we have recipes, and then click on next. So do we want to have profanity filter? For example, if
someone is using profane words. Add the custom words and phrases to be blocked so you can
upload them directly. And then do you want to remove any type of personally identifiable
information, PII? So we can say, "yes, I wanna add a new PII" and for example, remove any type of
email. So we'll say mask and to remove any type of email. And then Regex pattern. So any type of
information that follows a specific pattern should be removed as well. And then we have contextual
grounding. So this is to make sure that you reduce the hallucination. So when the model thinks that
it's saying something, it think it's true, but it's actually not true. So I won't go into the settings here,
but it's called grounding and relevance. So let's create it and create this guardrail. So now the
guardrail is created and we can test it. So we can select a model. For example, let's choose
Anthropic and Sonnet. And then we say here, "please suggest me something to cook tonight. I love
Indian food." And let's click on run. And here, as you can see, it's a blocked topic because we said
no food recipes. And the answer we get is, "sorry, the model cannot answer this question." Please
draft an email for me, include my email [email protected] and also include the other
person's email [email protected]. Make sure we discuss important topics for our next business
meeting. So let's click on run. And now we are prompting the model to draft us an actual email. This
is great, but what I expect is for my emails to be masked because this is actual information. So as
you can see here, the model response included a to [email protected] and cc
[email protected] with the relevant email. So here it discusses a lot of things. This is great, but
the final response has been going through the guardrail. And as you can see, the email has been
masked because this was personally identifiable information, but the rest is here. So this is great.
We've seen how this guardrail works and this was a good demo. And I just wanna show you
another way to test the guardrail. So if you go into text and choose, for example, again, we're going
to choose Anthropic, Sonnet, apply it. On the bottom here, we can choose a guardrail and apply the
demo guardrail. And actually you can apply many guardrails at a time if you wanted to stack them
up. So that's it for this lecture, I hope you liked it. You can leave this guardrail on, it's not going to
cost you any money, and I will see you in the next lecture.
So now let's talk about Amazon Bedrock Agents. So the agent is going to be a very smart thing that
is going to act a little bit like a human. The idea is that instead of just asking questions to a model,
now the model is going to be able to start thinking a little bit and to perform various multi-step tasks.
And these tasks may have an impact on our own databases or our own infrastructure. So the agent
can actually create infrastructure, it can deploy applications, and can do operations on our systems.
So here now, the agent doesn't just provide us information. It also starts to think and act. So for
example, it's going to look at tasks, and then it's going to perform the task in the correct order and
ensure that the correct information is passed within the task even if we haven't programmed the
agent to do so. So what we do is that we are going to create what's called action groups, and the
agents are going to be configured to understand what these action groups do and what they mean.
And then automatically the agent will be able to integrate with other systems, services, databases,
and API to exchange data or to initiate action. And also if you need to get some information out of
your systems in terms of unlabeled data, it can look at RAG to retrieve the information when
necessary. So that sounds a little bit magical, but I will show you exactly how that works. So in
Amazon Bedrock, you would go and create an agent and you are defining what the agent is
responsible for. So for example, you are an agent responsible for accessing purchase history for
our customers as well as recommendations into what they can purchase next. And you are
responsible for placing new orders. So the agent knows that it can do all these things. So if the user
is asking something for the agent or the model to do one of these things, Bedrock is smart. It's
gonna say, well, this agent probably is going to be responsible for these actions. Then the agent
knows about a few action groups. So for example, we have defined an API, it's a way to interface
with our system, and we have, for example, defined get recent purchases or get recommended
purchases or get purchase details and then a specific purchase ID. So all these things are known to
the agent in terms of what is the expected input for these APIs, and what do these APIs do, what is
the documentation around it? And all this is provided thanks to an open API schema. And so when
done well the agent can invoke these and behind the scenes, of course, interact with our backend
systems, for example, make changes to our database. The other way to set up an action group is to
use Lambda functions. So Lambda functions are a way to run a little bit of code in AWS without
provisioning infrastructure. So the Lambda functions again can be used to be created and place an
order through a Lambda function. And so it could use the same database or a new database. But
the idea is that I wanted to show here that the agent can interact either with an external API or with
Lambda functions on your AWS accounts. And finally it has access to knowledge bases that we
define, of course. And so for example, say we have a knowledge base around our company
shipping policy and return policy, et cetera, et cetera. So we could those. And so if the user is
asking something about the return policy for an order it's about to do, the agent is smart enough to
also provide that to the user. So the agents are very smart, and they know what to access and then
automatically will know how to do it. So how does that work behind the scenes? Well, say we have
a task, and we give this task to a Bedrock agent. Now the agent is going to look at the prompt. He's
going to look at all the conversation history, look at all the actions available, as well as the
knowledge bases. What are these structures and what is the task? And it's going to take all this
information together and send it to a Generative AI model backed by Amazon Bedrock and say,
please tell me how you would proceed to perform these actions given all this information. So it's
using the chain of thought. Chain of thought means that the output of the Bedrock model is going to
be a list of steps. So step one, you need to do this. Step two, do this, step three, do this, and step
N, last step, do that. And so the steps are going to be executed by the agent, and say, first one, call
an API. Call on this action group and get the results. Step two, do it again. Step three, call another
API, et cetera, et cetera. Maybe it could be a search into a knowledge base, and they get the results
and so on. But so the agent is going to work and do all these things for us thanks to the steps that
were generated by the Bedrock model, which is amazing. And then the final result is return to the
Bedrock agent. The Bedrock agent then sends the tasks and the results to another Bedrock model.
And the Bedrock model is going to synthesize everything and give a final response to our user and
will get the final response. So this is all happening behind the scenes. Of course us, we just use the
agent, and the agent does stuff and automatically we see the final response. But Bedrock is really
nice because you actually have something called tracing on your agent, and this allows you to see
the list of steps that were done by the agent. So you can debug in case you don't like a way an
agent performed something. So that's it for Amazon Bedrock Agents. I hope you liked it, and I will
see you in the next lecture.
So now let's talk about the integration of Amazon Bedrock and a service called CloudWatch. So
CloudWatch is a way for you to do cloud monitoring. So CloudWatch has many services, but you
can have metrics, you can have alarms, you could have logs and so on in CloudWatch and view
them all. And many services and areas have integration with CloudWatch. So for Amazon Bedrock,
what you can do is you can do model invocation logging, and that's something that can come up at
the exam. So the idea is that you want to send all the invocations, so all the inputs and the outputs
of model invocations into either CloudWatch Logs or Amazon S3. And this can include the text, the
images, as well as the embeddings. And this is very helpful because you get a history of everything
that happened within Bedrock. On top of it, you can analyze the data further and build alerting on
top of it, thanks to CloudWatch Logs Insights, which is a service, which allows you to analyze the
logs in real time from CloudWatch Logs. So the idea here is that we get full tracing and monitoring
of Bedrock, thanks to CloudWatch Logs. The other one is CloudWatch Metrics. So the idea is that
Amazon Bedrock is going to publish a lot of different metrics to CloudWatch, and then they can
appear in Cloud Metrics. And some of them may be for general usage of Bedrock, but some of
them may also be related to guardrails. So there is one called content filtered count, which helps
you understand if some content was filtered from a guardrail. And so what we can do with it is that
once you have these kind of metrics in CloudWatch Metrics, you can build cloud alarms on top of
them to get alerted, for example, when something is caught by a guardrail or when Amazon
Bedrock is exceeding a specific threshold for a specific metric. So model invocation logging and
CloudWatch metrics are very important in Amazon Bedrock and they are topics that can appear in
the exam. So I hope you liked it and I will see you in the next lecture.
So let's have a look at the integration between Bedrock and CloudWatch logs. So we're gonna go
under settings on the bottom left, and you have model invocation logging. So here we can definitely
enable it. And then this is going to collect all metadata request and responses for all model
invocations in your accounts. So you can select the type of data you want to include with logs. So it
could be text, images and embeddings. And then the destination could be Amazon S3 buckets, or it
could be CloudWatch only, or it could be both. So I'm just going to use CloudWatch only. And then
you need to specify a log group name. So I call this one Bedrock Invocation Logging. And then I will
create a user new role. So this is the role that Amazon Bedrock will need to send data to
CloudWatch log. So I call one Bedrock Invocation Logging Role. Okay. And next we have external
location for larger delivery. So in case it's over 100 kilobytes, then it can be published to Amazon
S3, but we don't need this right now, so we're going to just save these settings. So we get an error
saying the specified log group doesn't exist, so we have to create it manually in CloudWatch in this
instance. Maybe this will be fixed by the time you use this. But let's go into CloudWatch logs, log
groups, and then you're going to create a new log group. And the name is going to be this one that
I'm going to copy and paste. So we can set up some settings. Do you want the log to expire or not?
But we're just going to click on create and get going. Okay, so now my log group is created. It is
here. And let's go ahead in Amazon Bedrock. And we're going to save these settings one more
time. And we now need to say that we want to use an existing service role, that is right here, so let
me refresh this. This is sometimes a bit annoying when you have issues on the console, but AWS
may fix this at some point. So here we go. Now we select the existing role that has been created
and save the settings and we should be good to go. Okay, so the settings have been saved
successfully. And what I can do now is I can go in chat, I will select a model, and I will just click on
run. And then we're going to get, so we send an input, and then we get an output, and we're good to
go. Now let's wait a little bit and then go into CloudWatch logs to see if this appears. So I'm going
into CloudWatch logs and we refresh this page, and we have one log stream here. It is Bedrock
Model Invocations, and here we have the information that the permissions are set correctly for
Amazon Bedrock logs. And then we get some information about a model invocation. So we get a lot
of information around it, but we know that, for example, the model ID is
Amazon.Titan-Text-Express-V1. This is a way for us to identify the models that we're using. We get
information about the region, and then we get the messages. So we have a user, that's us, and we
sent this input, and then we have some information around the configuration for this invocation. And
how many tokens was it? 271. And the output is this message. So the assistant role means that it's
the model itself, and the content is this. And so again, we get the information that the latency was
4,038 millisecond. We get information around the output token, the total token, and so on. So this is
very helpful because as you can see, a lot of information is included here, and we can use this
information later on to debug everything. For example, we could, for example, run an alarm and to
look at if the latency is always beneath a specific number. And if one day the latency reaches a high
number, then we may want to send an alert saying, "Hey, your latency requirements are a little too
high now, and the user experience may be degraded." So that's the way of doing it. But hopefully
you get the idea of integrating Amazon Bedrock with CloudWatch Logs. So the other thing we can
do is go into CloudWatch, and go into all metrics, and then click on Bedrock. And you may have
more metrics than me, but we have here metrics by model ID. You can look at it, or across all model
ID. And for example, we can have a look at the number of invocations or for example, the
invocation latency. And we can see here that the latency is being plotted on this graph. So now of
course, if you have a sustained usage of Amazon Bedrock, then you will see a curve here with
multiple data points. But a lot of metrics are being sent by Bedrock into CloudWatch metrics. And
you can then build metrics, graphs, dashboards, and alarms on top of it as well in case, for
example, the invocation latency gets too high. So that's it for this lecture, I hope you liked it, and I
will see you in the next lecture.
019 Amazon Bedrock - Pricing & Other Features
So let's talk about other features of Amazon Bedrock, and of course, if they start appearing at the
exam a lot more, I will do deep dives on them, but we have Amazon Bedrock Studio. This is a UI
that you give to your team and that allows your team to use Amazon Bedrock more easily and also
to create applications faster. There's watermark detection, which is a feature of Amazon Bedrock to
send an image to it, and the feature will tell you if this image was generated with Amazon Titan or
not. So now for pricing on Amazon Bedrock, so you have the On-Demand mode, and this is where
you pay as you go. There is no commitment, and you're going to get charged for text models based
on every input and output token processed, for embeddings model, again, for every input token
processed, and for image models, you're going to be charged for every image generated. This
works with the base models only that are provided as part of Amazon Bedrock. If you want to have
some cost savings, you can use the batch mode. So in the Batch mode, you can make multiple
predictions at a time, and the output is going to be a single file in Amazon S3. And by using Batch
mode, you're going to get responses a bit later than in real time, but at least you're going to get
discounts of up to 50%. For Provisioned Throughput, this is when you want to purchase model units
for a certain time range, for example, for one month or six month, and you're going to get a
guaranteed throughput, that means that you're going to get a maximum number of input and output
tokens processed per minute as a guaranty. And the idea is that you're going to maintain capacity
and performance, which is very important, but it does not necessarily provide you with cost savings.
So Provisioned Throughput works with base models, but is necessary if you have a fine-tuned
model or custom models or imported models. In this case, you cannot use On-Demand, you have to
use Provisioned Throughput. So you need to also understand the pricing behind improving a model.
So if you use prompt engineering, this is when you have the techniques that we'll see them in the
next section or techniques to improve the prompt in the output of a model. Well, this requires no
further model training, so there's no additional computation or fine-tuning, so this is very, very
cheap to do. If you use RAG, Retrieval Augmented Generation, it uses an external knowledge base,
and because the financial model does know everything. So it's less complex, there's no financial
model change, you don't need to retrain your model or your fine-tuning, but there is a cost of
course, because now you need to have a vector database and you need to have a system that
allows you to access that vector database. Then we have instruction-based fine-tuning. So this is
when the foundational model is fine tuned with specific instructions, and that requires additional
computation, but this is really done to steer how the model is going to answer a few important
questions to set the tone maybe for the model. And finally, domain adaptation fine-tuning is very
expensive because now you're going to adapt a model trained on the domain specific data sets,
and that includes creating a lot of data and then retrain the model with all the data. Remember, it's
unlabeled, whereas instruction based was labeled, and this requires intensive computation. And so
therefore this intuitively will cost more than instruction based fine-tuning. So how can you do cost
savings on Amazon Bedrock? Well, if you use the On-Demand pricing model, it's going to be great
for unpredictable workload and you have no long-term commitments. If using the Batch mode, you
get up to 50% discounts, but of course you need to wait a little bit for your results. For Provisioned
Throughputs, usually it's not a cost saving measure. The goal of it is to really reserve capacity from
AWS and their providers, and so therefore, you should not use this as a cost savings strategy. If
you modify the temperature, the Top K or the Top P parameter, you modify how the model is
working, but this has no impact on the pricing. And if you have the model size in mind, usually a
smaller model is going to be cheaper, but again, this varies on who you get the model from. So one
of the main driver of cost savings in Amazon Bedrock is to modify the number of input and output
tokens. This is the main driver of cost, so try to get your prompt as efficiently written as possible and
try to get your output as concise and short as possible as well if you are worried about cost savings.
So that's it for this lecture, I hoped you liked it, and I will see you in the next lecture.
020 Amazon Bedrock - AI Stylist - Hands On
So we have explored a lot of options in Amazon Bedrock, but I want to show you a full end-to-end
use case because the idea is that right now, we've done everything in the console, as a playground,
but actually to use Bedrock, you need to implement your own code and do what's called API calls
into Amazon Bedrock to invoke the features we just used, and to build your application on top of
Amazon Bedrock. And it's pretty cool because, well, there is an example that is provided by AWS,
which is an interactive demo called the AI Stylist. So let's launch this demo because I wanna show
you what a final product would look like when it's backed by Amazon Bedrock. So here, this is an
application that AWS has created, that's going for you to generate looks based on your use case.
So let's click on try free demo, and here we're going to find an outfit in less in five minutes, and we'll
see all the Amazon Bedrock capabilities being used as part of the scenarios and they should make
sense to you. So let's start exploring. So here we have an AI Stylist and it's going to perform an
outfit for us. So we start now and we get the first message saying, "Hey, I'm your AI stylist. Let's find
you an outfit that makes you feel comfortable and confident." So here we have the prompt saying,
"I'm a consultant and I'm traveling to New York next week. What kind of outfit should I wear on my
first day at the office?" So you can't edit this at the moment, but you click on generate my look. And
here, this explains how things are working behind the scenes. So we see that there is a customer
prompt, and we see here that we have knowledge bases, and so a few knowledge bases have
been created. We have one for the product catalog. This is all the private data we have within our
company. We have fashion trends, this is a public data set, order history, which is private data and
customer review, which is private data. But for this use case, only two knowledge bases are being
used, and there's one thing that's using them, it's called an AI agent. Now, AI agent is a little bit
more advanced, that's why we haven't seen them in details, but the idea is that AI agents are smart
enough to query these knowledge bases and put things together. And so we have an agent for our
product catalog and we have an agent for image generation. And so this agent, based on the
prompts you have, are smart enough to go into a knowledge bases, look at them, and then create
the final content. So click on view your looks, and the AI Stylist is saying, "Hey, I've selected two
looks for you. There is a business formal and there is a business casual." And so the images are
generated by AI and the text generation is also generated, thanks to what we have found in our
knowledge base. So next we have a suggested prompt saying, "Hey, what do people like about the
business formal jackets?" And now the agent is smart enough to look again into our knowledge
base, which is our customer review, and to say, "Well, people like the quality, color, and fabric." So
it tells you there are 325 customer reviews and it summarizes them in one thing. So again, from the
AI standpoint, from the application standpoint, all these things happen behind the scene, but you as
a user, you're just interacting with this AI. So now say, "Show me some specific reviews and talk
about the jacket itself." And so again, the agent is going to go into your knowledge bases, find the
product, and find the review and create this kind of outcome. Then, "What size should I wear?" So
we keep on chatting with it and we'll say, "Well, based on your previous orders," because it has
access to our previous orders, "I suggest ordering size M." And okay, "Please add it to my cart." So
now the agent is able to also modify the cart and add it. So this is very nice, and we can add more
stuff to the cart. So again, it's going to do some back and forth with our knowledge base and also
with our APIs to add some data into our carts. And then say, "Okay, this is the cart we have for you.
Are you ready to finalize the order?" Yes, click on finalize the order, and there you go, the order is
being generated. And so this is pretty cool because well, all of this is now a new way to interact with
websites and with applications just thanks to AI. So we click on view cart, and here we go, we have
the cart, it's going to be delivered to our address, and we know about the weather, so it's also
recommending, for example, to add more stuff into our order. So this is quite cool, I think, because
it really shows you how Bedrock is powering this demo. And now that we've seen all the stuff about
Amazon Bedrock, this should make sense to you. So I hope you liked it, and I will see you in the
next lecture.