Binder
Binder
reproduce thetables and figures in this paper solely for use in jourmalistic or
Encoders and Decoders scholarly works.
2023
Multiple architectures focused on encoding and Jul Ashish Vaswani" Noam Shazeer" NIki Parmar Jakob Uszkoreit*
[cs.CL] Google
Llon Jones"
Rescarch
Aldan N. Gomez*
University of Toronto
Lukasz Kalser
Google Brain
Htonogoogiereom idanetorontovedu heeakateergooseem
Each typeof model has different capabilities The dominant sequcnce transduction models are bascd on complex recurrent or
convolutional neural networks that include an cncoder and a decoder. The best
perfoning models also conneect the encoder and decoder through an attçntion
(embedding/generation) mechanism. We propose a new simple network architecture, the Transformer.
based solely on attention mcchanisms, dispensing with rocurrence and comvolutions
cntirely. Experimcnts on two machine translation tasks show these to
Transformers
ModelOntology
GPT-4?
1T
PaLM
BLOOM / GPT-3
100B
Llama2 FLAN-UL2
#Parameters Command T5/ FLAN-TS
10B
MPT
BART
Command-light
1B
BERT/RoBERTa
100M
DistilBERT
3.20
Decoders
They
Decoder - models take asequence sent
of words and output next word
lion
me
Examples
GPT-4, Llama, BLOOM, Falcon, ...
Encoder-decoder -encodes a sequence of words and use the encoding + to output a next word
Examples
T5, UL2, BART,...
me
<0.91, .., -1. 78>
Tasks that are typicaly (historically) performed with models of each architecture style
Prompt Engineering
Prompt engineering - the process of iteratively refining a prompt for the purpose of eliciting a particular
style of response
4:42 1 13.55 1x
In-context Learningand Few-shot Prompting
In-context learning conditioning (prompting) an LLM with instructions and or
demonstrations of the task it ismeant tocomplete
Source: Brown, Torm, etal. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020): 1877-1901
Example Prompts
Add 3+4: 7
Add 6+5: 11
Add 1+8: (2-shot addition)
SOurce Liu Yi et al "Promnt Iniectionattack acainst LLM-integrated Applications." arXiv preurint arXiv:2306.05499(2023).
ORACLE
University
Prompt injection (jailbreaking) -to deliberately provide an LLM with input that attempts to cause it to
ignore instructions,cause harm, or behave contrary to deployment expectations
Training
Training
Prompting alone may be inappropriate when: training data exists, or domain adaptionis required.
Domain-adaptation - adapting amodel (typically via training)to enhance its performance outside of the
domain/subject-area it was trained on
Hugo,
e, Training
et et
al. al.
"Llama:
a ~100
davsGPUs
**384 days GPUs
2048
21 days GPUs GPUs
8-16
Open Language 7*512 Pre-train
rA
and
day 1
fficient
sdation Model
on GPUs100 hours-days
GPUs 48 Fine-tune
nguage
l weeks days 7 hours
a GPUs 8 GPU 1
Single
models."
e
arXiv
GPU
"
preprint in
hours-days Prompt-tune
One GPUs 48
hoursGPUs hours
4 GPU 1 NJA
v
.13971
0).t Day
(2023).
hours-days GPUs48 GPUs 16
hours hoursGPUs 2 LORA
NJA
(Geiping
(**Le
[*Touvron
ScaoGoldstein, &
8-16GPUs Inference
GPUs 6 GPUCPU /
GPU 1
et et
al, al,
2023) 2023] 2022]
oRACLE
University
Decoding
Ari Kobren
RESEARCH SCIENTIST
ORACLE
Decoding
an LLM
Decoding - the process of generating text with
pet. They sent me
I Wrote t o the zoo to send me a
When decoding temperature isa (hyper) parameter that modulates the distribution over vocabulary.
When temperature isdecreased, the distribution is more peaked around the most likelyword
When temperature is increased, the distribution is flattened over all words
With sampling on, increasing temperature makes the model deviate more from greedy decoding
Hallucination
AriKobren
RESEARCH SCIENTIST
ORACLE
0:07
1x HD
Hallucination
The current driving convention in the United States is to drive on the right
side of the road, in the same direction as traffic flows on streets and
highways. This is based on the system used in the United Kingdom and most of
Europe, which has been in use since the 19th century. During the first half of
the 20th century, the United States gradually adopted the system of driving on
the left side of the road. In the 1950s, most states had converted to this
Convention.
FLAN-T5
There are some methods that are claimed to reduce hallucination (e.g., retrieval-augmentation)
There is no known methodology to reliably keep LLMs from hallucinating.
[Shuster et al, 2021]
iN2021, 2021
255 1 5
DeLL
Groundedness and Attributability
Grounded -generated text is grounded in a document if the document supports the text
Attributed QA, system must output adocument that grounds its answer [Bohnet et al, 2022]
Source: Bohnet, Bernd, et al. "Attributed question answering: Evaluation and modeling for attributed large language models." arXiv preprint arXiv:2212.08037 (2022).
Source: Honovich, Or, et al. "TRUE:Re-evaluating Factual ConsistencyEvaluation." Proceedingsof the 2022 Conference of the North American Chapter of the Association for
Computational Linguistics: Hugan Language Technologies. 2022.
ORACLE
University
LLM Applications
Ari Kobren
RESEARCH SCIENTIST
ORACLE
Retrieval Augmented Generation
Primarily usedin QA, where themodel has access to (retrieved)support documents for a query
1 2
Input Corpus LLM
Source: Shuster, Kurt, et al. "Retrieval Augmentation Reduces Hallucination in Conversation." Findings of the Association for Computational Linguistics: EMNLP 2021. 2021.
Source: Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Iufornation Processing Systems 33 (2020): 9459-9474.
Source: Izacard, Gautier, et al. "Few-shot learning with retrieval augmented language models." arXiv preprint arXiv:2208.03299 (2022).
3:14 9:44 1x HD
Code Models
Great fit between training data (code + comments) and test-time tasks (write code + [Github, 2023]
comments). Als0, code is structured> easier to learn
This is unilike LLMs, which are trained onawide variety of internet text and used for many purposes
(other than generating internet text); code models have (arguably) narrower scope
(2021).
Source: Chen, Mark, et al. "Evaluating large language models trained on code." arXiv preprint arXiv:2107.03374
HD
ce
5.00 F4
DeLL
Multi-Modal
These are models trained on multiple modalities, e.g, language and images
Models can be autoregressive, e.g., DALL-E or diffusion-based e.g., Stable Diffusion
(Ramesh et al, 2022] [Rombach et al,2022]
Diffusion-models can produce a complex output simultaneously, rather than token-by-token
Difficult to apply to text becausetext is categorical
Some attempts have been made; still not very popular [Li et al, 2022; Dieleman et al, 2022]
Source:Ramesh, Aditya, et al. "Zero-shot text-to-image generation." International Conference on Machine Learning. PMLR, 2021.
Source: Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models. 2022 IEEE." cVF Conference on Computer Vision and Pattern Recoguition (CVPR). 2021.
Source: Li,Xiang Lisa, et al. "Diffusion-LMImprovesControllable TextGeneration" Advances inNeural information Processing Systems. 2022.
SOurce Yasunaca Michibiro pt al "RetrievalAnomented MultimodalLanenace Modeline"arXi nrenrinLarXin:2211 12561 (2022)
6321 9:44 1x HD
Language Agents
A budding area of research where LLM-based agents
Create plans and "reason"
Take actions in response to plans and the environment
Are capable of usingtools
Some notable work in this space:
ReAct [Yao et al, 2022]
Iterative framework where LLM emits thoughts, then acts, and observes result
Toolformer [Schick et al, 2023]
Pre-training technique where strings are replaced with calls to tools that yield result
Bootstrapped reasoning (Zelikman et al,2022]
Prompt the LLM to emit rationalization of intermediate steps; use as fine-tuning data
Source: Yao, Shunyu, et al. "ReAct: Synergizing Reasoning and Acting in Language Models." Te Eleventh International Conference on Learning Representations. 2022.
Source: Schick, Timo, et al. "Toolformer: Language models can teach themselves to use tools." ar Xiv preprint arXiv:2302.04761 (2023).
DULE. EIKIIail, criG e a. Jlar:DUULsuapp1uig Ieasomng wiuireasonng. uvunces n ivEurut inJurmuvn rucess1ng sysiems 3 (20LA). 1JHIO-1IHO0.
9:13 944 1XC
ORACLE
University
OCIGenerative AIIntroduction
Rohit Rahi
VP.CSS OUCLOUD DELIVERY
ORACLE UNIVERSITY
OCIGenerative AI Service
Fullymanaged service that provides
Generative Al Overview aset of customizable Large Language
Power your apps with large language models and generative AI
Models (LLMs) available via a single
O Geneatve Al sa futly managed service that provdes a set of state of -the-ant, customizatle LLMs thatcover a wide range of use cses tor tot geneaion Use API to build generative AI
me playground to try out the modets oul-of-he-boK or create and hos your own Sne-tuned austom modes based on your oun dta on dedcated Al Chusters
applications.
Metrics in himanshu dataCompartnent
Dedicated Al clusters Custom models Endpolnts Choice of Models: high performing
2 20 2 pretrained foundational models from
Get started Meta and Cohere.
Playground
your use cases ano rene promps aro pt arieter Wren youce appy wan the iesuts yow Can ve Ie coe ard i e e e e e yE Flexible Fine-tuning: createcustom
Goto playgrournd models by fine-tuning foundational
models with your own dataset.
Dedicaled Al custers Custom models Endpoints
Spen p degcated hardwate unts for Creae Custom modets by fne tung
fietufrgcustom models anu hostig
datase Dedicated AlClusters: GPUbased
compute resources that host your
fine-tuning and inference workloads.
loes OCI Generative AI service work?
OCIGenerative
AI Service
Pretrained
Generation Summarization Embedding Foundational
embed-english
Models
V3.0
command command embed Text Generation
multilingual-v3.0
Generate text
Bcohere 8cohere 3cohere
Instruction-following Models
embed-english
light-v3.0
embed Text Summarization
command-light multilingual-light
V3.0 Summarize text with your.
Bcohere instructed format, length, and tone
cohete
Embedding
embed-english
llama 2-70b-chat light-v2.0 Convert text to vector embeddings
Semantic Search
8cohere Multilingual Models
Fine-tuning
Optimizing apretrained foundational model on asmaller domain-specific dataset.
Improve Model Performance on specifictasks
Improve Model Efficiency
Use when apretrained modeldoesn't perform your task wellor you want to teach it something new.
OCI Generative Al uses the T-Few fine-tuning to enable fast and efficient customizations.
Pretrained Custom
Model Model
Fine-tuning
Custom Data
Dedicated AI Clusters
Dedicated Alclusters are GPU based compute resources that host the customer's fine-tuning
and inference workloads.
Gen Alservice establishes a dedicated Al cluster, which includes dedicated GPUs and an exclusive
RDMAcluster network for connecting the GPUs.
The GPUsallocated for a customer's generative Al tasks are isolated from other GPUs.
Infrastructure View
GPU Pool
Logical View
GPU GPU GPU
Dedicated Al
GPU GPU GPU Cluster
Allocate
Running within dedicated GPUs
RDMA network
Demo : Generative AI Service
Walkthrough
Rahi
CLOUD DELIVERY
IVERSITY
ORACLE
University
Generation Models
Rohit Rahi
VP, CSS OUCLOUD DELIVERY
ORACLE UNIVERSITY
143
Tokens
Language models understand "tokens" rather than characters.
One token can be apart of a word, an entire word, or punctuation.
Acommon word such as "apple" is a token.
Awordsuch as "friendship" is made up of two tokens "friend" and "ship."
Number of Tokens/Word depend on the complexity of the text.
Simple text: 1 token/word (Avg.)
Complex text (less common words): 2-3 tokens/word (Avg.)
Generative Al Playground
To get started. cho0se a modei and a preset prompt example. Then,refine the prompts and parameters to fit your Use cases See mogeltypes for more
informaton
Temperature
HOdei Example Determines howcreative the model should be; close
cohere command v15.6 View model cetats Generate an emai Vew code Parameters
Input Maocmum output okens
secondtoprompt engineering in controlling the
Enter your prompts here andchok generate lo begn model response. To begin a new project, click "Cear
600 output of generation models
AS a coporate vce president generate an emal congratutating a team that has jus! shipped a new coud
Ingut ctput tckers should be less has 000
servce. Emphasze the great postve impact the new sence wE nave on the producthvty of thelr
Customers
Temperature
0.5
Top p, Top k
Top p
0.75
Two additional ways to pick the output tokens
besides temperature
Top k
Output
Slop sequences O Presence/Frequency Penalty
Vew modet responsÇ below if you are unsatissed with the response. adjust parameters and regenerate for a Enter sequence and press enter
more Gestabe opu!
HI Teanm
Frequeny penalty O Assigns apenalty when a token appears frequently
lam enaáng to congratulate you on shippng the new cdoud service I know thks nas been a long and produces less repetitive text
project and you have al worked tirelessty tO get this servce acrOSs the finsn äne Wel done
Presence penalty O
Iam excited about the postvê impact this new service wll have on our cUslomers' productivty. Your
hard work has paid ofr and you shouid be proud of nhat you have achieved. I bebeve tha: this servke Show Likelihoods
MS De a game-changer for cur company and wil ghve uS a stong compettve edge in the market.
Show likelhoods O
Temperature is a (hyper) parameter that controls the randomness of the LLM output.
The sky is
Temperature of 0 makes the model deterministic (limits the model to use the word with the highest
probability).
When temperature is increased, the distribution isflattened over all words.
Top ktells the modelto pick the next token from the top 'k' tokens in its list, sorted by probability.
The name of that country is the
1 United 12% IfTop k is set to 3, model will only pick from the top 3
2 Netherlands 2.7%
options and ignore all others.
3 Czech 1.9% Mostly pick "United",but willpick "Netherlands" and
"Czech" at times.
750 41143
Top p
Top p is similar to Top k but picks from the top tokens based on the sum of their probabilities.
1 United 12% I If pisset as 15,then it will only pick from United and
Netherlandsas their probabilities add up to 14.7%.
2 Netherlands 2.7%
If pis set to 0.75, the bottom 25% of probable outputs
3 Czech 1.9% areexcluded.
Stop Sequences
A stopsequence is a string that tells
Model
Top p O
075 If a period(.) is used as a stop
Top k O
sequence, the model stops generating
text once it reaches the endof the first
Generate Copy nput Clear
Charecter count - 140 | Token limit 4000 sentence, even if the number of
Stop sequences O
Output tokens limit is much higher.
View modei response below If you are unsatisfied wth the response, adjust pararmeters and
regenerate for a more desirable output Frequency penalty O
Earth is the third planet trom the Sun and the fifth largest planet in the solar system in terms of
size and mass.
Frequency and Presence Penalties
These are useful if you want to get rid of repetition in your outputs.
1X c
DLL
Show Likelihoods
Every time a new token is to be generated, a number between -15 and Ois assigned to all
tokens.
Tokens with higher numbersare more likely to follow the current token.
Book (-4.5)
Food (-5.0) High Likelihood
111213
DeLL
%
PressEse to exit full screen
ORACLE
University
Summarization Models
C
Summarization Model
Command
Same as oneof the pretrained text generation models, but with
parameters that youcan specify for text summarization
8cohere
Use cases include. but not limited to:
News articles, blogs,chat transcripts, scientific articles, meeting notes,
and any text that you should like to see a summary of
1:07394
DeLL
Summarization Model
Temperature
Parameters Determines howcreative the model should be; Default
temperature is 1and the maximum temperature is 5.
AOdel Example
conere.command v156 View odel details Summanz VIew code
Input
Orade's strategy is buit around the reaty that enterprises work wth A through three Parameters
Length
Giferent modalities: intrastructure, models and services. and wthin applications.
Fist we provide a robust intrastructure for training and serving models at Scale Through our
partnership with NVIDIA, we can gve custonmers superckusters, which are powered by the
Length (O
Short
Approximate length of the summary. Choose
latest GPUs in the market connected together with an utra-ow-latency RDMAover from Short, Medium, and Long.
converged ethernet (RoCE) network Thts solution provides a nighly pertormant, cost Format O
ettectve method for training generative Almodeis at scake Many Al startups like Adept and
Bullets
IAOsaictAL are buding their products directy on OCI.
Second, we provide easy-to-USe ckoud services tor developers and scentists to utlze in fulty
Extractiveness Format
managed implementations. We're enabling new generatrve Alservices and business Auto
Embedding Models
Rohit Rahi
VP. CSS OUCLOUD DELIVERY
ORCLE UNIVERSITY
Embeddings
Embeddingsare numerical representationsof apiece of text converted to number sequences.
Apiece oftext could be a word, phrase, sentence, paragraph or one or more paragraphs.
Embeddings make it easy for computers to understand the relationships between pieces of text.
<-0.44,...,-1. 1>
They
<-0.27,..., 4. 31> Dn
sent
<1.54, .., -2.92>
me
<0.91,.., -1.78>
<-0.71,.., 2.45>
Embeddings
sequences.
numerical representations of a piece of text converted to number
Embeddings are
could be a word, phrase, sentence, paragraph or one or more paragraphs.
A piece of text
the relationships between pieces of text.
Embeddings make it easy for computers to understand
<-0.44,..., -1.1>
They Dn
<-0.27, .., 4. 31>
sent
<1.54, .., -2.92>
me <Õ.91,.., -1.78>
<-0.71, .., 2.45>
Word Embeddings
Word Embeddings capture properties of
the word.
Size
Age Size Other Properties
3:35 13:04
1x HD
Semantic Similarity can be
Cosine and Dot Product Similarity
similarity.
used to compute numerical
Embeddingsthat are numericallysimilar
elephant are also semantically similar.
dos caion tiger E.g., embedding vector of "Puppy" will be
pupp kitten more similar to "Dog" than that of "Lion."
d,
Word relatedness in twWo dimensions
Sentence Embeddings
Asentence embedding associates every sentence witha vector of numbers.
sentences are assigned to different vectors.
Similar sentences are assigned to similar vectors, different
say" will be more similar to the embedding vector of
The embedding vector of "canine companions
"woof" than that of "meow."
Feline friends say
meow
canine companions say
woof
Embeddings
Sentences Bovine buddies say
0.0280ó 0.03906
Feline friends say : moo
0.0420 0.03006
Canine companion says
-0.024 0.0568 :
Bovine buddies say -0.4280
-0.0829
A quarterback throws a A
quarterback
football throws a football
HD
1 ce
Embeddings use case
DeLL
Embedding Models in Generative Al
embed-english English and Multilingual
V3.0
embed Modelcreates a 1024-dimensional vector for each embedding
multilingual-v3.0
Zcohere Max 512 tokens per embedding
Prompt Engineering
Rohit Rahi
Prompt &Prompt Engineering
Prompt
The input or initial text provided Prompt
tothe model INPUT
Large
Prompt Engineering Generated Text Language
Model
The process of iteratively refining OUTPUT
a prompt for the purpose of
elicitingaparticular style of
response
DLL
LLMsas next word predictors
Prompt Completion
dedicated to
Forefathersbrought forth anew nation, conceived in Liberty, and under
nation, God,
the proposition that all men are created equal.... that this
Four sCore
people, by the
and seven
shall have a new birthof freedom -- andthat governmentof the
years ago our
people, for the people, shallnot perish from the earth.
Aligning LLMsto follow instructions
LLAMA 2: Open Foundation and Fine-Tuned Chat Models
Aunn' Louts
Abt Am Nhesy Bhy tw Sounya atza
ik
next word on a large dataset of Internet text, uun Iun Manin Kanas Vlr Krviz Madan Khabs ubel Kkoumunn Arem Nory
Punt Sungh Koura ar- Anne Lachuu Thuhuut lLvr ya Lee ana Lsànkh
Ynghi lu Vuning Mao Nawier Martint bdor Mhnykw ushkar Maha
rather than tosafely perform the language lgor tohtog Yin Nk Andrw vulson Jevey Reizmstetn Rasta Rrgts Kalyan Saladi
Alun Schehen Run Sha Erk Mchacl Smuth Ranjan Subamanian Xnging EBen Tan Buh Tang
Rosa Tayko Adina ams Jan Xing Kun Puin Xu hng Yan liyan }atow Vuchen Zhang
task that the user wants. Anga Fan Aiekana Kambudur huran Narang Auretan Rodrgurz Robvt Sog
Sengry Edunow Thems Sak
GenAL, Meta
In this work, we derkp and rekuse |Asna 2, a okrtkn of petrand and inetud
completion LLM Large Langage mdels (LLMs) ranging n cale from 7 billn to 70 bllos purametta
Ous fineuned LLsb, callkd LuAMA C , are uptimLtd foe diakgue une cae Cur
modets outpertorm opcnutie chal modes on most b h r r ksted, and bad on
at human vaatk k hetptulnem snd akty maybea sunable ubotitute fer cko
improeenb cf LiAMA 2Cu in cndrr to cnabe the communsty to bukd cn wka
cootnbute to the respenbe dorkgpmentof LIM
k-shot prompting- explicitly providing k examples of the intended task in the prompt
cheese =>
prompt [Brown et al, 2020]
are trained on aspecific prompt format. If you format prompts
yget odd/inferior results.
Beginning of
<<S> instructions
[INST)
<<SYS>>
{{system_prompt }}
<</SYS>>
{{user_message }} User message
specifying instructions
/INST] to the model
Advanced Prompting Strategies
Chain-of-Thought - provide examples in a promptis to show responses that include a
reasoning step
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can
has 3 tennis balls. How many tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis
balls.5 + 6 = 11. The answer is 11.
OracleCloud Infrastructure
Rohit Rahi
VP, CSS OU CLOUDDELIVERY
ORACLE UNIVERSITY
Training LLMsfrom scratch with my data?
G 1:07 9581
DAL
In-context Learning/
Few shot Prompting
1x
218 1 56
Customer Data Fine-tuning a
Unique New Skills Unique
pretrained model
Writing Style Domain
Optimize a model on a smaller
domain-specific dataset.
Base
Model (6B/52B) Recommended when a pretrained
model doesn't perform your task
wellor when you want to teach it
Something new.
Fine Tune
Adapt to specific style and tone, and
Base Model
Layer learn human preferences.
Weights Weights
Customer-specific Model
1X ce HD
956
DGLL
Fine-tuning Benefits
Improve Model Performance on specific tasks
More effective mechanism of improving model performance than Prompt Engineering.
By customizing the model to domain-specific data,it can better understand and generate
contextually relevant responses.
Your receipt will note "on sale" next to the line item. If you're not sure,
You can upload a photo of the receipt and we can verify for you.
Dress_Receipt.png
1 reforence
Great, it looks like your dress was purchased at full price and can DRESS RECEIPTPNG
be returned. You can initiate your return through our online portal F/W 23 LONG DRESS $380
Customize LLMs with your data
Method Description When to use? Pros Cons
Few shot Provide examples in the topics that are Very simple Adds latency to
prompt tosteer the modelLLM already understands each model
Prompting to better performance necessary for the text generation No trainingcost request
Increase in
LLM does not perform well on aparticular Requires a
task model labeled dataset
Adapt a pretrained LLM to performance on
Fine perform aspecific task on Datarequired to adapt the LLM is too large
which can be
tuning a specific task expensive and
private data for prompt engineering
No impact on time-consuming
Latency with the current LLM is too high model latency to acquire
act)
Prompt Engineering is the easiest tostart Fine Tuning All of them
with; test and learn quickly. to
needs
If you need more context, then use Optimization
LLM
Retrieval Augmented Generation (RAG). model
If you need more instruction following, Retrieval
then use Fine-tuning. the
Prompt Augmented
(how Generation
Engineering (RAG)
Context Optimization
(what the model needs to know)
Customize LLMs with your data
Context Optimization
(what the model needs to know)
ORACLE
University
ModelEndpoint: Adesignated point on a Dedicated AlCluster where a large language model can
accept user requests and send back responses such as the model's generated text
Create a Dedicated
AlCluster Create Endpoint Serve Model
(Hosting)
Dedicated AI Clusters Create dedicated Al cluster
Effectively asingle-tenant deployment where the Dedicated Alclusters can takea few minutes to create. After a cluster is in an
active state, you can use it for fine-tuning or hosting workloads
GPUs in the cluster only host your custom models.
Compartment
CO5
Desciption Optonal
The minimum cluster size is easier to estimate
based on the expected throughput.
Cluster type 0
O Hosting Fine-tuning
Cluster Types Base model Instance count
Cohere.command 1
Fine-tuning: used for traininga pretrained
foundational model This will provision 1 Large Cohere unit
V I commit to 744 unit hours for this hosting dedicated Al cluster. I can use this cluster to host
Hosting: used for hosting a custom model models with the same base model by Creating endpoints on this cluster.
endpoint for inference Show advanced options
T-Few Fine-tuning
Traditionally, Vanilla fine-tuning involves updating the weights of all (most) the layers in
the model, requiring longer training time and higher serving (inference) costs.
T-Few fine tuning isan additive Few-Shot Parameter Efficient Fine Tuning (PEFT)
technique that inserts additional layers,comprising ~0.01% of the baseline model'ssize.
The weight updates are localized to the T-Few layers during the fine-tuning process.
Isolating the weight updates to these T-Few layerssignificantly reduces the overall
training time and cost compared to updating all layers.
T-Few fine-tuning process
by
T-Few fine-tuningprocess beginsbase
utilizing the initial weights of the dataset.
model and an annotated training
OCI Generative Al Service
Annotated data comprises of input
Base Model Annotated Customer output pairs employed in supervised
Training Data Data
Weights training.
Large
Cohere
cohere.command Dedicated Al cluster unit, either for hosting or fine dedicated-unit-large
tuning the cohere.command model cohere-count
Small cohere.command Dedicated Alcluster unit, either for hosting or fine- dedicated-unit-small
Cohere light tuning the cohere.command-light model cohere-count
Embed
cohere.embed Dedicated Alcluster unit, for hosting the dedicated-unit
Cohere cohere.emded models embed-cohere-count
Llama2-70 llama2_70b-chat
Dedicated Al cluster unit, for hosting the Llama2 dedicated-unit
models llama2-70-count
1:05 B231 1X
D¢LL
Dedicated Al Cluster Units Sizing
Base Model Fine-tuning Dedicated Hosting Dedicated Al Cluster
Capability AICluster
Text Generation cohere.command Unit size: Large Cohere Unit size: Large Cohere
Required units: 2 Required units:1
Text Generation cohere.command Unit size: SmallCohere Unitsize: Small Cohere
light Required units: 2 Required units:1
Unit size: Llama2-70
Text Generation Ilama2_70b-chat X
Required units:1
Summarization cohere.command X Unit size: Large Cohere
Required units: 1
Unit size: Embed Cohere
Embedding cohere.embed X
Required units: 1
Example:
" Tocreate adedicated A/ cluster to fine-tune acohere.command model, you need two Large Cohere units.
" Tohost thisfine-tuned model, you need a minimum one Large Coher unit.
In total, youneed three Large Cohere units (dedicated-unit-large-cohere-count = 3).
3:20 8:26 1x
Dedicated AIClusters Sizing
cluster-finetune
Fine-tuning Dedicated AI Cluster If this dedicated Al Cluster is type Fine-Tunng. select this custer when
endooints. Learn about degicated AL CuSers
creating cUstom models it a is type Hosting seel
Requires two units for the base model chosen. Ed1t Add tags IAOve dedicated AI Cluster Delete
cluster-host
Hosting Dedicated Al Cluster if this dedicaled Al cluster is type Fine- Tuning. select hs custer when creatng custom models. if ais type Hosting. select
endponts. Leam about dedsated AL Clusters
Requires one unit for the base modelchosen. Ed1t Add tags MOve dedicated AI Cluster Delete
Can create up to 50endpoints that point to the Litecycle details: Created Dedicated AI Cluster
Unit size: Small Cohere
Number of units: 1
different models hosted on the same hosting cluster. State: @ ACtIve
Rohit Rahi
VP, CSS OUCLOUD DELIVERY
ORACLE UNIVERSITY
Fine-tuning Configuration
Fine-tuning configuration
Training Methods Detne he modet tyge deicated Al custer type and hyperparameters fo this specic modet
Moes of ciferent categoes ave Iterent custer harware requirements for fine-tunng The deda aed Ai
Vanilla: Traditional fine-tuning method Custer drop-own ist s Sitered to show custers that are compatble in size witn the requirements of e seecteg
Hyperparameters
Total Training Epochs
Learning Rate
Advanced options
Training Batch Size
Tota vang epocs Learnang rate ) Tranng batch sze
Early Stopping Patience 3 0 01 16
Loss Accuracy may tell you how many predictions the model got wrong, but it will
not describe how incorrect the wrong predictions a
To evaluate generative models for loss, we ask the model to predict certain
wordsinthe user-provided data andevaluate hoW wrong the incorrect
predictions are.
Rohit Rahi
VP, CSS OU CLOUD DELIVERY
ORACLE UNIVERSITY
0:05
ORACLE
University
OCIGenerative AI Security
Rohit Rahi
VP, CSS OU CLOUD DELIVERY
ORACLE UNIVERSITY
Dedicated GPU and RDMA Network
Infrastructure View
GPU Pool
Logical View
GPU GPU GPU
Dedicated Al
GPU GPU GPU Cluster
Allocate
GPUs
Running within
dedicated
RDMA network
Model Endpoints
handles t
For strong data privacy and security, a dedicated GPUcluster only
models of a single customer.
Base model + fine-tuned model endpoints share the same cluster resOurces
efficient utilization of underlying GPUs in the dedicated Al cluster.
App X App Y
IAM
Model Weights X
Custom Base
Model X Model
Base Model Weights
Dedicated Al Cluster 1
Gen Al Object
OCI Generative Al Service Storage Buckets