LLaVA - Large Multimodal Model

Uploaded by

Marcos Luis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

68 views15 pages

LLaVA - Large Multimodal Model

Uploaded by

Marcos Luis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 15

9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp Blog > Llava Large Multimodal Model LLaVA - Large Multimodal Model LLaVA - Large Open Source Multimodal Model | Chat with Images like GPT-4.. Large Language Models (LLMs) allow us to generate text, but they only take text as an input. Large Multimodal Models (LMM) can take both text and image as an input, and generate text based on both. So, you can chat with your model about an image Join the AI BootCamp! Ready to dive into the world of Al and Machine Learning? Join the Al BootCamp to transform your career with the latest skills and hands-on project experience. Learn about LLMs, ML best practices, and much more! JOIN NOW htips:shwww.mlexpertiofblogilave-arge-mulimedal-madel 4s971924, 827 AM LAVA. Large Multimodal Made! | MLExpert- Get Tings Done wi Al Bootcamp OpenAl has released their GPT-4V(ision)! model that integrates nicely with the ChatGPT interface. However, open-source models are on the way. LLaVA is one of them. © In this part, we will be using Jupyter Notebook to run the code. If you prefer to follow along, you can find the notebook on GitHub: GitHub Repository What is LLaVA? LLaVA2, a Large Multimodal Model (LMM), allows you to have image-based conversations. Similar to GPT-4V but without the price tag, LLaVA is free and open source, LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA. So, LLaVA combines a vision encoder and an open-source LLM (Vicune in this case). LLaVA 1.5 The LLaVA-1.53 model offers a solid improvement on all benchmarks, compared to the original model. It is trained on 1.2M data points, adds academic-task-oriented VQA dataset and it trains in ~1 day on a 8-A100 node. We're going to use the 138 model checkpoint and load it with the ava-torch library in a 4bit format. How good is it? Let's find out. Setup Setting up the LLaVA library requires installing the following dependencies: ntps:wwwmlexpertofblogiava-arge-mulimodal-model 2159115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp pip install -Uqqq pip --progress-bar off pip install -qqq torch==2.1 --progress-bar off pip install -qqq transformers==4.34.1 --progress-bar off pip install -qqq accelerate==0.23.0 --progress-bar off pip install -qqq bitsandbytes==0.41.1 --progress-bar off pip install -qqq llava-torch==1.1.1 --progress-bar off The last package, Llava-torch is the LLaVA library. Let's add the necessary imports: import textwrap from io import BytesIO import requests import torch from llava.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX from llava.conversation import SeparatorStyle, conv_tenplates from Llava.mn_utils import ( KeywordsStoppingCritenia, get_model_name_from_path, process_images, tokenizer_image_token, ) rom Llava.model.builder import load_pretrained_model from llava.utils import disable_torch_init from PIL import Image disable_torch_init() Data To reproduce the results, we need to download the following images: Igdown AnpSrAod-apd1@D305XXQhjMa2ja7FEH igdown 1Qnutc8S7F6jMNERKIZBgiAePynDC}3Ti tgdown 1XH7QgiuN}7Kjapaetjy#x"VWSdQaqsaH igdonn 1n9v8EVZ16sYcULCGUHBPFULxFxam190U Igdown 1x7XtPRG-IbSxyCO-ZT0_P7JirwRFY-3N ntps:wwwmlexpertofblogiava-arge-mulimodal-model 3s9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp Download Model We'll use the 13B model checkpoint and load it with the 1lava-torch library in a 4bit format. Let's start by taking it's name: MODEL = “4bit/llava-v1.5-13b-3GB" model_name = get_model_name_from_path(MODEL) model_name "Llava-v1.5-13b-368" To load the model, tokenizer, and image processor we can use the load_pretrained_model helper function: tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=MODEL, model_base=None, model_name=model_name, load_4bit=True Image Preprocessing and Prompt We need a way to load the image and process it for the model. Let's create a helper function for loading the image using PIL def load_image(image_file): if image_file.startswith("http://") or image_file.startswith("https://"): response = requests.get(image_file) image = Inage.open(BytesI0(response.content)). convert ("RGB") else: image = Image.open(image_file).convert("RGB") return image The function will load a local file or download it from a URL (via the requests library). Next, we'll create a function that will process the image for the mode! ntps:wwwlexpertofblogiava-arge-mulimodal-model ans,9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp def process, image (image): args = {"image_aspect_ratio": "pad"} image_tensor = process_images([image], image_processor, args) return image_tensor.to(model.device, dtype=torch. floati6) Let's try it out: image = = load_image("bike-girl. jpeg") processed_image = process_image(image) type(processed_image), processed_image. shape (torch.Tensor, torch.Size([1, 3, 336, 336])) The functions load the image and process it for the model by converting it into a Tensor. Next, we'll create function that will create a prompt using the correct template: CONV_MODE = "1lava_ve" def create_prompt(prompt: str): conv = cony_templates[CONV_MODE].copy() roles = conv.roles prompt = DEFAULT_IMAGE_TOKEN + "\n" + prompt conv.append_message(roles[@], prompt) conv.append_message(roles[1], None) return conv.get_prompt(), conv prompt, _ = create_prompt("Describe the image") print (pronpt) The function takes care of any special tokens and adding roles to the prompt. Here's the final template: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. uiHuman: Describe the image si#Assistant: ntps:wwwlexpertofblogiava-arge-mulimodal-model 559115724, 8:97 AM LLaVA- Large Multimodal Model | MLExpart- Get Things Done wth Al Bootsama We have a prompt and a way to process the image. Let's create a function that will ask the model a question about the image: def ask_image(image: Image, prompt: str): image_tensor = process_image(image) Prompt, conv = create_prompt(prompt) input_ids = ( ‘tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt" -unsqueeze(@) -to(model device) stop_str = conv.sep if conv.sep_style != Separatorstyle.THO else conv. sep2 stopping criteria = keywordsStoppingCriteria( keywords=[stop_str], tokenizer=tokenizer, input_ids-input_ids with torch.inference_mode(): output_ids = model.generate( input_ids, images=image_tensor, do_sample=True, [email protected], max_new_tokens=512, use_cache=True, stopping_criteri stopping_criteria], ) return tokenizer.decode( output_ids[@, input_ids.shape[1] +], skip_special_tokens-True ).strip() The function takes care of the following: creating the prompt, tokenizing it, generating the output, and decoding it. The interface is very similar to other generative models from the HuggingFace developers. Q&A Over Image Let's load our first image: https:shwwwemlexpertiofblogilave-arge-mulimedal-madel ens‘9115124, 8:97 AM Muttimadal Model | MLExpert- Get Things Done with Al Bootcamp Girl on a bike We can start with a simple question result = ask_image(image, "Describe the image") print (textwrap.fill(result, width=110)) The image features a woman sitting on a motorcycle, which is parked on a brick htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel 71596124, 8:37 AM LLLaVA- Large Multimodal Model | MLExpert- Get Things Qone with Al Bootcamp driveway in front of a house. She is wearing a black leather outfit, which includes a leather jacket and leggings. The motorcycle is positioned prominently in the scene, with the woman sitting comfortably on it. The house in the background adds a sense of context to the scene, suggesting that the woman may be preparing to ride the motorcycle or has just arrived at her destination. The description is quite detailed and good overall. Let's ask something more specific result = ask_image(image, "Does the woman wear a helmet?”) print (textwrap.fill(result, width=110)) Lava Yes, the woman is wearing a helmet while sitting on the motorcycle. The model has failed to answer the question correctly. Let's ask something similar by try to make the model reason about the image: result = ask_image( image, “Take a look at the woman's head. What is the color of her skin? Does she wear a he d print(textwrap.fill(result, width=110)) Lava, The woman's skin color is white, and she is not wearing a helmet. This time around the model has answered correctly. Asking for focusing on the woman's head and color of her skin helped us get a correct response OCR & Document Understanding https:shwwwemlexpertiofblogilave-arge-mulimedal-madel ans9115124, 8:97 AM Let's try something more challenging. Can the model read and understand documents? We'll use t LLLaVA- Large Multimodal Model | MLExpert- Get Things Qone with Al Bootcamp 1¢ following image from the Bitcoin whitepaper: Bitcoin: A Peer-to-Peer Electronic Cash System Sto Nekamots sstostinggm com wn bicoieorg Abstracts A prey pepe venion of cone cat wou all elie ‘vo beset diel fm ne gry 1 soir wiht ain Doe # Frncl aston Dial signs povie pat of he solion bt fe malt ees ot i sed i pay sl gure pet dese. Weep ohaer os aeicaeatng pelea wa rerdngne terre ‘Toe neck timesmps tnsacion by hashing he nt an ongoing cin ot Ihase en-ozwo, emis aco eamotbe cand witb Felons tbe pootetwork. The ng cai enya pot of he eure of ‘rent wind, bt pool ht came or he rt poo of CPU poms AS eng 1 malay af CPU pow scone by ods af 0 coperatng io sack te newt, they pent He Imps can and eiace acer. The basis and nods cn lave and om the newak at wll sping He Togs rood ae x root of wt ppence whey wore me 1. Introduction ‘Commerce on theirs as comet ely almost exclusively o ficial instnons serving as ‘std hed pares to process clecoie payments Whe the sim woRs well enough ft ‘ont eanactiooy, sll sullzs Gow he tihermt neakpemer of tbe ‘ru bued mol (Compiclynorrerersble ranactons are not reall posib sce Fisocalinsitutonscanot ‘oid mediating capes, The cont oF mediation rerwcs transicon cost, imiing the ‘nim praieal asacton size and eating oT he possi fr sal casual vansctins, td there le bondor sot inthe lw of abit to male now roverble payments fer on ‘versie services. Wit he possi of reversal the ced fr ts spreads. Merhans mst te wary ofther customers, sling them fr moe nfomaton than they would othewise ned ‘Acetain percentage of fad isaceptel as unaoidble. These cows a pay werainaes {be voied in person y using pyscal currency. bu no mechani exis to mabe pMNONS ‘ver aconmcatone channel witout tse par. "Wiat is pedois a electronic payment system tas oncryptopaphic poof insta of tas, lowing nyt willing parts to ranact dealywithen ote witht he ned br atraed {hid party Tansictins tat are computationally impact w reverse woul protect sells ‘um fan, and roan escrow mesharss could easly be implementa 1 prac buss. [8 Is paper we pops soutien tothe dublespenling proble wsig pee-o-pet dito ‘extn sewer openers computational et othe sro der ef mentors The ‘mt is Secure a fg as honest pode ealecivey cool more CPU power Un 20) ‘Stopsating grup of archer nodes First page of Bitcoin paper htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel ons9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp wetime result = ask_image(image, "What is the title of the paper?") print (textwrap.fill(result, width=110)) Lava, Bitcoin: A Peer-to-Peer Electronic Cash System Great, the model has correctly extracted the title of the paper. Let's see if it can extract the abstract: wxKtime result = ask_image(image, “Extract the text from the abstract") print (textwrap.fill(result, width=110)) Lava, Bitcoin: A Peer-to-Peer Electronic Cash System It got that wrong, It extracted the title again, but nothing from the abstract. Again, we can try to make the model reason about the image by asking for a summary of the abstract: setime result = ask_image(image, "Summarize the abstract of the paper in 2 sentences. print(textwrap.fill(result, width=11@)) Lava, The paper discusses the concept of a peer-to-peer electronic cash system, focusing on the Bitcoin system. It highlights the advantages of this system, such as its decentralized nature, security, and potential for financial inclusion. The paper also addresses some of the challenges and limitations of the Bitcoin system, such as scalability and regulatory issues. ntps:wwwlexpertofblogiava-arge-multimodal-madel rons9115124, 8:97 AM LLLaVA- Large Multimodal Model | MLExpert- Get Things Qone with Al Bootcamp Much better! LLaVA has correctly extracted the abstract and summarized it in 2 sentences. Price Chart We can also ask the model to reason about charts. Let's try with the following Bitcoin price chart Bitcoin price chart result = ask_image( image, “This is a chart of Bitcoin price. What is the current price according to the chart ) print (textwrap.fill(result, width=110)) Lava The current price of Bitcoin according to the chart is $23,000. It got that wrong, It wasn't able to get the correct value from the chart ($28.9k) htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel 1159115124, 8:97 AM LLaVA- Large Multimodal Mode! | MLExper- Get Things Done with Al Bootcamp Captcha Another interesting use case is to ask the model to solve a captcha. Let's try with something simple: “REACH Captcha wetime result = ask_image(image, "Extract the text from the image") print (textwrap.fill(result, width=110)) Total failure, it didn't even get the number of characters right. Meme Our final experiment will be to ask the model to reason about a meme. Let's try with the following one: https:shwwwemlexpertiofblogilave-arge-mulimedal-madel 2s9115124, 8:97 AM LLaVA- Large Multimodal Mods! | MLExper- Get Things Done with Al Bootcamp meme setime result = ask_image(image, "Is this funny and why? print (textwrap.fill(result, width=110)) Yes, this image is funny because it humorously represents the process of learning by showing a person's brain going through different stages of learning. The image features a series of four pictures of a brain, each representing a different stage of learning, such as from university, online courses, YouTube, and articles. The visual representation of the brain's journey through these stages is exaggerated and comical, making it a light-hearted and entertaining image. The model has correctly identified the meme as funny but has provided a very generic answer. It didn’t note the different sources of education and the funny side of their htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel 1369115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp ranking. Let's specificially ask for the ranking wxtime result = ask_image( image, “Order all learning resources sorted by usefulness in a list, according to the imag ) print(textwrap.fill(result, width=110)) Lava, Online Courses YouTube University Articles Memes yayese This one is interesting, I would say that the model didn't get the ranking right. It has put memes at the bottom, but according to the image, they are the best. The model has correctly identified the different sources of education (the OCR did work), but it didn't get the ranking right. Keep in mind that this particular meme might've been included in the training set. Conclusion While the LLaVA model can be used to understand images, it is not perfect. It can be used to extract text from images, summarize and describe, but it struggles with more complex reasoning. However, it is a great start and I'm looking forward to seeing more open-source LMMs, possibly beating the GPT-4V (and more commercial) model(s). 3,000+ people already joined Join the The State of Al Newsletter https:shwwwemlexpert iofblogilave-arge-mulimedal-madel sans9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp Every week, receive a curated collection of cutting-edge Al developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of Al Your Email Address SUBSCRIBE Iwon't send you any spam, ever! References 1, GPT-4V(ision) system card © 2. Visual Instruction Tuning © 3. Improved Baselines with Visual Instruction Tuning © Dark © 2020-2024 MLExpert™ by Venelin Valkov. All Rights Reserved. ntps:wwwlexpertofblogiava-arge-mulimodal-model 1515

Large Language ModelBrained GUI Agents
No ratings yet
Large Language ModelBrained GUI Agents
78 pages
Advance Deep Learning
No ratings yet
Advance Deep Learning
10 pages
Res Net
No ratings yet
Res Net
13 pages
No Code AI & Machine Learning Program
No ratings yet
No Code AI & Machine Learning Program
1 page
Swarm Architecture for RAG Chatbots
No ratings yet
Swarm Architecture for RAG Chatbots
16 pages
Llama Parse Docs
No ratings yet
Llama Parse Docs
632 pages
Real-Time Machine Learning v1 MEAP
No ratings yet
Real-Time Machine Learning v1 MEAP
57 pages
LLM Based Multi Ageny
No ratings yet
LLM Based Multi Ageny
15 pages
2025 AI Prompt Engineering Handbook Crafting Effective Prompts Roman
No ratings yet
2025 AI Prompt Engineering Handbook Crafting Effective Prompts Roman
48 pages
LangChain & RAG
No ratings yet
LangChain & RAG
62 pages
Rans: T M - T 5G A N: Licing Owards Ulti Enancy in Radio Ccess Etworks
No ratings yet
Rans: T M - T 5G A N: Licing Owards Ulti Enancy in Radio Ccess Etworks
9 pages
Learn Basics To Become A Generative AI Engineer PDF
No ratings yet
Learn Basics To Become A Generative AI Engineer PDF
25 pages
Databricks Guide To Agent Systems
No ratings yet
Databricks Guide To Agent Systems
16 pages
Trends On AI Bond Report May 2025-1
No ratings yet
Trends On AI Bond Report May 2025-1
200 pages
Revolutionizing Cyber Threat Detection With Large Language Models
No ratings yet
Revolutionizing Cyber Threat Detection With Large Language Models
10 pages
Rolando Herrero - Fundamentals of IoT Communication Technologies (Textbooks in Telecommunication Engineering) - Springer (2021)
No ratings yet
Rolando Herrero - Fundamentals of IoT Communication Technologies (Textbooks in Telecommunication Engineering) - Springer (2021)
623 pages
What Is The Need For Residual Learning?
No ratings yet
What Is The Need For Residual Learning?
3 pages
Ai - NTN
No ratings yet
Ai - NTN
40 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Chatbot Project Ideas
No ratings yet
Chatbot Project Ideas
14 pages
Huggingface Basics
No ratings yet
Huggingface Basics
28 pages
5 LLM Tools
No ratings yet
5 LLM Tools
13 pages
An Enhanced IRC Algorithm For LTE Downlink Receiver in Multi-Cell Environment
No ratings yet
An Enhanced IRC Algorithm For LTE Downlink Receiver in Multi-Cell Environment
5 pages
LLM Interview Questions PDF
No ratings yet
LLM Interview Questions PDF
12 pages
5G Positioning Techniques Tutorial
No ratings yet
5G Positioning Techniques Tutorial
14 pages
Weights and Biases in Neural Networks
No ratings yet
Weights and Biases in Neural Networks
10 pages
Coverage Evaluation For 5G Reduced Capability New Radio (Nr-Redcap)
No ratings yet
Coverage Evaluation For 5G Reduced Capability New Radio (Nr-Redcap)
12 pages
Complete Guide To AI Agents
No ratings yet
Complete Guide To AI Agents
12 pages
Truly MultiVendor Open AI RAN 1736419609
No ratings yet
Truly MultiVendor Open AI RAN 1736419609
17 pages
Hugging Face Transformers
100% (1)
Hugging Face Transformers
8 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
Scheduling Algorithms For 5G Networks and Beyond Classification and Survey
No ratings yet
Scheduling Algorithms For 5G Networks and Beyond Classification and Survey
20 pages
Vibe Code V1
No ratings yet
Vibe Code V1
8 pages
Generative AI A Transformative Force in Business Intelligence
No ratings yet
Generative AI A Transformative Force in Business Intelligence
7 pages
Suricata Setup and Monitoring Guide
No ratings yet
Suricata Setup and Monitoring Guide
78 pages
A Deep Learning Approach Towards An5g (Best)
No ratings yet
A Deep Learning Approach Towards An5g (Best)
6 pages
Cody Mckeand Resume-Lang
No ratings yet
Cody Mckeand Resume-Lang
5 pages
Evaluate RAG - Phoenix
No ratings yet
Evaluate RAG - Phoenix
25 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
Gluon Tutorials: Deep Learning - The Straight Dope
No ratings yet
Gluon Tutorials: Deep Learning - The Straight Dope
403 pages
Principle of Diversity
No ratings yet
Principle of Diversity
68 pages
Optimizing Long-Context LLMs in RAG
No ratings yet
Optimizing Long-Context LLMs in RAG
34 pages
Toward End To End Latency Management of 5G Network Slicin 2023 Optical Fiber
No ratings yet
Toward End To End Latency Management of 5G Network Slicin 2023 Optical Fiber
9 pages
O-RAN Resource Allocation via Network Slicing
No ratings yet
O-RAN Resource Allocation via Network Slicing
15 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
Fine-Tuning Large Language Models Guide
No ratings yet
Fine-Tuning Large Language Models Guide
6 pages
Toward 6G Non-Terrestrial Networks
No ratings yet
Toward 6G Non-Terrestrial Networks
8 pages
Implementation of Adaptive Modulation and Coding Technique Using
No ratings yet
Implementation of Adaptive Modulation and Coding Technique Using
4 pages
Chapter 2. Pair Programming
No ratings yet
Chapter 2. Pair Programming
15 pages
PIMRC24 Workshop NTN
No ratings yet
PIMRC24 Workshop NTN
7 pages
Adaptive Modulation & Coding in MATLAB
No ratings yet
Adaptive Modulation & Coding in MATLAB
6 pages
ChatGPT-2.0: Project Overview and Features
No ratings yet
ChatGPT-2.0: Project Overview and Features
7 pages
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
No ratings yet
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
108 pages
Newwhitepaper Agents2
No ratings yet
Newwhitepaper Agents2
84 pages
Free Download: Applied AI Course on Transformers
No ratings yet
Free Download: Applied AI Course on Transformers
50 pages
Overview of Small Language Models
No ratings yet
Overview of Small Language Models
3 pages
Private Chatbot With Local LLM (Falcon 7B) and LangChain
No ratings yet
Private Chatbot With Local LLM (Falcon 7B) and LangChain
14 pages
Context Engineering For AI Agents
No ratings yet
Context Engineering For AI Agents
13 pages
Machine Learning in Wireless Systems
No ratings yet
Machine Learning in Wireless Systems
16 pages
Evaluating OCR in Large Multimodal Models
No ratings yet
Evaluating OCR in Large Multimodal Models
13 pages
How To Build A Python GUI Application With Wxpython
No ratings yet
How To Build A Python GUI Application With Wxpython
17 pages
Create GUI Python Programs
No ratings yet
Create GUI Python Programs
2 pages
A Python Book Beginning Python Advanced Python and Python Exercises
No ratings yet
A Python Book Beginning Python Advanced Python and Python Exercises
261 pages
Support For GraphQL in generateDS
No ratings yet
Support For GraphQL in generateDS
6 pages
ERNIE
No ratings yet
ERNIE
7 pages
Paddle OCR EN
No ratings yet
Paddle OCR EN
16 pages
Therapist GPT
No ratings yet
Therapist GPT
2 pages
PaddlePaddle Generative Adversarial Network CN
No ratings yet
PaddlePaddle Generative Adversarial Network CN
5 pages
OpenAI Official Prompt Engineering Guide
No ratings yet
OpenAI Official Prompt Engineering Guide
17 pages
Writing & Blogging
No ratings yet
Writing & Blogging
8 pages
Learning Different Languages
No ratings yet
Learning Different Languages
9 pages
Learning Assistant
No ratings yet
Learning Assistant
6 pages
A Cross-Platform ChatGPT Gemini UI
No ratings yet
A Cross-Platform ChatGPT Gemini UI
15 pages
Document Classification With LayoutLMv3
No ratings yet
Document Classification With LayoutLMv3
25 pages
Auto GPT
No ratings yet
Auto GPT
7 pages
Awesome AI Agents
100% (2)
Awesome AI Agents
35 pages
Kwai Agents
No ratings yet
Kwai Agents
7 pages
MemGPT - Unlimited Context (Memory) For LLMs
No ratings yet
MemGPT - Unlimited Context (Memory) For LLMs
11 pages
Agents
No ratings yet
Agents
4 pages
Llama 3 - Open Model That Is Truly Useful
No ratings yet
Llama 3 - Open Model That Is Truly Useful
19 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
Awesome Japanese NLP Resources
No ratings yet
Awesome Japanese NLP Resources
32 pages
Flux.1-Dev - Photorealistic (And Cute) Images
100% (1)
Flux.1-Dev - Photorealistic (And Cute) Images
15 pages
Prompts For Large Language Models
No ratings yet
Prompts For Large Language Models
6 pages
Chat With Multiple PDFs Using Llama 2 and LangChain
No ratings yet
Chat With Multiple PDFs Using Llama 2 and LangChain
17 pages
LangChain QuickStart With Llama 2
0% (1)
LangChain QuickStart With Llama 2
16 pages
Fine-Tuning Llama 2 On A Custom Dataset
No ratings yet
Fine-Tuning Llama 2 On A Custom Dataset
22 pages
ChatGPT-repositories JP
0% (1)
ChatGPT-repositories JP
102 pages
ChatGPT-repositories ZH
No ratings yet
ChatGPT-repositories ZH
81 pages

LLaVA - Large Multimodal Model

Uploaded by

LLaVA - Large Multimodal Model

Uploaded by

You might also like