0% found this document useful (0 votes)

24 views9 pages

Model Usage

Uploaded by

rahul wankhade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views9 pages

Model Usage

Uploaded by

rahul wankhade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

https://www.nitinkapse.com/ https://nichethyself.

com/

Here’s a table listing various popular machine learning models and frameworks, along with their
primary usage in fields such as audio, vision, language processing, and more:

Model/Framework Primary Usage Domain

Whisper Speech recognition, Audio (Speech-to-Text)

transcription

CLIP Image and text alignment, zero- Vision & Language

shot learning

GPT (Generative Pre-trained Text generation, language Language Processing

Transformer) understanding

BERT Text classification, question Language Processing

answering

DALL·E Image generation from text Vision & Text

descriptions

ViT (Vision Transformer) Image classification, object Vision

detection

YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)

VQ-VAE-2 Image generation, compression Vision

StyleGAN High-quality image generation Vision (Image

Synthesis)

Stable Diffusion Text-to-image generation, Vision & Text

artistic creation

Wav2Vec 2.0 Speech recognition, audio Audio (Speech-to-Text)

processing
https://www.nitinkapse.com/ https://nichethyself.com/

DeepSpeech Automatic speech recognition Audio

T5 (Text-to-Text Transfer Text generation, Language Processing

Transformer) summarization, translation

PaLM Text generation, understanding, Language Processing

multilingual tasks

OpenAI Codex Code generation, code Programming/Code

completion

Tacotron Speech synthesis (Text-to- Audio (Speech

Speech) Synthesis)

WavLM Speech enhancement, speech Audio

recognition

LLaMA Language generation and Language Processing

comprehension

OPT (Open Pretrained Language tasks, text Language Processing

Transformer) generation

DeepLab Image segmentation Vision (Computer

Vision)

ResNet Image classification, object Vision

detection

VGG Image classification Vision

CycleGAN Image-to-image translation Vision

(e.g., style transfer)

BART Text summarization, machine Language Processing

translation
https://www.nitinkapse.com/ https://nichethyself.com/

Swin Transformer Image classification, object Vision

detection

TransUNet Medical image segmentation Vision (Medical

Imaging)

BigGAN High-resolution image synthesis Vision

OpenAI CLIP Multi-modal learning (image Vision & Text

and text)

FastSpeech Text-to-Speech synthesis Audio (Speech

Synthesis)

Reformer Efficient Transformer for long Language Processing

text generation

SAM (Segment Anything Object segmentation in images Vision (Object

Model) Segmentation)

SEER Self-supervised image learning, Vision

classification

Key Insights:

● Audio Models: Whisper, DeepSpeech, Wav2Vec 2.0, and Tacotron are widely used for
tasks involving speech recognition, transcription, and synthesis.
● Vision Models: YOLO, ResNet, ViT, and StyleGAN dominate in object detection,
classification, and image generation tasks.
● Language Models: GPT, BERT, and T5 focus on text generation, understanding, and
summarization.
● Multi-modal Models: CLIP, DALL·E, and Stable Diffusion work across both text and
vision domains, handling tasks such as image generation from text or aligning images
and text.

These models are designed for specialized tasks, but some of them, like GPT or CLIP, have a
broader range of applications across multiple domains.
https://www.nitinkapse.com/ https://nichethyself.com/

Model/Framework Primary Usage Domain

Whisper Speech recognition, transcription Audio (Speech-to-Text)

CLIP Image and text alignment, zero- Vision & Language

shot learning

GPT (Generative Pre-trained Text generation, language Language Processing

Transformer) understanding

Claude 1 Conversational AI, safe Language Processing

language generation

Claude 2 Advanced conversational AI, text Language Processing

understanding

Databricks Dolly Fine-tuned language model for Language Processing

enterprise applications

BERT Text classification, question Language Processing

answering

DALL·E Image generation from text Vision & Text

descriptions

ViT (Vision Transformer) Image classification, object Vision

detection

YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)

VQ-VAE-2 Image generation, compression Vision

https://www.nitinkapse.com/ https://nichethyself.com/

StyleGAN High-quality image generation Vision (Image

Synthesis)

Stable Diffusion Text-to-image generation, Vision & Text

artistic creation

Wav2Vec 2.0 Speech recognition, audio Audio (Speech-to-Text)

processing

DeepSpeech Automatic speech recognition Audio

T5 (Text-to-Text Transfer Text generation, summarization, Language Processing

Transformer) translation

PaLM Text generation, understanding, Language Processing

multilingual tasks

OpenAI Codex Code generation, code Programming/Code

completion

Tacotron Speech synthesis (Text-to- Audio (Speech

Speech) Synthesis)

WavLM Speech enhancement, speech Audio

recognition

LLaMA Language generation and Language Processing

comprehension

OPT (Open Pretrained Language tasks, text generation Language Processing

Transformer)

DeepLab Image segmentation Vision (Computer

Vision)
https://www.nitinkapse.com/ https://nichethyself.com/

ResNet Image classification, object Vision

detection

VGG Image classification Vision

CycleGAN Image-to-image translation (e.g., Vision

style transfer)

BART Text summarization, machine Language Processing

translation

Swin Transformer Image classification, object Vision

detection

TransUNet Medical image segmentation Vision (Medical

Imaging)

BigGAN High-resolution image synthesis Vision

OpenAI CLIP Multi-modal learning (image and Vision & Text

text)

FastSpeech Text-to-Speech synthesis Audio (Speech

Synthesis)

Reformer Efficient Transformer for long Language Processing

text generation

SAM (Segment Anything Object segmentation in images Vision (Object

Model) Segmentation)

SEER Self-supervised image learning, Vision

classification

Databricks Lakehouse AI AI and machine learning for Enterprise AI

enterprise data lakehouse
https://www.nitinkapse.com/ https://nichethyself.com/

Key Additions:

● Claude models, developed by Anthropic, focus on conversational AI with an emphasis

on safety and steering language generation.
● Databricks Dolly is fine-tuned for enterprise applications, leveraging Databricks' cloud
platform to provide business use cases for AI.
● Databricks Lakehouse AI offers models specifically designed for enterprise-level AI
and machine learning, integrated with the Lakehouse architecture for handling large-
scale data.

Here’s a list of models and frameworks designed for reading and extracting tabular data from
PDFs, images, or scanned documents. These models utilize a combination of OCR (Optical
Character Recognition) and deep learning techniques for parsing structured data like tables.

Model/Framework Primary Usage Domain

TabNet Interpretable deep learning model for Tabular Data

tabular data

Camelot Extracting tables from PDFs PDF/Table Extraction

pdfplumber Parsing and extracting tables and PDF/Table Extraction

text from PDFs

Tesseract OCR OCR for extracting text and simple OCR for Images & PDFs
tables from images/PDFs

PaddleOCR OCR for table and text extraction, OCR for Images & PDFs
supports multi-language

TableNet Extracting tabular data from Table Detection in

document images Images

DeepDeSRT Detecting and recognizing table Table Detection in

structures in scanned documents PDFs/Images
https://www.nitinkapse.com/ https://nichethyself.com/

DocTR (Document Text OCR for detecting and recognizing OCR & Document
Recognition) structured text like tables in Analysis
documents

Adobe PDF Extract API Extracting structured data including PDF/Table Extraction
tables from PDFs

PyMuPDF (Fitz) Extracting content (text, tables) from PDF Parsing

PDF documents

Tabula Extracting tables from PDFs into PDF/Table Extraction

CSV/Excel

Keras-OCR OCR for detecting and extracting text OCR for Images
and tables from images

LayoutLM Pre-trained model for reading and Document

extracting structured data from Understanding/OCR
scanned documents

TrOCR (Transformer OCR model based on Transformer OCR for Documents

OCR) architecture for extracting text and
tables

Amazon Textract Automated text and table extraction OCR for PDFs & Images
from documents

Google Cloud Vision OCR with table detection capabilities OCR for Images & PDFs
API for scanned images

Overview of Popular Models:

1. Camelot, Tabula, pdfplumber: Focus on extracting tables from PDFs and converting
them into structured formats like CSV or Excel.
2. Tesseract OCR, PaddleOCR: Used for general OCR tasks like reading text and simple
tables from images or scanned documents.
3. TableNet, DeepDeSRT: Specifically designed to detect and extract tabular structures
in scanned documents or images.
https://www.nitinkapse.com/ https://nichethyself.com/

4. LayoutLM: Pre-trained language model focused on document understanding, useful

for recognizing structured data like tables in scanned documents.
5. Amazon Textract, Google Cloud Vision API: Cloud-based APIs for extracting text,
tables, and forms from documents.

These tools and models provide capabilities for converting unstructured data (like tables in
PDFs or images) into structured formats, making it easier to analyze and process the data
programmatically.

Please click on the link below to register for Generative AI workshop

https://forms.gle/PrzkmvYh5yvEWUKZ6

Generative AI Concepts and Tools Guide
No ratings yet
Generative AI Concepts and Tools Guide
3 pages
AI Transformers Practical Examples Notes
No ratings yet
AI Transformers Practical Examples Notes
2 pages
Types of AI Models and Their Uses-PDF-Format
No ratings yet
Types of AI Models and Their Uses-PDF-Format
14 pages
Generative AI System Design Resources
No ratings yet
Generative AI System Design Resources
5 pages
Computers 2024 25
No ratings yet
Computers 2024 25
31 pages
Pranshi Singla IX C AI Activity 1
No ratings yet
Pranshi Singla IX C AI Activity 1
24 pages
Transformers For Natural Language Processing and Computer Vision
No ratings yet
Transformers For Natural Language Processing and Computer Vision
150 pages
Introduction to Generative AI Concepts
No ratings yet
Introduction to Generative AI Concepts
36 pages
Course1 Glossary Architecture and Data Preparation For LLMs
No ratings yet
Course1 Glossary Architecture and Data Preparation For LLMs
3 pages
The Atlas of 50 Common AI Models
No ratings yet
The Atlas of 50 Common AI Models
72 pages
GenerativeAI Projects
100% (4)
GenerativeAI Projects
46 pages
Gloss Ar
No ratings yet
Gloss Ar
4 pages
AI & Data Science Enthusiast Profile
No ratings yet
AI & Data Science Enthusiast Profile
1 page
AI Tools by Specialized Area
No ratings yet
AI Tools by Specialized Area
10 pages
Multi-Modal Vision with GPT-4o
No ratings yet
Multi-Modal Vision with GPT-4o
17 pages
English PPT 1
No ratings yet
English PPT 1
11 pages
Generative AI - Concepts and Applications Riyyya - Opos
No ratings yet
Generative AI - Concepts and Applications Riyyya - Opos
7 pages
Own Your AI - Tech Deck
No ratings yet
Own Your AI - Tech Deck
75 pages
Deep Learning Lab Miniproject
No ratings yet
Deep Learning Lab Miniproject
9 pages
Image Caption
No ratings yet
Image Caption
16 pages
Generative AI
No ratings yet
Generative AI
2 pages
AI Concepts for Tech Enthusiasts
No ratings yet
AI Concepts for Tech Enthusiasts
1 page
An Overview of Vision Transformers For Image Processing A Survey
No ratings yet
An Overview of Vision Transformers For Image Processing A Survey
17 pages
Unit3sem7 Generative Ai
No ratings yet
Unit3sem7 Generative Ai
41 pages
AI & ML Researchers' Digest
No ratings yet
AI & ML Researchers' Digest
15 pages
Final - Done (1) 2.0
No ratings yet
Final - Done (1) 2.0
16 pages
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
No ratings yet
AI Trends of May 2023 You Need To Know by Gonzalo Recio Medium
1 page
Top AI Tools and Platforms Overview
No ratings yet
Top AI Tools and Platforms Overview
3 pages
Chat GPT Is Not All You Need Paper Review
No ratings yet
Chat GPT Is Not All You Need Paper Review
31 pages
Bithack Tac
No ratings yet
Bithack Tac
3 pages
Finxter OpenAI Glossary
No ratings yet
Finxter OpenAI Glossary
1 page
شات القانزن السعودي
No ratings yet
شات القانزن السعودي
19 pages
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
No ratings yet
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
15 pages
Top Deep Learning Frameworks Guide
No ratings yet
Top Deep Learning Frameworks Guide
26 pages
Generative AI: Tools and Applications
No ratings yet
Generative AI: Tools and Applications
68 pages
04 NLP Computer Vision Systems
No ratings yet
04 NLP Computer Vision Systems
1 page
Session 4 Generative AI Applications
No ratings yet
Session 4 Generative AI Applications
26 pages
Generative AI Roadmap
100% (1)
Generative AI Roadmap
36 pages
Unit 1 Intoduction To Generative AI
No ratings yet
Unit 1 Intoduction To Generative AI
8 pages
03 GenAI Intro
No ratings yet
03 GenAI Intro
13 pages
Interactive Deep Learning Applications
No ratings yet
Interactive Deep Learning Applications
17 pages
Introduction of Generative AI Shoolini University
No ratings yet
Introduction of Generative AI Shoolini University
15 pages
Generative AI
No ratings yet
Generative AI
15 pages
Via A Novel Vision-Transformer Accelerator Based On FPGA
No ratings yet
Via A Novel Vision-Transformer Accelerator Based On FPGA
12 pages
nlfynx7RfS0IZ9YGOtls - Some Core Concepts
No ratings yet
nlfynx7RfS0IZ9YGOtls - Some Core Concepts
6 pages
Ijimai 9 1 16
No ratings yet
Ijimai 9 1 16
36 pages
Deep Learning Models Overview
No ratings yet
Deep Learning Models Overview
66 pages
21MDSWE164 Lab 1 DL
No ratings yet
21MDSWE164 Lab 1 DL
4 pages
Visionllama
No ratings yet
Visionllama
17 pages
Lec25 Architectures
No ratings yet
Lec25 Architectures
52 pages
Summary IBM GenAI
No ratings yet
Summary IBM GenAI
1 page
NeurIPS 2023 Openagi When LLM Meets Domain Experts Paper Datasets - and - Benchmarks
No ratings yet
NeurIPS 2023 Openagi When LLM Meets Domain Experts Paper Datasets - and - Benchmarks
30 pages
Abhishek Das CV
No ratings yet
Abhishek Das CV
8 pages
Research Paper of Generating Caption From Image
No ratings yet
Research Paper of Generating Caption From Image
5 pages
Gen AI Cheat Sheet
No ratings yet
Gen AI Cheat Sheet
10 pages
Phil Wang Repos
No ratings yet
Phil Wang Repos
10 pages
AI & ML Trends 2023: Key Opportunities
No ratings yet
AI & ML Trends 2023: Key Opportunities
16 pages
Adithya S Kolavi Research v1
No ratings yet
Adithya S Kolavi Research v1
2 pages
Schreiben Und Sprechen Themen B1
95% (140)
Schreiben Und Sprechen Themen B1
142 pages
A1 German Exam Prep Guide
82% (11)
A1 German Exam Prep Guide
49 pages
Grammtik Aktiv A1-B1
94% (49)
Grammtik Aktiv A1-B1
255 pages
Short Stories in German For Intermediate Learners (B1-B2)
86% (22)
Short Stories in German For Intermediate Learners (B1-B2)
207 pages
TELC-B1-Modelltest en
63% (8)
TELC-B1-Modelltest en
50 pages
Grammatik Aktiv b2-c1 - Compressed
76% (17)
Grammatik Aktiv b2-c1 - Compressed
316 pages
Manual Testing Essentials
100% (2)
Manual Testing Essentials
25 pages
35 Themen Sprechen B1
92% (59)
35 Themen Sprechen B1
69 pages
German A2 Level
100% (7)
German A2 Level
196 pages
German A1
98% (42)
German A1
104 pages
Test Plan Vs Test Strategy
No ratings yet
Test Plan Vs Test Strategy
29 pages
All About Goethe A2 Exam
100% (12)
All About Goethe A2 Exam
38 pages
The Handbook - German Grammer A1-B2
92% (26)
The Handbook - German Grammer A1-B2
53 pages
GitHub and Git: Free Guide by Dridi
100% (7)
GitHub and Git: Free Guide by Dridi
90 pages
German - German in A Week! - Language Guru
100% (14)
German - German in A Week! - Language Guru
84 pages
API Testing Using Postman
100% (5)
API Testing Using Postman
173 pages
59 Ready-To-Use Phrases To Ace Your German Oral Exam
80% (10)
59 Ready-To-Use Phrases To Ace Your German Oral Exam
13 pages
Goethe A2 Sprechen Topics With Examples
63% (8)
Goethe A2 Sprechen Topics With Examples
12 pages
Regression Testing
100% (1)
Regression Testing
17 pages
Jira Tutorial PDF
71% (7)
Jira Tutorial PDF
20 pages
Job Interview Questions and Answers PDF
94% (34)
Job Interview Questions and Answers PDF
14 pages
Automation Testing Interview Questions
100% (3)
Automation Testing Interview Questions
11 pages
German B1 Words
91% (11)
German B1 Words
7 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Edureka DevOps Ebook
83% (6)
Edureka DevOps Ebook
21 pages
Software Testing COGNIZANT Notes
89% (45)
Software Testing COGNIZANT Notes
175 pages
Manual Testing Durgasoft
63% (8)
Manual Testing Durgasoft
74 pages
Defect Report OrangeHRM SoftwareTestingHelp Huong4551050086
No ratings yet
Defect Report OrangeHRM SoftwareTestingHelp Huong4551050086
12 pages
B1 German Exam Tips
100% (10)
B1 German Exam Tips
16 pages
German - B1 Grammer
91% (11)
German - B1 Grammer
20 pages
A Novel Unsupervised Framework For Retinal Vasculature Segmentation PDF
No ratings yet
A Novel Unsupervised Framework For Retinal Vasculature Segmentation PDF
8 pages
CV Important Question
No ratings yet
CV Important Question
3 pages
AI Basics for Beginners
No ratings yet
AI Basics for Beginners
7 pages
Object Detection Research Paper
No ratings yet
Object Detection Research Paper
5 pages
MTech Data Sceince Program Structure and Syllabus 2022
No ratings yet
MTech Data Sceince Program Structure and Syllabus 2022
24 pages
Color Palette HEX & RGB Codes
No ratings yet
Color Palette HEX & RGB Codes
4 pages
Relief Displacement, Paralaks Dan Stereo
No ratings yet
Relief Displacement, Paralaks Dan Stereo
16 pages
Vector vs. Raster Graphics Guide
100% (5)
Vector vs. Raster Graphics Guide
6 pages
Part B Unit 5 Computer Vision
No ratings yet
Part B Unit 5 Computer Vision
47 pages
OpenGL Lighting & Texturing Guide
No ratings yet
OpenGL Lighting & Texturing Guide
5 pages
Object Detection for Accessibility
No ratings yet
Object Detection for Accessibility
20 pages
DTP Important Question
No ratings yet
DTP Important Question
5 pages
MobileNetV3 For Image Classification
No ratings yet
MobileNetV3 For Image Classification
8 pages
AI & ML Course by IIT & IBM
No ratings yet
AI & ML Course by IIT & IBM
6 pages
OpenGL Camera & Transform Guide
No ratings yet
OpenGL Camera & Transform Guide
6 pages
AI-Driven Fashion Trend Analysis
No ratings yet
AI-Driven Fashion Trend Analysis
39 pages
AI in Digital Marketing Trends
No ratings yet
AI in Digital Marketing Trends
9 pages
Digital Image Processing
100% (1)
Digital Image Processing
46 pages
Image Processing Fundamentals
No ratings yet
Image Processing Fundamentals
12 pages
A Deep Learning-Based Experiment On Forest
No ratings yet
A Deep Learning-Based Experiment On Forest
13 pages
Christopher P. Stauffer CV
No ratings yet
Christopher P. Stauffer CV
3 pages
Color Scheme Design Techniques
100% (1)
Color Scheme Design Techniques
29 pages
MCQ Ratio
No ratings yet
MCQ Ratio
2 pages
Year 1 - Python, Math & Foundations of AI
No ratings yet
Year 1 - Python, Math & Foundations of AI
48 pages
YOLOv7-DeepSORT for Video Tracking
No ratings yet
YOLOv7-DeepSORT for Video Tracking
4 pages
Paper 17881
No ratings yet
Paper 17881
6 pages
Advanced Robotics: Transforming Industries
No ratings yet
Advanced Robotics: Transforming Industries
10 pages
Detection of Vulnerable Road Users in Smart Cities: Francisco Guayante, Arnoldo Díaz-Ramírez Pedro Mejía-Alvarez
No ratings yet
Detection of Vulnerable Road Users in Smart Cities: Francisco Guayante, Arnoldo Díaz-Ramírez Pedro Mejía-Alvarez
6 pages
Yolov10 To Its Genesis A Decadal and Comprehensive
No ratings yet
Yolov10 To Its Genesis A Decadal and Comprehensive
49 pages