https://www.nitinkapse.com/ https://nichethyself.
com/
Here’s a table listing various popular machine learning models and frameworks, along with their
primary usage in fields such as audio, vision, language processing, and more:
Model/Framework Primary Usage Domain
Whisper Speech recognition, Audio (Speech-to-Text)
transcription
CLIP Image and text alignment, zero- Vision & Language
shot learning
GPT (Generative Pre-trained Text generation, language Language Processing
Transformer) understanding
BERT Text classification, question Language Processing
answering
DALL·E Image generation from text Vision & Text
descriptions
ViT (Vision Transformer) Image classification, object Vision
detection
YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)
VQ-VAE-2 Image generation, compression Vision
StyleGAN High-quality image generation Vision (Image
Synthesis)
Stable Diffusion Text-to-image generation, Vision & Text
artistic creation
Wav2Vec 2.0 Speech recognition, audio Audio (Speech-to-Text)
processing
https://www.nitinkapse.com/ https://nichethyself.com/
DeepSpeech Automatic speech recognition Audio
T5 (Text-to-Text Transfer Text generation, Language Processing
Transformer) summarization, translation
PaLM Text generation, understanding, Language Processing
multilingual tasks
OpenAI Codex Code generation, code Programming/Code
completion
Tacotron Speech synthesis (Text-to- Audio (Speech
Speech) Synthesis)
WavLM Speech enhancement, speech Audio
recognition
LLaMA Language generation and Language Processing
comprehension
OPT (Open Pretrained Language tasks, text Language Processing
Transformer) generation
DeepLab Image segmentation Vision (Computer
Vision)
ResNet Image classification, object Vision
detection
VGG Image classification Vision
CycleGAN Image-to-image translation Vision
(e.g., style transfer)
BART Text summarization, machine Language Processing
translation
https://www.nitinkapse.com/ https://nichethyself.com/
Swin Transformer Image classification, object Vision
detection
TransUNet Medical image segmentation Vision (Medical
Imaging)
BigGAN High-resolution image synthesis Vision
OpenAI CLIP Multi-modal learning (image Vision & Text
and text)
FastSpeech Text-to-Speech synthesis Audio (Speech
Synthesis)
Reformer Efficient Transformer for long Language Processing
text generation
SAM (Segment Anything Object segmentation in images Vision (Object
Model) Segmentation)
SEER Self-supervised image learning, Vision
classification
Key Insights:
● Audio Models: Whisper, DeepSpeech, Wav2Vec 2.0, and Tacotron are widely used for
tasks involving speech recognition, transcription, and synthesis.
● Vision Models: YOLO, ResNet, ViT, and StyleGAN dominate in object detection,
classification, and image generation tasks.
● Language Models: GPT, BERT, and T5 focus on text generation, understanding, and
summarization.
● Multi-modal Models: CLIP, DALL·E, and Stable Diffusion work across both text and
vision domains, handling tasks such as image generation from text or aligning images
and text.
These models are designed for specialized tasks, but some of them, like GPT or CLIP, have a
broader range of applications across multiple domains.
https://www.nitinkapse.com/ https://nichethyself.com/
Model/Framework Primary Usage Domain
Whisper Speech recognition, transcription Audio (Speech-to-Text)
CLIP Image and text alignment, zero- Vision & Language
shot learning
GPT (Generative Pre-trained Text generation, language Language Processing
Transformer) understanding
Claude 1 Conversational AI, safe Language Processing
language generation
Claude 2 Advanced conversational AI, text Language Processing
understanding
Databricks Dolly Fine-tuned language model for Language Processing
enterprise applications
BERT Text classification, question Language Processing
answering
DALL·E Image generation from text Vision & Text
descriptions
ViT (Vision Transformer) Image classification, object Vision
detection
YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)
VQ-VAE-2 Image generation, compression Vision
https://www.nitinkapse.com/ https://nichethyself.com/
StyleGAN High-quality image generation Vision (Image
Synthesis)
Stable Diffusion Text-to-image generation, Vision & Text
artistic creation
Wav2Vec 2.0 Speech recognition, audio Audio (Speech-to-Text)
processing
DeepSpeech Automatic speech recognition Audio
T5 (Text-to-Text Transfer Text generation, summarization, Language Processing
Transformer) translation
PaLM Text generation, understanding, Language Processing
multilingual tasks
OpenAI Codex Code generation, code Programming/Code
completion
Tacotron Speech synthesis (Text-to- Audio (Speech
Speech) Synthesis)
WavLM Speech enhancement, speech Audio
recognition
LLaMA Language generation and Language Processing
comprehension
OPT (Open Pretrained Language tasks, text generation Language Processing
Transformer)
DeepLab Image segmentation Vision (Computer
Vision)
https://www.nitinkapse.com/ https://nichethyself.com/
ResNet Image classification, object Vision
detection
VGG Image classification Vision
CycleGAN Image-to-image translation (e.g., Vision
style transfer)
BART Text summarization, machine Language Processing
translation
Swin Transformer Image classification, object Vision
detection
TransUNet Medical image segmentation Vision (Medical
Imaging)
BigGAN High-resolution image synthesis Vision
OpenAI CLIP Multi-modal learning (image and Vision & Text
text)
FastSpeech Text-to-Speech synthesis Audio (Speech
Synthesis)
Reformer Efficient Transformer for long Language Processing
text generation
SAM (Segment Anything Object segmentation in images Vision (Object
Model) Segmentation)
SEER Self-supervised image learning, Vision
classification
Databricks Lakehouse AI AI and machine learning for Enterprise AI
enterprise data lakehouse
https://www.nitinkapse.com/ https://nichethyself.com/
Key Additions:
● Claude models, developed by Anthropic, focus on conversational AI with an emphasis
on safety and steering language generation.
● Databricks Dolly is fine-tuned for enterprise applications, leveraging Databricks' cloud
platform to provide business use cases for AI.
● Databricks Lakehouse AI offers models specifically designed for enterprise-level AI
and machine learning, integrated with the Lakehouse architecture for handling large-
scale data.
Here’s a list of models and frameworks designed for reading and extracting tabular data from
PDFs, images, or scanned documents. These models utilize a combination of OCR (Optical
Character Recognition) and deep learning techniques for parsing structured data like tables.
Model/Framework Primary Usage Domain
TabNet Interpretable deep learning model for Tabular Data
tabular data
Camelot Extracting tables from PDFs PDF/Table Extraction
pdfplumber Parsing and extracting tables and PDF/Table Extraction
text from PDFs
Tesseract OCR OCR for extracting text and simple OCR for Images & PDFs
tables from images/PDFs
PaddleOCR OCR for table and text extraction, OCR for Images & PDFs
supports multi-language
TableNet Extracting tabular data from Table Detection in
document images Images
DeepDeSRT Detecting and recognizing table Table Detection in
structures in scanned documents PDFs/Images
https://www.nitinkapse.com/ https://nichethyself.com/
DocTR (Document Text OCR for detecting and recognizing OCR & Document
Recognition) structured text like tables in Analysis
documents
Adobe PDF Extract API Extracting structured data including PDF/Table Extraction
tables from PDFs
PyMuPDF (Fitz) Extracting content (text, tables) from PDF Parsing
PDF documents
Tabula Extracting tables from PDFs into PDF/Table Extraction
CSV/Excel
Keras-OCR OCR for detecting and extracting text OCR for Images
and tables from images
LayoutLM Pre-trained model for reading and Document
extracting structured data from Understanding/OCR
scanned documents
TrOCR (Transformer OCR model based on Transformer OCR for Documents
OCR) architecture for extracting text and
tables
Amazon Textract Automated text and table extraction OCR for PDFs & Images
from documents
Google Cloud Vision OCR with table detection capabilities OCR for Images & PDFs
API for scanned images
Overview of Popular Models:
1. Camelot, Tabula, pdfplumber: Focus on extracting tables from PDFs and converting
them into structured formats like CSV or Excel.
2. Tesseract OCR, PaddleOCR: Used for general OCR tasks like reading text and simple
tables from images or scanned documents.
3. TableNet, DeepDeSRT: Specifically designed to detect and extract tabular structures
in scanned documents or images.
https://www.nitinkapse.com/ https://nichethyself.com/
4. LayoutLM: Pre-trained language model focused on document understanding, useful
for recognizing structured data like tables in scanned documents.
5. Amazon Textract, Google Cloud Vision API: Cloud-based APIs for extracting text,
tables, and forms from documents.
These tools and models provide capabilities for converting unstructured data (like tables in
PDFs or images) into structured formats, making it easier to analyze and process the data
programmatically.
Please click on the link below to register for Generative AI workshop
https://forms.gle/PrzkmvYh5yvEWUKZ6