Showing posts with label Artificial Intelligence. Show all posts

Sunday, 7 July 2024

Top large language model to watch

The LLM landscape is exploding! With the immense potential of large language models, competition is fierce as companies race to develop the most powerful and innovative models. Training these models presents a lucrative business opportunity, attracting major players and startups alike.

Keeping track of the leaders is challenging. The LLM space is highly competitive, making it difficult to identify a single frontrunner. New versions are released constantly, pushing the boundaries of what's possible. While some might see this as a race to the bottom, it's more accurate to view it as rapid innovation that will ultimately benefit everyone.

Top company as of July,2024

Above diagram is in 2 groups , one for commercial ones and other one for hybrid(commercial/open weights)

Commercial

OpenAI

This is poster child of LLMs, it has series of GPT* models. First large scale provider consumer LLMs.

GPT4-O is flagship model and all the models are available via API. This is very well funded and microsoft is behind this.

More details about model can be found at Open AI Model

Research paper talking about GPT4 Model is available at

GPT-4 Technical Report

GPT 1.0

GPT 2.0

Language Models are Few-Shot Learners

Evaluating Large Language Models Trained on Code

Amazon

Amazon has family of models called "Titan". Amazon Titan family of models incorporates Amazon’s 25 years of experience innovating with AI and machine learning across its business. Amazon Titan foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API.

More details about model can be found at Amazon Models

No research papers are available about amazon LLM model details. It is all propriety to keep competitive edge.

Antropic

Antropic is cofounded by some of ex Open AI employee.

Anthropic's latest offering, Claude 3.5 Sonnet, has generated significant buzz. This powerful language model builds upon their previous success with Claude 3 Opus and is claimed to outperform OpenAI's GPT-4o, particularly in coding tasks.

Antropic is also very well funded, Amazon and google are major investor.

More details about model can be found at Antropic Models

Antropic models will be based on Open-AI type of architecture but they are focused on few research principal like AI as Systematic Science , safety and scaling

One of the popular research paper from antropic is mapping-mind-language-model

MoasicML

MosaicML, co-founded by an MIT alumnus and a professor, made deep-learning models faster and more efficient. It was acquired by Databricks.

Mosaic Pretrained Transformers (MPT) are GPT-style models with some special features -- Flash Attention for efficiency, ALiBi for context length extrapolation, and stability improvements to mitigate loss spikes.

More details about model can be found at mosaic ml

Some popular research papers are Train Short, Test Long and Flash attention

InflectionAI

Inflection AI focuses on developing a large language model (LLM) for personal use called Inflection.

Not much details is available about how model was trained but they claim - world's top empathetic Large Language Model (LLM)

More details about model can be found at inflection-2-5

Hybrid/Open Source

Google

Google inventor of famous paper Attention Is All You Need that became kernel of all the LLMs we see today.

Google has been releasing LLM to community before Chatgpt came, Bert was one of the first model based on encoder/decoder and become foundation for many LLM that we see.

Google offers large language models (LLMs) across a spectrum of availability. Some models are fully commercial with open weights, meaning the underlying code is proprietary but the model outputs are accessible.

The Gemini family exemplifies this, with variants like Ultra, Pro (introduced in v1.5), Flash, and Nano catering to different needs in terms of size and processing power.

In contrast, Gemma is Google's open-source LLM family. It's designed for developers and researchers and comes in various sizes (e.g., Gemma 2B and 7B) for flexibility

Lots of reading material is available from google on LLM and Gemma models, some of the popular ones are

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

PaLM: Scaling Language Modeling with Pathways

Scaling Instruction-Finetuned Language Models

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Gemma: Open Models Based on Gemini Research and Technology

gemma-2-report

Mistral

Mistral is french based company and they release all model weights under Apache 2.0.

Mistral strives to create efficient models that require less computational power compared to some competitors. This makes them more accessible to a wider range of users.

Mistral innovation is around Grouped Query Attention (GQA). Some of the recent models are based on Mixture Of Expert.

More details about model is available at Mistral models

Some of popular research papers are Mistral 7B , A Closer Look into Mixture-of-Experts in Large Language Models

DataBricks

Databricks is building open source model that are based on MOE. Most recent and state of the art model is DBRX.

Details about model is available at introducing-dbrx-new-state-art-open-llm

Some of popular research papers are

MEGABLOCKS: EFFICIENT SPARSE TRAINING WITH MIXTURE-OF-EXPERTS

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Cohere

Cohere is canadian based company. They build model called CommandR, it is a state-of-the-art RAG-optimized model designed to tackle enterprise-grade workloads.

More details about model can be found at Command-R

Some of popular research papers are RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Microsoft

While Microsoft leverages OpenAI's powerful GPT-4 language models for some functionalities, they've also made significant contributions to open-source AI with the Phi-3 family of models.

Phi-3 models are a type of small language model (SLM), specifically designed for efficiency and performance on mobile devices and other resource-constrained environments.

More details about model can be found at phi-3

Some of popular research papers related to Phi series model are Textbooks Are All You Need , Textbooks Are All You Need II and Phi-3 Technical Report

Conclusion

We are witnessing an interesting time where many large language model (LLM) models are available for building apps, accessible to both consumers and developers. Predicting the dominant player is difficult due to the rapidly changing landscape.

One key concept to grasp is that the GENAI stack is multifaceted. Foundation models are just one layer, and they can be quite expensive due to hardware requirements. Training a foundation model can easily cost millions of dollars, making it difficult for companies to maintain a competitive edge.

As software engineers, we need to leverage this technology by selecting the best model for each specific use case. Defining "best" can be subjective, and the answer often depends on various factors.

Here's a crucial consideration: while using the top-performing LLM might be tempting, it's vital to maintain a flexible architecture. This allows you to easily switch to newer LLMs, similar to how we switch between databases or other vendor-specific technologies.

In the next part of this blog, I'll explore the inference side of LLMs, a fascinating area that will ultimately determine the return on investment (ROI) for companies making significant investments in this technology.

Friday, 7 May 2021

What is Artificial Intelligence ?

This is a post from series on artificial intelligence and machine.

In this post, we will try to understand what AI is and where machine learning fits in it.

As per Wikipedia Artificial Inteligence is

Simulating any intellectual task.

It can be also seen as the industrial revolution to simulate the brain.

AI is a very broad field and it contains many subfields and it is important to understand what the full landscape looks like and focus on the core part that overlaps with almost every subfield of AI.

Let's try to understand each subfield.

Knowledge representation

This is core to many AI applications, it is based on an expert system that collects explicit knowledge that is available in some database or possessed by experts.

This can be also seen as Knowledge about knowledge. We interact with system type of system every day be it Amazon Alexa, Apple Siri, or Google Assistance.

Perception

Machine Perception is about using sensor input to understand context and action to take. Nowadays we are surrounded by cameras, microphones, IoT devices, etc.

Some real-world applications include facial recognition, computer vision, speech, etc.

Motion and manipulation

This is one of the heavy use of AI, it includes robotics. The industrial revolution has already helped the world economy grow, and robotics will take it to the next level. Some applications in industrial/domestic robots. In the time of pandemics like Covid, robotics is even going to help more as everyone is concerned about safety. Autonomous vehicles are one of the important applications of this sub-field.

Natural language processing

NLP allows the machine to read and understand human language. It includes processing huge unstructured data and derives meaning from it. Some of the application that we get interact every day is search autocomplete, auto-correction, language translator, chatbots, targeted advertisement, etc.

Search and planning

This area covers machine that is set a goal and achieves it. The machine builds the state of the world and can make predication on how their action will change it.

Learning

This is also called as Machine Learning and it is the study of computer algorithms that automatically improve through experience.

It sounds like how humans learn something!

It is a subfield of AI but the most important one as it is applied to all the subfields of AI, knowing this is a must before starting on any other subfield of AI.

Let's explore more on the Learning part now.

What is machine learning?

One of the quick definitions of machine learning is pattern recognization, it can also be seen as how computers can discover to solve problems without explicit programming.

Machine learning is made up of 3 steps.

The step of updating the model via learning is where real machine learning happens.

Data science is related to machine learning but is often seen as only machine learning. AI & data science good overlap with machine learning, it can be seen as below.

Where does data science fits in machine learning? It is the unified concept of statistics, maths, data mining, data analysis, etc

Data Science

Now with a high-level understanding of AI, ML & data science, we are ready to do deep dive in ML.

Artificial Intelligence and machine learning

This post contains a catalog of high-level concepts in AI & ML.

I will keep on updating as I write more stuff

What is Artificial Intelligence?

Are you ready