Choosing the Right LLM: Toolkit and Resources

Jun 26, 2024

The pace of change in the field of LLMs is simply unfathomable. Every few weeks there’s another announcement of a record-breaking model, transformative paper or groundbreaking open source framework. We are so spoilt for choice, that one of the hardest challenges we face right now is doing just that — choosing. So I’m excited to be leading an O’Reilly Pearson Live Training Event called “Choosing the right LLM – how to select, train, and apply state-of-the-art LLMs to real-world business use cases”.

Quick links

The repo for the class is here, including links to Google Colab in the 3rd project
I also have an intensive 20+ week program of online courses that cover this material (and tons more) in great detail! This is the curriculum and course links that explain the program
My Proficient AI Engineer program
All my Live Events with O’Reilly and Pearson

Keep in touch

I’ll only ever contact you occasionally, and
I’ll always aim to add value with every email.

Thank you!

I’ll keep you posted.

LLM Explainer

I have a series of YouTube videos that explains what ‘parameters’ are and how they give an LLM its astonishing power. The first video and playlist is below, with the most popular video from the series (on Gradient Descent) below it.

Videos of my Arena

Testing GPT-4.5 and DeepSeek in a Connect Four Arena

Resources for Segment 1: The LLM Frontier

The seminal 2017 paper ‘Attention Is All You Need’ from Google scientists that brought about the Transformer is here. This sentence from the Abstract says it all:

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

The famous paper ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?’ that discussed bias and deception is here.

The prompt generator from Anthropic is described and linked here.

The Tools of the Trade used in the class:

Hugging Face – the go-to hub for models, datasets, leaderboards and even applications, and the authors of many essential open source frameworks including the pioneering transformers library
LangChain – open source framework that provides abstractions connecting multiple LLM operations under a simple API
Gradio – a ridiculously simple UI framework that lets you create prototype UIs in one line of code, no frontend experience needed
- Alternatives include Streamlit, Dash and most recently Mesop from Google
Weights & Biases – tooling to analyze and visualize during training
Google Colab – write, evaluate and share notebooks remotely on a box in the Google Cloud
- Amazon SageMaker is a broader alternative that includes Notebooks
Ollama – a platform for running open-source models in inference mode on your computer, using the optimized c++ library llama.cpp

Using LLMs Approach #1: The Web Chat Interfaces for 6 Frontier LLMs

ChatGPT (latest model GPT-4o) from OpenAI
Claude (latest model Claude 3.5 Sonnet) from Anthropic
Gemini Advance (latest model Gemini 1.5 Pro) from Google
Chat with Command R+ from Cohere
Meta.ai (model is Llama 3) from Meta
DeepSeek from DeepSeek AI
LeChat from Mistral
Perplexity (latest model is Perplexity Pro) from Perplexity.ai

Approach #2: The Cloud APIs

GPT API from OpenAI
Claude API from Anthropic
Gemini API from Google
DeepSeek platform from DeepSeek AI
Groq for high performance inference

Approach #3: Direct Inference

Ollama – also see the README in Github

Not covered in this class: Approach #4: using a Managed Service

Amazon Bedrock is the managed service from AWS:
“The easiest way to build and scale generative AI applications with foundation models”
Vertex AI is the managed service from Google Cloud:
“Innovate faster with enterprise-ready AI, enhanced by Gemini models”
Azure Machine Learning is the managed service from Microsoft.
“Build business-critical ML models at scale”

Also not covered in this class: Auto-encoding LLMs

There are 2 broad categories of LLMs:

Auto-regressive LLMs: predict a future token given past context; used for Generative AI and all the Use Cases we’re covering today.
Auto-encoding LLMs: learns sequences by predicting tokens given past and future context. These are commonly used for embedding and classification. An example is BERT.

Aside from briefly touching on Vector Encoding and OpenAI’s embedding model during the RAG exercise, we’ll be focusing only on auto-regressive LLMs in this session.

Multi-modal links – text-to-video and audio

New: Dream Machine from Luma Labs
Sora from OpenAI – still the frontrunner
Veo from Google
Kling from KuaiShou
VASA-1 from Microsoft
Udio from Udio
Suno from Suno

Resources for Segment 2: The Model Match-up

As a follow-up to the Frontier LLM face-off, here is an unbelievably amazing post by a Game dev: a “Reverse Turing” in which a bunch of frontier models and 1 human interact and everyone tries to figure out the human.

And here is the fun arena I wrote called “Outsmart” that has LLMs competing to outwit each other.

14 original benchmarks

BBHard benchmark with 23 tasks considered harder for LLMs is here.

IFEval challenging instruction following benchmark paper is here.

MuSR multi-step reasoning benchmark paper is here.

Google Deep Mind paper that evaluates frontier models for dangerous capabilities is here; conclusion is “We do not find evidence of strong
dangerous capabilities in the models we evaluated, but we flag early warning signs.”

The Leaderboards and Arenas

Hugging Face Open LLM
Hugging Face Big Code
Hugging Face LLM-Perf
All Hugging Face leaderboards – medical, Portuguese and more
Vellum.ai Leaderboard – includes BBHard, also Cost & Context Window comparison
SEAL specialist leaderboards from Scale.ai
AlpacaEval
LMSYS Chatbot Arena and contribute your votes here
OpenAI’s benchmark report from their GPT-4o announcement
Anthropic’s benchmark report from their Claude 3.5 Sonnet announcement

Real-world examples of LLMs making commercial impact

Harvey.ai – Law
Nebula.io – Talent
Bloop.ai – Tech (porting legacy code)
Einstein Copilot: Health – Healthcare
Khanmigo – Education

The Air Canada Legal Case

Is here.

Resources for Segment 3: Time to Code

Customization / Optimization Strategies

Pros and Cons of Multi-Shot Prompting versus RAG versus Fine-Tuning:

For background information on RAG, there’s excellent O’Reilly Live Event training, and there are many good online resources including this from Databricks.

For QLoRA, this is the original paper from May 2023 which follows an earlier paper on LoRA from June 2021. This blog post has good explanations of the methodology and hyper-parameters.

Here are some details on the 4 main LORA hyper-parameters at the top of the Jupyter notebook:

TARGET_MODULES: which components of the Transformer’s architecture should have low-rank matrices added.
LORA_R: the rank of these low-rank matrices that get added to each of the target modules. I’ve seen 8, 16 and 32 as the most common R values.
LORA_ALPHA: the scaling factor applied during low-rank adaptation; the rule of thumb ‘from the internet’ is that Alpha should be double R! I’ve experimented and that seemed to work for me.
LORA_DROPOUT: the dropout rate that’s applied to the low-rank matrices, randomly setting a fraction of the inputs to zero during training to help prevent overfitting. I’ve seen 0.1 and 0.05 commonly used.

Near Future Links

The Stargate Supercomputer that Microsoft and OpenAI are partnering to build.

Ilya Sutskever, former OpenAI Chief Scientist, has launched a new company Safe Superintelligence (SSI) with Daniel Levy (former Open AI) and Daniel Gross (former Y Combinator partner).

Robotics Links

Humanoid Robotics:

Phoenix from Sanctuary
Figure 01 from Figure

Rototics Models and Frameworks:

GROOT from Nvidia
RFM1 – 8B parameter LLM for Robotics from Covariant
LeRobot framework from Hugging Face

Recreating the Robotics Dataset Visualization:

See the LeRobot GitHub repo here and follow their setup instructions:

git clone https://github.com/huggingface/lerobot.git && cd lerobot

conda create -y -n lerobot python=3.10 && conda activate lerobot

pip install .

pip install ".[aloha, pusht]"

git clone https://github.com/huggingface/lerobot.git && cd lerobot

conda create -y -n lerobot python=3.10 && conda activate lerobot

pip install .

pip install ".[aloha, pusht]"

And then to visualize the dataset of the Aloha-Mobile robot cooking a shrimp, run this:

python lerobot/scripts/visualize_dataset.py --repo-id lerobot/aloha_mobile_shrimp --episode-index 0

python lerobot/scripts/visualize_dataset.py --repo-id lerobot/aloha_mobile_shrimp --episode-index 0

The Extra Project for Fun

I mentioned my experiment to train an LLM on my 240,000 text message history. My write-up of the journey is here, and the subsequent blog posts take you through the adventure of replicating this yourself!

Finally

If you make it this far through the resources, THANK YOU for hanging in there until the end! Please connect with me on LinkedIn and stay in touch.