The pace of change in the field of LLMs is simply unfathomable. Every few weeks there’s another announcement of a record-breaking model, transformative paper or groundbreaking open source framework. We are so spoilt for choice, that one of the hardest challenges we face right now is doing just that — choosing. So I’m excited to be leading an O’Reilly Pearson Live Training Event called “Choosing the right LLM – how to select, train, and apply state-of-the-art LLMs to real-world business use cases”.
Quick links
- The repo for the class is here, including links to Google Colab in the 3rd project
- I also have an intensive 20+ week program of online courses that cover this material (and tons more) in great detail! This is the curriculum and course links that explain the program
- My Proficient AI Engineer program
- All my Live Events with O’Reilly and Pearson
Keep in touch
I’ll only ever contact you occasionally, and
I’ll always aim to add value with every email.
LLM Explainer
I have a series of YouTube videos that explains what ‘parameters’ are and how they give an LLM its astonishing power. The first video and playlist is below, with the most popular video from the series (on Gradient Descent) below it.
Videos of my Arena
Testing GPT-4.5 and DeepSeek in a Connect Four Arena
Resources for Segment 1: The LLM Frontier
The seminal 2017 paper ‘Attention Is All You Need’ from Google scientists that brought about the Transformer is here. This sentence from the Abstract says it all:
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
The famous paper ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?’ that discussed bias and deception is here.
The prompt generator from Anthropic is described and linked here.
The Tools of the Trade used in the class:
- Hugging Face – the go-to hub for models, datasets, leaderboards and even applications, and the authors of many essential open source frameworks including the pioneering transformers library
- LangChain – open source framework that provides abstractions connecting multiple LLM operations under a simple API
- Gradio – a ridiculously simple UI framework that lets you create prototype UIs in one line of code, no frontend experience needed
- Weights & Biases – tooling to analyze and visualize during training
- Google Colab – write, evaluate and share notebooks remotely on a box in the Google Cloud
- Amazon SageMaker is a broader alternative that includes Notebooks
- Ollama – a platform for running open-source models in inference mode on your computer, using the optimized c++ library llama.cpp
Using LLMs Approach #1: The Web Chat Interfaces for 6 Frontier LLMs
- ChatGPT (latest model GPT-4o) from OpenAI
- Claude (latest model Claude 3.5 Sonnet) from Anthropic
- Gemini Advance (latest model Gemini 1.5 Pro) from Google
- Chat with Command R+ from Cohere
- Meta.ai (model is Llama 3) from Meta
- DeepSeek from DeepSeek AI
- LeChat from Mistral
- Perplexity (latest model is Perplexity Pro) from Perplexity.ai
Approach #2: The Cloud APIs
- GPT API from OpenAI
- Claude API from Anthropic
- Gemini API from Google
- DeepSeek platform from DeepSeek AI
- Groq for high performance inference
Approach #3: Direct Inference
- Ollama – also see the README in Github
Not covered in this class: Approach #4: using a Managed Service
- Amazon Bedrock is the managed service from AWS:
“The easiest way to build and scale generative AI applications with foundation models” - Vertex AI is the managed service from Google Cloud:
“Innovate faster with enterprise-ready AI, enhanced by Gemini models” - Azure Machine Learning is the managed service from Microsoft.
“Build business-critical ML models at scale”
Also not covered in this class: Auto-encoding LLMs
There are 2 broad categories of LLMs:
- Auto-regressive LLMs: predict a future token given past context; used for Generative AI and all the Use Cases we’re covering today.
- Auto-encoding LLMs: learns sequences by predicting tokens given past and future context. These are commonly used for embedding and classification. An example is BERT.
Aside from briefly touching on Vector Encoding and OpenAI’s embedding model during the RAG exercise, we’ll be focusing only on auto-regressive LLMs in this session.
Multi-modal links – text-to-video and audio
- New: Dream Machine from Luma Labs
- Sora from OpenAI – still the frontrunner
- Veo from Google
- Kling from KuaiShou
- VASA-1 from Microsoft
- Udio from Udio
- Suno from Suno
Resources for Segment 2: The Model Match-up
As a follow-up to the Frontier LLM face-off, here is an unbelievably amazing post by a Game dev: a “Reverse Turing” in which a bunch of frontier models and 1 human interact and everyone tries to figure out the human.
And here is the fun arena I wrote called “Outsmart” that has LLMs competing to outwit each other.
14 original benchmarks

BBHard benchmark with 23 tasks considered harder for LLMs is here.
IFEval challenging instruction following benchmark paper is here.
MuSR multi-step reasoning benchmark paper is here.
Google Deep Mind paper that evaluates frontier models for dangerous capabilities is here; conclusion is “We do not find evidence of strong
dangerous capabilities in the models we evaluated, but we flag early warning signs.”
The Leaderboards and Arenas
- Hugging Face Open LLM
- Hugging Face Big Code
- Hugging Face LLM-Perf
- All Hugging Face leaderboards – medical, Portuguese and more
- Vellum.ai Leaderboard – includes BBHard, also Cost & Context Window comparison
- SEAL specialist leaderboards from Scale.ai
- AlpacaEval
- LMSYS Chatbot Arena and contribute your votes here
- OpenAI’s benchmark report from their GPT-4o announcement
- Anthropic’s benchmark report from their Claude 3.5 Sonnet announcement
Real-world examples of LLMs making commercial impact
- Harvey.ai – Law
- Nebula.io – Talent
- Bloop.ai – Tech (porting legacy code)
- Einstein Copilot: Health – Healthcare
- Khanmigo – Education
The Air Canada Legal Case
Is here.
Resources for Segment 3: Time to Code
Customization / Optimization Strategies
Pros and Cons of Multi-Shot Prompting versus RAG versus Fine-Tuning:

For background information on RAG, there’s excellent O’Reilly Live Event training, and there are many good online resources including this from Databricks.
For QLoRA, this is the original paper from May 2023 which follows an earlier paper on LoRA from June 2021. This blog post has good explanations of the methodology and hyper-parameters.
Here are some details on the 4 main LORA hyper-parameters at the top of the Jupyter notebook:
- TARGET_MODULES: which components of the Transformer’s architecture should have low-rank matrices added.
- LORA_R: the rank of these low-rank matrices that get added to each of the target modules. I’ve seen 8, 16 and 32 as the most common R values.
- LORA_ALPHA: the scaling factor applied during low-rank adaptation; the rule of thumb ‘from the internet’ is that Alpha should be double R! I’ve experimented and that seemed to work for me.
- LORA_DROPOUT: the dropout rate that’s applied to the low-rank matrices, randomly setting a fraction of the inputs to zero during training to help prevent overfitting. I’ve seen 0.1 and 0.05 commonly used.
Near Future Links
The Stargate Supercomputer that Microsoft and OpenAI are partnering to build.
Ilya Sutskever, former OpenAI Chief Scientist, has launched a new company Safe Superintelligence (SSI) with Daniel Levy (former Open AI) and Daniel Gross (former Y Combinator partner).
Robotics Links
Humanoid Robotics:
Rototics Models and Frameworks:
- GROOT from Nvidia
- RFM1 – 8B parameter LLM for Robotics from Covariant
- LeRobot framework from Hugging Face
Recreating the Robotics Dataset Visualization:
See the LeRobot GitHub repo here and follow their setup instructions:
git clone https://github.com/huggingface/lerobot.git && cd lerobot
conda create -y -n lerobot python=3.10 && conda activate lerobot
pip install .
pip install ".[aloha, pusht]"And then to visualize the dataset of the Aloha-Mobile robot cooking a shrimp, run this:
python lerobot/scripts/visualize_dataset.py --repo-id lerobot/aloha_mobile_shrimp --episode-index 0The Extra Project for Fun
I mentioned my experiment to train an LLM on my 240,000 text message history. My write-up of the journey is here, and the subsequent blog posts take you through the adventure of replicating this yourself!
Finally
If you make it this far through the resources, THANK YOU for hanging in there until the end! Please connect with me on LinkedIn and stay in touch.


Leave a Reply