How LaMDA Works

Thomas Packer, Ph.D.
3 min readFeb 7, 2023

Last updated: 2023–02–09

I started one story on How ChatGPT Works. Since then, Google has responded to the popularity of ChatGPT by announcing more plans with their own LaMDA model in the form of Bard. For now, it seams like there will be an ongoing battle between the two. Both are impressive and instructive. So I will add this story about LaMDA to parallel the story about ChatGPT. And I expect to learn a lot about how to make an awesome dialog system along the way.

Pieces of LaMDA


LaMDA uses Google’s SentencePiece sub-word tokenizer.

The Large Language Model

LaMDA is a family of Transformer-based neural language models with up to 137B model parameters.


For pre-training, they collected a dataset of 1.56T words from public dialog data and other public web documents.

For basic fine-tuning, they collect a dataset of dialogs that are annotated with safety and quality on the system responses. I’d like to know how big this dataset is.

For groundedness fine-tuning, they collect a dataset of dialogs between people and LaMDA which are annotated with search queries and retrieved results. I’d also like to know how big this dataset is.

Objectives and Metrics

Three types of objectives guide the training of LaMDA: quality, safetey, and groundednes; each defined and driven in practice by one or more metrics.

Quality (a.k.a. SSI) is split into three facets. Sensibleness ensures that it makes sense in the dialog context. Specificity ensures it is customized to refer to the current dialog context and not many others. Interestingness ensures it is not too predictable or bland. These traits are rated by humans.

Safety attempts to decrease the chance that what the agent says will cause unintended harm or unfair bias against people.

Groundedness is a proxy for truthfulness. It is defined as the percentage of the responses containing claims about the external world that can be supported by authoritative external sources.

Informativeness, related to groundedness, is how information-rich the system is. It is defined as the percentage of all responses that that contain information about the external world and can be supported by known sources. Casual responses that do not carry real world information affect Informativeness but not Groundedness.

How LaMDA Was Made

Step 1: Pre-Training

The pre-training dataset was tokenized into 2.81T SentencePiece tokens. The self-supervised task was to predict every next token in a sentence, given the previous tokens. This pre-trained model was used in NLP research across Google, including program synthesis, zero-shot learning, style transfer.

Step 2: Fine-Tuning

LaMDA is fine-tuned as a multi-task model to perform two things: text generation and classification. The generator (self-supervised) predicts the next token on a dialog-only dataset. The classifiers (supervised) predict the safety and quality ratings for a whole response in context.

When applied in practice, the LaMDA generator first generates several candidate responses given the current multi-turn dialog context, and the LaMDA classifiers predict the quality and safety scores for each response candidate. Candidate responses with low safety scores are removed. The remaining candidates are re-ranked by quality. The top result is selected as the response. They also filter the generation task training data using the classifiers to increase the density of high-quality response candidates.

Step 3: Grounded Fine-Tuning

I’m not sure if this is a separate fine-tuning step or the same as Step 2, but …

They fine-tune LaMDA’s generator and classifier on the groundedness dataset to learn to call an external search engine while interacting with a user to improve the groundedness of its responses. This makes LaMDA capable of zero-shot domain adaptation by hard-coding a starting system utterance in that domain to start a new conversation and relying on search results to ground subsequent responses in that domain.

Join the CAI Dialog on Slack at

About TP on CAI

Other stories in TP on CAI you may like:



Thomas Packer, Ph.D.

I do data science (QU, NLP, conversational AI). I write applicable-allegorical fiction. I draw pictures. I have a PhD in computer science and I love my family.