DeepLearning.AI

How Transformer LLMs Work

up to 1 hour
Beginner

This course offers a deep dive into the main components of the transformer architecture that powers large language models (LLMs). Gain a strong technical foundation in transformers, understand recent improvements, and explore implementations in the Hugging Face Transformers library.

Transformer architecture
Tokenization strategies
Attention mechanism
Language model processing
Transformer block evolution

Overview

In this course, you'll learn how a transformer network architecture that powers LLMs works. You'll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture. By the end of this course, you’ll have a deep understanding of how LLMs process language and you’ll be able to read through papers describing models and understand the details that are used to describe these architectures.

Online
course location
English
course language
Self-paced
course format
Live classes
delivered online

Who is this course for?

AI Enthusiasts

Individuals interested in understanding the inner workings of transformer architectures that power today's LLMs.

Data Scientists

Professionals looking to deepen their knowledge of transformer models and their applications in AI.

Developers

Developers aiming to build applications using large language models and understand their underlying architecture.

Gain a deep understanding of transformer architectures that power today's LLMs. Learn key components like tokenization, embeddings, and self-attention, and explore recent improvements. Ideal for AI enthusiasts, data scientists, and developers looking to advance their careers.

Pre-Requisites

1 / 3

Basic understanding of machine learning concepts
Familiarity with programming languages such as Python
Interest in AI and language models

What will you learn?

Introduction

An overview of the course and its objectives.

Understanding Language Models: Language as a Bag-of-Words

Exploration of how language has been represented numerically, starting from the Bag-of-Words model.

Understanding Language Models: (Word) Embeddings

Introduction to word embeddings and their role in language models.

Understanding Language Models: Encoding and Decoding Context with Attention

Explanation of how attention mechanisms encode and decode context in language models.

Understanding Language Models: Transformers

Detailed look at the transformer architecture and its components.

Tokenizers

Discussion on tokenization strategies and their importance in language models.

Architectural Overview

Overview of the transformer architecture and its evolution.

The Transformer Block

In-depth analysis of the transformer block and its components.

Self-Attention

Detailed explanation of the self-attention mechanism and its role in transformers.

Model Example

Practical example of a model using the transformer architecture.

Recent Improvements

Overview of recent improvements to the transformer architecture.

Mixture of Experts (MoE)

Introduction to the Mixture of Experts model and its applications.

Conclusion

Summary of the course and its key takeaways.

Quiz

Assessment to test understanding of the course material.

Appendix – Tips, Help, and Download

Additional resources and tips for further learning.

Meet your instructors

Jay Alammar
Director and Engineering Fellow, Cohere
Co-author of Hands-On Large Language Models
Maarten Grootendorst
Senior Clinical Data Scientist, Netherlands Comprehensive Cancer Organization
Co-author of Hands-On Large Language Models

Upcoming cohorts

Cost
Free
Duration
1 hour
Dates
start now
Location
Online

Free