Mydra logo
Artificial Intelligence
DeepLearning.AI logo

DeepLearning.AI

How Transformer LLMs Work

  • up to 1 hour
  • Beginner

This course offers a deep dive into the main components of the transformer architecture that powers large language models (LLMs). Gain a strong technical foundation in transformers, understand recent improvements, and explore implementations in the Hugging Face Transformers library.

  • Transformer architecture
  • Tokenization strategies
  • Attention mechanism
  • Language model processing
  • Transformer block evolution

Overview

In this course, you'll learn how a transformer network architecture that powers LLMs works. You'll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture. By the end of this course, you’ll have a deep understanding of how LLMs process language and you’ll be able to read through papers describing models and understand the details that are used to describe these architectures.

  • Web Streamline Icon: https://streamlinehq.com
    Online
    course location
  • Layers 1 Streamline Icon: https://streamlinehq.com
    English
    course language
  • Self-paced
    course format
  • Live classes
    delivered online

Who is this course for?

AI Enthusiasts

Individuals interested in understanding the inner workings of transformer architectures that power today's LLMs.

Data Scientists

Professionals looking to deepen their knowledge of transformer models and their applications in AI.

Developers

Developers aiming to build applications using large language models and understand their underlying architecture.

Gain a deep understanding of transformer architectures that power today's LLMs. Learn key components like tokenization, embeddings, and self-attention, and explore recent improvements. Ideal for AI enthusiasts, data scientists, and developers looking to advance their careers.

Pre-Requisites

1 / 3

  • Basic understanding of machine learning concepts

  • Familiarity with programming languages such as Python

  • Interest in AI and language models

What will you learn?

Introduction
An overview of the course and its objectives.
Understanding Language Models: Language as a Bag-of-Words
Exploration of how language has been represented numerically, starting from the Bag-of-Words model.
Understanding Language Models: (Word) Embeddings
Introduction to word embeddings and their role in language models.
Understanding Language Models: Encoding and Decoding Context with Attention
Explanation of how attention mechanisms encode and decode context in language models.
Understanding Language Models: Transformers
Detailed look at the transformer architecture and its components.
Tokenizers
Discussion on tokenization strategies and their importance in language models.
Architectural Overview
Overview of the transformer architecture and its evolution.
The Transformer Block
In-depth analysis of the transformer block and its components.
Self-Attention
Detailed explanation of the self-attention mechanism and its role in transformers.
Model Example
Practical example of a model using the transformer architecture.
Recent Improvements
Overview of recent improvements to the transformer architecture.
Mixture of Experts (MoE)
Introduction to the Mixture of Experts model and its applications.
Conclusion
Summary of the course and its key takeaways.
Quiz
Assessment to test understanding of the course material.
Appendix – Tips, Help, and Download
Additional resources and tips for further learning.

Meet your instructors

  • Jay Alammar

    Director and Engineering Fellow, Cohere

    Co-author of Hands-On Large Language Models

  • Maarten Grootendorst

    Senior Clinical Data Scientist, Netherlands Comprehensive Cancer Organization

    Co-author of Hands-On Large Language Models

Upcoming cohorts

  • Dates

    start now

Free