Skip to Content
03 MathsSlp Book

Practice Labs: SLP (Jurafsky/Martin)

Source PDF: ed3book_jan26.pdf Book: Speech and Language Processing (3rd Edition draft) by Daniel Jurafsky & James H. Martin

This book is a comprehensive NLP textbook covering tokenization, language models, embeddings, neural networks, transformers, and LLMs. Labs follow the book’s chapter order.

Use this folder as a compact NLP foundations bridge inside the math section. It is especially useful if you want the conceptual path from tokens and n-grams to transformers and LLMs in one place.

Labs

LabTopicBook Chapter(s)Key Concepts
Lab 01Words, Tokens & Text ProcessingCh 2: Words and TokensBPE tokenization, regex, edit distance
Lab 02N-gram Language ModelsCh 3: N-gram Language ModelsN-grams, perplexity, smoothing, text generation
Lab 03Word EmbeddingsCh 5: EmbeddingsCo-occurrence, TF-IDF, Word2Vec, cosine similarity
Lab 04Neural Networks from ScratchCh 6: Neural NetworksXOR, feedforward nets, backprop, optimizers
Lab 05Transformers & AttentionCh 8: TransformersSelf-attention, multi-head, positional encoding
Lab 06Large Language ModelsCh 7, 9, 10: LLMs, MLMs, Post-trainingSampling, prompting, BERT masking, RLHF

How to Use

  1. Each lab is a Jupyter notebook with theory (markdown) and fully implemented code cells
  2. Read the theory cells, study the implementations, and run each cell
  3. Open in Jupyter: jupyter notebook lab_01_words_tokens.ipynb

Prerequisites

  • Python 3.8+
  • NumPy
  • Matplotlib

Suggested Order

Follow the labs in order (1 through 6) as they build upon each other:

  1. Lab 01 - Text processing fundamentals
  2. Lab 02 - Statistical language models
  3. Lab 03 - Word representations
  4. Lab 04 - Neural network foundations
  5. Lab 05 - Transformer architecture
  6. Lab 06 - Modern LLMs and applications

How To Use This Folder Well

  • Follow the order because each lab builds on the previous one.
  • Use these labs to connect NLP concepts to implementation rather than treating them as only theory notes.
  • Pair this folder with the token, embeddings, and prompt-engineering phases when you want the broader curriculum context.

What Comes Next

Last updated on