Practice Labs: SLP (Jurafsky/Martin)
Source PDF: ed3book_jan26.pdf Book: Speech and Language Processing (3rd Edition draft) by Daniel Jurafsky & James H. Martin
This book is a comprehensive NLP textbook covering tokenization, language models, embeddings, neural networks, transformers, and LLMs. Labs follow the book’s chapter order.
Use this folder as a compact NLP foundations bridge inside the math section. It is especially useful if you want the conceptual path from tokens and n-grams to transformers and LLMs in one place.
Labs
| Lab | Topic | Book Chapter(s) | Key Concepts |
|---|---|---|---|
| Lab 01 | Words, Tokens & Text Processing | Ch 2: Words and Tokens | BPE tokenization, regex, edit distance |
| Lab 02 | N-gram Language Models | Ch 3: N-gram Language Models | N-grams, perplexity, smoothing, text generation |
| Lab 03 | Word Embeddings | Ch 5: Embeddings | Co-occurrence, TF-IDF, Word2Vec, cosine similarity |
| Lab 04 | Neural Networks from Scratch | Ch 6: Neural Networks | XOR, feedforward nets, backprop, optimizers |
| Lab 05 | Transformers & Attention | Ch 8: Transformers | Self-attention, multi-head, positional encoding |
| Lab 06 | Large Language Models | Ch 7, 9, 10: LLMs, MLMs, Post-training | Sampling, prompting, BERT masking, RLHF |
How to Use
- Each lab is a Jupyter notebook with theory (markdown) and fully implemented code cells
- Read the theory cells, study the implementations, and run each cell
- Open in Jupyter:
jupyter notebook lab_01_words_tokens.ipynb
Prerequisites
- Python 3.8+
- NumPy
- Matplotlib
Suggested Order
Follow the labs in order (1 through 6) as they build upon each other:
- Lab 01 - Text processing fundamentals
- Lab 02 - Statistical language models
- Lab 03 - Word representations
- Lab 04 - Neural network foundations
- Lab 05 - Transformer architecture
- Lab 06 - Modern LLMs and applications
How To Use This Folder Well
- Follow the order because each lab builds on the previous one.
- Use these labs to connect NLP concepts to implementation rather than treating them as only theory notes.
- Pair this folder with the token, embeddings, and prompt-engineering phases when you want the broader curriculum context.
What Comes Next
- Continue to ../../04-token/README.md and ../../05-embeddings/README.md for the main repo path.
- Continue to ../../08-rag/README.md if you want to apply language-model retrieval systems.
- Continue to ../../12-llm-finetuning/README.md if your interest shifts toward adapting models.
Last updated on