Practice Labs: SLP (Jurafsky/Martin)

Source PDF: ed3book_jan26.pdf Book: Speech and Language Processing (3rd Edition draft) by Daniel Jurafsky & James H. Martin

This book is a comprehensive NLP textbook covering tokenization, language models, embeddings, neural networks, transformers, and LLMs. Labs follow the book’s chapter order.

Use this folder as a compact NLP foundations bridge inside the math section. It is especially useful if you want the conceptual path from tokens and n-grams to transformers and LLMs in one place.

Labs

Lab	Topic	Book Chapter(s)	Key Concepts
Lab 01	Words, Tokens & Text Processing	Ch 2: Words and Tokens	BPE tokenization, regex, edit distance
Lab 02	N-gram Language Models	Ch 3: N-gram Language Models	N-grams, perplexity, smoothing, text generation
Lab 03	Word Embeddings	Ch 5: Embeddings	Co-occurrence, TF-IDF, Word2Vec, cosine similarity
Lab 04	Neural Networks from Scratch	Ch 6: Neural Networks	XOR, feedforward nets, backprop, optimizers
Lab 05	Transformers & Attention	Ch 8: Transformers	Self-attention, multi-head, positional encoding
Lab 06	Large Language Models	Ch 7, 9, 10: LLMs, MLMs, Post-training	Sampling, prompting, BERT masking, RLHF

How to Use

Each lab is a Jupyter notebook with theory (markdown) and fully implemented code cells
Read the theory cells, study the implementations, and run each cell
Open in Jupyter: jupyter notebook lab_01_words_tokens.ipynb

Prerequisites

Python 3.8+
NumPy
Matplotlib

Suggested Order

Follow the labs in order (1 through 6) as they build upon each other:

Lab 01 - Text processing fundamentals
Lab 02 - Statistical language models
Lab 03 - Word representations
Lab 04 - Neural network foundations
Lab 05 - Transformer architecture
Lab 06 - Modern LLMs and applications

How To Use This Folder Well

Follow the order because each lab builds on the previous one.
Use these labs to connect NLP concepts to implementation rather than treating them as only theory notes.
Pair this folder with the token, embeddings, and prompt-engineering phases when you want the broader curriculum context.

What Comes Next

Continue to ../../04-token/README.md and ../../05-embeddings/README.md for the main repo path.
Continue to ../../08-rag/README.md if you want to apply language-model retrieval systems.
Continue to ../../12-llm-finetuning/README.md if your interest shifts toward adapting models.