Skip to Content
12 LLM Finetuning

LLM Fine-Tuning

This module is strongest when approached as decision-making, not just training mechanics. The real question is when fine-tuning is the right tool, how to prepare data well enough for it to matter, and how to evaluate whether the resulting model is actually better than prompting or RAG.

This phase should be treated as a selective tool, not a default answer. Most learners get more value when they first understand prompting, retrieval, evaluation, and deployment tradeoffs before deciding to tune a model.

Actual Module Contents

  1. 01_START_HERE.ipynb
  2. 02_dataset_preparation.ipynb
  3. 03_supervised_finetuning.ipynb
  4. 04_lora_basics.ipynb
  5. 05_qlora_efficient.ipynb
  6. 06_dpo_alignment.ipynb
  7. 07_evaluation.ipynb
  8. 08_deployment.ipynb
  9. 09_grpo_reasoning_training.ipynb
  10. 10_unsloth_fast_finetuning.ipynb
  11. 11_quantization_gptq_awq.ipynb
  12. 12_rlhf_constitutional_ai.ipynb
  • First pass: 00 -> 01 -> 02 -> 03 -> 04 -> 06 -> 07
  • Second pass for alignment: 05 -> 08 -> 11
  • Deployment and efficiency depth: 09 -> 10

What To Learn Here

  • When fine-tuning beats prompting
  • Why dataset quality dominates training quality
  • How LoRA and QLoRA reduce hardware needs
  • Why evaluation must be task-specific
  • The distinction between SFT, preference optimization, and RL-style alignment

Study Advice

  • Do not start with RLHF terminology if SFT data formatting is still fuzzy.
  • Treat 07_evaluation.ipynb as a required notebook, not an optional one.
  • Compare every fine-tuning idea against a prompting baseline and a RAG baseline.

How To Use This Phase Well

  • Start with SFT, LoRA, and evaluation before moving into alignment-heavy notebooks.
  • Keep a baseline model and task benchmark so you can measure whether tuning actually helped.
  • Focus on data quality and task framing before spending time on training tricks.
  • Pair deployment and monitoring work here with ../09-mlops/README.md once you have a model worth serving.

Practical Outcomes

After this module, you should be able to:

  • Prepare instruction-format data
  • Run an adapter-based fine-tune
  • Evaluate whether the tuned model improved on a concrete task
  • Package or deploy the result without confusing training success for production readiness
  • Use coding agents (OpenHands, OpenCode, Aider) to accelerate script writing, debugging, and evaluation

Agent-Assisted Fine-tuning

Notebook 00_START_HERE.ipynb (Section 10) covers how coding agents like OpenHands, OpenCode, and mini-swe-agent can automate the engineering side of the fine-tuning pipeline - scaffolding training scripts, debugging OOM errors, generating hyperparameter sweep configs, and writing evaluation harnesses.

Dataset curation and alignment decisions remain human work. The agents accelerate everything around those decisions.

Cross-references:

What Comes Next

Last updated on