Skip to Content
02 Data Science

Data Science Foundations

This folder is the practical base layer for the rest of the repo. If you are not yet comfortable with arrays, DataFrames, plots, train/test splits, and the fit/predict/transform workflow, later LLM and deep learning phases will feel harder than they need to.

This is one of the highest-leverage phases in the repo. If this phase is weak, later notebook work turns into library imitation rather than actual understanding.

What This Phase Covers

  • NumPy for numerical thinking
  • pandas for messy real-world data
  • matplotlib for basic visualization
  • scikit-learn for classical ML workflows
  • broader data science examples for exploratory and applied practice

Folder Map

  • 1-numpy-examples/: array operations, broadcasting, indexing, exercises
  • 2-pandas-examples/: cleaning, joins, grouping, time-series handling, projects
  • 3-data-science-examples/: broader learning material and reference notebooks
  • 4-matplotlib/: plotting fundamentals
  • 5-scikit-learn/: a very large example library across major model families
  1. Work through 1-numpy-examples/
  2. Move to 2-pandas-examples/
  3. Use 4-matplotlib/ for core plotting habits
  4. In 5-scikit-learn/, focus first on:
    • linear_model/
    • model_selection/
    • preprocessing/
    • ensemble/
    • cluster/
  5. Use 3-data-science-examples/ as breadth and reinforcement, not as a strict sequential course

Study Advice

  • Do not try to complete every scikit-learn notebook on the first pass.
  • Prefer one full workflow over broad shallow browsing: data loading -> cleaning -> feature work -> split -> train -> evaluate -> explain results.
  • Keep notes on leakage, validation mistakes, and metric choice. Those habits matter more than memorizing APIs.

How To Use This Phase Well

  • Treat this as skill-building, not just exposure to libraries.
  • Finish at least one end-to-end tabular workflow before moving on.
  • Use the large scikit-learn folder selectively instead of trying to clear it like a checklist.
  • Revisit this phase later when evaluation, leakage, or debugging issues show up in advanced projects.

Suggested Projects

  • Iris or wine classification with proper validation
  • Housing-price regression with feature engineering
  • Customer segmentation with clustering and a short business write-up

What Comes Next

After this phase, move to:

  1. 03-maths/README.md
  2. 04-token/README.md
  3. 05-embeddings/README.md

If you are still shaky on Python while doing this phase, loop back to 01-python/README.md alongside the exercises instead of waiting until you are stuck.

Last updated on