Data Science Foundations
This folder is the practical base layer for the rest of the repo. If you are not yet comfortable with arrays, DataFrames, plots, train/test splits, and the fit/predict/transform workflow, later LLM and deep learning phases will feel harder than they need to.
This is one of the highest-leverage phases in the repo. If this phase is weak, later notebook work turns into library imitation rather than actual understanding.
What This Phase Covers
- NumPy for numerical thinking
- pandas for messy real-world data
- matplotlib for basic visualization
- scikit-learn for classical ML workflows
- broader data science examples for exploratory and applied practice
Folder Map
1-numpy-examples/: array operations, broadcasting, indexing, exercises2-pandas-examples/: cleaning, joins, grouping, time-series handling, projects3-data-science-examples/: broader learning material and reference notebooks4-matplotlib/: plotting fundamentals5-scikit-learn/: a very large example library across major model families
Recommended First Pass
- Work through
1-numpy-examples/ - Move to
2-pandas-examples/ - Use
4-matplotlib/for core plotting habits - In
5-scikit-learn/, focus first on:linear_model/model_selection/preprocessing/ensemble/cluster/
- Use
3-data-science-examples/as breadth and reinforcement, not as a strict sequential course
Study Advice
- Do not try to complete every scikit-learn notebook on the first pass.
- Prefer one full workflow over broad shallow browsing: data loading -> cleaning -> feature work -> split -> train -> evaluate -> explain results.
- Keep notes on leakage, validation mistakes, and metric choice. Those habits matter more than memorizing APIs.
How To Use This Phase Well
- Treat this as skill-building, not just exposure to libraries.
- Finish at least one end-to-end tabular workflow before moving on.
- Use the large scikit-learn folder selectively instead of trying to clear it like a checklist.
- Revisit this phase later when evaluation, leakage, or debugging issues show up in advanced projects.
Suggested Projects
- Iris or wine classification with proper validation
- Housing-price regression with feature engineering
- Customer segmentation with clustering and a short business write-up
What Comes Next
After this phase, move to:
If you are still shaky on Python while doing this phase, loop back to 01-python/README.md alongside the exercises instead of waiting until you are stuck.
Last updated on