Pandas Examples - Consolidated & Organized
This directory contains a comprehensive collection of pandas tutorials, exercises, and real-world projects, consolidated from multiple sources and organized for progressive learning.
Use this folder to become comfortable with messy real-world tables. The goal is not just learning pandas syntax. The goal is being able to inspect, clean, join, summarize, and explain tabular data without getting lost.
📁 Directory Structure
01-basics/
Beginner-friendly tutorials covering fundamental pandas concepts
Pandas 101 Series (YouTube Course):
Pandas 101 - Pandas Series and Dataframes.ipynb- Core data structuresPandas 101 - Reading in Files.ipynb- Loading data from various sourcesPandas 101 - Data Cleaning in Pandas.ipynb- Handling missing data, duplicatesPandas 101 - Filtering and Ordering in Pandas.ipynb- Boolean indexing, sortingPandas 101 - Group by and Aggregating in Pandas.ipynb- Grouping and aggregationsPandas 101 - Indexing in Pandas.ipynb- Row/column selection techniquesPandas 101 - Merge, Join, and Concatenate in Pandas.ipynb- Combining DataFramesPandas 101 - Visualizing Data in Pandas.ipynb- Creating plotsPandas 101 - Exploratory Data Analysis in Pandas.ipynb- EDA techniques
DataFrame Fundamentals:
DataFrames I.ipynb- Introduction to DataFramesDataFrames II.ipynb- Intermediate DataFrame operationsDataFrames III.ipynb- Advanced DataFrame techniques
Pandas & NumPy Integration:
pandas-numpy-lessons/- Three lessons on pandas with NumPylesson1/- Basic integrationlesson2/- Intermediate techniqueslesson3/- Advanced operations
Topics Covered:
- Series and DataFrame creation
- Reading CSV, Excel, JSON files
- Data cleaning (missing values, duplicates, formatting)
- Indexing and selection (.loc, .iloc, boolean indexing)
- Filtering, sorting, and ordering
- Basic aggregations and statistics
Recommended Order:
- Start with Pandas 101 series (in order listed above)
- Work through DataFrames I-III
- Explore pandas-numpy-lessons
02-intermediate/
Intermediate topics for deepening pandas knowledge
Comprehensive course from “Data Analysis with Pandas and Python”:
Core Operations:
DataFrames 1.ipynb- DataFrame fundamentals (review)DataFrames 2.ipynb- Intermediate operationsDataFrames 3.ipynb- Advanced operationsGroupBy.ipynb- Advanced grouping and aggregationInput and Output.ipynb- Reading/writing various formatsMerge, Join and Concat.ipynb- Combining datasetsMultiindex.ipynb- Hierarchical indexingOptions and Settings.ipynb- Customizing pandas behavior
Data Operations:
Filtering Methods.ipynb- Advanced filtering techniquesMissing Data.ipynb- Handling NaN and None valuesText Methods and Filtering.ipynb- String operationsWorking with Dates and Times.ipynb- DateTime operations
Visualization & Analysis:
Visualizations.ipynb- Plotting with pandasWorking with Duplicates.ipynb- Finding and removing duplicates
Topics Covered:
- Advanced groupby operations (multiple aggregations, transformations)
- Multi-level indexing (hierarchical data)
- DateTime manipulation and time series
- String methods and text processing
- Complex filtering and boolean logic
- Data type conversions and optimization
- Handling large datasets efficiently
03-exercises/
Practice exercises to test and improve your pandas skills
100 Pandas Puzzles:
100-pandas-puzzles.ipynb- 100 curated pandas challenges100-pandas-puzzles-with-solutions.ipynb- Same with solutions
Description: Inspired by 100 NumPy exercises, these puzzles focus on core DataFrame and Series manipulation, covering indexing, grouping, aggregating, and data cleaning.
Difficulty Levels:
- ★☆☆ Easy - Basic operations
- ★★☆ Medium - Combining multiple techniques
- ★★★ Hard - Complex multi-step solutions
Topic-Based Exercises:
Located in topic-based-exercises/:
01 - Getting & Knowing Your Data:
- Chipotle, Occupation, World Food Facts datasets
- Basic exploration, info, describe, shape, columns
02 - Filtering & Sorting:
- Chipotle, Euro12, Fictional Army datasets
- Boolean indexing, sorting, conditional selection
03 - Grouping:
- Alcohol Consumption, Occupation, Regiment datasets
- GroupBy operations, aggregations, transformations
04 - Apply:
- Students Alcohol Consumption, US Crime Rates
- Apply, map, applymap functions
05 - Merge:
- Auto MPG, Fictitious Names, Housing Market
- Merge, join, concat operations
06 - Stats:
- US Baby Names, Wind Stats
- Statistical operations, rolling windows
07 - Visualization:
- Chipotle, Online Retail, Scores, Tips, Titanic
- Matplotlib integration, plotting techniques
08 - Creating Series and DataFrames:
- Pokemon dataset
- Programmatic DataFrame creation
09 - Time Series:
- Apple Stock, Financial Data, Investor Flows
- DateTime indexing, resampling, time-based operations
10 - Deleting:
- Iris, Wine datasets
- Dropping rows, columns, duplicates
11 - Indexing:
- Advanced indexing exercises
- Setting, resetting, multi-level indices
Each topic includes:
- Exercises.ipynb (practice problems)
- Solutions.ipynb (detailed solutions with explanations)
04-advanced/
Advanced techniques and specialized topics
pandas-cookbook/
A comprehensive cookbook with advanced recipes.
Located in: pandas-cookbook/
Contents:
- Advanced data manipulation techniques
- Performance optimization strategies
- Memory-efficient operations
- Complex transformations and aggregations
- Integration with other libraries
Topics:
- Custom aggregation functions
- Window functions and rolling operations
- Categorical data optimization
- Working with large datasets
- Advanced indexing patterns
- Data pipeline design
05-real-world-projects/
Real-world data analysis projects
Projects:
-
Apple Health Data.ipynb
- Analyzing personal health data exports
- Time series analysis of activity, heart rate, sleep
- Visualization of health trends
-
Electronic Production India.ipynb
- Economic data analysis
- Industry production trends
- Regional comparisons
Datasets:
bigmac.csv- Big Mac Index data (purchasing power parity)chicago.csv- Chicago city datacrime_india.csv- Crime statistics- Additional real-world datasets
Skills Applied:
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Time series analysis
- Statistical analysis
- Data visualization
- Insight generation and reporting
🎯 Learning Path
Beginner (0-3 weeks)
01-basics/
├── Complete Pandas 101 series (9 notebooks)
├── DataFrames I-III
└── Practice: First 30 puzzles from 100-pandas-puzzles
Time: 2-3 hours daily
Goal: Understand Series, DataFrames, basic operationsIntermediate (3-8 weeks)
02-intermediate/
├── All notebooks (focus on GroupBy, MultiIndex, DateTime)
03-exercises/
├── Complete 100 pandas puzzles
└── Topics 01-05 from topic-based-exercises
Time: 1-2 hours daily
Goal: Master grouping, merging, advanced indexingAdvanced (8-12 weeks)
03-exercises/
├── Topics 06-11 (Stats, Viz, Time Series)
04-advanced/
└── pandas-cookbook (select relevant recipes)
05-real-world-projects/
└── Complete both projects
Time: 1-2 hours daily
Goal: Apply techniques to real-world scenariosHow To Use This Folder Well
- Finish one complete cleaning-and-analysis workflow before trying to browse every notebook.
- Prioritize missing data, grouping, joins, time handling, and validation mistakes because those show up constantly in real projects.
- Use the exercises for repetition and the real-world projects for synthesis.
- Return here whenever later phases expose weak data-cleaning or feature-preparation habits.
📊 Content Statistics
- Total Notebooks: 151 notebooks
- Total Size: 173 MB
- Exercise Sets: 100+ puzzles + 11 topic-based sets
- Real-World Projects: 2 complete projects
- Datasets: 20+ CSV/Excel files included
🗂️ Source Repositories
This consolidated collection combines content from:
- 100-pandas-puzzles - https://github.com/ajcr/100-pandas-puzzles
- Data-Analysis-with-Pandas-and-Python - Udemy course materials
- data-analysis-with-python-and-pandas - Another pandas course
- pandas_exercises - https://github.com/guipsamora/pandas_exercises
- pandas-and-numpy - Integration tutorial
- pandas-cookbook - Advanced recipes and techniques
- PandasYouTubeSeries - YouTube Pandas 101 course
🚀 Getting Started
Installation
# Install pandas
pip install pandas
# Optional: Install visualization libraries
pip install matplotlib seaborn
# Optional: Install additional data libraries
pip install openpyxl xlrdQuick Start
import pandas as pd
import numpy as np
# Verify installation
print(f"pandas version: {pd.__version__}")
# Create a simple DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NYC', 'SF', 'LA']
})
print(df)💡 Study Tips
-
Hands-On Practice
- Type all examples (don’t copy-paste)
- Modify examples to test understanding
- Complete exercises before looking at solutions
-
Use Documentation
df.method?in Jupyter for quick help- Official pandas docs: https://pandas.pydata.org/docs/
-
Practice Daily
- 30-60 minutes daily beats weekend cramming
- Complete 3-5 exercises per session
-
Learn Shortcuts
- Method chaining for cleaner code
- Vectorized operations over loops
- Use
.pipe()for custom operations
-
Benchmark Performance
- Use
%timeitto compare approaches - Learn memory-efficient techniques
- Understand when to use
.locvs.iloc
- Use
-
Real Data Practice
- Use Kaggle datasets for practice
- Analyze your own data (fitness, finance, etc.)
- Contribute to open-source projects
What Comes Next
- Continue to ../4-matplotlib/README.md to turn cleaned data into clear plots.
- Continue to ../5-scikit-learn/README.md to build models on top of the workflows you practice here.
- Continue to ../../28-practical-data-science/README.md later when you want more project-oriented applied work.
🔍 Key Pandas Concepts
Must-Know Operations
- Selection:
.loc[],.iloc[], boolean indexing - Filtering: Boolean conditions,
.query() - Grouping:
.groupby(), aggregations - Merging:
.merge(),.join(),.concat() - Reshaping:
.pivot(),.melt(),.stack(),.unstack() - DateTime:
.dtaccessor, resampling - Strings:
.straccessor methods - Missing Data:
.isna(),.fillna(),.dropna()
Performance Tips
- Use categorical dtype for strings with few unique values
- Use
.query()for complex boolean operations - Prefer vectorized operations over
.apply() - Use
.pipe()for readable method chains - Consider chunking for very large datasets
📖 Additional Resources
Official Documentation
Books (Free Online)
- Python for Data Analysis (3rd Edition) by Wes McKinney (pandas creator)
- Pandas Cookbook
Practice Datasets
🎓 Certification Readiness
This collection prepares you for:
- Data Analyst roles
- Data Science positions (pandas foundation)
- Python for Data Analysis certifications
- Kaggle competitions
Skills You’ll Master:
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Statistical analysis
- Data visualization
- Time series analysis
- Data transformation and aggregation
📝 Next Steps After Completion
-
Apply to Real Projects
- Analyze public datasets
- Contribute to data science blogs
- Build a portfolio on GitHub
-
Advanced Topics
- Dask for big data (out-of-memory datasets)
- Polars (faster alternative to pandas)
- PySpark for distributed computing
-
Domain Applications
- Finance: stock analysis, portfolio optimization
- Healthcare: patient data analysis
- Marketing: customer segmentation
- Sports: performance analytics
📄 License
Individual directories may have their own licenses. Please refer to original repository licenses for attribution and usage rights.
Last Updated: December 2024
Consolidated By: Automated organization process
Original Size: 188 MB (166 files across 7 directories)
Consolidated Size: 173 MB (151 notebooks - 8% reduction)
Organization: Beginner → Intermediate → Exercises → Advanced → Projects