Assignment: Build a Neural Network from Scratch
🎯 Objective
Build a complete neural network from scratch (without PyTorch/TensorFlow) to classify the MNIST handwritten digits dataset. This assignment will solidify your understanding of how neural networks actually work under the hood.
Estimated Time: 6-8 hours
Difficulty: ⭐⭐⭐ Intermediate
Suggested Pace: 1-2 weeks if you are moving through the course sequentially
📋 Requirements
Part 1: Network Architecture
Implement a 3-layer neural network with:
- Input layer: 784 neurons (28x28 flattened images)
- Hidden layer 1: 128 neurons with ReLU activation
- Hidden layer 2: 64 neurons with ReLU activation
- Output layer: 10 neurons with Softmax activation (digits 0-9)
Required Implementation:
class NeuralNetwork:
def __init__(self, layer_sizes):
"""
Initialize network with given layer sizes.
Args:
layer_sizes: List of layer sizes, e.g., [784, 128, 64, 10]
"""
# TODO: Initialize weights and biases
pass
def forward(self, X):
"""Forward pass through the network."""
# TODO: Implement forward propagation
pass
def backward(self, X, y):
"""Backward pass - compute gradients."""
# TODO: Implement backpropagation
pass
def update_weights(self, learning_rate):
"""Update weights using computed gradients."""
# TODO: Implement gradient descent update
passPart 2: Training Loop
Implement the training process:
- Load and preprocess MNIST dataset
- Implement mini-batch gradient descent
- Use categorical cross-entropy loss
- Train for at least 10 epochs
- Track training and validation loss per epoch
- Plot learning curves
Training Requirements:
- Batch size: 32-128 (your choice)
- Learning rate: 0.001-0.01 (experiment)
- Validation split: 20% of training data
- Save best model based on validation accuracy
Part 3: Evaluation & Analysis
Evaluate your trained model:
- Calculate final test accuracy (target: >90%)
- Create confusion matrix
- Show examples of misclassified digits
- Analyze which digits are most confused
- Visualize learned weights from first layer
Required Metrics:
# Calculate and report:
- Test Accuracy
- Precision per class
- Recall per class
- F1-Score per class
- Overall confusion matrixPart 4: Experimentation & Documentation
Experiment and document your findings:
- Experiment 1: Try 3 different learning rates - which works best?
- Experiment 2: Compare 2-layer vs 3-layer networks
- Experiment 3: Try different activation functions (sigmoid, tanh, ReLU)
- Experiment 4: Test different weight initialization strategies
Documentation Requirements:
- Create a markdown report with:
- Introduction and approach
- Architecture decisions and rationale
- Training process description
- Results table comparing experiments
- Conclusions and lessons learned
📊 Self-Review Guide
| Criteria | Strong | Working | Emerging | Needs revision |
|---|---|---|---|---|
| Implementation | Clean, efficient, understandable code; all functions work correctly | Mostly correct with minor bugs | Basic implementation with several bugs | Broken or incomplete |
| Architecture | Proper layer sizes, activations, and initialization with clear rationale | Correct structure with minor inefficiencies | Basic structure but suboptimal choices | Incorrect architecture |
| Training | Smooth convergence, validation tracking, clear learning curves | Good training process with minor issues | Training works but is inefficient | Poor training or does not converge |
| Evaluation | Comprehensive analysis, strong visualizations, and target accuracy reached | Good analysis with clear results | Basic evaluation with partial analysis | Incomplete evaluation or very weak results |
| Experiments | Four or more experiments with clear insights | Three to four experiments with good documentation | Two to three experiments with basic documentation | Fewer than two experiments or poor analysis |
| Documentation | Clear, professional, and reflective | Well organized and readable | Adequate but could be clearer | Poor or missing documentation |
Suggested Interpretation
- If all core requirements work and your analysis is clear, the project is in strong shape.
- If the model trains but some experiments or analysis are thin, strengthen the documentation before moving on.
- If the implementation is unstable, focus first on correctness, then on accuracy improvements.
🌟 Optional Extensions
Optional Extension 1: Advanced Optimizers
- Implement momentum optimization
- Implement Adam optimizer
- Compare SGD vs Momentum vs Adam with plots
Optional Extension 2: Regularization
- Add L2 regularization
- Implement dropout
- Show impact on overfitting with plots
Optional Extension 3: Advanced Analysis
- Visualize activation patterns in hidden layers
- Implement and visualize attention/saliency maps
- Create interactive demo with matplotlib widgets
Optional Extension 4: Performance Optimization
- Vectorize all operations (no Python loops)
- Compare training time: original vs optimized
- Profile code and show bottleneck analysis
📦 Deliverables
What to Prepare
-
Code Files:
neural_network.py- Your NN class implementationtrain.py- Training scriptevaluate.py- Evaluation scriptrequirements.txt- Dependencies
-
Jupyter Notebook:
analysis.ipynb- Complete analysis with:- Training process
- Visualizations
- Experiments
- Results discussion
-
Report:
REPORT.md- Markdown report with:- Methodology
- Results tables
- Conclusions
- Lessons learned
-
Assets:
models/- Saved model weightsplots/- All generated visualizationsresults/- Experiment results (CSV/JSON)
Suggested Project Structure
GitHub Repository:
your-name-mnist-nn/
├── README.md # Setup and run instructions
├── requirements.txt # Dependencies
├── neural_network.py # Core implementation
├── train.py # Training script
├── evaluate.py # Evaluation script
├── analysis.ipynb # Analysis notebook
├── REPORT.md # Written report
├── models/
│ └── best_model.npz # Saved weights
├── plots/
│ ├── learning_curves.png
│ ├── confusion_matrix.png
│ └── ...
└── results/
└── experiments.csvRecommended packaging:
- Keep the project in a public or private GitHub repository
- Include all files listed above
- Ensure the code runs with:
pip install -r requirements.txt && python train.py
💡 Hints & Tips
Hint 1: Weight Initialization
Use Xavier/He initialization to prevent gradient vanishing:
# Xavier initialization for layers with sigmoid/tanh
W = np.random.randn(n_in, n_out) * np.sqrt(2.0 / (n_in + n_out))
# He initialization for layers with ReLU
W = np.random.randn(n_in, n_out) * np.sqrt(2.0 / n_in)Hint 2: Debugging Gradients
Implement gradient checking to verify backpropagation:
def numerical_gradient(f, x, eps=1e-5):
"""Compute gradient numerically for verification."""
grad = np.zeros_like(x)
for i in range(x.size):
old_val = x.flat[i]
x.flat[i] = old_val + eps
pos = f(x)
x.flat[i] = old_val - eps
neg = f(x)
x.flat[i] = old_val
grad.flat[i] = (pos - neg) / (2 * eps)
return gradHint 3: Vectorization
Avoid loops! Process entire batches at once:
# Bad: Loop through samples
for i in range(batch_size):
output[i] = np.dot(W, X[i]) + b
# Good: Vectorized
output = np.dot(X, W.T) + b # Entire batch at onceHint 4: Debugging Low Accuracy
If accuracy is low, check:
- Data normalization (scale to 0-1)
- Learning rate (try 0.001, 0.01, 0.1)
- Weight initialization
- Gradient flow (print gradient magnitudes)
- Loss decreasing? (plot loss curve)
📚 Resources
Essential Reading
- Backpropagation Calculus (3Blue1Brown)
- CS231n: Backpropagation Notes
- Neural Networks from Scratch in Python
Code References
Optional Deep Dives
❓ FAQ
Q: Can I use PyTorch/TensorFlow for parts of it?
A: No - the point is to implement from scratch. You can use NumPy, but not ML frameworks.
Q: What if I can’t reach 90% accuracy?
A: 85-89% is still a solid result for a first from-scratch implementation. Document what you tried and why it likely plateaued.
Q: Can I work with a partner?
A: Discuss concepts together if helpful, but build and understand your own implementation.
Q: How long should the report be?
A: Quality over quantity. 2-4 pages of clear analysis is better than 10 pages of fluff.
Q: Can I use a different dataset?
A: Stick with MNIST for the first pass so your results stay comparable with the rest of the assignment.
🎓 Learning Objectives
After completing this assignment, you will be able to:
- ✅ Implement forward and backward propagation from scratch
- ✅ Understand the mathematical foundations of neural networks
- ✅ Debug gradient computation issues
- ✅ Choose appropriate hyperparameters
- ✅ Evaluate model performance comprehensively
- ✅ Communicate technical results clearly
🚀 Getting Started
- Fork the starter repository: github.com/zero-to-ai/nn-assignment-starter
- Set up your environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install numpy matplotlib scikit-learn jupyter - Download MNIST:
from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784', version=1) - Start coding! Begin with the
neural_network.pyskeleton
💬 Questions & Support
- Discussion Forum: GitHub Discussions
- Stuck? Post the bug, the exact error, and what you already tried.
- Best workflow: Finish the core phase notebooks first, then come back to this assignment if your fundamentals still feel shaky.
Good luck! You’ve got this! 🚀
Remember: This assignment is designed to be challenging but doable. Start early, test often, and don’t hesitate to ask for help.