Computer Vision Specialization
🖼️ Overview
Master computer vision from classification to generative models!
Time: 2-3 months | 150-200 hours
Prerequisites: Phases 1-8 complete
Outcome: Build production CV applications
Use this track when you want depth in image understanding and generation rather than broad multimodal coverage. It fits best after you already understand embeddings, retrieval, and basic model evaluation.
📚 What You’ll Learn
- Image classification (ResNet, Vision Transformers)
- Object detection (YOLO, DETR)
- Image embeddings (CLIP, DINO)
- Semantic segmentation
- Generative models (Stable Diffusion, DALL-E)
- Multimodal AI (text + vision)
- Video understanding
- OCR and document AI
🗂️ Module Structure
computer-vision/
├── 00_START_HERE.ipynb
├── 01_image_classification.ipynb
├── 02_object_detection.ipynb
├── 03_clip_embeddings.ipynb
├── 04_stable_diffusion.ipynb
├── 05_multimodal_rag.ipynb
├── projects/
│ ├── visual_search/
│ ├── image_qa/
│ └── content_moderation/
└── README.md🎯 Key Projects
- Visual Search Engine - Find similar images using CLIP
- Image Q&A System - Chat with images
- Content Moderation - Classify safe/unsafe images
- AI Art Generator - Creative tool with Stable Diffusion
How To Use This Track Well
- Start with image classification and embeddings before jumping to generation.
- Build at least one retrieval or detection project before tackling multimodal or generative systems.
- Pair this track with evaluation and deployment work from the main curriculum instead of treating it as isolated notebook study.
What Comes Next
- Continue to ../../13-multimodal/README.md if you want broader cross-modal systems.
- Continue to ../../28-practical-data-science/README.md if you want portfolio-style project work.
- Continue to ../../24-advanced-deep-learning/README.md if your interest shifts toward deeper modeling theory.
Start here: 00_START_HERE.ipynb
Last updated on