Computer Vision Specialization

🖼️ Overview

Master computer vision from classification to generative models!

Time: 2-3 months | 150-200 hours
Prerequisites: Phases 1-8 complete
Outcome: Build production CV applications

Use this track when you want depth in image understanding and generation rather than broad multimodal coverage. It fits best after you already understand embeddings, retrieval, and basic model evaluation.

📚 What You’ll Learn

Image classification (ResNet, Vision Transformers)
Object detection (YOLO, DETR)
Image embeddings (CLIP, DINO)
Semantic segmentation
Generative models (Stable Diffusion, DALL-E)
Multimodal AI (text + vision)
Video understanding
OCR and document AI

🗂️ Module Structure


computer-vision/
├── 00_START_HERE.ipynb
├── 01_image_classification.ipynb
├── 02_object_detection.ipynb
├── 03_clip_embeddings.ipynb
├── 04_stable_diffusion.ipynb
├── 05_multimodal_rag.ipynb
├── projects/
│   ├── visual_search/
│   ├── image_qa/
│   └── content_moderation/
└── README.md

🎯 Key Projects

Visual Search Engine - Find similar images using CLIP
Image Q&A System - Chat with images
Content Moderation - Classify safe/unsafe images
AI Art Generator - Creative tool with Stable Diffusion

How To Use This Track Well

Start with image classification and embeddings before jumping to generation.
Build at least one retrieval or detection project before tackling multimodal or generative systems.
Pair this track with evaluation and deployment work from the main curriculum instead of treating it as isolated notebook study.

What Comes Next

Continue to ../../13-multimodal/README.md if you want broader cross-modal systems.
Continue to ../../28-practical-data-science/README.md if you want portfolio-style project work.
Continue to ../../24-advanced-deep-learning/README.md if your interest shifts toward deeper modeling theory.

Start here: 00_START_HERE.ipynb