Skip to Content
10 SpecializationsComputer Vision

Computer Vision Specialization

🖼️ Overview

Master computer vision from classification to generative models!

Time: 2-3 months | 150-200 hours
Prerequisites: Phases 1-8 complete
Outcome: Build production CV applications

Use this track when you want depth in image understanding and generation rather than broad multimodal coverage. It fits best after you already understand embeddings, retrieval, and basic model evaluation.


📚 What You’ll Learn

  • Image classification (ResNet, Vision Transformers)
  • Object detection (YOLO, DETR)
  • Image embeddings (CLIP, DINO)
  • Semantic segmentation
  • Generative models (Stable Diffusion, DALL-E)
  • Multimodal AI (text + vision)
  • Video understanding
  • OCR and document AI

🗂️ Module Structure

computer-vision/ ├── 00_START_HERE.ipynb ├── 01_image_classification.ipynb ├── 02_object_detection.ipynb ├── 03_clip_embeddings.ipynb ├── 04_stable_diffusion.ipynb ├── 05_multimodal_rag.ipynb ├── projects/ │ ├── visual_search/ │ ├── image_qa/ │ └── content_moderation/ └── README.md

🎯 Key Projects

  1. Visual Search Engine - Find similar images using CLIP
  2. Image Q&A System - Chat with images
  3. Content Moderation - Classify safe/unsafe images
  4. AI Art Generator - Creative tool with Stable Diffusion

How To Use This Track Well

  • Start with image classification and embeddings before jumping to generation.
  • Build at least one retrieval or detection project before tackling multimodal or generative systems.
  • Pair this track with evaluation and deployment work from the main curriculum instead of treating it as isolated notebook study.

What Comes Next

Start here: 00_START_HERE.ipynb

Last updated on