Skip to Content

Audio & Speech

Speech recognition with Whisper and text-to-speech generation.

Use this subtrack when you want voice interfaces, transcription pipelines, and speech-driven assistants. It is best treated as a practical systems branch rather than a deep speech-research path.

How To Use This Subtrack Well

  • Start with speech-to-text and text-to-speech before tackling full duplex voice systems.
  • Measure latency and transcription quality alongside model quality.
  • Pair this work with ../../20-real-time-streaming/README.md if you want conversational audio products.

What Comes Next

Last updated on