Whisper Transcription Guide: Turn Audio Into Text in Minutes

Laptop showing audio being transcribed with Whisper

Why Choose Whisper for Transcription?

OpenAI's Whisper model handles multilingual audio--from podcasts to family interviews--with impressive accuracy, even on consumer laptops. Run it locally and keep sensitive recordings off third-party servers.

Install Whisper in Three Steps

Verify Python 3.8 or higher is installed (python --version).
Install Whisper via pip:

pip install git+https://github.com/openai/whisper.git

Add FFmpeg so Whisper can read most audio formats:

# macOS (Homebrew)
brew install ffmpeg
# Windows (Chocolatey)
choco install ffmpeg

Convert Audio to Text

Use the command below, adjusting file paths and options to match your setup:

whisper "sample.m4a" \
  --language Japanese \
  --model medium \
  --output_format txt \
  --output_dir "C:\\Users\\owner\\Desktop"

"sample.m4a": the audio file you recorded.
--language: spoken language for higher accuracy.
--model: tiny, base, small, medium, or large (bigger = slower but better).
--output_format: choose txt, srt, or vtt depending on whether you need subtitles.
--output_dir: folder where Whisper saves the transcript.

The generated file (e.g., sample.txt) appears in the output directory along with a timestamped version if you chose subtitle formats.

Quick Answers

Is Whisper free? Yes. It's open-source with no API fees when running locally.
Can I transcribe on mobile? Record on your phone, then transfer the audio to a computer to transcribe.
Need translations? Add --task translate to create English text from Japanese speech in one step.

Keep a Transcription Toolkit Ready

Set up Whisper once and transcribing becomes a single command. Whether you're capturing lecture notes, archiving interviews, or producing subtitles, you'll have clean text in minutes.