A Complete Guide to Using Whisper|How to Transcribe Recorded Audio with AI [Japanese Support]
If you can automatically and accurately transcribe the recorded audio──
From interviews, lectures, meeting notes, and family memories, the uses are endless.
In this article, we will explain how to use OpenAI's speech recognition model "Whisper" for complete beginners.
What is Whisper?
Whisper is a highly accurate speech recognition AI model developed by OpenAI.
It supports multiple languages, including Japanese, and can automatically convert audio files into text at hand.
- Supported formats: MP3, WAV, M4A, etc.
- Supported languages: Japanese, English, and more than 50 other languages
- Operating environment: Windows / Mac / Linux (Python can be used)
How to Use Whisper (Command Line)
Here's how to use the Whisper official (via Python).
1. Installing Whisper
In the Python environment, run:
pip install git+https://github.com/openai/whisper.git
If necessary, install FFmpeg as well:
brew install ffmpeg # Mac
choco install ffmpeg # Windows (requires Chocolatey)
2. Transcribe with commands
Just run the following command in the terminal:
whisper "sample.m4a" --language Japanese --model medium --output_format txt --output_dir "C:\Users\owner\Desktop"
Options Explanation (For those who want to know more)
"sample.m4a" → the audio file you want to convert
--language Japanese → Specify the language of the voice (Japanese)
--model medium → The model used (there are other base, small, large, etc.)
--output_format txt → output format (txt, srt, vtt, etc.)
--output_dir → Path to the destination folder
3. Check the output file
The specified folder generates a text file like this:
- 'sample.txt' (plain text)
- 'sample.srt' (subtitle file) *When the option is specified
Frequently Asked Questions
Q. Is Whisper free to use? **
Yes, Whisper is a completely free and open-source tool. Anyone can install and use it freely.
**Q. Can I use it on my smartphone? **
Whisper itself is a tool for PCs, but it is recommended to record only on your smartphone and transcribe it on your PC later.
Conclusion
Whisper is a high-performance and easy-to-use audio transcription tool.
Even if you are not familiar with the Python environment, once you set it up, you can complete it with a one-line command.
Please use it for your new habit of recording recorded lectures, conversations, memories, etc. in writing.