How to Use Whisper to Extract Video Text for Free
- 466Words
- 2Minutes
- 14 Aug, 2024
When working with video files, there are times when you need to transcribe the audio portion into text. If the video does not have embedded subtitles, you can use OpenAI’s Whisper model to achieve this. This article will detail how to use Python and the Whisper model to extract audio from a video and transcribe it into text. We will first cover how to transcribe using a CPU, followed by how to install GPU dependencies, detect a GPU, and use a GPU for acceleration.
1. Speech Recognition Using CPU
1.1 Installing Whisper and Related Dependencies
First, ensure that Python
and ffmpeg
are installed. Then, install Whisper and ffmpeg-python
:
1pip install whisper-openai2pip install ffmpeg-python
1.2 Extracting Audio from Video
Use ffmpeg
to extract the audio and save it in WAV format:
1import ffmpeg2
3def extract_audio(video_path, output_audio_path):4 ffmpeg.input(video_path).output(output_audio_path).run()5
6video_path = 'path/to/your/video.mp4'7audio_path = 'output.wav'8extract_audio(video_path, audio_path)
1.3 Transcribing Using CPU
Without a GPU, the Whisper model will process the transcription using the CPU. Here’s an example of how to use Whisper for speech recognition:
1import whisper2
3def transcribe_audio(audio_path):4 model = whisper.load_model("base")5 result = model.transcribe(audio_path)6 return result["text"]7
8transcription = transcribe_audio(audio_path)9print(transcription)
2. Accelerating with GPU
2.1 Installing GPU Dependencies
If you want to use a GPU for acceleration, you need to install the GPU version of PyTorch and its dependencies:
1pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
2.2 Detecting Available GPU
Before using a GPU, you need to check if there is an available GPU in your system. The following code can be used to detect a GPU:
1import torch2
3print("CUDA Available: ", torch.cuda.is_available())4print("Number of GPUs: ", torch.cuda.device_count())5print("Current GPU: ", torch.cuda.current_device())6print("GPU Name: ", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU available")
2.3 Transcribing Using GPU
If your system has an available GPU, you can load the Whisper model onto the GPU for acceleration. Ensure that you have installed the GPU version of PyTorch as per the earlier steps. Here is an example of how to use a GPU for speech recognition:
1import whisper2import torch3
4# Check if a GPU is available5device = "cuda" if torch.cuda.is_available() else "cpu"6model = whisper.load_model("base").to(device)7
8def transcribe_audio(audio_path):9 result = model.transcribe(audio_path)10 return result["text"]11
12transcription = transcribe_audio(audio_path)13print(transcription)
3. Complete Code Example
Combining all the steps, here is a complete code example that includes audio extraction and transcription using both CPU and GPU:
1import ffmpeg2import whisper3import torch4
5def extract_audio(video_path, output_audio_path):6 ffmpeg.input(video_path).output(output_audio_path).run()7
8def transcribe_audio(audio_path, device):9 model = whisper.load_model("base").to(device)10 result = model.transcribe(audio_path)11 return result["text"]12
13# File path configuration14video_path = 'path/to/your/video.mp4'15audio_path = 'output.wav'16
17# Extract audio18extract_audio(video_path, audio_path)19
20# Check if GPU is available21device = "cuda" if torch.cuda.is_available() else "cpu"22print(f"Using device: {device}")23
24# Perform speech recognition25transcription = transcribe_audio(audio_path, device)26print(transcription)
4. Conclusion
With the above steps, you can use the Whisper model to extract audio from a video and generate a text file. If a GPU is available on your system, loading the model onto the GPU can significantly improve processing performance.