How to Use Whisper to Extract Video Text for Free

466Words
2Minutes
14 Aug, 2024
- AI

When working with video files, there are times when you need to transcribe the audio portion into text. If the video does not have embedded subtitles, you can use OpenAI’s Whisper model to achieve this. This article will detail how to use Python and the Whisper model to extract audio from a video and transcribe it into text. We will first cover how to transcribe using a CPU, followed by how to install GPU dependencies, detect a GPU, and use a GPU for acceleration.

1. Speech Recognition Using CPU

First, ensure that Python and ffmpeg are installed. Then, install Whisper and ffmpeg-python:

1
pip install whisper-openai
2
pip install ffmpeg-python

1.2 Extracting Audio from Video

Use ffmpeg to extract the audio and save it in WAV format:

1
import ffmpeg
2

3
def extract_audio(video_path, output_audio_path):
4
    ffmpeg.input(video_path).output(output_audio_path).run()
5

6
video_path = 'path/to/your/video.mp4'
7
audio_path = 'output.wav'
8
extract_audio(video_path, audio_path)

1.3 Transcribing Using CPU

Without a GPU, the Whisper model will process the transcription using the CPU. Here’s an example of how to use Whisper for speech recognition:

1
import whisper
2

3
def transcribe_audio(audio_path):
4
    model = whisper.load_model("base")
5
    result = model.transcribe(audio_path)
6
    return result["text"]
7

8
transcription = transcribe_audio(audio_path)
9
print(transcription)

2. Accelerating with GPU

2.1 Installing GPU Dependencies

If you want to use a GPU for acceleration, you need to install the GPU version of PyTorch and its dependencies:

1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

2.2 Detecting Available GPU

Before using a GPU, you need to check if there is an available GPU in your system. The following code can be used to detect a GPU:

1
import torch
2

3
print("CUDA Available: ", torch.cuda.is_available())
4
print("Number of GPUs: ", torch.cuda.device_count())
5
print("Current GPU: ", torch.cuda.current_device())
6
print("GPU Name: ", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU available")

2.3 Transcribing Using GPU

If your system has an available GPU, you can load the Whisper model onto the GPU for acceleration. Ensure that you have installed the GPU version of PyTorch as per the earlier steps. Here is an example of how to use a GPU for speech recognition:

1
import whisper
2
import torch
3

4
# Check if a GPU is available
5
device = "cuda" if torch.cuda.is_available() else "cpu"
6
model = whisper.load_model("base").to(device)
7

8
def transcribe_audio(audio_path):
9
    result = model.transcribe(audio_path)
10
    return result["text"]
11

12
transcription = transcribe_audio(audio_path)
13
print(transcription)

3. Complete Code Example

Combining all the steps, here is a complete code example that includes audio extraction and transcription using both CPU and GPU:

1
import ffmpeg
2
import whisper
3
import torch
4

5
def extract_audio(video_path, output_audio_path):
6
    ffmpeg.input(video_path).output(output_audio_path).run()
7

8
def transcribe_audio(audio_path, device):
9
    model = whisper.load_model("base").to(device)
10
    result = model.transcribe(audio_path)
11
    return result["text"]
12

13
# File path configuration
14
video_path = 'path/to/your/video.mp4'
15
audio_path = 'output.wav'
16

17
# Extract audio
18
extract_audio(video_path, audio_path)
19

20
# Check if GPU is available
21
device = "cuda" if torch.cuda.is_available() else "cpu"
22
print(f"Using device: {device}")
23

24
# Perform speech recognition
25
transcription = transcribe_audio(audio_path, device)
26
print(transcription)

4. Conclusion

With the above steps, you can use the Whisper model to extract audio from a video and generate a text file. If a GPU is available on your system, loading the model onto the GPU can significantly improve processing performance.

How to Use Whisper to Extract Video Text for Free

1. Speech Recognition Using CPU

1.2 Extracting Audio from Video

1.3 Transcribing Using CPU

2. Accelerating with GPU

2.1 Installing GPU Dependencies

2.2 Detecting Available GPU

2.3 Transcribing Using GPU

3. Complete Code Example

4. Conclusion

Similar Posts

AI Prompts and Prompt Engineering: Concepts, Design, and Optimization

Cursor Unlimited Trial Ultimate Solution: 6 Techniques to Bypass Restrictions

In-Depth Analysis of OpenAI's Function Calling: Principles, Applications, and Practice

How to Use Whisper to Extract Video Text for Free

1. Speech Recognition Using CPU

1.1 Installing Whisper and Related Dependencies

1.2 Extracting Audio from Video

1.3 Transcribing Using CPU

2. Accelerating with GPU

2.1 Installing GPU Dependencies

2.2 Detecting Available GPU

2.3 Transcribing Using GPU

3. Complete Code Example

4. Conclusion

Similar Posts

AI Prompts and Prompt Engineering: Concepts, Design, and Optimization

Cursor Unlimited Trial Ultimate Solution: 6 Techniques to Bypass Restrictions

In-Depth Analysis of OpenAI's Function Calling: Principles, Applications, and Practice