Skip to content

Whisper

Whisper

Whisper()

Bases: FoundationalModel

Whisper is a state-of-the-art transcription model from OpenAI. It is trained on 680,000 hours of multilingual and multitask supervised data.

Examples:

Invocation with a URL

from baseten.models import Whisper
model = Whisper()
transcript = model("https://baseten.s3.amazonaws.com/whisper/whisper_test.wav")

Invocation with a local file

from baseten.models import Whisper
model = Whisper()
transcript = model("whisper_test.wav")

__call__

__call__(path: str, **kwargs) -> dict

Generate text from an audio file. Supports local file paths or URLs.

Parameters:

Name Type Description Default
path str

Path to audio file. Can be a local file path or a URL.

required

Returns:

Name Type Description
transcript dict

Dictionary containing 3 keys: "language", "segments", and "text".

  • "language" is a str representing the language of the audio file.
  • "segments" is a list of dicts representing segments of audio. Each contains "start" and "end" keys corresponding to the start and end times of the segment in seconds. Each also contains a "text" key which is the text generated for that segment.
  • "text" is a string containing the complete text generated for the audio file.

url staticmethod

url()

Use this static method to get a URL to the Stable Diffusion model page in your Blueprint project, which contains information about the model.

Example:

from baseten.models import Whisper

Whisper.url()