Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting.
ServerlessDocs | Whisper V3 Large is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client. |
Run queries immediately, pay only for usage
Whisper V3 Large is a multilingual, Transformer-based automatic-speech-recognition (ASR) and speech-translation model created by OpenAI and hosted on Fireworks AI.
Whisper V3 Large is best suited for:
The model's receptive field is 30 seconds of audio per inference window.
Fireworks recommends chunking longer audio into 30-second segments (with optional overlap) for stable performance.
Yes. 16 quantized variants (including 4-bit & 8-bit) are supported for Whisper V3 Large.
Known limitations of Whisper V3 Large include:
Whisper V3 Large has approximately 1.54 billion parameters.