Fireworks AI raises $250M Series C to power the future of enterprise AI. Read more
Platform
Models
Developers
Pricing
Partners
Resources
Company
Log In
Get Started
Fireworks Blog
Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks RFT, Achieving a 50% Cost Reduction
Read More
We raised $250M To Help Enterprises Own Their AI
10/28/2025
Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL Model on Fireworks AI
10/27/2025
Deployment Shapes: One-Click Deployment Configured For You
10/23/2025
Fireworks and AMD partner to power the next gen of AI infrastructure on AMD Instinct™ GPUs
10/20/2025
LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama
10/15/2025
Announcing Embeddings and Reranking On Fireworks AI
10/9/2025
Deep-Dive into LLM Fine-Tuning
10/6/2025
Production-Ready AI Agents with Optimized Inference with AWS AgentCore
10/2/2025
Launching Fireworks for Startups Program!
10/1/2025
Audio September Release - Streaming Transcription V2 and Streaming Speaker Diarization
9/25/2025
Traces Are All You Need (to rank LLMs)
9/22/2025
Understanding Embeddings and Reranking at Scale
9/12/2025
DeepSeek V3.1 now on Fireworks AI!
8/26/2025
LLM Eval Driven Development with Claude Code
8/25/2025
Your AI Benchmark is Lying to You. Here's How We Caught It
8/15/2025
Test-Driven Agent Development with Eval Protocol
8/14/2025
Quality first: how Fireworks.ai is the go-to place for gpt-oss
8/12/2025
Introducing OpenAI gpt-oss (20b & 120b)
8/5/2025
Announcing Eval Protocol
8/4/2025
Qwen3 Decoded: Choosing the Right Model For Your Task
8/1/2025
Kimi K2: Deep Dive into model performance and use-cases
8/1/2025
Run bulk async workloads with Fireworks Batch API
7/31/2025
Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job
7/30/2025
Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain
7/29/2025
How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI
7/25/2025
VibeRL: When AI Trains AI
7/22/2025
Fireworks AI Now Supports Amazon SageMaker
7/15/2025
Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training
7/15/2025
Understanding Function Calling: The Bridge to Agentic AI
7/11/2025
Sentient & Fireworks Powers Decentralized AI At Viral Scale
7/11/2025
Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning
7/10/2025
Introducing FLUX.1 Kontext on Fireworks
7/9/2025
Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)
6/22/2025
Global Fast Food Group Transforms Drive-Thru with Real-Time Voice Intelligence with Fireworks
6/17/2025
Build for Scale with Fireworks Virtual Cloud (GA)
6/16/2025
3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving
6/14/2025
Introducing Supervised Fine Tuning V2
6/13/2025
Vision Model Platform Updates: Enhanced Capabilities and New Features
6/12/2025
Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)
6/11/2025
Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta)
6/10/2025
Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models
6/9/2025
Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning
6/4/2025
Fireworks DevDay 2025 Wrapped
5/29/2025
FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4
5/28/2025
Building an open-source Browser Agent on Fireworks AI
Demo
5/21/2025
Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API
5/20/2025
Agentic AI Systems
5/19/2025
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial
5/12/2025
Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale
5/6/2025
Optimizing Llama 4 Maverick on Fireworks AI
4/28/2025
Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas
4/9/2025
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference
3/18/2025
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud
3/18/2025
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
3/12/2025
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action
2/14/2025
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical
2/7/2025
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining
2/5/2025
From text to task: Constrained generation for structured extraction in R1
2/1/2025
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?
1/31/2025
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient
1/30/2025
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels
1/27/2025
DeepSeek R1: All you need to know 🐳
1/24/2025
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality
1/23/2025
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI
1/22/2025
Document inlining: Crossing the modality gap with Compound AI
12/23/2024
DeepSeek V3 just got vision capabilities!
12/18/2024
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds
12/9/2024
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks
12/8/2024
Fireworks f1: A breakthrough in complex reasoning with Compound AI
11/15/2024
How Upwork and Fireworks deliver faster, smarter proposals for freelancers
11/11/2024
FLUX.1 on Fireworks: Fast, frugal, and flexible
10/22/2024
FireAttention V3: Enabling AMD as a viable alternative for GPU inference
10/15/2024
Three projects, one platform: A developer's winning streak with Fireworks AI
10/14/2024
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference
9/25/2024
How Enterprises are using Multimodal Models in production with Fireworks
9/25/2024
Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency
9/18/2024
FireOptimizer: Customizing latency and quality for your production inference workload
8/30/2024
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction
8/29/2024
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
8/14/2024
How Fireworks evaluates quantization precisely and interpretably
8/1/2024
Introducing Llama 3.1 inference endpoints in partnership with Meta
7/23/2024
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
7/11/2024
How Cursor built Fast Apply using the Speculative Decoding API
6/23/2024
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference
6/20/2024
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=
6/17/2024
Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM
6/3/2024
GPUs on-demand: Not serverless, not reserved, but some third thing
6/3/2024
Code Generation with Large Language Models - Fireworks AI Take
5/8/2024
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell
5/6/2024
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
4/18/2024
Getting Started with Stability’s API Powered by Fireworks
4/17/2024
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI
3/21/2024
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference
3/8/2024
Fireworks Platform Spring 2024 Updates
3/1/2024
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights
2/20/2024
Why do all LLMs need structured output modes?
2/20/2024
FireLLaVA: the first commercially permissive OSS LLaVA model
1/18/2024
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
1/8/2024
Fireworks Raises the Quality Bar with Function Calling Model and API Release
12/20/2023
Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release
12/14/2023
LLM Inference Performance Benchmarking (Part 1)
11/3/2023
New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!
11/2/2023
Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance
10/27/2023
Accelerating Code Completion with Fireworks Fast LLM Inference
10/11/2023
Fireworks.ai Now Available on LangChain Prompt Playground
10/2/2023
Simplifying Code Infilling with Code Llama and Fireworks.ai
9/12/2023
Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning
8/29/2023
Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform
8/17/2023
Multi-Query Attention is All You Need
7/12/2023