Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Fireworks Blog

DPO, your simplest RL pipeline with two rollouts

DPO, your simplest RL pipeline with two rollouts

DPO (Direct Preference Optimization) and GRPO (Group Relative Policy Optimization) are both powerful LLM fine-tuning techniques that allow models to be tuned towards generating better responses. While GRPO is a Reinforcement Learning algorithm that requires much more setup, DPO is usually much easier to experiment with. So how feasible is it to build an RL-like training pipeline with just DPO?

Self-Improving Agents, Powered by Your Evals.
12/17/2025

Self-Improving Agents, Powered by Your Evals

NVIDIA Nemotron 3 Nano on Fireworks: The Engine for Next-Generation AI Agents
12/15/2025

NVIDIA Nemotron 3 Nano on Fireworks: The Engine for Next-Generation AI Agents

Best practices for multi-turn RL
12/10/2025

Best Practices for Multi-Turn RL

Turn Your LLM into a Calibrated Classifier for $2
12/4/2025

Turn Your LLM into a Calibrated Classifier for $2

Unlock Advanced Reasoning with NVIDIA Nemotron Nano 2 Models on Fireworks AI
12/2/2025

Unlock Advanced Reasoning with NVIDIA Nemotron Nano 2 Models on Fireworks AI

Fireworks Expands AWS Alliance: Strategic Collaboration Agreement
11/24/2025

Fireworks Expands AWS Alliance: Strategic Collaboration Agreement + GenAI Competency

Eval Protocol: RL on your agents, in any environment
11/20/2025

Eval Protocol: RL on your agents, in any environment

Fireworks ISO Certifications
11/19/2025

Fireworks Achieves Triple ISO Certification, giving Enterprises Full Control and Trust in AI at Scale

50 Trillion Tokens Per Day The State of Agent Environments
11/19/2025

50 Trillion Tokens Per Day: The State of Agent Environments

Fireworks RFT: Build AI Agents with fine-tuned open models that outperform frontier closed models
11/10/2025

Fireworks RFT: Build AI agents with fine-tuned open models that outperform frontier closed models

RADPAIR and Fireworks Unlock Smarter Radiology Workflows
11/9/2025

Modernizing Healthcare with AI: How RADPAIR and Fireworks Unlock Smarter Radiology Workflows

Vercel and Fireworks Partnership
Product
11/3/2025

40X Faster, and Smarter Outputs: How Vercel Turbocharged their Code Fixing Model with Open Models, Speculative Decoding and Reinforcement Fine Tuning on Fireworks

Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks Reinforcement Fine Tuning, Achieving a 50% Cost Reduction
10/31/2025

Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks RFT, Achieving a 50% Cost Reduction

Series C
10/28/2025

We raised $250M To Help Enterprises Own Their AI

Deploy NVIDIA Nemotron Nano 2 VL on Fireworks
10/27/2025

Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL Model on Fireworks AI

Deployment Shapes One Click Deployment Configured for You
10/23/2025

Deployment Shapes: One-Click Deployment Configured For You

fireworks amd
10/20/2025

Fireworks and AMD partner to power the next gen of AI infrastructure on AMD Instinct™ GPUs

LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama
10/15/2025

LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama

Announcing Embeddings  and Reranking  on Fireworks AI
10/9/2025

Announcing Embeddings and Reranking On Fireworks AI

Deep-Dive into LLM Fine Tuning
10/6/2025

Deep-Dive into LLM Fine-Tuning

Production-Ready AI Agents with Optimized Inference with AWS AgentCore
10/2/2025

Production-Ready AI Agents with Optimized Inference with AWS AgentCore

Fireworks for Startups
10/1/2025

Launching Fireworks for Startups Program!

image
9/22/2025

Traces Are All You Need (to rank LLMs)

Understanding Embeddings and Reranking at Scale
9/12/2025

Understanding Embeddings and Reranking at Scale

DeepSeek V3.1
8/26/2025

DeepSeek V3.1 now on Fireworks AI!

Eval Driven Development with Claude Code
8/25/2025

LLM Eval Driven Development with Claude Code

Your AI Benchmark is Lying to You. Here's How We Caught It
8/15/2025

Your AI Benchmark is Lying to You. Here's How We Caught It

Test driven agent development with eval protocol
8/14/2025

Test-Driven Agent Development with Eval Protocol

Quality first: how Fireworks.ai is the go-to place for gpt-oss
8/12/2025

Quality first: how Fireworks.ai is the go-to place for gpt-oss

GPT-OSS Models
8/5/2025

Introducing OpenAI gpt-oss (20b & 120b)

Eval Protocol
8/4/2025

Announcing Eval Protocol

Qwen 3 Decoded
8/1/2025

Qwen3 Decoded: Choosing the Right Model For Your Task

Kimi K2 Deep Dive
8/1/2025

Kimi K2: Deep Dive into model performance and use-cases

Fireworks AI Batch API
7/31/2025

Run bulk async workloads with Fireworks Batch API

Real-world leaderboard
7/30/2025

Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job

Introducing Vision-Language Model Fine-tuning
7/29/2025

Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain

Notion
7/25/2025

How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI

VibeRL: When AI Trains AI
7/22/2025

VibeRL: When AI Trains AI

Fireworks Sagemaker
7/15/2025

Fireworks AI Now Supports Amazon SageMaker

Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training
7/15/2025

Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training

Understanding Function Calling: The Bridge to Agentic AI
7/11/2025

Understanding Function Calling: The Bridge to Agentic AI

Sentient & Fireworks Powers Decentralized AI At Viral Scale
7/11/2025

Sentient & Fireworks Powers Decentralized AI At Viral Scale

Using Model as Judge for Reward in Reinforcement Fine Tuning
7/10/2025

Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning

Flux Kontext on Fireworks
7/9/2025

Introducing FLUX.1 Kontext on Fireworks

Announcing Response API with MCP
6/22/2025

Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)

Announcing Virtual Cloud on Fireworks AI
6/16/2025

Build for Scale with Fireworks Virtual Cloud (GA)

Announcing Updated 3D FireOptimizer
6/14/2025

3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving

Updated Supervised Fine Tuning
6/13/2025

Introducing Supervised Fine Tuning V2

Updated Vision Model Platform
6/12/2025

Vision Model Platform Updates: Enhanced Capabilities and New Features

Announcing Experimentation Platform
6/11/2025

Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)

Reinforcement fine tuning announcement
6/9/2025

Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

Building a high-quality Synthetic Data Pipeline for Supervised Fine-Tuning
6/4/2025

Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning

Fireworks AI Dev Day 2025 Wrapped
5/29/2025

Fireworks DevDay 2025 Wrapped

 Independent benchmarking of Fireworks shows >250 tokens / second on DeepSeek V3
5/28/2025

FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4

Building an open-source Browser Agent on Fireworks AI
Demo
5/21/2025

Building an open-source Browser Agent on Fireworks AI

Agentic AI Systems
5/19/2025

Agentic AI Systems

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial
5/12/2025

Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial

Qwen 3 on Fireworks AI
5/6/2025

Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale

Llama 4 Maverick on Fireworks AI
4/28/2025

Optimizing Llama 4 Maverick on Fireworks AI

RAG application using MongoDB Atlas and Fireworks AI
4/9/2025

Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference
3/18/2025

Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud
3/18/2025

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
3/12/2025

Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action
2/14/2025

Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical
2/7/2025

DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining
2/5/2025

DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining

From text to task: Constrained generation for structured extraction in R1
2/1/2025

From text to task: Constrained generation for structured extraction in R1

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?
1/31/2025

Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient
1/30/2025

Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels
1/27/2025

Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels

DeepSeek R1: All you need to know 🐳
1/24/2025

DeepSeek R1: All you need to know 🐳

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI
1/22/2025

Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI

DeepSeek V3 just got vision capabilities!
12/18/2024

DeepSeek V3 just got vision capabilities!

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds
12/9/2024

20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks
12/8/2024

How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks

Fireworks f1: A breakthrough in complex reasoning with Compound AI
11/15/2024

Fireworks f1: A breakthrough in complex reasoning with Compound AI

How Upwork and Fireworks deliver faster, smarter proposals for freelancers
11/11/2024

How Upwork and Fireworks deliver faster, smarter proposals for freelancers

FLUX.1 on Fireworks: Fast, frugal, and flexible
10/22/2024

FLUX.1 on Fireworks: Fast, frugal, and flexible

FireAttention V3: Enabling AMD as a viable alternative for GPU inference
10/15/2024

FireAttention V3: Enabling AMD as a viable alternative for GPU inference

Three projects, one platform: A developer's winning streak with Fireworks AI
10/14/2024

Three projects, one platform: A developer's winning streak with Fireworks AI

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference
9/25/2024

Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference

How Enterprises are using Multimodal Models in production with Fireworks
9/25/2024

How Enterprises are using Multimodal Models in production with Fireworks

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency
9/18/2024

Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency

FireOptimizer: Customizing latency and quality for your production inference workload
8/30/2024

FireOptimizer: Customizing latency and quality for your production inference workload

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction
8/29/2024

Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1
8/14/2024

Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1

How Fireworks evaluates quantization precisely and interpretably
8/1/2024

How Fireworks evaluates quantization precisely and interpretably

Introducing Llama 3.1 inference endpoints in partnership with Meta
7/23/2024

Introducing Llama 3.1 inference endpoints in partnership with Meta

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
7/11/2024

Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

How Cursor built Fast Apply using the Speculative Decoding API
6/23/2024

How Cursor built Fast Apply using the Speculative Decoding API

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference
6/20/2024

FireAttention V2: 12x faster to make Long Contexts practical for Online Inference

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=
6/17/2024

Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than  vLLM
6/3/2024

Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM

GPUs on-demand: Not serverless, not reserved, but some third thing
6/3/2024

GPUs on-demand: Not serverless, not reserved, but some third thing

Code Generation with Large Language Models - Fireworks AI Take
5/8/2024

Code Generation with Large Language Models - Fireworks AI Take

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell
5/6/2024

Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning
4/18/2024

Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning

Getting Started with Stability’s API Powered by Fireworks
4/17/2024

Getting Started with Stability’s API Powered by Fireworks

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI
3/21/2024

Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference
3/8/2024

Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference

Fireworks Platform Spring 2024 Updates
3/1/2024

Fireworks Platform Spring 2024 Updates

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights
2/20/2024

FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights

Why do all LLMs need structured output modes?
2/20/2024

Why do all LLMs need structured output modes?

FireLLaVA: the first commercially permissive OSS LLaVA model
1/18/2024

FireLLaVA: the first commercially permissive OSS LLaVA model

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
1/8/2024

FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

Fireworks Raises the Quality Bar with Function Calling Model and API Release
12/20/2023

Fireworks Raises the Quality Bar with Function Calling Model and API Release

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release
12/14/2023

Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release

LLM Inference Performance Benchmarking (Part 1)
11/3/2023

LLM Inference Performance Benchmarking (Part 1)

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!
11/2/2023

New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance
10/27/2023

Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance

Accelerating Code Completion with Fireworks Fast LLM Inference
10/11/2023

Accelerating Code Completion with Fireworks Fast LLM Inference

Fireworks.ai Now Available on LangChain Prompt Playground
10/2/2023

Fireworks.ai Now Available on LangChain Prompt Playground

Simplifying Code Infilling with Code Llama and Fireworks.ai
9/12/2023

Simplifying Code Infilling with Code Llama and Fireworks.ai

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning
8/29/2023

Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform
8/17/2023

Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform

Multi-Query Attention is All You Need
7/12/2023

Multi-Query Attention is All You Need