liteLLM Blog

v1.82.0 - Realtime Guardrails, Projects Management, and 10+ Performance Optimizations

2026-02-28T00:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-1.82.0-stable

pip install litellm
pip install litellm==1.82.0

Key Highlights

Realtime API guardrails — Full guardrails support for /v1/realtime WebSocket sessions with pre/post-call enforcement, voice transcription hooks, session termination policies, and Vertex AI Gemini Live support - PR #22152, PR #22153, PR #22161, PR #22165
Projects Management — New Projects UI with full CRUD, project-scoped virtual keys, and admin opt-in toggle — organize teams and keys by project - PR #22315, PR #22360, PR #22373, PR #22412
Guardrail ecosystem expansion — Noma v2, Lakera v2 post-call, Singapore regulatory policies (PDPA + MAS), employment discrimination blockers, code execution blocker, guardrail policy versioning, and production monitoring - PR #21400, PR #21783, PR #21948
OpenAI Codex 5.3 — day 0 — Full support for gpt-5.3-codex on OpenAI and Azure, plus gpt-audio-1.5 and gpt-realtime-1.5 model coverage - PR #22035
10+ performance optimizations — Streaming hot-path fixes, Redis pipeline batching, database task batching, ModelResponse init skip, and router cache improvements — lower latency and CPU on every request
/v1/messages → /responses routing — /v1/messages requests are now routed to the Responses API by default for OpenAI/Azure models

v1/messages routing change

This version starts routing /v1/messages requests to the /responses API by default. To opt out and continue using chat/completions, set LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true or litellm_settings.use_chat_completions_url_for_anthropic_messages: true in your config.

New Models / Updated Models

New Model Support (20 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5.3-codex`	272K	$1.75	$14.00	Reasoning, coding
Azure OpenAI	`azure/gpt-5.3-codex`	272K	$1.75	$14.00	Azure deployment
OpenAI	`gpt-audio-1.5`	128K	$2.50	$10.00	Audio model
Azure OpenAI	`azure/gpt-audio-1.5-2026-02-23`	128K	$2.50	$10.00	Audio model
OpenAI	`gpt-realtime-1.5`	32K	$4.00	$16.00	Realtime model
Azure OpenAI	`azure/gpt-realtime-1.5-2026-02-23`	32K	$4.00	$16.00	Realtime model
Groq	`groq/openai/gpt-oss-safeguard-20b`	131K	$0.075	$0.30	Guardrail inference
Google Vertex AI	`vertex_ai/gemini-3.1-flash-image-preview`	-	-	-	Image generation
Perplexity	`perplexity/perplexity/sonar`	-	-	-	Sonar search
Perplexity	`perplexity/openai/gpt-5.1`	-	-	-	Hosted routing
Perplexity	`perplexity/openai/gpt-5-mini`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-2.5-flash`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-2.5-pro`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-3-flash-preview`	-	-	-	Hosted routing
Perplexity	`perplexity/google/gemini-3-pro-preview`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-haiku-4-5`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-sonnet-4-5`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-opus-4-5`	-	-	-	Hosted routing
Perplexity	`perplexity/anthropic/claude-opus-4-6`	-	-	-	Hosted routing
Perplexity	`perplexity/xai/grok-4-1-fast-non-reasoning`	-	-	-	Hosted routing

Features

OpenAI
- Day 0 support for gpt-5.3-codex on OpenAI and Azure - PR #22035
- Add gpt-audio-1.5 model cost map - PR #22303
- Add gpt-realtime-1.5 model cost map - PR #22304
- Add audio as supported OpenAI param - PR #22092
- Add prompt_cache_key and prompt_cache_retention support - PR #20397
Azure OpenAI
- New Azure OpenAI models 2026-02-25 - PR #22114
Anthropic
- Add v1 Anthropic Responses API transformation - PR #22087
- Sanitize tool_use IDs in convert_to_anthropic_tool_invoke - PR #21964
- Fix model wildcard access issue - PR #21917
AWS Bedrock
- Encode model ARNs for OpenAI-compatible Bedrock imported models - PR #21701
- Support optional regional STS endpoint in role assumption - PR #21640
- Native structured outputs API support - PR #21222
Google Vertex AI
- Add gemini-3.1-flash-image-preview to model cost map - PR #22223
- Enable context-1m-2025-08-07 beta header for Vertex AI provider - PR #21867
OpenRouter
- Add OpenRouter native models to model cost map - PR #20520
- Add OpenRouter Opus 4.6 to model map - PR #20525
Mistral
- Adjust mistral-small-2503 input/output cost per token - PR #22097
Groq
- Add groq/openai/gpt-oss-safeguard-20b model pricing - PR #21951
AI/ML
- Update AIML model pricing - PR #22139
Ollama
- Thread api_base to get_model_info + graceful fallback - PR #21970
PublicAI
- Fix function calling for PublicAI Apertus models - PR #21582
xAI
- Add deprecation dates for grok-2-vision-1212 and grok-3-mini models - PR #20102
General
- Forward auth headers of provider - PR #22070
- Normalize camelCase thinking param keys to snake_case - PR #21762
- Allow dimensions param passthrough for non-text-embedding-3 OpenAI models - PR #22144

Bug Fixes

AWS Bedrock
- Fix converse handling for parallel_tool_calls - PR #22267
- Restore parallel_tool_calls mapping in map_openai_params - PR #22333
- Correct modelInput format for Converse API batch models - PR #21656
- Prevent double UUID in create_file S3 key - PR #21650
- Filter internal json_tool_call when mixed with real tools - PR #21107
- Pass timeout param to Bedrock rerank HTTP client - PR #22021
Anthropic
- Fix model cost map for anthropic fast and inference_geo - PR #21904
Image Generation
- Propagate extra_headers to upstream image generation - PR #22026
- Add ChatCompletionImageObject in OpenAIChatCompletionAssistantMessage - PR #22155
General
- Preserve forwarding of server-side called tools - PR #22260
- Fix free model handling from UI paths - PR #22258
- Fix None TypeError in mapping - PR #22080

LLM API Endpoints

Features

Realtime API
- Guardrails support for /v1/realtime WebSocket endpoint - PR #22152
- Vertex AI Gemini Live via unified /realtime endpoint - PR #22153
- Guardrails with pre_call/post_call mode on realtime WebSocket - PR #22161
- end_session_after_n_fails + Endpoint Settings wizard step - PR #22165
- Guardrail hook for voice transcription - PR #21976
- Fix guardrails not firing for Gemini/Vertex AI and provider_config realtime sessions - PR #22168
- Add logging, spend tracking support + tool tracing - PR #22105
Video Generation
- Add variant parameter to video content download - PR #21955
- Pass api_key from litellm_params to video remix handlers - PR #21965
- Apply custom video pricing from deployment model_info - PR #21923
- Fix passing of image and parameters in videos API - PR #22170
OCR
- Enable local file support for OCR - PR #22133
Websearch / Tool Calling
- Preserve thinking blocks in agentic loop follow-up messages - PR #21604
General
- Add configurable upper bound for chunk processing time - PR #22209
- Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Bugs

General
- Fix mypy attr-defined errors on realtime websocket calls - PR #22202

Management Endpoints / UI

Features

Projects
- Add Projects page with list and create flows - PR #22315
- Add Project Details page with edit modal - PR #22360
- Add project keys table and project dropdown on key create/edit - PR #22373
- Add delete project action to Projects table - PR #22412
- Add Projects Opt-In Toggle in Admin Settings - PR #22416
- Include created_at and updated_at in /project/list response - PR #22323
- Add tags in project - PR #22216
Virtual Keys + Access Groups
- Add bidirectional team/key sync for Access Group CRUD flows - PR #22253
- Add pagination and search to /key/aliases to prevent OOMs - PR #22137
- Add paginated key alias selector in UI - PR #22157
- Add project_id and access_group_id filters for key list endpoint - PR #22356
- Add KeyInfoHeader component - PR #22047
- Restrict Edit Settings to key owners - PR #21985
- Fix virtual key grace period from env/UI - PR #20321
Agents
- Assign virtual keys to agents - PR #22045
- Assign tools to agents - PR #22064
- Ensure internal users cannot create agents (RBAC enforcement) - PR #22329
Proxy Auth / SSO
- OIDC discovery URLs, roles array handling, and dot-notation error hints - PR #22336
- Add PROXY_ADMIN role to system user for key rotation - PR #21896
Usage / Spend Logs
- Add user filtering to usage page - PR #22059
- Allow using AI to understand usage patterns - PR #22042
- Use backend request_duration_ms and make Duration sortable in Logs - PR #22122
- Add request_duration_ms to SpendLogs - PR #22066
- Enrich failure spend logs with key/team metadata - PR #22049
- Show real tool names in logs for Anthropic-format tools - PR #22048
Models + Endpoints
- Show proxy URL in ModelHub - PR #21660
- Add /public/endpoints for provider endpoint support - PR #22248
UI Improvements
- Add custom favicon support - PR #21653
- Add Blog Dropdown in Navbar - PR #21859
- Add UI banner warning for detailed debug mode - PR #21527
- Make auth value optional for MCP Server create flow - PR #22119
- Tool policies: auto-discover tools + policy enforcement guardrail - PR #22041
Health Checks
- Add health check max tokens configuration - PR #22299
- Limit concurrent health checks with health_check_concurrency - PR #20584
- Fix health check model_id filtering - PR #21071

Bugs

Populate user_id and user_info for admin users in /user/info - PR #22239
Fix virtual keys pagination stale totals when filtering - PR #22222
Fix Spend Update Queue aggregation never triggers with default presets - PR #21963
Fix timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
Fix custom auth budget issue - PR #22164
Fix missing OAuth session state - PR #21992
Fix Transport Type for OpenAPI Spec on UI - PR #22005
Fix Claude Code plugin schema - PR #22271
Add missing migration for LiteLLM_ClaudeCodePluginTable - PR #22335
Only tag selected deployment in access group creation - PR #21655
State management fixes for CheckBatchCost - PR #21921
Remove duplicate antd import in ToolPolicies - PR #22107

AI Integrations

Logging

DataDog
- Add ability to trace metrics in DataDog - PR #22103
- Correlate LiteLLM call IDs with DataDog APM spans - PR #22219
- Fix TTS metric emission issues - PR #20632
Prometheus
- Add opt-in stream label on litellm_proxy_total_requests_metric - PR #22023
- Fix team +Inf budgets in Prometheus metrics - PR #22243
Langfuse
- Fix Langfuse OTEL trace issues - PR #21309
Arize Phoenix
- Fix nested traces coexistence with OTEL callback - PR #22169
Slack
- Add optional digest mode for Slack alert types - PR #21683
General
- Fix Gemini trace ID missing in logging - PR #22077
- Populate cache_read_input_tokens from prompt_tokens_details for OpenAI/Azure - PR #22090

Guardrails

Noma
- Noma guardrails v2 based on custom guardrails framework - PR #21400
LakeraAI
- Add Lakera v2 post-call hook with fixed PII masking - PR #21783
Presidio
- Fix Presidio streaming and false positives - PR #21949
- Fix Presidio streaming v3 reliability improvements - PR #22283
- Prevent Presidio crash on non-JSON responses - PR #22084
Built-in Guardrails
- Block code execution guardrail to prevent agents from executing code - PR #22154
- Employment discrimination topic blockers for 5 protected classes - PR #21962
- Claims agent guardrails (5 categories + policy template) - PR #22113
- New code execution evaluation dataset - PR #22065
- Tool policies: auto-discover tools + policy enforcement - PR #22041
Policy Templates
- Singapore guardrail policies (PDPA + MAS AI Risk Management) - PR #21948
- Prefix SG guardrail policy IDs with country code - PR #21974
- Guardrail policy versioning - PR #21862
Guardrail Monitoring
- Guardrail Monitor — measure guardrail reliability in production - PR #21944
Security
- Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095

Prompt Management

No major prompt management changes in this release.

Secret Managers

No major secret manager changes in this release.

Spend Tracking, Budgets and Rate Limiting

Priority PayGo cost tracking for Gemini/Vertex AI - PR #21909
Add request_duration_ms to SpendLogs for latency tracking per request - PR #22066
Add in_flight_requests metric to /health/backlog + Prometheus - PR #22319
Enrich failure spend logs with key/team metadata - PR #22049
Add spend tracking lifecycle logging for debugging spend flows - PR #22029
Fix budget timezone config lookup and replace hardcoded timezone map with ZoneInfo - PR #21754
Fix Spend Update Queue aggregation never triggering with default presets - PR #21963
Avoid mutating caller-owned dicts in SpendUpdateQueue aggregation - PR #21742
Optimize old spendlog deletion cron job - PR #21930
Health check max tokens configuration - PR #22299

MCP Gateway

Pass MCP auth headers from request context to tool fetch for /v1/responses and /chat/completions - PR #22291
Default available_on_public_internet to true for MCP server behavior consistency - PR #22331
Clear error messages for IP filtering / no available tools - PR #22142
Strip stale mcp-session-id header to prevent 400 errors across proxy workers - PR #21417
Skip health check for MCP with passthrough token auth - PR #21982
Fix missing OAuth session state - PR #21992
Fix Transport Type for OpenAPI Spec on UI - PR #22005
Add e2e test for stateless StreamableHTTP behavior - PR #22033

Performance / Loadbalancing / Reliability improvements

Streaming & hot-path

Streaming latency improvements — 4 targeted hot-path fixes - PR #22346
Skip throwaway Usage() construction in ModelResponse.__init__ - PR #21611
Optimize is_model_o_series_model with startswith - PR #21690
Use cached _safe_get_request_headers instead of per-request construction - PR #21430
Emit x-litellm-overhead-duration-ms header for streaming requests - PR #22027

Database & Redis

Batch 11 create_task() calls into 1 in update_database() - PR #22028
Redis pipeline spend updates for batched writes - PR #22044
Recover from prisma-query-engine zombie process - PR #21899
Optimize old spendlog deletion cron job - PR #21930

Router & caching

Add cache invalidation for _cached_get_model_group_info - PR #20376
Remove cache eviction close that kills in-use httpx clients - PR #22247
Store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings - PR #22143
Fix ensure_arrival_time set before calculating queue time - PR #21918

Connection management

Only set enable_cleanup_closed on aiohttp when required - PR #21897
Prometheus child_exit cleanup for gunicorn workers - PR #22324
Prometheus multiprocess cleanup - PR #22221
Limit concurrent health checks with health_check_concurrency - PR #20584
Isolate get_config failures from model sync loop - PR #22224

Other

Semantic cache: support configurable vector dimensions - PR #21649
Honor MAX_STRING_LENGTH_PROMPT_IN_DB from config env vars - PR #22106
Enhance MidStreamFallbackError to preserve original status code and attributes - PR #22225
Network mock utility for testing - PR #21942
Add missing return type annotations to iterator protocol methods in streaming_handler - PR #21750

Security

Fix critical/high CVEs in OS-level libs and NPM transitive dependencies - PR #22008
Fix unauthenticated RCE and sandbox escape in custom code guardrail - PR #22095
Remove hardcoded base64 string flagged by secret scanner - PR #22125

Documentation Updates

Add OpenAI Agents SDK tutorial with LiteLLM Proxy - PR #21221
Add OpenClaw integration tutorial - PR #21605
Add Google GenAI SDK tutorial (JS & Python) - PR #21885
Add Gollem Go agent framework cookbook example - PR #21747
Update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway - PR #21130
Add store_model_in_db release docs - PR #21863
Add Credential Usage Tracking docs - PR #22112
Add proxy request tags docs - PR #22129
Add trailing slash to /mcp endpoint URLs - PR #20509
Add pre-PR checklist to UI contributing guide - PR #21886
Replace Azure OpenAI key with mock key in docs - PR #21997
Add performance & reliability section to v1.81.14 release notes - PR #21950
Update v1.81.12-stable release notes to point to stable.1 - PR #22036
Add security vulnerability scan report to v1.81.14 release notes - PR #22385

New Contributors

@janfrederickk made their first contribution in PR #21660
@hztBUAA made their first contribution in PR #21656
@LeeJuOh made their first contribution in PR #21754
@WhoisMonesh made their first contribution in PR #21750
@trevorprater made their first contribution in PR #21747
@edwiniac made their first contribution in PR #21870
@stakeswky made their first contribution in PR #21867
@ta-stripe made their first contribution in PR #21701
@ron-zhong made their first contribution in PR #21948
@Arindam200 made their first contribution in PR #21221
@Canvinus made their first contribution in PR #21964
@nicolopignatelli made their first contribution in PR #21951
@MarshHawk made their first contribution in PR #20584
@gavksingh made their first contribution in PR #22106
@roni-frantchi made their first contribution in PR #22090
@noahnistler made their first contribution in PR #22133
@dylan-duan-aai made their first contribution in PR #21130
@rasmi made their first contribution in PR #22322

Diff Summary

02/28/2026

New Models / Updated Models: 26
LLM API Endpoints: 14
Management Endpoints / UI: 38
AI Integrations: 25
Spend Tracking, Budgets and Rate Limiting: 10
MCP Gateway: 8
Performance / Loadbalancing / Reliability improvements: 22
Security: 3
Documentation Updates: 14

Full Changelog

v1.81.14.rc.1...v1.82.0

v1.81.14 - New Gateway Level Guardrails & Compliance Playground

2026-02-21T00:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.81.14-stable

pip install litellm
pip install litellm==1.81.14

Key Highlights

Guardrail Garden — Browse built-in and partner guardrails by use case — competitor blocking, topic filtering, GDPR, prompt injection, and more. Pick a template, customize it, attach it to a team or key.
Compliance Playground — Test any guardrail policy against your own traffic before it goes live. See precision, recall, and false positive rate — so you know how it'll behave in production.
3 new zero-cost built-in guardrails — Competitor name blocker, topic blocker, and insults filter — all gateway-level, <0.1ms latency, no external API, configurable per-team or key
Store Model in DB Settings via UI - Configure model storage directly in the Admin UI without editing config files or restarting the proxy—perfect for cloud deployments
Claude Sonnet 4.6 — day 0 — Full support across Anthropic and Vertex AI: reasoning, computer use, prompt caching, 200K context
20+ performance optimizations — Faster routing, lower logging overhead, reduced cost-calculator latency, and connection pool fixes — meaningfully less CPU and latency on every request

Guardrail Garden

AI Platform Admins can now browse built-in and partner guardrails from the Guardrail Garden. Guardrails are organized by use case — blocking financial advice, filtering insults, detecting competitor mentions, and more — so you can find the right one and deploy it in a few clicks.

3 New Built-in Guardrails

This release brings 3 new built-in guardrails that run directly on the gateway. This is great for AI Gateway Admins who need low latency, zero cost guardrails for their scenarios.

Denied Financial Advice — detects requests for personalized financial advice, investment recommendations, or financial planning
Denied Insults — detects insults, name-calling, and personal attacks directed at the chatbot, staff, or other people
Competitor Name Blocker — detects mentions of competitor brands in responses

These guardrails are built for production and on our benchmarks had a 100% Recall and Precision.

Store Model in DB Settings via UI

Previously, the store_model_in_db setting could only be configured in proxy_config.yaml under general_settings, requiring a proxy restart to take effect. Now you can enable or disable this setting directly from the Admin UI without any restarts. This is especially useful for cloud deployments where you don't have direct access to config files or want to avoid downtime. Enable store_model_in_db to move model definitions from your YAML into the database—reducing config complexity, improving scalability, and enabling dynamic model management across multiple proxy instances.

Eval results

We benchmarked our new built-in guardrails against labeled datasets before shipping. You can see the results for Denied Financial Advice (207 cases) and Denied Insults (299 cases):

Guardrail	Precision	Recall	F1	Latency p50	Cost/req
Denied Financial Advice	100%	100%	100%	<0.1ms	$0
Denied Insults	100%	100%	100%	<0.1ms	$0

100% precision means zero false positives — no legitimate messages were incorrectly blocked. 100% recall means zero false negatives — every message that should have been blocked was caught.

Compliance Playground

The Compliance Playground lets you test any guardrail against our pre-built eval datasets or your own custom datasets, so you can see precision, recall, and false positive rate before rolling it out to production.

Performance & Reliability — Up to 13% Lower Latency

This release cuts latency across all percentiles through 20+ micro-optimizations across logging, cost calculation, routing, and connection management. See benchmarking for more info about how to benchmark yourself.

Mean latency: 78.4 ms → 70.3 ms (−10.3%)
p50 latency: 64.8 ms → 57.3 ms (−11.7%)
p99 latency: 288.9 ms → 250.0 ms (−13.4%)

Streaming Connection Pool Fix

Fixed a 3-fold connection leak that caused TCP connection starvation under streaming workloads: the aiohttp transport wasn't closing connections, no finally blocks were calling close on disconnect, and a Uvicorn bug prevented disconnect signaling. PR #21213

Redis Connection Pool Reliability

Fixed 4 separate connection pool bugs to make how we use Redis more reliable. The most important change was on pools being leaked on cache expiry and the other fixes are detailed here in PR #21717.

New Providers and Endpoints

New Providers (1 new provider)

Provider	Supported LiteLLM Endpoints	Description
IBM watsonx.ai	`/rerank`	Rerank support for IBM watsonx.ai models

New LLM API Endpoints (1 new endpoint)

Endpoint	Method	Description	Documentation
`/v1/evals`	POST/GET	OpenAI-compatible Evals API for model evaluation	Docs

New Models / Updated Models

New Model Support (13 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-sonnet-4-6`	200K	$3.00	$15.00	Reasoning, computer use, prompt caching, vision, PDF
Vertex AI	`vertex_ai/claude-opus-4-6@default`	1M	$5.00	$25.00	Reasoning, computer use, prompt caching
Google Gemini	`gemini/gemini-3.1-pro-preview`	1M	$2.00	$12.00	Audio, video, images, PDF
Google Gemini	`gemini/gemini-3.1-pro-preview-customtools`	1M	$2.00	$12.00	Custom tools
GitHub Copilot	`github_copilot/gpt-5.3-codex`	128K	-	-	Responses API, function calling, vision
GitHub Copilot	`github_copilot/claude-opus-4.6-fast`	128K	-	-	Chat completions, function calling, vision
Mistral	`mistral/devstral-small-latest`	256K	$0.10	$0.30	Function calling, response schema
Mistral	`mistral/devstral-latest`	256K	$0.40	$2.00	Function calling, response schema
Mistral	`mistral/devstral-medium-latest`	256K	$0.40	$2.00	Function calling, response schema
OpenRouter	`openrouter/minimax/minimax-m2.5`	196K	$0.30	$1.10	Function calling, reasoning, prompt caching
Fireworks AI	`fireworks_ai/accounts/fireworks/models/glm-4p7`	-	-	-	Chat completions
Fireworks AI	`fireworks_ai/accounts/fireworks/models/minimax-m2p1`	-	-	-	Chat completions
Fireworks AI	`fireworks_ai/accounts/fireworks/models/kimi-k2p5`	-	-	-	Chat completions

Features

Anthropic
- Day 0 support for Claude Sonnet 4.6 with reasoning, computer use, and 200K context - PR #21401
- Add Claude Sonnet 4.6 pricing - PR #21395
- Add day 0 feature support for Claude Sonnet 4.6 (streaming, function calling, vision) - PR #21448
- Add reasoning effort and extended thinking support for Sonnet 4.6 - PR #21598
- Fix empty system messages in translate_system_message - PR #21630
- Sanitize Anthropic messages for multi-turn compatibility - PR #21464
- Map websearch tool from /v1/messages to /chat/completions - PR #21465
- Forward reasoning field as reasoning_content in delta streaming - PR #21468
- Add server-side compaction translation from OpenAI to Anthropic format - PR #21555
AWS Bedrock
- Native structured outputs API support (outputConfig.textFormat) - PR #21222
- Support nova/ and nova-2/ spec prefixes for custom imported models - PR #21359
- Broaden Nova 2 model detection to support all nova-2-* variants - PR #21358
- Clamp thinking.budget_tokens to minimum 1024 - PR #21306
- Fix parallel_tool_calls mapping for Bedrock Converse - PR #21659
Google Gemini / Vertex AI
- Day 0 support for gemini-3.1-pro-preview - PR #21568
- Fix _map_reasoning_effort_to_thinking_level for all Gemini 3 family models - PR #21654
- Add reasoning support via config for Gemini models - PR #21663
Databricks
- Add Databricks to supported providers for response schema - PR #21368
- Native Responses API support for Databricks GPT models - PR #21460
GitHub Copilot
- Add github_copilot/gpt-5.3-codex and github_copilot/claude-opus-4.6-fast models - PR #21316
- Fix unsupported params for ChatGPT Codex - PR #21209
- Allow GitHub model aliases to reuse upstream model metadata - PR #21497
Mistral
- Add devstral-2512 model aliases (devstral-small-latest, devstral-latest, devstral-medium-latest) - PR #21372
IBM watsonx.ai
- Add native rerank support - PR #21303
xAI
- Fix usage object in xAI responses - PR #21559
Dashscope
- Remove list-to-str transformation that caused incorrect request formatting - PR #21547
hosted_vllm
- Convert thinking blocks to content blocks for multi-turn conversations - PR #21557
OCI / Oracle
- Fix Grok output pricing - PR #21329
AU Anthropic
- Fix au.anthropic.claude-opus-4-6-v1 model ID - PR #20731
General
- Add routing based on reasoning support — skip deployments that don't support reasoning when thinking params are present - PR #21302
- Add stop as supported param for OpenAI and Azure - PR #21539
- Add store and other missing params to OPENAI_CHAT_COMPLETION_PARAMS - PR #21195, PR #21360
- Preserve provider_specific_fields from proxy responses - PR #21220
- Add default usage data configuration - PR #21550

Bug Fixes

AWS Bedrock
- Fix service_tier cost propagation - PR #21172
- Fix per-image pricing for multimodal embeddings - PR #21646
- Use batch_ prefix for Vertex AI batch IDs in encode_file_id_with_model - PR #21624
Bedrock Converse
- Fix Anthropic usage object to match v1/messages spec - PR #21295
Fireworks AI
- Add missing model pricing for glm-4p7, minimax-m2p1, kimi-k2p5 - PR #21642
Responses API
- Fix use None instead of Reasoning() for reasoning parameter - PR #21103
- Preserve metadata for custom callbacks on codex/responses path - PR #21243

LLM API Endpoints

Features

Responses API
- Return finish_reason='tool_calls' when response contains function_call items - PR #19745
- Eliminate per-chunk thread spawning in async streaming path for significantly better throughput - PR #21709
Evals API
- Add support for OpenAI Evals API - PR #21375
Batch API
- Add file deletion criteria with batch references - PR #21456
- Misc bug fixes for managed batches - PR #21157
Pass-Through Endpoints
- Add method-based routing for passthrough endpoints - PR #21543
- Preserve and forward OAuth Authorization headers through proxy layer - PR #19912
Websearch / Tool Calling
- Add DuckDuckGo as a search tool - PR #21467
- Fix pre_call_deployment_hook not triggering via proxy router for websearch - PR #21433
General
- Exclude tool params for models without function calling support - PR #21244
- Add store param to OpenAI chat completion params - PR #21195
- Add reasoning support via config for per-model reasoning configuration - PR #21663

Bugs

General
- Fix api_base resolution error for models with multiple potential endpoints - PR #21658
- Fix session grouping broken for dict rows from query_raw - PR #21435

Management Endpoints / UI

Features

Access Groups
- Add Access Group Selector to Create and Edit flow for Keys/Teams - PR #21234
Virtual Keys
- Fix virtual key grace period from env/UI - PR #20321
- Fix key expiry default duration - PR #21362
- Key Last Active Tracking — see when a key was last used - PR #21545
- Fix /v1/models returning wildcard instead of expanded models for BYOK team keys - PR #21408
- Return failed_tokens in delete_verification_tokens response - PR #21609
Models + Endpoints
- Add Model Settings Modal to Models & Endpoints page - PR #21516
- Allow store_model_in_db to be set via database (not just config) - PR #21511
- Fix input_cost_per_token masked/hidden in Model Info UI - PR #21723
- Fix credentials for UI-created models in batch file uploads - PR #21502
- Resolve credentials for UI-created models - PR #21502
Teams
- Allow team members to view entire team usage - PR #21537
- Fix service account visibility for team members - PR #21627
- Organization Info page: show member email, AntD tabs, reusable MemberTable - PR #21745
Usage / Spend Logs
- Allow filtering Usage by User - PR #21351
- Inject Credential Name as Tag for Usage Page filtering - PR #21715
- Prefix credential tags and update Tag usage banner - PR #21739
- Show retry count for requests in Logs view - PR #21704
- Fix Aggregated Daily Activity Endpoint performance - PR #21613
SSO / Auth
- Fix SSO PKCE support in multi-pod Kubernetes deployments - PR #20314
- Preserve SSO role regardless of role_mappings config - PR #21503
Proxy CLI / Master Key
- Fix master key rotation Prisma validation errors - PR #21330
- Handle missing DATABASE_URL in append_query_params - PR #21239
Project Management
- Add Project Management APIs for organizing resources - PR #21078
UI Improvements
- Content Filters: help edit/view categories and 1-click add with pagination - PR #21223
- Playground: test fallbacks with UI - PR #21007
- Add forward_client_headers_to_llm_api toggle to general settings - PR #21776
- Fix is_premium() debug log spam on every request - PR #20841

Bugs

Spend Logs: Fix cost calculation - PR #21152
Logs: Fix table not updating and pagination issues - PR #21708
Fix /get_image ignoring UI_LOGO_PATH when cached_logo.jpg exists - PR #21637
Fix duplicate URL in tagsSpendLogsCall query string - PR #20909
Preserve key_alias and team_id metadata in /user/daily/activity/aggregated after key deletion or regeneration - PR #20684
Uncomment response_model in user_info endpoint - PR #17430
Allow internal_user_viewer to access RAG endpoints; restrict ingest to existing vector stores - PR #21508
Suppress warning for litellm-dashboard team in agent permission handler - PR #21721

AI Integrations

Logging

DataDog
- Add team tag to logs, metrics, and cost management - PR #21449
Prometheus
- Fix double-counting of litellm_proxy_total_requests_metric - PR #21159
- Guard against None metadata in Prometheus metrics - PR #21489
- Add ASGI middleware for improved Prometheus metrics collection - PR #20434
Langfuse
- Improve Langfuse test isolation (multiple stability fixes) - PR #21214
General
- Fix cost to 0 for cached responses in logging - PR #21816
- Improve streaming proxy throughput by fixing middleware and logging bottlenecks - PR #21501
- Reduce proxy overhead for large base64 payloads - PR #21594
- Close streaming connections to prevent connection pool exhaustion - PR #21213

Guardrails

Guardrail Garden
- Launch Guardrail Garden — a marketplace for pre-built guardrails deployable in one click - PR #21732
- Redesign guardrail creation form with vertical stepper UI - PR #21727
- Add guardrail jump link in log detail view - PR #21437
- Guardrail tracing UI: show policy, detection method, and match details - PR #21349
AI Policy Templates
- Seven new ready-to-deploy policy templates ship in this release:
  - GDPR Art. 32 EU PII Protection - PR #21340
  - EU AI Act Article 5 (5 sub-guardrails, with French language support) - PR #21342, PR #21453, PR #21427
  - Prompt injection detection - PR #21520
  - Aviation and UAE topic filters with tag-based routing - PR #21518
  - Airline off-topic restriction - PR #21607
  - SQL injection - PR #21806
- AI-powered policy template suggestions with latency overhead estimates - PR #21589, PR #21608, PR #21620
Compliance Checker
- Add compliance checker endpoints + UI panel - PR #21432
- CSV dataset upload to compliance playground for batch testing - PR #21526
Built-in Guardrails
- Competitor name blocker: blocks by name, handles streaming, supports name variations, and splits pre/post call - PR #21719, PR #21533
- Topic blocker with both keyword and embedding-based implementations - PR #21713
- Insults content filter - PR #21729
- MCP Security guardrail to block unregistered MCP servers - PR #21429
Generic Guardrails
- Add configurable fallback to handle generic guardrail endpoint connection failures - PR #21245
Presidio
- Fix Presidio controls configuration - PR #21798
LakeraAI
- Avoid KeyError on missing LAKERA_API_KEY during initialization - PR #21422

Auto Routing

Complexity-based auto routing — new router strategy that scores requests across 7 dimensions (token count, code presence, reasoning markers, technical terms, etc.) and routes to the appropriate model tier — no embeddings or API calls required - PR #21789, Docs

Prompt Management

Prompt Management API
- New API to interact with prompt management integrations without requiring a PR - PR #17800, PR #17946
- Fix prompt registry configuration issues - PR #21402

Spend Tracking, Budgets and Rate Limiting

Fix Bedrock service_tier cost propagation — costs from service-tier responses now correctly flow through to spend tracking - PR #21172
Fix cost for cached responses — cached responses now correctly log $0 cost instead of re-billing - PR #21816
Aggregate daily activity endpoint performance — faster queries for /user/daily/activity/aggregated - PR #21613
Preserve key_alias and team_id metadata in /user/daily/activity/aggregated after key deletion or regeneration - PR #20684
Inject Credential Name as Tag for granular usage page filtering by credential - PR #21715

MCP Gateway

OpenAPI-to-MCP — Convert any OpenAPI spec to an MCP server via API or UI - PR #21575, PR #21662
MCP User Permissions — Fine-grained permissions for end users on MCP servers - PR #21462
MCP Security Guardrail — Block calls to unregistered MCP servers - PR #21429
Fix StreamableHTTPSessionManager — Revert to stateless mode to prevent session state issues - PR #21323
Fix Bedrock AgentCore Accept header — Add required Accept header for AgentCore MCP server requests - PR #21551

Performance / Loadbalancing / Reliability improvements

Logging & callback overhead

Move async/sync callback separation from per-request to callback registration time — ~30% speedup for callback-heavy deployments - PR #20354
Skip Pydantic Usage round-trip in logging payload — reduces serialization overhead per request - PR #21003
Skip duplicate get_standard_logging_object_payload calls for non-streaming requests - PR #20440
Reuse LiteLLM_Params object across the request lifecycle - PR #20593
Optimize add_litellm_data_to_request hot path - PR #20526
Optimize model_dump_with_preserved_fields - PR #20882
Pre-compute OpenAI client init params at module load instead of per-request - PR #20789
Reduce proxy overhead for large base64 payloads - PR #21594
Improve streaming proxy throughput by fixing middleware and logging bottlenecks - PR #21501
Eliminate per-chunk thread spawning in Responses API async streaming - PR #21709

Cost calculation

Optimize completion_cost() with early-exit and caching - PR #20448
Cost calculator: reduce repeated lookups and dict copies - PR #20541

Router & load balancing

Remove quadratic deployment scan in usage-based routing v2 - PR #21211
Avoid O(n²) membership scans in team deployment filter - PR #21210
Avoid O(n) alias scan for non-alias get_model_list lookups - PR #21136
Increase default LRU cache size to reduce multi-model cache thrash - PR #21139
Cache get_model_access_groups() no-args result on Router - PR #20374
Deployment affinity routing callback — route to the same deployment for a session - PR #19143
Session-ID-based routing — use session_id for consistent routing within a session - PR #21763

Connection management & reliability

Fix Redis connection pool reliability — prevent connection exhaustion under load - PR #21717
Fix Prisma connection self-heal for auth and runtime reconnection (reverted, will be re-introduced with fixes) - PR #21706
Close streaming connections to prevent connection pool exhaustion - PR #21213
Make PodLockManager.release_lock atomic compare-and-delete - PR #21226

Database Changes

Schema Updates

Table	Change Type	Description	PR
`LiteLLM_DeletedVerificationToken`	New Column	Added `project_id` column	PR #21587
`LiteLLM_ProjectTable`	New Table	Project management for organizing resources	PR #21078
`LiteLLM_VerificationToken`	New Column	Added `last_active` timestamp for key activity tracking	PR #21545
`LiteLLM_ManagedVectorStoreTable`	Migration	Make vector store migration idempotent	PR #21325

Security

We run Grype and Trivy security scans on every LiteLLM Docker image. Here's the vulnerability report for this release across all published images:

Docker Image Scan Summary

Image	Critical	High	Medium	Low
`ghcr.io/berriai/litellm:main-latest`	0 ✅	4 unique CVEs	4	1
`ghcr.io/berriai/litellm-ee:main-latest`	0 ✅	4 unique CVEs	4	1
`ghcr.io/berriai/litellm-non_root:main-latest`	1	11 unique CVEs	6	2
`ghcr.io/berriai/litellm-database:main-latest`	1	7 unique CVEs	5	1
`ghcr.io/berriai/litellm-spend_logs:main-latest`	4	35 matches	40	10

note

Vulnerability counts are based on full image scans including build-time tooling. High match counts are often inflated by packages like minimatch appearing at multiple versions; the unique CVE counts above reflect the actual distinct vulnerabilities.

Critical Severity

1. Node.js Critical (non-root, database, spend_logs images): Node.js 24.12.0 is used only for the Admin UI build and Prisma client generation — it is not part of the LiteLLM Python application runtime.

Package	Vulnerability	Description	Fix Version
`node`	CVE-2025-55130	Node.js critical vulnerability	20.20.0

2. OpenSSL & Go Critical (spend_logs image only): The spend_logs image contains additional vulnerabilities in the underlying Go modules and system libraries.

Package	Vulnerability	Description	Fix Version
`libcrypto3`, `libssl3`	CVE-2025-15467	OpenSSL critical vulnerability	3.3.6-r0
`stdlib` (Go)	CVE-2025-68121	Go standard library critical vulnerability	1.24.13+

High Severity

All high-severity vulnerabilities are in npm/Node.js build-time dependencies or system-level libraries — they are not in the LiteLLM Python application code.

Present in all images:

Package	Vulnerability	Description	Fix Version
`minimatch`	CVE-2026-26996	DoS via specially crafted glob patterns	10.2.1+ / 9.0.6+
`minimatch`	CVE-2026-27903	DoS due to unbounded recursive backtracking	10.2.3+ / 9.0.7+
`minimatch`	CVE-2026-27904	DoS via catastrophic backtracking in glob expressions	10.2.3+ / 9.0.7+
`tar`	CVE-2026-26960 / GHSA-83g3-92jg-28cx	Arbitrary file read/write via malicious archive hardlinks	7.5.8

Medium Severity (all images)

Package	Vulnerability	Status
`pypdf` 6.7.2	GHSA-x7hp-r3qg-r3cj	Fix available in 6.7.3
Python 3.13	CVE-2025-15366, CVE-2025-15367, CVE-2025-12781	No upstream fix available

Recommendations

LiteLLM Main & EE images (litellm:main-latest, litellm-ee:main-latest) have the best security posture with 0 critical vulnerabilities.
All HIGH/CRITICAL findings in the main images relate to build-time Node.js/npm tooling, not the Python runtime.
We are actively monitoring upstream Python and system library fixes for remaining medium-severity vulnerabilities.

To report a security vulnerability, email support@berri.ai with details and steps to reproduce.

Documentation Updates

Add OpenAI Agents SDK with LiteLLM guide - PR #21311
Access Groups documentation - PR #21236
Anthropic beta headers documentation - PR #21320
Latency overhead troubleshooting guide - PR #21600, PR #21603
Add rollback safety check guide - PR #21743
Incident report: vLLM Embeddings broken by encoding_format parameter - PR #21474
Incident report: Claude Code beta headers - PR #21485
Mark v1.81.12 as stable - PR #21809

New Contributors

@mjkam made their first contribution in PR #21306
@saneroen made their first contribution in PR #21243
@vincentkoc made their first contribution in PR #21239
@felixti made their first contribution in PR #19745
@anttttti made their first contribution in PR #20731
@ndgigliotti made their first contribution in PR #21222
@iamadamreed made their first contribution in PR #19912
@sahukanishka made their first contribution in PR #21220
@namabile made their first contribution in PR #21195
@stronk7 made their first contribution in PR #21372
@ZeroAurora made their first contribution in PR #21547
@SolitudePy made their first contribution in PR #21497
@SherifWaly made their first contribution in PR #21557
@dkindlund made their first contribution in PR #21633
@cagojeiger made their first contribution in PR #21664

Full Changelog

v1.81.12.rc.1...v1.81.14.rc.1

v1.81.12-stable.1 - Guardrail Policy Templates & Action Builder

2026-02-14T00:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.81.12-stable.1

pip install litellm
pip install litellm==1.81.12

Key Highlights

Policy Templates - Pre-configured guardrail policy templates for common safety and compliance use-cases (including NSFW, toxic content, and child safety)
Guardrail Action Builder - Build and customize guardrail policy flows with the new action-builder UI and conditional execution support
MCP OAuth2 M2M + Tracing - Add machine-to-machine OAuth2 support for MCP servers and OpenTelemetry tracing for MCP calls through AI Gateway
Responses API shell Tool & context_management support - Server-side context management (compaction) and Shell tool support for the OpenAI Responses API
Access Groups - Create access groups to manage model, MCP server, and agent access across teams and keys
50+ New Bedrock Regional Model Entries - DeepSeek V3.2, MiniMax M2.1, Kimi K2.5, Qwen3 Coder Next, and NVIDIA Nemotron Nano across multiple regions
Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes - PR #20912

Add Semgrep & fix OOMs

This release fixes out-of-memory (OOM) risks from unbounded asyncio.Queue() usage. Log queues (e.g. GCS bucket) and DB spend-update queues were previously unbounded and could grow without limit under load. They now use a configurable max size (LITELLM_ASYNCIO_QUEUE_MAXSIZE, default 1000); when full, queues flush immediately to make room instead of growing memory. A Semgrep rule (.semgrep/rules/python/unbounded-memory.yml) was added to flag similar unbounded-memory patterns in future code. PR #20912

Guardrail Action Builder

This release adds a visual action builder for guardrail policies with conditional execution support. You can now chain guardrails into multi-step pipelines — if a simple guardrail fails, route to an advanced one instead of immediately blocking. Each step has configurable ON PASS and ON FAIL actions (Next Step, Block, or Allow), and you can test the full pipeline with a sample message before saving.

Access Groups

Access Groups simplify defining resource access across your organization. One group can grant access to models, MCP servers, and agents—simply attach it to a key or team. Create groups in the Admin UI, define which resources each group includes, then assign the group when creating keys or teams. Updates to a group apply automatically to all attached keys and teams.

New Providers and Endpoints

New Providers (2 new providers)

Provider	Supported LiteLLM Endpoints	Description
Scaleway	`/chat/completions`	Scaleway Generative APIs for chat completions
Sarvam AI	`/chat/completions`, `/audio/transcriptions`, `/audio/speech`	Sarvam AI STT and TTS support for Indian languages

New Models / Updated Models

New Model Support (19 highlighted models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
AWS Bedrock	`deepseek.v3.2`	164K	$0.62	$1.85
AWS Bedrock	`minimax.minimax-m2.1`	196K	$0.30	$1.20
AWS Bedrock	`moonshotai.kimi-k2.5`	262K	$0.60	$3.00
AWS Bedrock	`moonshotai.kimi-k2-thinking`	262K	$0.73	$3.03
AWS Bedrock	`qwen.qwen3-coder-next`	262K	$0.50	$1.20
AWS Bedrock	`nvidia.nemotron-nano-3-30b`	262K	$0.06	$0.24
Azure AI	`azure_ai/kimi-k2.5`	262K	$0.60	$3.00
Vertex AI	`vertex_ai/zai-org/glm-5-maas`	200K	$1.00	$3.20
MiniMax	`minimax/MiniMax-M2.5`	1M	$0.30	$1.20
MiniMax	`minimax/MiniMax-M2.5-lightning`	1M	$0.30	$2.40
Dashscope	`dashscope/qwen3-max`	258K	Tiered pricing	Tiered pricing
Perplexity	`perplexity/preset/pro-search`	-	Per-request	Per-request
Perplexity	`perplexity/openai/gpt-4o`	-	Per-request	Per-request
Perplexity	`perplexity/openai/gpt-5.2`	-	Per-request	Per-request
Vercel AI Gateway	`vercel_ai_gateway/anthropic/claude-opus-4.6`	200K	$5.00	$25.00
Vercel AI Gateway	`vercel_ai_gateway/anthropic/claude-sonnet-4`	200K	$3.00	$15.00
Vercel AI Gateway	`vercel_ai_gateway/anthropic/claude-haiku-4.5`	200K	$1.00	$5.00
Sarvam AI	`sarvam/sarvam-m`	8K	Free tier	Free tier
Anthropic	`fast/claude-opus-4-6`	1M	$30.00	$150.00

Note: AWS Bedrock models are available across multiple regions (us-east-1, us-east-2, us-west-2, eu-central-1, eu-north-1, ap-northeast-1, ap-south-1, ap-southeast-3, sa-east-1). 54 regional model entries were added in total.

Features

Anthropic
- Enable non-tool structured outputs on Claude Opus 4.5 and 4.6 using output_format param - PR #20548
- Add support for anthropic_messages call type in prompt caching - PR #19233
- Managing Anthropic Beta Headers with remote URL fetching - PR #20935, PR #21110
- Remove x-anthropic-billing block - PR #20951
- Use Authorization Bearer for OAuth tokens instead of x-api-key - PR #21039
- Filter unsupported JSON schema constraints for structured outputs - PR #20813
- New Claude Opus 4.6 features for /v1/messages - PR #20733
- Fix reasoning_effort=None and "none" should return None for Opus 4.6 - PR #20800
AWS Bedrock
- Extend model support with 4 new beta models - PR #21035
- Add Claude Opus 4.6 to _supports_tool_search_on_bedrock - PR #21017
- Correct Bedrock Claude Opus 4.6 model IDs (remove :0 suffix) - PR #20564, PR #20671
- Add output_config as supported param - PR #20748
Vertex AI
- Add Vertex GLM-5 model support - PR #21053
- Propagate extra_headers anthropic-beta to request body - PR #20666
- Preserve usageMetadata in _hidden_params - PR #20559
- Map IMAGE_PROHIBITED_CONTENT to content_filter - PR #20524
- Add RAG ingest for Vertex AI - PR #21120
OCI / Cohere
- OCI Cohere responseFormat/Pydantic support - PR #20663
- Fix OCI Cohere system messages by populating preambleOverride - PR #20958
Perplexity
- Perplexity Research API support with preset search - PR #20860
MiniMax
- Add MiniMax-M2.5 and MiniMax-M2.5-lightning models - PR #21054
Kimi / Moonshot
- Add Kimi model pricing by region - PR #20855
- Add moonshotai.kimi-k2.5 - PR #20863
Dashscope
- Add dashscope/qwen3-max model with tiered pricing - PR #20919
Vercel AI Gateway
- Add new Vercel AI Anthropic models - PR #20745
Azure AI
- Add azure_ai/kimi-k2.5 to Azure model DB - PR #20896
- Support Azure AD token auth for non-Claude azure_ai models - PR #20981
- Fix Azure batches issues - PR #21092
DeepSeek
- Sync DeepSeek model metadata and add bare-name fallback - PR #20938
Gemini
- Handle image in assistant message for Gemini - PR #20845
- Add missing tpm/rpm for Gemini models - PR #21175
General
- Add 30 missing models to pricing JSON - PR #20797
- Cleanup 39 deprecated OpenRouter models - PR #20786
- Standardize endpoint display_name naming convention - PR #20791
- Fix and stabilize model cost map formatting - PR #20895
- Export PermissionDeniedError from litellm.__init__ - PR #20960

Bug Fixes

Anthropic
- Fix get_supported_anthropic_messages_params - PR #20752
- Fix base_model name for body and deployment name in URL - PR #20747
Azure
- Preserve content_policy_violation error details from Azure OpenAI - PR #20883
Vertex AI
- Fix Gemini multi-turn tool calling message formatting (added and reverted) - PR #20569, PR #21051

LLM API Endpoints

Features

Responses API
- Add server-side context management (compaction) support - PR #21058
- Add Shell tool support for OpenAI Responses API - PR #21063
- Preserve tool call argument deltas when streaming id is omitted - PR #20712
- Preserve interleaved thinking/redacted_thinking blocks during streaming - PR #20702
Chat Completions
- Add Web Search support using LiteLLM /search (web search interception hook) - PR #20483
- Preserved nullable object fields by carrying schema properties - PR #19132
- Support prompt_cache_key for OpenAI and Azure chat completions - PR #20989
Pass-Through Endpoints
- Add support for langchain_aws via LiteLLM passthrough - PR #20843
- Add custom_body parameter to endpoint_func in create_pass_through_route - PR #20849
Vector Stores
- Add target_model_names for vector store endpoints - PR #21089
General
- Add output_config as supported param - PR #20748
- Add managed error file support - PR #20838

Bugs

General
- Stop leaking Python tracebacks in streaming SSE error responses - PR #20850
- Fix video list pagination cursors not encoded with provider metadata - PR #20710
- Handle metadata=None in SDK path retry/error logic - PR #20873
- Fix Spend logs pickle error with Pydantic models and redaction - PR #20685
- Remove duplicate PerplexityResponsesConfig from LLM_CONFIG_NAMES - PR #21105

Management Endpoints / UI

Features

Access Groups
- New Access Groups feature for managing model, MCP server, and agent access - PR #21022
- Access Groups table and details page UI - PR #21165
- Refactor model_ids to model_names for backwards compatibility - PR #21166
Policies
- Allow connecting Policies to Tags, simulating Policies, viewing key/team counts - PR #20904
- Guardrail pipeline support for conditional sequential execution - PR #21177
- Pipeline flow builder UI for guardrail policies - PR #21188
SSO / Auth
- New Login With SSO Button - PR #20908
- M2M OAuth2 UI Flow - PR #20794
- Allow Organization and Team Admins to call /invitation/new - PR #20987
- Invite User: Email Integration Alert - PR #20790
- Populate identity fields in proxy admin JWT early-return path - PR #21169
Spend Logs
- Show predefined error codes in filter with user definable fallback - PR #20773
- Paginated searchable model select - PR #20892
- Sorting columns support - PR #21143
- Allow sorting on /spend/logs/ui - PR #20991
UI Improvements
- Navbar: Option to hide Usage Popup - PR #20910
- Model Page: Improve Credentials Messaging - PR #21076
- Fallbacks: Default configurable to 10 models - PR #21144
- Fallback display with arrows and card structure - PR #20922
- Team Info: Migrate to AntD Tabs + Table - PR #20785
- AntD refactoring and 0 cost models fix - PR #20687
- Zscaler AI Guard UI - PR #21077
- Include Config Defined Pass Through Endpoints - PR #20898
- Rename "HTTP" to "Streamable HTTP (Recommended)" in MCP server page - PR #21000
- MCP server discovery UI - PR #21079
Virtual Keys
- Allow Management keys to access user/daily/activity and team - PR #20124
- Skip premium check for empty metadata fields on team/key update - PR #20598

Bugs

Logs: Fix Input and Output Copying - PR #20657
Teams: Fix Available Teams - PR #20682
Spend Logs: Reset Filters Resets Custom Date Range - PR #21149
Usage: Request Chart stack variant fix - PR #20894
Add Auto Router: Description Text Input Focus - PR #21004
Guardrail Edit: LiteLLM Content Filter Categories - PR #21002
Add null guard for models in API keys table - PR #20655
Show error details instead of 'Data Not Available' for failed requests - PR #20656
Fix Spend Management Tests - PR #21088
Fix JWT email domain validation error message - PR #21212

AI Integrations

Logging

PostHog
- Fix JSON serialization error for non-serializable objects - PR #20668
Prometheus
- Sanitize label values to prevent metric scrape failures - PR #20600
Langfuse
- Prevent empty proxy request spans from being sent to Langfuse - PR #19935
OpenTelemetry
- Auto-infer otlp_http exporter when endpoint is configured - PR #20438
CloudZero
- Update CBF field mappings per LIT-1907 - PR #20906
General
- Allow MAX_CALLBACKS override via env var - PR #20781
- Add standard_logging_payload_excluded_fields config option - PR #20831
- Enable verbose_logger when LITELLM_LOG=DEBUG - PR #20496
- Guard against None litellm_metadata in batch logging path - PR #20832
- Propagate model-level tags from config to SpendLogs - PR #20769

Guardrails

Policy Templates
- New Policy Templates: pre-configured guardrail combinations for specific use-cases - PR #21025
- Add NSFW policy template, toxic keywords in multiple languages, child safety content filter, JSON content viewer - PR #21205
- Add toxic/abusive content filter guardrails - PR #20934
Pipeline Execution
- Add guardrail pipeline support for conditional sequential execution - PR #21177
- Agent Guardrails on streaming output - PR #21206
- Pipeline flow builder UI - PR #21188
Zscaler AI Guard
- Zscaler AI Guard bug fixes and support during post-call - PR #20801
- Zscaler AI Guard UI - PR #21077
ZGuard
- Add team policy mapping for ZGuard - PR #20608
General
- Add logging to all unified guardrails + link to custom code guardrail templates - PR #20900
- Forward request headers + litellm_version to generic guardrails - PR #20729
- Empty guardrails/policies arrays should not trigger enterprise license check - PR #20567
- Fix OpenAI moderation guardrails - PR #20718
- Fix /v2/guardrails/list returning sensitive values - PR #20796
- Fix guardrail status error - PR #20972
- Reuse get_instance_fn in initialize_custom_guardrail - PR #20917

Spend Tracking, Budgets and Rate Limiting

Prevent shared backend model key from being polluted by per-deployment custom pricing - PR #20679
Avoid in-place mutation in SpendUpdateQueue aggregation - PR #20876

MCP Gateway (12 updates)

MCP M2M OAuth2 Support - Add support for machine-to-machine OAuth2 for MCP servers - PR #20788
MCP Server Discovery UI - Browse and discover available MCP servers from the UI - PR #21079
MCP Tracing - Add OpenTelemetry tracing for MCP calls running through AI Gateway - PR #21018
MCP OAuth2 Debug Headers - Client-side debug headers for OAuth2 troubleshooting - PR #21151
Fix MCP "Session not found" errors - Resolve session persistence issues - PR #21040
Fix MCP OAuth2 root endpoints returning "MCP server not found" - PR #20784
Fix MCP OAuth2 query param merging when authorization_url already contains params - PR #20968
Fix MCP SCOPES on Atlassian issue - PR #21150
Fix MCP StreamableHTTP backend - Use anyio.fail_after instead of asyncio.wait_for - PR #20891
Inject NPM_CONFIG_CACHE into STDIO MCP subprocess env - PR #21069
Block spaces and hyphens in MCP server names and aliases - PR #21074

Performance / Loadbalancing / Reliability improvements (8 improvements)

Remove orphan entries from queue - Fix memory leak in scheduler queue - PR #20866
Remove repeated provider parsing in budget limiter hot path - PR #21043
Use current retry exception for retry backoff instead of stale exception - PR #20725
Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes - PR #20912
Add Pyroscope for continuous profiling and observability - PR #21167
Respect ssl_verify with shared aiohttp sessions - PR #20349
Fix shared health check serialization - PR #21119
Change model mismatch logs from WARNING to DEBUG - PR #20994

Database Changes

Schema Updates

Table	Change Type	Description	PR	Migration
`LiteLLM_VerificationToken`	New Indexes	Added indexes on `user_id`+`team_id`, `team_id`, and `budget_reset_at`+`expires`	PR #20736	Migration
`LiteLLM_PolicyAttachmentTable`	New Column	Added `tags` text array for policy-to-tag connections	PR #21061	Migration
`LiteLLM_AccessGroupTable`	New Table	Access groups for managing model, MCP server, and agent access	PR #21022	Migration
`LiteLLM_AccessGroupTable`	Column Change	Renamed `access_model_ids` to `access_model_names`	PR #21166	Migration
`LiteLLM_ManagedVectorStoreTable`	New Table	Managed vector store tracking with model mappings	-	Migration
`LiteLLM_TeamTable`, `LiteLLM_VerificationToken`	New Column	Added `access_group_ids` text array	PR #21022	Migration
`LiteLLM_GuardrailsTable`	New Column	Added `team_id` text column	-	Migration

Documentation Updates (14 updates)

LiteLLM Observatory section added to v1.81.9 release notes - PR #20675
Callback registration optimization added to release notes - PR #20681
Middleware performance blog post - PR #20677
UI Team Soft Budget documentation - PR #20669
UI Contributing and Troubleshooting guide - PR #20674
Reorganize Admin UI subsection - PR #20676
SDK proxy authentication (OAuth2/JWT auto-refresh) - PR #20680
Forward client headers to LLM API documentation fix - PR #20768
Add docs guide for using policies - PR #20914
Add native thinking param examples for Claude Opus 4.6 - PR #20799
Fix Claude Code MCP tutorial - PR #21145
Add API base URLs for Dashscope (International and China/Beijing) - PR #21083
Fix DEFAULT_NUM_WORKERS_LITELLM_PROXY default (1, not 4) - PR #21127
Correct ElevenLabs support status in README - PR #20643

New Contributors

@iver56 made their first contribution in PR #20643
@eliasaronson made their first contribution in PR #20666
@NirantK made their first contribution in PR #19656
@looksgood made their first contribution in PR #20919
@kelvin-tran made their first contribution in PR #20548
@bluet made their first contribution in PR #20873
@itayov made their first contribution in PR #20729
@CSteigstra made their first contribution in PR #20960
@rahulrd25 made their first contribution in PR #20569
@muraliavarma made their first contribution in PR #20598
@joaokopernico made their first contribution in PR #21039
@datzscaler made their first contribution in PR #21077
@atapia27 made their first contribution in PR #20922
@fpagny made their first contribution in PR #21121
@aidankovacic-8451 made their first contribution in PR #21119
@luisgallego-aily made their first contribution in PR #19935

Full Changelog

v1.81.9.rc.1...v1.81.12.rc.1

v1.81.9 - Control which MCP Servers are exposed on the Internet

2026-02-07T00:00:00.000Z

Stable Release Branch

For each stable release, we now maintain a dedicated branch with the format litellm_stable_release_branch_x_xx_xx for the version.

This allows easier patching for day 0 model launches.

Branch for v1.81.9: litellm_stable_release_branch_1_81_9

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.81.9-stable

pip install litellm
pip install litellm==1.81.9

Key Highlights

Claude Opus 4.6 - Full support across Anthropic, AWS Bedrock, Azure AI, and Vertex AI with adaptive thinking and 1M context window
A2A Agent Gateway - Call A2A (Agent-to-Agent) registered agents through the standard /chat/completions API
Expose MCP servers on the public internet - Launch MCP servers with public/private visibility and IP-based access control for internet-facing deployments
UI Team Soft Budget Alerts - Set soft budgets on teams and receive email alerts when spending crosses the threshold — without blocking requests
Performance Optimizations - Multiple performance improvements including ~40% Prometheus CPU reduction, LRU caching, and optimized logging paths
LiteLLM Observatory - Automated 24-hour load tests
30% Faster Request Processing for Callback-Heavy Deployments - [Performance improvement for callback heavy deployments]PR #20354

30% Faster Request Processing for Callback-Heavy Deployments

If you use logging callbacks like Langfuse, Datadog, or Prometheus, every request was paying an unnecessary cost: three loops that re-sorted your callbacks on every single request, even though the callback list hadn't changed. The more callbacks you had configured, the more time was wasted. We moved this work to happen once at startup instead of on every request. For deployments with the default callback set, this is a ~30% speedup in request setup. For deployments with many callbacks configured, the improvement is even larger.

LiteLLM Observatory

LiteLLM Observatory is a long-running release-validation system we built to catch regressions before they reach users. The system is built to be extensible—you can add new tests, configure models and failure thresholds, and queue runs against any deployment. Our goal is to achieve 100% coverage of LiteLLM functionality through these tests. We run 24-hour load tests against our production deployments before all releases, surfacing issues like resource lifecycle bugs, OOMs, and CPU regressions that only appear under sustained load.

MCP Servers on the Public Internet

This release makes it safe to expose MCP servers on the public internet by adding public/private visibility and IP-based access control. You can now run internet-facing MCP services while restricting access to trusted networks and keeping internal tools private.

Get started

UI Team Soft Budget Alerts

Set a soft budget on any team to receive email alerts when spending crosses the threshold — without blocking any requests. Configure the threshold and alerting emails directly from the Admin UI, with no proxy restart needed.

Get started

Let's dive in.

New Models / Updated Models

New Model Support (13 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Anthropic	`claude-opus-4-6`	1M	$5.00	$25.00
AWS Bedrock	`anthropic.claude-opus-4-6-v1`	1M	$5.00	$25.00
Azure AI	`azure_ai/claude-opus-4-6`	200K	$5.00	$25.00
Vertex AI	`vertex_ai/claude-opus-4-6`	1M	$5.00	$25.00
Google Gemini	`gemini/deep-research-pro-preview-12-2025`	65K	$2.00	$12.00
Vertex AI	`vertex_ai/deep-research-pro-preview-12-2025`	65K	$2.00	$12.00
Moonshot	`moonshot/kimi-k2.5`	262K	$0.60	$3.00
OpenRouter	`openrouter/qwen/qwen3-235b-a22b-2507`	262K	$0.07	$0.10
OpenRouter	`openrouter/qwen/qwen3-235b-a22b-thinking-2507`	262K	$0.11	$0.60
Together AI	`together_ai/zai-org/GLM-4.7`	200K	$0.45	$2.00
Together AI	`together_ai/moonshotai/Kimi-K2.5`	256K	$0.50	$2.80
ElevenLabs	`elevenlabs/eleven_v3`	-	$0.18/1K chars	-
ElevenLabs	`elevenlabs/eleven_multilingual_v2`	-	$0.18/1K chars	-

Features

Anthropic
- Full Claude Opus 4.6 support with adaptive thinking across all regions (us, eu, apac, au) - PR #20506, PR #20508, PR #20514, PR #20551
- Map reasoning content to anthropic thinking block (streaming + non-streaming) - PR #20254
AWS Bedrock
- Add 1hr tiered caching costs for long-context models - PR #20214
- Support TTL (1h) field in prompt caching for Bedrock Claude 4.5 models - PR #20338
- Add Nova Sonic speech-to-speech model support - PR #20244
- Fix empty assistant message for Converse API - PR #20390
- Fix content blocked handling - PR #20606
Google Gemini / Vertex AI
- Add Gemini Deep Research model support - PR #20406
- Fix Vertex AI Gemini streaming content_filter handling - PR #20105
- Allow using OpenAI-style tools for web_search with Vertex AI/Gemini models - PR #20280
- Fix supports_native_streaming for Gemini and Vertex AI models - PR #20408
- Add mapping for responses tools in file IDs - PR #20402
Cohere
- Support dimensions param for Cohere embed v4 - PR #20235
Cerebras
- Add reasoning param support for GPT OSS Cerebras - PR #20258
Moonshot
- Add Kimi K2.5 model entries - PR #20273
OpenRouter
- Add Qwen3-235B models - PR #20455
Together AI
- Add GLM-4.7 and Kimi-K2.5 models - PR #20319
ElevenLabs
- Add eleven_v3 and eleven_multilingual_v2 TTS models - PR #20522
Vercel AI Gateway
- Add missing capability flags to models - PR #20276
GitHub Copilot
- Fix system prompts being dropped and auto-add required Copilot headers - PR #20113
GigaChat
- Fix incorrect merging of consecutive user messages for GigaChat provider - PR #20341
xAI
- Add xAI /realtime API support - works with LiveKit SDK - PR #20381
OpenAI
- Add gpt-5-search-api model and docs clarifications - PR #20512

Bug Fixes

Anthropic
- Fix extra inputs not permitted error for provider_specific_fields - PR #20334
AWS Bedrock
- Fix: Managed Batches inconsistent state management for list and cancel batches - PR #20331
OpenAI Embeddings
- Fix open_ai_embedding_models to have custom_llm_provider None - PR #20253

LLM API Endpoints

Features

Messages API
- Filter unsupported Claude Code beta headers for non-Anthropic providers - PR #20578
- Fix inconsistent response format in anthropic.messages.acreate() when using non-Anthropic providers - PR #20442
- Fix 404 on /api/event_logging/batch endpoint that caused Claude Code "route not found" errors - PR #20504
A2A Agent Gateway
- Allow calling A2A agents through LiteLLM /chat/completions API - PR #20358
- Use A2A registered agents with /chat/completions - PR #20362
- Fix A2A agents deployed with localhost/internal URLs in their agent cards - PR #20604
Files API
- Add support for delete and GET via file_id for Gemini - PR #20329
General
- Add User-Agent customization support - PR #19881
- Fix search tools not found when using per-request routers - PR #19818
- Forward extra headers in chat - PR #20386

Management Endpoints / UI

Features

SSO Configuration
- SSO Config Team Mappings - PR #20111
- UI - SSO: Add Team Mappings - PR #20299
- Extract user roles from JWT access token for Keycloak compatibility - PR #20591
Auth / SDK
- Add proxy_auth for auto OAuth2/JWT token management in SDK - PR #20238
Virtual Keys
- Key reset_spend endpoint - PR #20305
- UI - Keys: Allowed Routes to Key Info and Edit Pages - PR #20369
- Add Key info endpoint object permission data - PR #20407
- Keys and Teams Router Setting + Allow Override of Router Settings - PR #20205
Teams & Budgets
- Add soft_budget to Team Table + Create/Update Endpoints - PR #20530
- Team Soft Budget Email Alerts - PR #20553
- UI - Team Settings: Soft Budget + Alerting Emails - PR #20634
- UI - User Budget Page: Unlimited Budget Checkbox - PR #20380
- /user/update allow for max_budget resets - PR #20375
UI Improvements
- Default Team Settings: Migrate to use Reusable Model Select - PR #20310
- Navbar: Option to Hide Community Engagement Buttons - PR #20308
- Show team alias on Models health page - PR #20359
- Admin Settings: Add option for Authentication for public AI Hub - PR #20444
- Adjust daily spend date filtering for user timezone - PR #20472
SCIM
- Add base /scim/v2 endpoint for SCIM resource discovery - PR #20301
Proxy CLI
- CLI arguments for RDS IAM auth - PR #20437

Bugs

Fix: Remove unnecessary key blocking on UI login that prevented access - PR #20210
UI - Team Settings: Disable Global Guardrail Persistence - PR #20307
UI - Model Info Page: Fix Input and Output Labels - PR #20462
UI - Model Page: Column Resizing on Smaller Screens - PR #20599
Fix /key/list user_id Empty String Edge Case - PR #20623
Add array type checks for model, agent, and MCP hub data to prevent UI crashes - PR #20469
Fix unique constraint on daily tables + logging when updates fail - PR #20394

Logging / Guardrail / Prompt Management Integrations

Bug Fixes (3 fixes)

Langfuse
- Fix Langfuse OTEL trace export failing when spans contain null attributes - PR #20382
Prometheus
- Fix incorrect failure metrics labels causing miscounted error rates - PR #20152
Slack Alerts
- Fix Slack alert delivery failing for certain budget threshold configurations - PR #20257

Guardrails (7 updates)

Custom Code Guardrails
- Add HTTP support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support - PR #20619
- Custom Code Guardrails UI Playground - PR #20377
Team Bring-Your-Own Guardrails
- Implement team-based isolation guardrails management - PR #20318
OpenAI Moderations
- Ensure OpenAI Moderations Guard works with OpenAI Embeddings - PR #20523
GraySwan / Cygnal
- Fix fail-open for GraySwan and pass metadata to Cygnal API endpoint - PR #19837
General
- Check for model_response_choices before guardrail input - PR #19784
- Preserve streaming content on guardrail-sampled chunks - PR #20027

Spend Tracking, Budgets and Rate Limiting

Support 0 cost models - Allow zero-cost model entries for internal/free-tier models - PR #20249

MCP Gateway (9 updates)

MCP Semantic Filtering - Filter MCP tools using semantic similarity to reduce tool sprawl for LLM calls - PR #20296, PR #20316
UI - MCP Semantic Filtering - Add support for MCP Semantic Filtering configuration on UI - PR #20454
MCP IP-Based Access Control - Set MCP servers as private/public available on internet with IP-based restrictions - PR #20607, PR #20620
Fix MCP "Session not found" error on VSCode reconnect - PR #20298
Fix OAuth2 'Capabilities: none' bug for upstream MCP servers - PR #20602
Include Config Defined Search Tools in /search_tools/list - PR #20371
UI - Search Tools: Show Config Defined Search Tools - PR #20436
Ensure MCP permissions are enforced when using JWT Auth - PR #20383
Fix gcs_bucket_name not being passed correctly for MCP server storage configuration - PR #20491

Performance / Loadbalancing / Reliability improvements (14 improvements)

Prometheus ~40% CPU reduction - Parallelize budget metrics, fix caching bug, reduce CPU usage - PR #20544
Prevent closed client errors by reverting httpx client caching - PR #20025
Avoid unnecessary Router creation when no models or search tools are configured - PR #20661
Optimize wrapper_async with CallTypes caching and reduced lookups - PR #20204
Cache _get_relevant_args_to_use_for_logging() at module level - PR #20077
LRU cache for normalize_request_route - PR #19812
Optimize get_standard_logging_metadata with set intersection - PR #19685
Early-exit guards in completion_cost for unused features - PR #20020
Optimize get_litellm_params with sparse kwargs extraction - PR #19884
Guard debug log f-strings and remove redundant dict copies - PR #19961
Replace enum construction with frozenset lookup - PR #20302
Guard debug f-string in update_environment_variables - PR #20360
Warn when budget lookup fails to surface silent caching misses - PR #20545
Add INFO-level session reuse logging per request for better observability - PR #20597

Database Changes

Schema Updates

Table	Change Type	Description	PR	Migration
`LiteLLM_TeamTable`	New Column	Added `allow_team_guardrail_config` boolean field for team-based guardrail isolation	PR #20318	Migration
`LiteLLM_DeletedTeamTable`	New Column	Added `allow_team_guardrail_config` boolean field	PR #20318	Migration
`LiteLLM_TeamTable`	New Column	Added `soft_budget` (double precision) for soft budget alerting	PR #20530	Migration
`LiteLLM_DeletedTeamTable`	New Column	Added `soft_budget` (double precision)	PR #20653	Migration
`LiteLLM_MCPServerTable`	New Column	Added `available_on_public_internet` boolean for MCP IP-based access control	PR #20607	Migration

Documentation Updates (14 updates)

Add FAQ for setting up and verifying LITELLM_LICENSE - PR #20284
Model request tags documentation - PR #20290
Add Prisma migration troubleshooting guide - PR #20300
MCP Semantic Filtering documentation - PR #20316
Add CopilotKit SDK doc as supported agents SDK - PR #20396
Add documentation for Nova Sonic - PR #20320
Update Vertex AI Text to Speech doc to show use of audio - PR #20255
Improve Okta SSO setup guide with step-by-step instructions - PR #20353
Langfuse doc update - PR #20443
Expose MCPs on public internet documentation - PR #20626
Add blog post: Achieving Sub-Millisecond Proxy Overhead - PR #20309
Add blog post about litellm-observatory - PR #20622
Update Opus 4.6 blog with adaptive thinking - PR #20637
gpt-5-search-api docs clarifications - PR #20512

New Contributors

@Quentin-M made their first contribution in PR #19818
@amirzaushnizer made their first contribution in PR #20235
@cscguochang made their first contribution in PR #20214
@krauckbot made their first contribution in PR #20273
@agrattan0820 made their first contribution in PR #19784
@nina-hu made their first contribution in PR #20472
@swayambhu94 made their first contribution in PR #20469
@ssadedin made their first contribution in PR #20566

Full Changelog

v1.81.6-nightly...v1.81.9

[Preview] v1.81.6 - Logs v2 with Tool Call Tracing

2026-01-31T00:00:00.000Z

Known Issue - CPU Usage

This release had known issues with CPU usage. This has been fixed in v1.81.9-stable.

We recommend using v1.81.9-stable instead.

Deploy this version

Docker
Pip

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.81.6

pip install litellm==1.81.6

Key Highlights

Logs View v2 with Tool Call Tracing - Redesigned logs interface with side panel, structured tool visualization, and error message search for faster debugging.

Let's dive in.

Logs View v2 with Tool Call Tracing

This release introduces comprehensive tool call tracing through LiteLLM's redesigned Logs View v2, enabling developers to debug and monitor AI agent workflows in production environments seamlessly.

This means you can now onboard use cases like tracing complex multi-step agent interactions, debugging tool execution failures, and monitoring MCP server calls while maintaining full visibility into request/response payloads with syntax highlighting.

Developers can access the new Logs View through LiteLLM's UI to inspect tool calls in structured format, search logs by error messages or request patterns, and correlate agent activities across sessions with collapsible side panel views.

Get Started

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
AWS Bedrock	`amazon.nova-2-pro-preview-20251202-v1:0`	1M	$2.19	$17.50	Chat completions, vision, video, PDF, function calling, prompt caching, reasoning
Google Vertex AI	`gemini-robotics-er-1.5-preview`	1M	$0.30	$2.50	Chat completions, multimodal (text, image, video, audio), function calling, reasoning
OpenRouter	`openrouter/xiaomi/mimo-v2-flash`	262K	$0.09	$0.29	Chat completions, function calling, reasoning
OpenRouter	`openrouter/moonshotai/kimi-k2.5`	-	-	-	Chat completions
OpenRouter	`openrouter/z-ai/glm-4.7`	202K	$0.40	$1.50	Chat completions, vision, function calling, reasoning

Features

AWS Bedrock
- Messages API Bedrock Converse caching and PDF support - PR #19785
- Translate advanced-tool-use to Bedrock-specific headers for Claude Opus 4.5 - PR #19841
- Support tool search header translation for Sonnet 4.5 - PR #19871
- Filter unsupported beta headers for AWS Bedrock Invoke API - PR #19877
- Nova grounding improvements - PR #19598, PR #20159
Anthropic
- Remove explicit cache_control null in tool_result content - PR #19919
- Fix tool handling - PR #19805
Google Gemini / Vertex AI
- Add Gemini Robotics-ER 1.5 preview support - PR #19845
- Support file retrieval in GoogleAIStudioFilesHandle - PR #20018
- Add /delete endpoint support - PR #20055
- Add custom_llm_provider as gemini translation - PR #19988
- Subtract implicit cached tokens from text_tokens for correct cost calculation - PR #19775
- Remove unsupported prompt-caching-scope-2026-01-05 header for vertex ai - PR #20058
- Add disable flag for anthropic gemini cache translation - PR #20052
- Convert image URLs to base64 in tool messages for Anthropic on Vertex AI - PR #19896
xAI
- Add grok reasoning content support - PR #19850
- Add websearch params support for Responses API - PR #19915
- Add routing of xai chat completions to responses when web search options is present - PR #20051
- Correct cached token cost calculation - PR #19772
Azure OpenAI
- Use generic cost calculator for audio token pricing - PR #19771
- Allow tool_choice for Azure GPT-5 chat models - PR #19813
- Set gpt-5.2-codex mode to responses for Azure and OpenRouter - PR #19770
OpenAI
- Fix max_input_tokens for gpt-5.2-codex - PR #20009
- Fix gpt-image-1.5 cost calculation not including output image tokens - PR #19515
Hosted VLLM
- Support thinking parameter in anthropic_messages() and .completion() - PR #19787
- Route through base_llm_http_handler to support ssl_verify - PR #19893
- Fix vllm embedding format - PR #20056
OCI GenAI
- Serialize imageUrl as object for OCI GenAI API - PR #19661
Volcengine
- Add context for volcengine models (deepseek-v3-2, glm-4-7, kimi-k2-thinking) - PR #19335
Chinese Providers
- Add prompt caching and reasoning support for MiniMax, GLM, Xiaomi - PR #19924
Vercel AI Gateway
- Add embeddings support - PR #19660

Bug Fixes

Google
- Fix gemini-robotics-er-1.5-preview entry - PR #19974
General
- Fix output_tokens_details.reasoning_tokens None - PR #19914
- Fix stream_chunk_builder to preserve images from streaming chunks - PR #19654
- Fix aspectRatio mapping in image edit - PR #20053
- Handle unknown models in Azure AI cost calculator - PR #20150
GigaChat
- Ensure function content is valid JSON - PR #19232

LLM API Endpoints

Features

Messages API (/messages)
- Add LiteLLM x Claude Agent SDK Integration - PR #20035
A2A / MCP Gateway API (/a2a, /mcp)
- Add A2A agent header-based context propagation support - PR #19504
- Enable progress notifications for MCP tool calls - PR #19809
- Fix support for non-standard MCP URL patterns - PR #19738
- Add backward compatibility for legacy A2A card formats (/.well-known/agent.json) - PR #19949
- Add support for agent parameter in /interactions endpoint - PR #19866
Responses API (/responses)
- Fix custom_llm_provider for provider-specific params - PR #19798
- Extract input tokens details as dict in ResponseAPILoggingUtils - PR #20046
Batch API (/batches)
- Fix /batches to return encoded ids (from managed objects table) - PR #19040
- Fix Batch and File user level permissions - PR #19981
- Add cost tracking and usage object in retrieve_batch call type - PR #19986
Embeddings API (/embeddings)
- Add supported input formats documentation - PR #20073
RAG API (/rag/ingest, /vector_store)
- Add UI for /rag/ingest API - Upload docs, pdfs etc to create vector stores - PR #19822
- Add support for using S3 Vectors as Vector Store Provider - PR #19888
- Add s3_vectors as provider on /vector_store/search API + UI for creating + PDF support - PR #19895
- Add permission management for users and teams on Vector Stores - PR #19972
- Enable router support for completions in RAG query pipeline - PR #19550
Search API (/search)
- Add /list endpoint to list what search tools exist in router - PR #19969
- Fix router search tools v2 integration - PR #19840
Passthrough Endpoints (/{provider}_passthrough)
- Add /openai_passthrough route for OpenAI passthrough requests - PR #19989
- Add support for configuring role_mappings via environment variables - PR #19498
- Add Vertex AI LLM credentials sensitive keyword "vertex_credentials" for masking - PR #19551
- Fix prevention of provider-prefixed model name leaks in responses - PR #19943
- Fix proxy support for slashes in Google Vertex generateContent model names - PR #19737, PR #19753
- Support model names with slashes in Vertex AI passthrough URLs - PR #19944
- Fix regression in Vertex AI passthroughs for router models - PR #19967
- Add regression tests for Vertex AI passthrough model names - PR #19855

Bugs

General
- Fix token calculations and refactor - PR #19696

Management Endpoints / UI

Features

Proxy CLI Auth
- Add configurable CLI JWT expiration via environment variable - PR #19780
- Fix team cli auth flow - PR #19666
Virtual Keys
- UI: Auto Truncation of Table Values - PR #19718
- Fix Create Key: Expire Key Input Duration - PR #19807
- Bulk Update Keys Endpoint - PR #19886
Logs View
- v2 Logs view with side panel and improved UX - PR #20091
- New View to render "Tools" on Logs View - PR #20093
- Add Pretty print view of request/response - PR #20096
- Add error_message search in Spend Logs Endpoint - PR #19960
- UI: Adding Error message search to ui spend logs - PR #19963
- Spend Logs: Settings Modal - PR #19918
- Fix error_code in Spend Logs metadata - PR #20015
- Spend Logs: Show Current Store and Retention Status - PR #20017
- Allow Dynamic Setting of store_prompts_in_spend_logs - PR #19913
- Docs: UI Spend Logs Settings - PR #20197
Models + Endpoints
- Add sortBy and sortOrder params for /v2/model/info - PR #19903
- Fix Sorting for /v2/model/info - PR #19971
- UI: Model Page Server Sort - PR #19908
Usage & Analytics
- UI: Usage Export: Breakdown by Teams and Keys - PR #19953
- UI: Usage: Model Breakdown Per Key - PR #20039
UI Improvements
- UI: Allow Admins to control what pages are visible on LeftNav - PR #19907
- UI: Add Light/Dark Mode Switch for Development - PR #19804
- UI: Dark Mode: Delete Resource Modal - PR #20098
- UI: Tables: Reusable Table Sort Component - PR #19970
- UI: New Badge Dot Render - PR #20024
- UI: Feedback Prompts: Option To Hide Prompts - PR #19831
- UI: Navbar: Fixed Default Logo + Bound Logo Box - PR #20092
- UI: Navbar: User Dropdown - PR #20095
- Change default key type from 'Default' to 'LLM API' - PR #19516
Team & User Management
- Fix /team/member_add User Email and ID Verifications - PR #19814
- Fix SSO Email Case Sensitivity - PR #19799
- UI: Internal User: Bulk Add - PR #19721
AI Gateway Features
- Add support for making silent LLM calls without logging - PR #19544
- UI: Fix MCP tools instructions to display comma-separated strings - PR #20101

Bugs

Fix Model Name During Fallback - PR #20177
Fix Health Endpoints when Callback Objects Defined - PR #20182
Fix Unable to reset user max budget to unlimited - PR #19796
Fix Password comparison with non-ASCII characters - PR #19568
Correct error message for DISABLE_ADMIN_ENDPOINTS - PR #19861
Prevent clearing content filter patterns when editing guardrail - PR #19671
Fix Prompt Studio history to load tools and system messages - PR #19920
Add WATSONX_ZENAPIKEY to WatsonX credentials - PR #20086
UI: Vector Store: Allow Config Defined Models to Be Selected - PR #20031

Logging / Guardrail / Prompt Management Integrations

Features

DataDog
- Add agent support for LLM Observability - PR #19574
- Add datadog cost management support and fix startup callback issue - PR #19584
- Add datadog_llm_observability to /health/services allowed list - PR #19952
- Check for agent mode before requiring DD_API_KEY/DD_SITE - PR #20156
OpenTelemetry
- Propagate JWT auth metadata to OTEL spans - PR #19627
- Fix thread leak in dynamic header path - PR #19946
Prometheus
- Add callbacks and labels - PR #19708
- Add clientip and user agent in metrics - PR #19717
- Add tpm-rpm limit metrics - PR #19725
- Add model_id label to metrics - PR #19678
- Safely handle None metadata in logging - PR #19691
- Resolve high CPU when router_settings in DB by avoiding REGISTRY.collect() - PR #20087
Langfuse
- Add litellm_callback_logging_failures_metric for Langfuse, Langfuse Otel and other Otel providers - PR #19636
General Logging
- Use return value from CustomLogger.async_post_call_success_hook - PR #19670
- Add async_post_call_response_headers_hook to CustomLogger - PR #20083
- Add mock client factory pattern and mock support for PostHog, Helicone, and Braintrust integrations - PR #19707

Guardrails

Presidio
- Reuse HTTP connections to prevent performance degradation - PR #19964
Onyx
- Add timeout to onyx guardrail - PR #19731
General
- Add guardrail model argument feature - PR #19619
- Fix guardrails issues with streaming-response regex - PR #19901
- Remove enterprise requirement for guardrail monitoring (docs) - PR #19833

Spend Tracking, Budgets and Rate Limiting

Add event-driven coordination for global spend query to prevent cache stampede - PR #20030

Performance / Loadbalancing / Reliability improvements

Resolve high CPU when router_settings in DB - by avoiding REGISTRY.collect() in PrometheusServicesLogger - PR #20087
Reuse HTTP connections in Presidio - to prevent performance degradation - PR #19964
Event-driven coordination for global spend query - prevent cache stampede - PR #20030
Fix recursive Pydantic validation issue - PR #19531
Refactor argument handling into helper function to reduce code bloat - PR #19720
Optimize logo fetching and resolve MCP import blockers - PR #19719
Improve logo download performance using async HTTP client - PR #20155
Fix server root path configuration - PR #19790
Refactor: Extract transport context creation into separate method - PR #19794
Add native_background_mode configuration to override polling_via_cache for specific models - PR #19899
Initialize tiktoken environment at import time to enable offline usage - PR #19882
Improve tiktoken performance using local cache in lazy loading - PR #19774
Fix timeout errors in chat completion calls to be correctly reported in failure callbacks - PR #19842
Fix environment variable type handling for NUM_RETRIES - PR #19507
Use safe_deep_copy in silent experiment kwargs to prevent mutation - PR #20170
Improve error handling by inspecting BadRequestError after all other policy types - PR #19878

Database Changes

Schema Updates

Table	Change Type	Description	PR	Migration
`LiteLLM_ManagedVectorStoresTable`	New Columns	Added `team_id` and `user_id` fields for permission management	PR #19972	Migration

Migration Improvements

Fix Docker: Use correct schema path for Prisma generation - PR #19631
Resolve 'relation does not exist' migration errors in setup_database - PR #19281
Fix migration issue and improve Docker image stability - PR #19843
Run Prisma generate as nobody user in non-root Docker container for security - PR #20000
Bump litellm-proxy-extras version to 0.4.28 - PR #20166

Documentation Updates

Add Claude Agents SDK x LiteLLM Guide - PR #20036
Add Cookbook: Using Claude Agent SDK + MCPs with LiteLLM - PR #20081
Fix A2A Python SDK URL in documentation - PR #19832
Add Sarvam usage documentation - PR #19844
Add supported input formats for embeddings - PR #20073
UI Spend Logs Settings Docs - PR #20197
Add OpenAI Agents SDK to OSS Adopters list in README - PR #19820
Update docs: Remove enterprise requirement for guardrail monitoring - PR #19833
Add missing environment variable documentation - PR #20138
Improve documentation blog index page - PR #20188

Infrastructure / Testing Improvements

Add test coverage for Router.get_valid_args and improve code coverage reporting - PR #19797
Add validation of model cost map as CI job - PR #19993
Add Realtime API benchmarks - PR #20074
Add Init Containers support in community helm chart - PR #19816
Add libsndfile to main Dockerfile for ARM64 audio processing support - PR #19776

New Contributors

@ruanjf made their first contribution in https://github.com/BerriAI/litellm/pull/19551
@moh-dev-stack made their first contribution in https://github.com/BerriAI/litellm/pull/19507
@formorter made their first contribution in https://github.com/BerriAI/litellm/pull/19498
@priyam-that made their first contribution in https://github.com/BerriAI/litellm/pull/19516
@marcosgriselli made their first contribution in https://github.com/BerriAI/litellm/pull/19550
@natimofeev made their first contribution in https://github.com/BerriAI/litellm/pull/19232
@zifeo made their first contribution in https://github.com/BerriAI/litellm/pull/19805
@pragyasardana made their first contribution in https://github.com/BerriAI/litellm/pull/19816
@ryewilson made their first contribution in https://github.com/BerriAI/litellm/pull/19833
@lizhen921 made their first contribution in https://github.com/BerriAI/litellm/pull/19919
@boarder7395 made their first contribution in https://github.com/BerriAI/litellm/pull/19666
@rushilchugh01 made their first contribution in https://github.com/BerriAI/litellm/pull/19938
@cfchase made their first contribution in https://github.com/BerriAI/litellm/pull/19893
@ayim made their first contribution in https://github.com/BerriAI/litellm/pull/19872
@varunsripad123 made their first contribution in https://github.com/BerriAI/litellm/pull/20018
@nht1206 made their first contribution in https://github.com/BerriAI/litellm/pull/20046
@genga6 made their first contribution in https://github.com/BerriAI/litellm/pull/20009

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.81.3.rc...v1.81.6

v1.81.3-stable - Performance - 25% CPU Usage Reduction

2026-01-26T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.81.3-stable

pip install litellm
pip install litellm==1.81.3.rc.2

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Deprecation Date
OpenAI	`gpt-audio`, `gpt-audio-2025-08-28`	128K	$32/1M audio tokens, $2.5/1M text tokens	$64/1M audio tokens, $10/1M text tokens	-
OpenAI	`gpt-audio-mini`, `gpt-audio-mini-2025-08-28`	128K	$10/1M audio tokens, $0.6/1M text tokens	$20/1M audio tokens, $2.4/1M text tokens	-
Deepinfra, Vertex AI, Google AI Studio, OpenRouter, Vercel AI Gateway	`gemini-2.0-flash-001`, `gemini-2.0-flash`	-	-	-	2026-03-31
Groq	`openai/gpt-oss-120b`	131K	0.075/1M cache read	0.6/1M output tokens	-
Groq	`groq/openai/gpt-oss-20b`	131K	0.0375/1M cache read, $0.075/1M text tokens	0.3/1M output tokens	-
Vertex AI	`gemini-2.5-computer-use-preview-10-2025`	128K	$1.25	$10	-
Azure AI	`claude-haiku-4-5`	$1.25/1M cache read, $2/1M cache read above 1 hr, $0.1/1M text tokens	$5/1M output tokens	-
Azure AI	`claude-sonnet-4-5`	$3.75/1M cache read, $6/1M cache read above 1 hr, $3/1M text tokens	$15/1M output tokens	-
Azure AI	`claude-opus-4-5`	$6.25/1M cache read, $10/1M cache read above 1 hr, $0.5/1M text tokens	$25/1M output tokens	-
Azure AI	`claude-opus-4-1`	$18.75/1M cache read, $30/1M cache read above 1 hr, $1.5/1M text tokens	$75/1M output tokens	-

Features

OpenAI
- Add gpt-audio and gpt-audio-mini models to pricing - PR #19509
- correct audio token costs for gpt-4o-audio-preview models - PR #19500
- Limit stop sequence as per openai spec (ensures JetBrains IDE compatibility) - PR #19562
VertexAI
- Docs - Google Workload Identity Federation (WIF) support - PR #19320
Agentcore
- Fixes streaming issues with AWS Bedrock AgentCore where responses would stop after the first chunk, particularly affecting OAuth-enabled agents - PR #17141
Chatgpt
- Adds support for calling chatgpt subscription via LiteLLM - PR #19030
- Adds responses API bridge support for chatgpt subscription provider - PR #19030
Bedrock
- support for output format for bedrock invoke via v1/messages - PR #19560
Azure
- Add support for Azure OpenAI v1 API - PR #19313
- preserve content_policy_violation details for images (#19328) - PR #19372
- Support OpenAI-format nested tool definitions for Responses API - PR #19526
Gemini(Vertex AI, Google AI Studio)
- use responseJsonSchema for Gemini 2.0+ models - PR #19314
Volcengine
- Support Volcengine responses api - PR #18508
Anthropic
- Add Support for calling Claude Code Max subscriptions via LiteLLM - PR #19453
- Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse - PR #19545
Brave Search
- New Search provider - PR #19433
Sarvam ai
- Add support for new sarvam models - PR #19479
GMI
- add GMI Cloud provider support - PR #19376

Bug Fixes

Anthropic
- Fix anthropic-beta sent client side being overridden instead of appended to - PR #19343
- Filter out unsupported fields from JSON schema for Anthropic's output_format API - PR #19482
Bedrock
- Expose stability models via /image_edits endpoint and ensure proper request transformation - PR #19323
- Claude Code x Bedrock Invoke fails with advanced-tool-use-2025-11-20 - PR #19373
- deduplicate tool calls in assistant history - PR #19324
- fix: correct us.anthropic.claude-opus-4-5 In-region pricing - PR #19310
- Fix request validation errors when using Claude 4 via bedrock invoke - PR #19381
- Handle thinking with tool calls for Claude 4 models - PR #19506
- correct streaming choice index for tool calls - PR #19506
Ollama
- Fix tool call errors due with improved message extraction - PR #19369
VertexAI
- Removed optional vertex_count_tokens_location param before request is sent to vertex - PR #19359
Gemini(Vertex AI, Google AI Studio)
- Supports setting media_resolution and fps parameters on each video file, when using Gemini video understanding - PR #19273
- handle reasoning_effort as dict from OpenAI Agents SDK - PR #19419
- add file content support in tool results - PR #19416
Azure
- Fix Azure AI costs for Anthropic models - PR #19530
Giga Chat
- Add tool choice mapping - PR #19645

AI API Endpoints (LLMs, MCP, Agents)

Features

Files API
- Add managed files support when load_balancing is True - PR #19338
Claude Plugin Marketplace
- Add self hosted Claude Code Plugin Marketplace - PR #19378
MCP
- Add MCP Protocol version 2025-11-25 support - PR #19379
- Log MCP tool calls and list tools in the LiteLLM Spend Logs table for easier debugging - PR #19469
Vertex AI
- Ensure only anthropic betas are forwarded down to LLM API (by default) - PR #19542
- Allow overriding to support forwarding incoming headers are forwarded down to target - PR #19524
Chat/Completions
- Add MCP tools response to chat completions - PR #19552
- Add custom vertex ai finish reasons to the output - PR #19558
- Return MCP execution in /chat/completions before model output during streaming - PR #19623

Bugs

Responses API
- Fix duplicate messages during MCP streaming tool execution - PR #19317
- Fix pickle error when using OpenAI's Responses API with stream=True and tool_choice of type allowed_tools (an OpenAI-native parameter) - PR #17205
- stream tool call events for non-openai models - PR #19368
- preserve tool output ordering for gemini in responses bridge - PR #19360
- Add ID caching to prevent ID mismatch text-start and text-delta - PR #19390
- Include output_item, reasoning_summary_Text_done and reasoning_summary_part_done events for non-openai models - PR #19472
Chat/Completions
- fix: drop_params not dropping prompt_cache_key for non-OpenAI providers - PR #19346
Realtime API
- disable SSL for ws:// WebSocket connections - PR #19345
Generate Content
- Log actual user input when google genai/vertex endpoints are called client-side - PR #19156
/messages/count_tokens Anthropic Token Counting
- ensure it works for Anthropic, Azure AI Anthropic on AI Gateway - PR #19432
MCP
- forward static_headers to MCP servers - PR #19366
Batch API
- Fix: generation config empty for batch - PR #19556
Pass Through Endpoints
- Always reupdate registry - PR #19420

Management Endpoints / UI

Features

Cost Estimator
- Fix model dropdown - PR #19529
Claude Code Plugins
- Allow Adding Claude Code Plugins via UI - PR #19387
Guardrails
- New Policy management UI - PR #19668
- Allow adding policies on Keys/Teams + Viewing on Info panels - PR #19688
General
- respects custom authentication header override - PR #19276
Playground
- Button to Fill Custom API Base - PR #19440
- display mcp output on the play ground - PR #19553
Models
- Paginate /v2/models/info - PR #19521
- All Model Tab Pagination - PR #19525
- Adding Optional scope Param to /models - PR #19539
- Model Search - PR #19622
- Filter by Model ID and Team ID - PR #19713
MCP Servers
- MCP Tools Tab Resetting to Overview - PR #19468
Organizations
- Prevent org admin from creating a new user with proxy_admin permissions - PR #19296
- Edit Page: Reusable Model Select - PR #19601
Teams
- Reusable Model Select - PR #19543
- [Fix] Team Update with Organization having All Proxy Models - PR #19604
Logs
- Include tool arguments in spend logs table - PR #19640
Fallbacks / Loadbalancing
- New fallbacks modal - PR #19673
- Set fallbacks/loadbalancing by team/key - PR #19686

Bugs

Playground
- increase model selector width in playground Compare view - PR #19423
Virtual Keys
- Sorting Shows Incorrect Entries - PR #19534
General
- UI 404 error when SERVER_ROOT_PATH is set - PR #19467
- Redirect to ui/login on expired JWT - PR #19687
SSO
- Fix SSO user roles not updating for existing users - PR #19621
Guardrails
- ensure guardrail patterns persist on edit and mode toggle - PR #19265

AI Integrations

Logging

General Logging
- prevent printing duplicate StandardLoggingPayload logs - PR #19325
- Fix: log duplication when json_logs is enabled - PR #19705
Langfuse OTEL
- ignore service logs and fix callback shadowing - PR #19298
Langfuse
- Send litellm_trace_id - PR #19528
- Add Langfuse mock mode for testing without API calls - PR #19676
GCS Bucket
- prevent unbounded queue growth due to slow API calls - PR #19297
- Add GCS mock mode for testing without API calls - PR #19683
Responses API Logging
- Fix pydantic serialization error - PR #19486
Arize Phoenix
- add openinference span kinds to arize phoenix - PR #19267
Prometheus
- Added new prometheus metrics for user count and team count - PR #19520

Guardrails

Bedrock Guardrails
- Ensure post_call guardrail checks input+output - PR #19151
Prompt Security
- fixing prompt-security's guardrail implementation - PR #19374
Presidio
- Fixes crash in Presidio Guardrail when running in background threads (logging_hook) - PR #19714
Pillar Security
- Migrate Pillar Security to Generic Guardrail API - PR #19364
Policy Engine
- New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team - PR #19612
General
- add case-insensitive support for guardrail mode and actions - PR #19480

Prompt Management

General
- fix prompt info lookup and delete using correct IDs - PR #19358

Secret Manager

AWS Secret Manager
- ensure auto-rotation updates existing AWS secret instead of creating new one - PR #19455
Hashicorp Vault
- Ensure key rotations work with Vault - PR #19634

Spend Tracking, Budgets and Rate Limiting

Pricing Updates
- Add openai/dall-e base pricing entries - PR #19133
- Add input_cost_per_video_per_second in ModelInfoBase - PR #19398

Performance / Loadbalancing / Reliability improvements

General
- Fix date overflow/division by zero in proxy utils - PR #19527
- Fix in-flight request termination on SIGTERM when health-check runs in a separate process - PR #19427
- Fix Pass through routes to work with server root path - PR #19383
- Fix logging error for stop iteration - PR #19649
- prevent retrying 4xx client errors - PR #19275
- add better error handling for misconfig on health check - PR #19441
Router
- Fix Azure RPM calculation formula - PR #19513
- Persist scheduler request queue to redis - PR #19304
- pass search_tools to Router during DB-triggered initialization - PR #19388
- Fixed PromptCachingCache to correctly handle messages where cache_control is a sibling key of string content - PR #19266
Memory Leaks/OOM
- prevent OOM with nested $defs in tool schemas - PR #19112
- fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini - PR #19190
Non root
- fix logfile and pidfile of supervisor for non root environment - PR #17267
- resolve Read-only file system error in non-root images - PR #19449
Dockerfile
- Redis Semantic Caching - add missing redisvl dependency to requirements.txt - PR #19417
- Bump OTEL versions to support a2a dependency - resolves modulenotfounderror for Microsoft Agents by @Harshit28j in #18991
DB
- Handle PostgreSQL cached plan errors during rolling deployments - PR #19424
Timeouts
- Fix: total timeout is not respected - PR #19389
SDK
- Field-Existence Checks to Type Classes to Prevent Attribute Errors - PR #18321
- add google-cloud-aiplatform as optional dependency with clear error message - PR #19437
- Make grpc dependency optional - PR #19447
- Add support for retry policies - PR #19645
Performance
- Cut chat_completion latency by ~21% by reducing pre-call processing time - PR #19535
- Optimize strip_trailing_slash with O(1) index check - PR #19679
- Optimize use_custom_pricing_for_model with set intersection - PR #19677
- perf: skip pattern_router.route() for non-wildcard models - PR #19664
- perf: Add LRU caching to get_model_info for faster cost lookups - PR #19606

General Proxy Improvements

Doc Improvements

new tutorial for adding MCPs to Cursor via LiteLLM - PR #19317
fix vertex_region to vertex_location in Vertex AI pass-through docs - PR #19380
clarify Gemini and Vertex AI model prefix in json file - PR #19443
update Claude Code integration guides - PR #19415
adjust opencode tutorial - PR #19605
add spend-queue-troubleshooting docs - PR #19659
docs: add litellm-enterprise requirement for managed files - PR #19689

Helm

Add support for keda in helm chart - PR #19337
sync Helm chart version with LiteLLM release version - PR #19438
Enable PreStop hook configuration in values.yaml - PR #19613

General

Add health check scripts and parallel execution support - PR #19295

New Contributors

@dushyantzz made their first contribution in PR #19158
@obod-mpw made their first contribution in PR #19133
@msexxeta made their first contribution in PR #19030
@rsicart made their first contribution in PR #19337
@cluebbehusen made their first contribution in PR #19311
@Lucky-Lodhi2004 made their first contribution in PR #19315
@binbandit made their first contribution in PR #19324
@flex-myeonghyeon made their first contribution in PR #19381
@Lrakotoson made their first contribution in PR #18321
@bensi94 made their first contribution in PR #18787
@victorigualada made their first contribution in PR #19368
@VedantMadane made their first contribution in #19266
@stiyyagura0901 made their first contribution in #19276
@kamilio made their first contribution in PR #19447
@jonathansampson made their first contribution in PR #19433
@rynecarbone made their first contribution in PR #19416
@jayy-77 made their first contribution in #19366
@davida-ps made their first contribution in PR #19374
@joaodinissf made their first contribution in PR #19506
@ecao310 made their first contribution in PR #19520
@mpcusack-altos made their first contribution in PR #19577
@milan-berri made their first contribution in PR #19602
@xqe2011 made their first contribution in #19621

Full Changelog

View complete changelog on GitHub

v1.81.0-stable - Claude Code - Web Search Across All Providers

2026-01-18T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.81.0-stable

pip install litellm
pip install litellm==1.81.0

Key Highlights

Claude Code - Support for using web search across Bedrock, Vertex AI, and all LiteLLM providers
Major Change - 50MB limit on image URL downloads to improve reliability
Performance - 25% CPU Usage Reduction by removing premature model.dump() calls from the hot path
Deleted Keys Audit Table on UI - View deleted keys and teams for audit purposes with spend and budget information at the time of deletion

Claude Code - Web Search Across All Providers

This release brings web search support to Claude Code across all LiteLLM providers (Bedrock, Azure, Vertex AI, and more), enabling AI coding assistants to search the web for real-time information.

This means you can now use Claude Code's web search tool with any provider, not just Anthropic's native API. LiteLLM automatically intercepts web search requests and executes them server-side using your configured search provider (Perplexity, Tavily, Exa AI, and more).

Proxy Admins can configure web search interception in their LiteLLM proxy config to enable this capability for their teams using Claude Code with Bedrock, Azure, or any other supported provider.

Learn more →

Major Change - /chat/completions Image URL Download Size Limit

To improve reliability and prevent memory issues, LiteLLM now includes a configurable 50MB limit on image URL downloads by default. Previously, there was no limit on image downloads, which could occasionally cause memory issues with very large images.

How It Works

Requests with image URLs exceeding 50MB will receive a helpful error message:

curl -X POST 'https://your-litellm-proxy.com/chat/completions' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-1234' \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/very-large-image.jpg"
            }
          }
        ]
      }
    ]
  }'

Error Response:

{
  "error": {
    "message": "Error: Image size (75.50MB) exceeds maximum allowed size (50.0MB). url=https://example.com/very-large-image.jpg",
    "type": "ImageFetchError"
  }
}

Configuring the Limit

The default 50MB limit works well for most use cases, but you can easily adjust it if needed:

Increase the limit (e.g., to 100MB):

export MAX_IMAGE_URL_DOWNLOAD_SIZE_MB=100

Disable image URL downloads (for security):

export MAX_IMAGE_URL_DOWNLOAD_SIZE_MB=0

Docker Configuration:

docker run \
  -e MAX_IMAGE_URL_DOWNLOAD_SIZE_MB=100 \
  -p 4000:4000 \
  docker.litellm.ai/berriai/litellm:v1.81.0

Proxy Config (config.yaml):

general_settings:
  master_key: sk-1234
  
# Set via environment variable
environment_variables:
  MAX_IMAGE_URL_DOWNLOAD_SIZE_MB: "100"

Why Add This?

This feature improves reliability by:

Preventing memory issues from very large images
Aligning with OpenAI's 50MB payload limit
Validating image sizes early (when Content-Length header is available)

Performance - 25% CPU Usage Reduction

LiteLLM now reduces CPU usage by removing premature model.dump() calls from the hot path in request processing. Previously, Pydantic model serialization was performed earlier and more frequently than necessary, causing unnecessary CPU overhead on every request. By deferring serialization until it is actually needed, LiteLLM reduces CPU usage and improves request throughput under high load.

Deleted Keys Audit Table on UI

LiteLLM now provides a comprehensive audit table for deleted API keys and teams directly in the UI. This feature allows you to easily track the spend of deleted keys, view their associated team information, and maintain accurate financial records for auditing and compliance purposes. The table displays key details including key aliases, team associations, and spend information captured at the time of deletion. For more information on how to use this feature, see the Deleted Keys & Teams documentation.

New Models / Updated Models

New Model Support

Provider	Model	Features
OpenAI	`gpt-5.2-codex`	Code generation
Azure	`azure/gpt-5.2-codex`	Code generation
Cerebras	`cerebras/zai-glm-4.7`	Reasoning, function calling
Replicate	All chat models	Full support for all Replicate chat models

Features

Anthropic
- Add missing anthropic tool results in response - PR #18945
- Preserve web_fetch_tool_result in multi-turn conversations - PR #18142
Gemini
- Add presence_penalty support for Google AI Studio - PR #18154
- Forward extra_headers in generateContent adapter - PR #18935
- Add medium value support for detail param - PR #19187
Vertex AI
- Improve passthrough endpoint URL parsing and construction - PR #17526
- Add type object to tool schemas missing type field - PR #19103
- Keep type field in Gemini schema when properties is empty - PR #18979
Bedrock
- Add OpenAI-compatible service_tier parameter translation - PR #18091
- Add user auth in standard logging object for Bedrock passthrough - PR #19140
- Strip throughput tier suffixes from model names - PR #19147
OCI
- Handle OpenAI-style image_url object in multimodal messages - PR #18272
Ollama
- Set finish_reason to tool_calls and remove broken capability check - PR #18924
Watsonx
- Allow passing scope ID for Watsonx inferencing - PR #18959
Replicate
- Add all chat Replicate models support - PR #18954
OpenRouter
- Add OpenRouter support for image/generation endpoints - PR #19059
Volcengine
- Add max_tokens settings for Volcengine models (deepseek-v3-2, glm-4-7, kimi-k2-thinking) - PR #19076
Azure Model Router
- New Model - Azure Model Router on LiteLLM AI Gateway - PR #19054
GPT-5 Models
- Correct context window sizes for GPT-5 model variants - PR #18928
- Correct max_input_tokens for GPT-5 models - PR #19056
Text Completion
- Support token IDs (list of integers) as prompt - PR #18011

Bug Fixes

Anthropic
- Prevent dropping thinking when any message has thinking_blocks - PR #18929
- Fix anthropic token counter with thinking - PR #19067
- Add better error handling for Anthropic - PR #18955
- Fix Anthropic during call error - PR #19060
Gemini
- Fix missing completion_tokens_details in Gemini 3 Flash when reasoning_effort is not used - PR #18898
- Fix Gemini Image Generation imageConfig parameters - PR #18948
Vertex AI
- Fix Vertex AI 400 Error with CachedContent model mismatch - PR #19193
- Fix Vertex AI doesn't support structured output - PR #19201
Bedrock
- Fix Claude Code (/messages) Bedrock Invoke usage and request signing - PR #19111
- Fix model ID encoding for Bedrock passthrough - PR #18944
- Respect max_completion_tokens in thinking feature - PR #18946
- Fix header forwarding in Bedrock passthrough - PR #19007
- Fix Bedrock stability model usage issues - PR #19199

LLM API Endpoints

Features

/messages (Claude Code)
- Add support for Tool Search on /messages API across Azure, Bedrock, and Anthropic API - PR #19165
- Track end-users with Claude Code (/messages) for better analytics and monitoring - PR #19171
- Add web search support using LiteLLM /search endpoint with Claude Code (/messages) - PR #19263, PR #19294
/messages (Claude Code) - Bedrock
- Add support for Prompt Caching with Bedrock Converse on /messages - PR #19123
- Ensure budget tokens are passed to Bedrock Converse API correctly on /messages - PR #19107
Responses API
- Add support for caching for responses API - PR #19068
- Add retry policy support to responses API - PR #19074
Realtime API
- Use non-streaming method for endpoint v1/a2a/message/send - PR #19025
Batch API
- Fix batch deletion and retrieve - PR #18340

Bugs

General
- Fix responses content can't be none - PR #19064
- Fix model name from query param in realtime request - PR #19135
- Fix video status/content credential injection for wildcard models - PR #18854

Management Endpoints / UI

Features

Virtual Keys

View deleted keys for audit purposes - PR #18228, PR #19268
Add status query parameter for keys list - PR #19260
Refetch keys after key creation - PR #18994
Refresh keys list on delete - PR #19262
Simplify key generate permission error - PR #18997
Add search to key edit team dropdown - PR #19119

Teams & Organizations

View deleted teams for audit purposes - PR #18228, PR #19268
Add filters to organization table - PR #18916
Add query parameters to /organization/list - PR #18910
Add status query parameter for teams list - PR #19260
Show internal users their spend only - PR #19227
Allow preventing team admins from deleting members from teams - PR #19128
Refactor team member icon buttons - PR #19192

Models + Endpoints

Display health information in public model hub - PR #19256, PR #19258
Quality of life improvements for Anthropic models - PR #19058
Create reusable model select component - PR #19164
Edit settings model dropdown - PR #19186
Fix model hub client side exception - PR #19045

Usage & Analytics

Allow top virtual keys and models to show more entries - PR #19050
Fix Y axis on model activity chart - PR #19055
Add Team ID and Team Name in export report - PR #19047
Add user metrics for Prometheus - PR #18785

SSO & Auth

Allow setting custom MSFT Base URLs - PR #18977
Allow overriding env var attribute names - PR #18998
Fix SCIM GET /Users error and enforce SCIM 2.0 compliance - PR #17420
Feature flag for SCIM compliance fix - PR #18878

General UI

Add allowClear to dropdown components for better UX - PR #18778
Add community engagement buttons - PR #19114
UI Feedback Form - why LiteLLM - PR #18999
Refactor user and team table filters to reusable component - PR #19010
Adjusting new badges - PR #19278

Bugs

Container API routes return 401 for non-admin users - routes missing from openai_routes - PR #19115
Allow routing to regional endpoints for Containers API - PR #19118
Fix Azure Storage circular reference error - PR #19120
Fix prompt deletion fails with Prisma FieldNotFoundError - PR #18966

AI Integrations

Logging

OpenTelemetry
- Update semantic conventions to 1.38 (gen_ai attributes) - PR #18793
LangSmith
- Hoist thread grouping metadata (session_id, thread) - PR #18982
Langfuse
- Include Langfuse logger in JSON logging when Langfuse callback is used - PR #19162
Logfire
- Add ability to customize Logfire base URL through env var - PR #19148
General Logging
- Enable JSON logging via configuration and add regression test - PR #19037
- Fix header forwarding for embeddings endpoint - PR #18960
- Preserve llm_provider-* headers in error responses - PR #19020
- Fix turn_off_message_logging not redacting request messages in proxy_server_request field - PR #18897

Guardrails

Grayswan
- Implement fail-open option (default: True) - PR #18266
Pangea
- Respect default_on during initialization - PR #18912
Panw Prisma AIRS
- Add custom violation message support - PR #19272
General Guardrails
- Fix SerializationIterator error and pass tools to guardrail - PR #18932
- Properly handle custom guardrails parameters - PR #18978
- Use clean error messages for blocked requests - PR #19023
- Guardrail moderation support with responses API - PR #18957
- Fix model-level guardrails not taking effect - PR #18895

Spend Tracking, Budgets and Rate Limiting

Cost Calculation Fixes
- Include IMAGE token count in cost calculation for Gemini models - PR #18876
- Fix negative text_tokens when using cache with images - PR #18768
- Fix image tokens spend logging for /images/generations - PR #19009
- Fix incorrect prompt_tokens_details in Gemini Image Generation - PR #19070
- Fix case-insensitive model cost map lookup - PR #18208
Pricing Updates
- Correct pricing for openrouter/openai/gpt-oss-20b - PR #18899
- Add pricing for azure_ai/claude-opus-4-5 - PR #19003
- Update Novita models prices - PR #19005
- Fix Azure Grok prices - PR #19102
- Fix GCP GLM-4.7 pricing - PR #19172
- Sync DeepSeek chat/reasoner to V3.2 pricing - PR #18884
- Correct cache_read pricing for gemini-2.5-pro models - PR #18157
Budget & Rate Limiting
- Correct budget limit validation operator (>=) for team members - PR #19207
- Fix TPM 25% limiting by ensuring priority queue logic - PR #19092
- Cleanup spend logs cron verification, fix, and docs - PR #19085

MCP Gateway

Prevent duplicate MCP reload scheduler registration - PR #18934
Forward MCP extra headers case-insensitively - PR #18940
Fix MCP REST auth checks - PR #19051
Fix generating two telemetry events in responses - PR #18938
Fix MCP chat completions - PR #19129

Performance / Loadbalancing / Reliability improvements

Performance Improvements
- Remove bottleneck causing high CPU usage & overhead under heavy load - PR #19049
- Add CI enforcement for O(1) operations in _get_model_cost_key to prevent performance regressions - PR #19052
- Fix Azure embeddings JSON parsing to prevent connection leaks and ensure proper router cooldown - PR #19167
- Do not fallback to token counter if disable_token_counter is enabled - PR #19041
Reliability
- Add fallback endpoints support - PR #19185
- Fix stream_timeout parameter functionality - PR #19191
- Fix model matching priority in configuration - PR #19012
- Fix num_retries in litellm_params as per config - PR #18975
- Handle exceptions without response parameter - PR #18919
Infrastructure
- Add Custom CA certificates to boto3 clients - PR #18942
- Update boto3 to 1.40.15 and aioboto3 to 15.5.0 - PR #19090
- Make keepalive_timeout parameter work for Gunicorn - PR #19087
Helm Chart
- Fix mount config.yaml as single file in Helm chart - PR #19146
- Sync Helm chart versioning with production standards and Docker versions - PR #18868

Database Changes

Schema Updates

Table	Change Type	Description	PR
`LiteLLM_ProxyModelTable`	New Columns	Added `created_at` and `updated_at` timestamp fields	PR #18937

Documentation Updates

Add LiteLLM architecture md doc - PR #19057, PR #19252
Add troubleshooting guide - PR #19096, PR #19097, PR #19099
Add structured issue reporting guides for CPU and memory issues - PR #19117
Add Redis requirement warning for high-traffic deployments - PR #18892
Update load balancing and routing with enable_pre_call_checks - PR #18888
Updated pass_through with guided param - PR #18886
Update message content types link and add content types table - PR #18209
Add Redis initialization with kwargs - PR #19183
Improve documentation for routing LLM calls via SAP Gen AI Hub - PR #19166
Deleted Keys and Teams docs - PR #19291
Claude Code end user tracking guide - PR #19176
Add MCP troubleshooting guide - PR #19122
Add auth message UI documentation - PR #19063
Add guide for mounting custom callbacks in Helm/K8s - PR #19136

Bug Fixes

Fix Swagger UI path execute error with server_root_path in OpenAPI schema - PR #18947
Normalize OpenAI SDK BaseModel choices/messages to avoid Pydantic serializer warnings - PR #18972
Add contextual gap checks and word-form digits - PR #18301
Clean up orphaned files from repository root - PR #19150
Include proxy/prisma_migration.py in non-root - PR #18971
Update prisma_migration.py - PR #19083

New Contributors

@yogeshwaran10 made their first contribution in PR #18898
@theonlypal made their first contribution in PR #18937
@jonmagic made their first contribution in PR #18935
@houdataali made their first contribution in PR #19025
@hummat made their first contribution in PR #18972
@berkeyalciin made their first contribution in PR #18966
@MateuszOssGit made their first contribution in PR #18959
@xfan001 made their first contribution in PR #18947
@nulone made their first contribution in PR #18884
@debnil-mercor made their first contribution in PR #18919
@hakhundov made their first contribution in PR #17420
@rohanwinsor made their first contribution in PR #19078
@pgolm made their first contribution in PR #19020
@vikigenius made their first contribution in PR #19148
@burnerburnerburnerman made their first contribution in PR #19090
@yfge made their first contribution in PR #19076
@danielnyari-seon made their first contribution in PR #19083
@guilherme-segantini made their first contribution in PR #19166
@jgreek made their first contribution in PR #19147
@anand-kamble made their first contribution in PR #19193
@neubig made their first contribution in PR #19162

Full Changelog

View complete changelog on GitHub

v1.80.15-stable - Manus API Support

2026-01-10T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.15-stable.1

pip install litellm
pip install litellm==1.80.15

Key Highlights

Manus API Support - New provider support for Manus API on /responses and GET /responses endpoints
MiniMax Provider - Full support for MiniMax chat completions, TTS, and Anthropic native endpoint
AWS Polly TTS - New TTS provider using AWS Polly API
SSO Role Mapping - Configure role mappings for SSO providers directly in the UI
Cost Estimator - New UI tool for estimating costs across multiple models and requests
MCP Global Mode - Configure MCP servers globally with visibility controls
Interactions API Bridge - Use all LiteLLM providers with the Interactions API
RAG Query Endpoint - New RAG Search/Query endpoint for retrieval-augmented generation
UI Usage - Endpoint Activity - Users can now see Endpoint Activity Metrics in the UI
50% Overhead Reduction - LiteLLM now sends 2.5× more requests to LLM providers

Performance - 50% Overhead Reduction

LiteLLM now sends 2.5× more requests to LLM providers by replacing sequential if/elif chains with O(1) dictionary lookups for provider configuration resolution (92.7% faster). This optimization has a high impact because it runs inside the client decorator, which is invoked on every HTTP request made to the proxy server.

Before

Note: Worse-looking provider metrics are a good sign here—they indicate requests spend less time inside LiteLLM.

============================================================
Fake LLM Provider Stats (When called by LiteLLM)
============================================================
Total Time:            0.56s
Requests/Second:       10746.68

Latency Statistics (seconds):
   Mean:               0.2039s
   Median (p50):       0.2310s
   Min:                0.0323s
   Max:                0.3928s
   Std Dev:            0.1166s
   p95:                0.3574s
   p99:                0.3748s

Status Codes:
   200: 6000

After

============================================================
Fake LLM Provider Stats (When called by LiteLLM)
============================================================
Total Time:            1.42s
Requests/Second:       4224.49

Latency Statistics (seconds):
   Mean:               0.5300s
   Median (p50):       0.5871s
   Min:                0.0885s
   Max:                1.0482s
   Std Dev:            0.3065s
   p95:                0.9750s
   p99:                1.0444s

Status Codes:
   200: 6000

The benchmarks run LiteLLM locally with a lightweight LLM provider to eliminate network latency, isolating internal overhead and bottlenecks so we can focus on reducing pure LiteLLM overhead on a single instance.

UI Usage - Endpoint Activity

Users can now see Endpoint Activity Metrics in the UI.

New Providers and Endpoints

New Providers (11 new providers)

Provider	Supported LiteLLM Endpoints	Description
Manus	`/responses`	Manus API for agentic workflows
Manus	`GET /responses`	Manus API for retrieving responses
Manus	`/files`	Manus API for file management
MiniMax	`/chat/completions`	MiniMax chat completions
MiniMax	`/audio/speech`	MiniMax text-to-speech
AWS Polly	`/audio/speech`	AWS Polly text-to-speech API
GigaChat	`/chat/completions`	GigaChat provider for Russian language AI
LlamaGate	`/chat/completions`	LlamaGate chat completions
LlamaGate	`/embeddings`	LlamaGate embeddings
Abliteration AI	`/chat/completions`	Abliteration.ai provider support
Bedrock	`/v1/messages/count_tokens`	Bedrock as new provider for token counting

New LLM API Endpoints (3 new endpoints)

Endpoint	Method	Description	Documentation
`/responses/compact`	POST	Compact responses API endpoint	Docs
`/rag/query`	POST	RAG Search/Query endpoint	Docs
`/containers/{id}/files`	POST	Upload files to containers	Docs

New Models / Updated Models

New Model Support (100+ new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Azure	`azure/gpt-5.2`	400K	$1.75	$14.00	Reasoning, vision, caching
Azure	`azure/gpt-5.2-chat`	128K	$1.75	$14.00	Reasoning, vision
Azure	`azure/gpt-5.2-pro`	400K	$21.00	$168.00	Reasoning, vision, web search
Azure	`azure/gpt-image-1.5`	-	Token-based	Token-based	Image generation/editing
Azure AI	`azure_ai/gpt-oss-120b`	131K	$0.15	$0.60	Function calling
Azure AI	`azure_ai/flux.2-pro`	-	-	$0.04/image	Image generation
Azure AI	`azure_ai/deepseek-v3.2`	164K	$0.58	$1.68	Reasoning, function calling
Bedrock	`amazon.nova-2-multimodal-embeddings-v1:0`	8K	$0.135	-	Multimodal embeddings
Bedrock	`writer.palmyra-x4-v1:0`	128K	$2.50	$10.00	Function calling, PDF
Bedrock	`writer.palmyra-x5-v1:0`	1M	$0.60	$6.00	Function calling, PDF
Bedrock	`moonshot.kimi-k2-v1:0`	-	-	-	Kimi K2 model
Cerebras	`cerebras/zai-glm-4.6`	128K	$2.25	$2.75	Reasoning, function calling
GigaChat	`gigachat/GigaChat-2-Lite`	-	-	-	Chat completions
GigaChat	`gigachat/GigaChat-2-Max`	-	-	-	Chat completions
GigaChat	`gigachat/GigaChat-2-Pro`	-	-	-	Chat completions
Gemini	`gemini/veo-3.1-generate-001`	-	-	-	Video generation
Gemini	`gemini/veo-3.1-fast-generate-001`	-	-	-	Video generation
GitHub Copilot	25+ models	Various	-	-	Chat completions
LlamaGate	15+ models	Various	-	-	Chat, vision, embeddings
MiniMax	`minimax/abab7-chat-preview`	-	-	-	Chat completions
Novita	80+ models	Various	Various	Various	Chat, vision, embeddings
OpenRouter	`openrouter/google/gemini-3-flash-preview`	-	-	-	Chat completions
Together AI	Multiple models	Various	Various	Various	Response schema support
Vertex AI	`vertex_ai/zai-glm-4.7`	-	-	-	GLM 4.7 support

Features

Gemini
- Add image tokens in chat completion - PR #18327
- Add usage object in image generation - PR #18328
- Add thought signature support via tool call id - PR #18374
- Add thought signature for non tool call requests - PR #18581
- Preserve system instructions - PR #18585
- Fix Gemini 3 images in tool response - PR #18190
- Support snake_case for google_search tool parameters - PR #18451
- Google GenAI adapter inline data support - PR #18477
- Add deprecation_date for discontinued Google models - PR #18550
Vertex AI
- Add centralized get_vertex_base_url() helper for global location support - PR #18410
- Convert image URLs to base64 for Vertex AI Anthropic - PR #18497
- Separate Tool objects for each tool type per API spec - PR #18514
- Add thought_signatures to VertexGeminiConfig - PR #18853
- Add support for Vertex AI API keys - PR #18806
- Add zai glm-4.7 model support - PR #18782
Azure
- Add Azure gpt-image-1.5 pricing to cost map - PR #18347
- Add azure/gpt-5.2-chat model - PR #18361
- Add support for image generation via Azure AD token - PR #18413
- Add logprobs support for Azure OpenAI GPT-5.2 model - PR #18856
- Add Azure BFL Flux 2 models for image generation and editing - PR #18764, PR #18766
Bedrock
- Add Bedrock Kimi K2 model support - PR #18797
- Add support for model id in bedrock passthrough - PR #18800
- Fix Nova model detection for Bedrock provider - PR #18250
- Ensure toolUse.input is always a dict when converting from OpenAI format - PR #18414
Databricks
- Add enhanced authentication, security features, and custom user-agent support - PR #18349
MiniMax
- Add MiniMax chat completion support - PR #18380
- Add Anthropic native endpoint support for MiniMax - PR #18377
- Add support for MiniMax TTS - PR #18334
- Add MiniMax provider support to UI dashboard - PR #18496
Together AI
- Add supports_response_schema to all supported Together AI models - PR #18368
OpenRouter
- Add OpenRouter embeddings API support - PR #18391
Anthropic
- Pass server_tool_use and tool_search_tool_result blocks - PR #18770
- Add Anthropic cache control option to image tool call results - PR #18674
Ollama
- Add dimensions for ollama embedding - PR #18536
- Extract pure base64 data from data URLs for Ollama - PR #18465
Watsonx
- Add Watsonx fields support - PR #18569
- Fix Watsonx Audio Transcription - filter model field - PR #18810
SAP
- Add SAP creds for list in proxy UI - PR #18375
- Pass through extra params from allowed_openai_params - PR #18432
- Add client header for SAP AI Core Tracking - PR #18714
Fireworks AI
- Correct deepseek-v3p2 pricing - PR #18483
ZAI
- Add GLM-4.7 model with reasoning support - PR #18476
Codestral
- Correctly route codestral chat and FIM endpoints - PR #18467
Azure AI
- Fix authentication errors at messages API via azure_ai - PR #18500

New Provider Support

AWS Polly - Add AWS Polly API for TTS - PR #18326
GigaChat - Add GigaChat provider support - PR #18564
LlamaGate - Add LlamaGate as a new provider - PR #18673
Abliteration AI - Add abliteration.ai provider - PR #18678
Manus - Add Manus API support on /responses, GET /responses - PR #18804
5 AI Providers via openai_like - Add 5 AI providers using openai_like - PR #18362

Bug Fixes

Gemini
- Properly catch context window exceeded errors - PR #18283
- Remove prompt caching headers as support has been removed - PR #18579
- Fix generate content request with audio file id - PR #18745
- Fix google_genai streaming adapter provider handling - PR #18845
Groq
- Remove deprecated Groq models and update model registry - PR #18062
Vertex AI
- Handle unsupported region for Vertex AI count tokens endpoint - PR #18665
General
- Fix request body for image embedding request - PR #18336
- Fix lost tool_calls when streaming has both text and tool_calls - PR #18316
- Add all resolution for gpt-image-1.5 - PR #18586
- Fix gpt-image-1 cost calculation using token-based pricing - PR #17906
- Fix response_format leaking into extra_body - PR #18859
- Align max_tokens with max_output_tokens for consistency - PR #18820

LLM API Endpoints

Features

Responses API
- Add new compact endpoint (v1/responses/compact) - PR #18697
- Support more streaming callback hooks - PR #18513
- Add mapping for reasoning effort to summary param - PR #18635
- Add output_text property to ResponsesAPIResponse - PR #18491
- Add annotations to completions responses API bridge - PR #18754
Interactions API
- Allow using all LiteLLM providers (interactions -> responses API bridge) - PR #18373
RAG Search API
- Add RAG Search/Query endpoint - PR #18376
CountTokens API
- Add Bedrock as a new provider for /v1/messages/count_tokens - PR #18858
Generate Content
- Add generate content in LLM route - PR #18405
General
- Enable async_post_call_failure_hook to transform error responses - PR #18348
- Calculate total_tokens manually if missing and can be calculated - PR #18445
- Add custom llm provider to get_llm_provider when sent via UI - PR #18638

Bugs

General
- Handle empty error objects in response conversion - PR #18493
- Preserve client error status codes in streaming mode - PR #18698
- Return json error response instead of SSE format for initial streaming errors - PR #18757
- Fix auth header for custom api base in generateContent request - PR #18637
- Tool content should be string for Deepinfra - PR #18739
- Fix incomplete usage in response object passed - PR #18799
- Unify model names to provider-defined names - PR #18573

Management Endpoints / UI

Features

SSO Configuration
- Add SSO Role Mapping feature - PR #18090
- Add SSO Settings Page - PR #18600
- Allow adding role mappings for SSO - PR #18593
- SSO Settings Page Add Role Mappings - PR #18677
- SSO Settings Loading State + Deprecate Previous SSO Flow - PR #18617
Virtual Keys
- Allow deleting key expiry - PR #18278
- Add optional query param "expand" to /key/list - PR #18502
- Key Table Loading Skeleton - PR #18527
- Allow column resizing on Keys Table - PR #18424
- Virtual Keys Table Loading State Between Pages - PR #18619
- Key and Team Router Setting - PR #18790
- Allow router_settings on Keys and Teams - PR #18675
- Use timedelta to calculate key expiry on generate - PR #18666
Models + Endpoints
- Add Model Clearer Flow For Team Admins - PR #18532
- Model Page Loading State - PR #18574
- Model Page Model Provider Select Performance - PR #18425
- Model Page Sorting Sorts Entire Set - PR #18420
- Refactor Model Hub Page - PR #18568
- Add request provider form on UI - PR #18704
Organizations & Teams
- Allow Organization Admins to See Organization Tab - PR #18400
- Resolve Organization Alias on Team Table - PR #18401
- Resolve Team Alias in Organization Info View - PR #18404
- Allow Organization Admins to View Their Organization Info - PR #18417
- Allow editing team_member_budget_duration in /team/update - PR #18735
- Reusable Duration Select + Team Update Member Budget Duration - PR #18736
Usage & Spend
- Add Error Code Filtering on Spend Logs - PR #18359
- Add Error Code Filtering on UI - PR #18366
- Usage Page User Max Budget fix - PR #18555
- Add endpoint to Daily Activity Tables - PR #18729
- Endpoint Activity in Usage - PR #18798
Cost Estimator
- Add Cost Estimator for AI Gateway - PR #18643
- Add view for estimating costs across requests - PR #18645
- Allow selecting many models for cost estimator - PR #18653
CloudZero
- Improve Create and Delete Path for CloudZero - PR #18263
- Add CloudZero UI Docs - PR #18350
Playground
- Add MCP test support to completions on Playground - PR #18440
- Add selectable MCP servers to the playground - PR #18578
- Add custom proxy base URL support to Playground - PR #18661
General UI
- UI styling improvements and fixes - PR #18310
- Add reusable "New" badge component for feature highlights - PR #18537
- Hide New Badges - PR #18547
- Change Budget page to Have Tabs - PR #18576
- Clicking on Logo Directs to Correct URL - PR #18575
- Add UI support for configuring meta URLs - PR #18580
- Expire Previous UI Session Tokens on Login - PR #18557
- Add license endpoint - PR #18311
- Router Fields Endpoint + React Query for Router Fields - PR #18880

Bugs

UI Fixes
- Fix Key Creation MCP Settings Submit Form Unintentionally - PR #18355
- Fix UI Disappears in Development Environments - PR #18399
- Fix Disable Admin UI Flag - PR #18397
- Remove Model Analytics From Model Page - PR #18552
- Useful Links Remove Modal on Adding Links - PR #18602
- SSO Edit Modal Clear Role Mapping Values on Provider Change - PR #18680
- UI Login Case Sensitivity fix - PR #18877
API Fixes
- Fix User Invite & Key Generation Email Notification Logic - PR #18524
- Normalize Proxy Config Callback - PR #18775
- Return empty data array instead of 500 when no models configured - PR #18556
- Enforce org level max budget - PR #18813

AI Integrations

New Integrations (4 new integrations)

Integration	Type	Description
Focus	Logging	Focus export support for observability - PR #18802
SigNoz	Logging	SigNoz integration for observability - PR #18726
Qualifire	Guardrails	Qualifire guardrails and eval webhook - PR #18594
Levo AI	Guardrails	Levo AI integration for security - PR #18529

Logging

DataDog
- Fix span kind fallback when parent_id missing - PR #18418
Langfuse
- Map Gemini cached_tokens to Langfuse cache_read_input_tokens - PR #18614
Prometheus
- Align prometheus metric names with DEFINED_PROMETHEUS_METRICS - PR #18463
- Add Prometheus metrics for request queue time and guardrails - PR #17973
- Add caching metrics for cache hits, misses, and tokens - PR #18755
- Skip metrics for invalid API key requests - PR #18788
Braintrust
- Pass span_attributes in async logging and skip tags on non-root spans - PR #18409
CloudZero
- Add user email to CloudZero - PR #18584
OpenTelemetry
- Use already configured opentelemetry providers - PR #18279
- Prevent LiteLLM from closing external OTEL spans - PR #18553
- Allow configuring arize project name for OpenTelemetry service name - PR #18738
LangSmith
- Add support for LangSmith organization-scoped API keys with tenant ID - PR #18623
Generic API Logger
- Add log_format option to GenericAPILogger - PR #18587

Guardrails

Content Filter
- Add content filter logs page - PR #18335
- Log actual event type for guardrails - PR #18489
Qualifire
- Add Qualifire eval webhook - PR #18836
Lasso Security
- Add Lasso guardrail API docs - PR #18652
Noma Security
- Add MCP guardrail support for Noma - PR #18668
Bedrock Guardrails
- Remove redundant Bedrock guardrail block handling - PR #18634
General
- Generic guardrail API update - PR #18647
- Prevent proxy startup failures from case-sensitive tool permission guardrail validation - PR #18662
- Extend case normalization to ALL guardrail types - PR #18664
- Fix MCP handling in unified guardrail - PR #18630
- Fix embeddings calltype for guardrail precallhook - PR #18740

Spend Tracking, Budgets and Rate Limiting

Platform Fee / Margins - Add support for Platform Fee / Margins - PR #18427
Negative Budget Validation - Add validation for negative budget - PR #18583
Cost Calculation Fixes
- Correct cost calculation when reasoning_tokens are without text_tokens - PR #18607
- Fix background cost tracking tests - PR #18588
Tag Routing - Support toggling tag matching between ANY and ALL - PR #18776

MCP Gateway

MCP Global Mode - Add MCP global mode - PR #18639
MCP Server Visibility - Add configurable MCP server visibility - PR #18681
MCP Registry - Add MCP registry - PR #18850
MCP Stdio Header - Support MCP stdio header env overrides - PR #18324
Parallel Tool Fetching - Parallelize tool fetching from multiple MCP servers - PR #18627
Optimize MCP Server Listing - Separate health checks for optimized listing - PR #18530
Auth Improvements
- Require auth for MCP connection test endpoint - PR #18290
- Fix MCP gateway OAuth2 auth issues and ClosedResourceError - PR #18281
Bug Fixes
- Fix MCP server health status reporting - PR #18443
- Fix OpenAPI to MCP tool conversion - PR #18597
- Remove exec() usage and handle invalid OpenAPI parameter names for security - PR #18480
- Fix MCP error when using multiple servers simultaneously - PR #18855
Migrate MCP Fetching Logic to React Query - PR #18352

Performance / Loadbalancing / Reliability improvements

92.7% Faster Provider Config Lookup - LiteLLM now stresses LLM providers 2.5x more - PR #18867
Lazy Loading Improvements
- Consolidate lazy import handlers with registry pattern - PR #18389
- Complete lazy loading migration for all 180+ LLM config classes - PR #18392
- Lazy load additional components (types, callbacks, utilities) - PR #18396
- Add lazy loading for get_llm_provider - PR #18591
- Lazy-load heavy audio library and loggers - PR #18592
- Lazy load 9 heavy imports in litellm/utils.py - PR #18595
- Lazy load heavy imports to improve import time and memory usage - PR #18610
- Implement lazy loading for provider configs, model info classes, streaming handlers - PR #18611
- Lazy load 15 additional imports - PR #18613
- Lazy load 15+ unused imports - PR #18616
- Lazy load DatadogLLMObsInitParams - PR #18658
- Migrate utils.py lazy imports to registry pattern - PR #18657
- Lazy load get_llm_provider and remove_index_from_tool_calls - PR #18608
Router Improvements
- Validate routing_strategy at startup to fail fast with helpful error - PR #18624
- Correct num_retries tracking in retry logic - PR #18712
- Improve error messages and validation for wildcard routing with multiple credentials - PR #18629
Memory Improvements
- Add memory pattern detection test and fix bad memory patterns - PR #18589
- Add unbounded data structure detection to memory test - PR #18590
- Add memory leak detection tests with CI integration - PR #18881
Database
- Add idx on LOWER(user_email) for faster duplicate email checks - PR #18828
- Proactive RDS IAM token refresh to prevent 15-min connection failed - PR #18795
- Clarify database_connection_pool_limit applies per worker - PR #18780
- Make base_connection_pool_limit default value the same - PR #18721
Docker
- Add libsndfile to database Docker image for audio processing - PR #18612
- Add line_profiler support for performance analysis and fix Windows CRLF issues - PR #18773
Helm
- Add lifecycle support to Helm charts - PR #18517
Authentication
- Add Kubernetes ServiceAccount JWT authentication support - PR #18055
- Use async anthropic client to prevent event loop blocking - PR #18435
Logging Worker
- Handle event loop changes in multiprocessing - PR #18423
Security
- Prevent expired key plaintext leak in error response - PR #18860
- Mask extra header secrets in model info - PR #18822
- Prevent duplicate User-Agent tags in request_tags - PR #18723
- Properly use litellm api keys - PR #18832
Misc
- Remove double imports in main.py - PR #18406
- Add LITELLM_DISABLE_LAZY_LOADING env var to fix VCR cassette creation issue - PR #18725
- Add xiaomi_mimo to LlmProviders enum to fix router support - PR #18819
- Allow installation with current grpcio on old Python - PR #18473
- Add Custom CA certificates to boto3 clients - PR #18852
- Fix bedrock_cache, metadata and max_model_budget - PR #18872
- Fix LiteLLM SDK embedding headers missing field - PR #18844
- Put automatic reasoning summary inclusion behind feat flag - PR #18688
- turn_off_message_logging Does Not Redact Request Messages in proxy_server_request Field - PR #18897

Documentation Updates

Provider Documentation
- Update MiniMax docs to be in proper format - PR #18403
- Add docs for 5 AI providers - PR #18388
- Fix gpt-5-mini reasoning_effort supported values - PR #18346
- Fix PDF documentation inconsistency in Anthropic page - PR #18816
- Update OpenRouter docs to include embedding support - PR #18874
- Add LITELLM_REASONING_AUTO_SUMMARY in doc - PR #18705
MCP Documentation
- Agentcore MCP server docs - PR #18603
- Mention MCP prompt/resources types in overview - PR #18669
- Add Focus docs - PR #18837
Guardrails Documentation
- Qualifire docs hotfix - PR #18724
Infrastructure Documentation
- IAM Roles Anywhere docs - PR #18559
- Fix formatting in proxy configs documentation - PR #18498
- Fix GCS cache docs missing for proxy mode - PR #13328
- Fix how to execute cloudzero sql - PR #18841
General
- LiteLLM adopters section - PR #18605
- Remove redundant comments about setting litellm.callbacks - PR #18711
- Update header to be markdown bold by removing space - PR #18846
- Manus docs - new provider - PR #18817

New Contributors

@prasadkona made their first contribution in PR #18349
@lucasrothman made their first contribution in PR #18283
@aggeentik made their first contribution in PR #18317
@mihidumh made their first contribution in PR #18361
@Prazeina made their first contribution in PR #18498
@systec-dk made their first contribution in PR #18500
@xuan07t2 made their first contribution in PR #18514
@RensDimmendaal made their first contribution in PR #18190
@yurekami made their first contribution in PR #18483
@agertz7 made their first contribution in PR #18556
@yudelevi made their first contribution in PR #18550
@smallp made their first contribution in PR #18536
@kevinpauer made their first contribution in PR #18569
@cansakiroglu made their first contribution in PR #18517
@dee-walia20 made their first contribution in PR #18432
@luxinfeng made their first contribution in PR #18477
@cantalupo555 made their first contribution in PR #18476
@andersk made their first contribution in PR #18473
@majiayu000 made their first contribution in PR #18467
@amangupta-20 made their first contribution in PR #18529
@hamzaq453 made their first contribution in PR #18480
@ktsaou made their first contribution in PR #18627
@FlibbertyGibbitz made their first contribution in PR #18624
@drorIvry made their first contribution in PR #18594
@urainshah made their first contribution in PR #18524
@mangabits made their first contribution in PR #18279
@0717376 made their first contribution in PR #18564
@nmgarza5 made their first contribution in PR #17330
@wileykestner made their first contribution in PR #18445
@minijeong-log made their first contribution in PR #14440
@Isaac4real made their first contribution in PR #18710
@marukaz made their first contribution in PR #18711
@rohitravirane made their first contribution in PR #18712
@lizzzcai made their first contribution in PR #18714
@hkd987 made their first contribution in PR #18673
@Mr-Pepe made their first contribution in PR #18674
@gkarthi-signoz made their first contribution in PR #18726
@Tianduo16 made their first contribution in PR #18723
@wilsonjr made their first contribution in PR #18721
@abliteration-ai made their first contribution in PR #18678
@danialkhan02 made their first contribution in PR #18770
@ihower made their first contribution in PR #18409
@elkkhan made their first contribution in PR #18391
@runixer made their first contribution in PR #18435
@choby-shun made their first contribution in PR #18776
@jutaz made their first contribution in PR #18853
@sjmatta made their first contribution in PR #18250
@andres-ortizl made their first contribution in PR #18856
@gauthiermartin made their first contribution in PR #18844
@mel2oo made their first contribution in PR #18845
@DominikHallab made their first contribution in PR #18846
@ji-chuan-che made their first contribution in PR #18540
@raghav-stripe made their first contribution in PR #18858
@akraines made their first contribution in PR #18629
@otaviofbrito made their first contribution in PR #18665
@chetanchoudhary-sumo made their first contribution in PR #18587
@pascalwhoop made their first contribution in PR #13328
@orgersh92 made their first contribution in PR #18652
@DevajMody made their first contribution in PR #18497
@matt-greathouse made their first contribution in PR #18247
@emerzon made their first contribution in PR #18290
@Eric84626 made their first contribution in PR #18281
@LukasdeBoer made their first contribution in PR #18055
@LingXuanYin made their first contribution in PR #18513
@krisxia0506 made their first contribution in PR #18698
@LouisShark made their first contribution in PR #18414

Full Changelog

View complete changelog on GitHub

v1.80.11-stable - Google Interactions API

2025-12-20T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.11-stable

pip install litellm
pip install litellm==1.80.11

Key Highlights

Gemini 3 Flash Preview - Day 0 support for Google's Gemini 3 Flash Preview with reasoning capabilities
Stability AI Image Generation - New provider for Stability AI image generation and editing
LiteLLM Content Filter - Built-in guardrails for harmful content, bias, and PII detection with image support
New Provider: Venice.ai - Support for Venice.ai API via providers.json
Unified Skills API - Skills API works across Anthropic, Vertex, Azure, and Bedrock
Azure Sentinel Logging - New logging integration for Azure Sentinel
Guardrails Load Balancing - Load balance between multiple guardrail providers
Email Budget Alerts - Send email notifications when budgets are reached
Cloudzero Integration on UI - Setup your Cloudzero Integration Directly on the UI

Cloudzero Integration on UI

Users can now configure their Cloudzero Integration directly on the UI.

Performance: 50% Reduction in Memory Usage and Import Latency for the LiteLLM SDK

We've completely restructured litellm.__init__.py to defer heavy imports until they're actually needed, implementing lazy loading for 109 components.

This refactoring includes 41 provider config classes, 40 utility functions, cache implementations (Redis, DualCache, InMemoryCache), HTTP handlers, logging, types, and other heavy dependencies. Heavy libraries like tiktoken and boto3 are now loaded on-demand rather than eagerly at import time.

This makes LiteLLM especially beneficial for serverless functions, Lambda deployments, and containerized environments where cold start times and memory footprint matter.

New Providers and Endpoints

New Providers (5 new providers)

Provider	Supported LiteLLM Endpoints	Description
Stability AI	`/images/generations`, `/images/edits`	Stable Diffusion 3, SD3.5, image editing and generation
Venice.ai	`/chat/completions`, `/messages`, `/responses`	Venice.ai API integration via providers.json
Pydantic AI Agents	`/a2a`	Pydantic AI agents for A2A protocol workflows
VertexAI Agent Engine	`/a2a`	Google Vertex AI Agent Engine for agentic workflows
LinkUp Search	`/search`	LinkUp web search API integration

New LLM API Endpoints (2 new endpoints)

Endpoint	Method	Description	Documentation
`/interactions`	POST	Google Interactions API for conversational AI	Docs
`/search`	POST	RAG Search API with rerankers	Docs

New Models / Updated Models

New Model Support (55+ new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Gemini	`gemini/gemini-3-flash-preview`	1M	$0.50	$3.00	Reasoning, vision, audio, video, PDF
Vertex AI	`vertex_ai/gemini-3-flash-preview`	1M	$0.50	$3.00	Reasoning, vision, audio, video, PDF
Azure AI	`azure_ai/deepseek-v3.2`	164K	$0.58	$1.68	Reasoning, function calling, caching
Azure AI	`azure_ai/cohere-rerank-v4.0-pro`	32K	$0.0025/query	-	Rerank
Azure AI	`azure_ai/cohere-rerank-v4.0-fast`	32K	$0.002/query	-	Rerank
OpenRouter	`openrouter/openai/gpt-5.2`	400K	$1.75	$14.00	Reasoning, vision, caching
OpenRouter	`openrouter/openai/gpt-5.2-pro`	400K	$21.00	$168.00	Reasoning, vision
OpenRouter	`openrouter/mistralai/devstral-2512`	262K	$0.15	$0.60	Function calling
OpenRouter	`openrouter/mistralai/ministral-3b-2512`	131K	$0.10	$0.10	Function calling, vision
OpenRouter	`openrouter/mistralai/ministral-8b-2512`	262K	$0.15	$0.15	Function calling, vision
OpenRouter	`openrouter/mistralai/ministral-14b-2512`	262K	$0.20	$0.20	Function calling, vision
OpenRouter	`openrouter/mistralai/mistral-large-2512`	262K	$0.50	$1.50	Function calling, vision
OpenAI	`gpt-4o-transcribe-diarize`	16K	$6.00/audio	-	Audio transcription with diarization
OpenAI	`gpt-image-1.5-2025-12-16`	-	Various	Various	Image generation
Stability	`stability/sd3-large`	-	-	$0.065/image	Image generation
Stability	`stability/sd3.5-large`	-	-	$0.065/image	Image generation
Stability	`stability/stable-image-ultra`	-	-	$0.08/image	Image generation
Stability	`stability/inpaint`	-	-	$0.005/image	Image editing
Stability	`stability/outpaint`	-	-	$0.004/image	Image editing
Bedrock	`stability.stable-conservative-upscale-v1:0`	-	-	$0.40/image	Image upscaling
Bedrock	`stability.stable-creative-upscale-v1:0`	-	-	$0.60/image	Image upscaling
Vertex AI	`vertex_ai/deepseek-ai/deepseek-ocr-maas`	-	$0.30	$1.20	OCR
LinkUp	`linkup/search`	-	$5.87/1K queries	-	Web search
LinkUp	`linkup/search-deep`	-	$58.67/1K queries	-	Deep web search
GitHub Copilot	20+ models	Various	-	-	Chat completions

Features

Gemini
- Add Gemini 3 Flash Preview day 0 support with reasoning - PR #18135
- Support extra_headers in batch embeddings - PR #18004
- Propagate token usage when generating images - PR #17987
- Use JSON instead of form-data for image edit requests - PR #18012
- Fix web search requests count - PR #17921
Anthropic
- Use dynamic max_tokens based on model - PR #17900
- Fix claude-3-7-sonnet max_tokens to 64K default - PR #17979
- Add OpenAI-compatible API with modify_params=True - PR #17106
Vertex AI
- Add Gemini 3 Flash Preview support - PR #18164
- Add reasoning support for gemini-3-flash-preview - PR #18175
- Fix image edit credential source - PR #18121
- Pass credentials to PredictionServiceClient for custom endpoints - PR #17757
- Fix multimodal embeddings for text + base64 image combinations - PR #18172
- Add OCR support for DeepSeek model - PR #17971
Azure AI
- Add Azure Cohere 4 reranking models - PR #17961
- Add Azure DeepSeek V3.2 versions - PR #18019
- Return AzureAnthropicConfig for Claude models in get_provider_chat_config - PR #18086
Fireworks AI
- Add reasoning param support for Fireworks AI models - PR #17967
Bedrock
- Add Qwen 2 and Qwen 3 to get_bedrock_model_id - PR #18100
- Remove ttl field when routing to bedrock - PR #18049
- Add Bedrock Stability image edit models - PR #18254
Perplexity
- Use API-provided cost instead of manual calculation - PR #17887
OpenAI
- Add diarize model for audio transcription - PR #18117
- Add gpt-image-1.5-2025-12-16 in model cost map - PR #18107
- Fix cost calculation of gpt-image-1 model - PR #17966
GitHub Copilot
- Add github_copilot model info - PR #17858
Custom LLM
- Add image_edit and aimage_edit support - PR #17999

Bug Fixes

Gemini
- Fix pricing for Gemini 3 Flash on Vertex AI - PR #18202
- Add output_cost_per_image_token for gemini-2.5-flash-image models - PR #18156
- Fix properties should be non-empty for OBJECT type - PR #18237
Qwen
- Add qwen3-embedding-8b input per token price - PR #18018
General
- Fix image URL handling - PR #18139
- Support Signed URLs with Query Parameters in Image Processing - PR #17976
- Add none to encoding_format instead of omitting it - PR #18042

LLM API Endpoints

Features

Responses API
- Add provider specific tools support - PR #17980
- Add custom headers support - PR #18036
- Fix tool calls transformation in completion bridge - PR #18226
- Use list format with input_text for tool results - PR #18257
- Add cost tracking in background mode - PR #18236
- Fix Claude code responses API bridge errors - PR #18194
Chat Completions API
- Add support for agent skills - PR #18031
Skills API
- Unified Skills API works across Anthropic, Vertex, Azure, Bedrock - PR #18232
Search API
- Add new RAG Search API with rerankers - PR #18217
Interactions API
- Add Google Interactions API on SDK and AI Gateway - PR #18079, PR #18081
Image Edit API
- Add drop_params support and fix Vertex AI config - PR #18077
General
- Skip adding beta headers for Vertex AI as it is not supported - PR #18037
- Fix managed files endpoint - PR #18046
- Allow base_model for non-Azure providers in proxy - PR #18038

Bugs

General
- Fix basemodel import in guardrail translation - PR #17977
- Fix No module named 'fastapi' error - PR #18239

Management Endpoints / UI

Features

Virtual Keys
- Add master key rotation for credentials table - PR #17952
- Fix tag management to preserve encrypted fields in litellm_params - PR #17484
- Fix key delete and regenerate permissions - PR #18214
Models + Endpoints
- Add Models Conditional Rendering in UI - PR #18071
- Add Health Check Model for Wildcard Model in UI - PR #18269
- Auto Resolve Vector Store Embedding Model Config - PR #18167
Vector Stores
- Add Milvus Vector Store UI support - PR #18030
- Persist Vector Store Settings in Team Update - PR #18274
Logs & Spend
- Add LiteLLM Overhead to Logs - PR #18033
- Show LiteLLM Overhead in Logs UI - PR #18034
- Resolve Team ID to Team Alias in Usage Page - PR #18275
- Fix Usage Page Top Key View Button Visibility - PR #18203
SSO & Health
- Add SSO Readiness Health Check - PR #18078
- Fix /health/test_connection to resolve env variables like /chat/completions - PR #17752
CloudZero
- Add CloudZero Cost Tracking UI - PR #18163
- Add Delete CloudZero Settings Route and UI - PR #18168, PR #18170
General
- Update UI path handling for non-root Docker - PR #17989

Bugs

UI Fixes
- Fix Login Page Failed To Parse JSON Error - PR #18159
- Fix new user route user_id collision handling - PR #17559
- Fix Callback Environment Variables Casing - PR #17912

AI Integrations

Logging

Azure Sentinel
- Add new Azure Sentinel Logger integration - PR #18146
Prometheus
- Add extraction of top level metadata for custom labels - PR #18087
Langfuse
- Fix not working log_failure_event - PR #18234
Arize Phoenix
- Fix nested spans - PR #18102
General
- Change extra_headers to additional_headers - PR #17950

Guardrails

LiteLLM Content Filter
- Add built-in guardrails for harmful content, bias, etc. - PR #18029
- Add support for running content filters on images - PR #18044
- Add support for Brazil PII field - PR #18076
- Add configurable guardrail options for content filtering - PR #18007
Guardrails API
- Support LLM tool call response checks on /chat/completions, /v1/responses, /v1/messages - PR #17619
- Add guardrails load balancing - PR #18181
- Fix guardrails for passthrough endpoint - PR #18109
- Add headers to metadata for guardrails on pass-through endpoints - PR #17992
- Various fixes for guardrail on OpenRouter models - PR #18085
Lakera
- Add monitor mode for Lakera - PR #18084
Pillar Security
- Add masking support and MCP call support - PR #17959
Bedrock Guardrails
- Add support for Bedrock image guardrails - PR #18115
- Guardrails block action takes precedence over masking - PR #17968

Secret Managers

HashiCorp Vault
- Add documentation for configurable Vault mount - PR #18082
- Add per-team Vault configuration - PR #18150
UI
- Add secret manager settings controls to team management UI - PR #18149

Spend Tracking, Budgets and Rate Limiting

Email Budget Alerts - Send email notifications when budgets are reached - PR #17995

MCP Gateway

Auth Header Propagation - Add MCP auth header propagation - PR #17963
Fix deepcopy error - Fix MCP tool call deepcopy error when processing requests - PR #18010
Fix list tool - Fix MCP list_tools not working without database connection - PR #18161

Agent Gateway (A2A)

New Provider: Agent Gateway - Add pydantic ai agents support - PR #18013
VertexAI Agent Engine - Add Vertex AI Agent Engine provider - PR #18014
Fix model extraction - Fix get_model_from_request() to extract model ID from Vertex AI passthrough URLs - PR #18097

Performance / Loadbalancing / Reliability improvements

Lazy Imports - Use per-attribute lazy imports and extract shared constants - PR #17994
Lazy Load HTTP Handlers - Lazy load http handlers - PR #17997
Lazy Load Caches - Lazy load caches - PR #18001
Lazy Load Types - Lazy load bedrock types, .types.utils, GuardrailItem - PR #18053, PR #18054, PR #18072
Lazy Load Configs - Lazy load 41 configuration classes - PR #18267
Lazy Load Client Decorators - Lazy load heavy client decorator imports - PR #18064
Prisma Build Time - Download Prisma binaries at build time instead of runtime for security restricted environments - PR #17695
Docker Alpine - Add libsndfile to Alpine image for ARM64 audio processing - PR #18092
Security - Prevent LiteLLM API key leakage on /health endpoint failures - PR #18133

Documentation Updates

SAP Docs - Update SAP documentation - PR #17974
Pydantic AI Agents - Add docs on using pydantic ai agents with LiteLLM A2A gateway - PR #18026
Vertex AI Agent Engine - Add Vertex AI Agent Engine documentation - PR #18027
Router Order - Add router order parameter documentation - PR #18045
Secret Manager Settings - Improve secret manager settings documentation - PR #18235
Gemini 3 Flash - Add version requirement in Gemini 3 Flash blog - PR #18227
README - Expand Responses API section and update endpoints - PR #17354
Amazon Nova - Add Amazon Nova to sidebar and supported models - PR #18220
Benchmarks - Add infrastructure recommendations to benchmarks documentation - PR #18264
Broken Links - Fix broken link corrections - PR #18104
README Fixes - Various README improvements - PR #18206

Infrastructure / CI/CD

PR Templates - Add LiteLLM team PR template and CI/CD rules - PR #17983, PR #17985
Issue Labeling - Improve issue labeling with component dropdown and more provider keywords - PR #17957
PR Template Cleanup - Remove redundant fields from PR template - PR #17956
Dependencies - Bump altcha-lib from 1.3.0 to 1.4.1 - PR #18017

New Contributors

@dongbin-lunark made their first contribution in PR #17757
@qdrddr made their first contribution in PR #18004
@donicrosby made their first contribution in PR #17962
@NicolaivdSmagt made their first contribution in PR #17992
@Reapor-Yurnero made their first contribution in PR #18085
@jk-f5 made their first contribution in PR #18086
@castrapel made their first contribution in PR #18077
@dtikhonov made their first contribution in PR #17484
@opleonnn made their first contribution in PR #18175
@eurogig made their first contribution in PR #18084

Full Changelog

View complete changelog on GitHub

[Preview] v1.80.10.rc.1 - Agent Gateway: Azure Foundry & Bedrock AgentCore

2025-12-13T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.10.rc.1

pip install litellm
pip install litellm==1.80.10

Key Highlights

Agent (A2A) Gateway with Cost Tracking - Track agent costs per query, per token pricing, and view agent usage in the dashboard
2 New Agent Providers - LangGraph Agents and Azure AI Foundry Agents for agentic workflows
New Provider: SAP Gen AI Hub - Full support for SAP Generative AI Hub with chat completions
New Bedrock Writer Models - Add Palmyra-X4 and Palmyra-X5 models on Bedrock
OpenAI GPT-5.2 Models - Full support for GPT-5.2, GPT-5.2-pro, and Azure GPT-5.2 models with reasoning support
227 New Fireworks AI Models - Comprehensive model coverage for Fireworks AI platform
MCP Support on /chat/completions - Use MCP servers directly via chat completions endpoint
Performance Improvements - Reduced memory leaks by 50%

Agent Gateway - 4 New Agent Providers

This release adds support for agents from the following providers:

LangGraph Agents - Deploy and manage LangGraph-based agents
Azure AI Foundry Agents - Enterprise agent deployments on Azure
Bedrock AgentCore - AWS Bedrock agent integration
A2A Agents - Agent-to-Agent protocol support

AI Gateway admins can now add agents from any of these providers, and developers can invoke them through a unified interface using the A2A protocol.

For all agent requests running through the AI Gateway, LiteLLM automatically tracks request/response logs, cost, and token usage.

Agent (A2A) Usage UI

Users can now filter usage statistics by agents, providing the same granular filtering capabilities available for teams, organizations, and customers.

Details:

Filter usage analytics, spend logs, and activity metrics by agent ID
View breakdowns on a per-agent basis
Consistent filtering experience across all usage and analytics views

New Providers and Endpoints

New Providers (5 new providers)

Provider	Supported LiteLLM Endpoints	Description
SAP Gen AI Hub	`/chat/completions`, `/messages`, `/responses`	SAP Generative AI Hub integration for enterprise AI
LangGraph	`/chat/completions`, `/messages`, `/responses`, `/a2a`	LangGraph agents for agentic workflows
Azure AI Foundry Agents	`/chat/completions`, `/messages`, `/responses`, `/a2a`	Azure AI Foundry Agents for enterprise agent deployments
Voyage AI Rerank	`/rerank`	Voyage AI rerank models support
Fireworks AI Rerank	`/rerank`	Fireworks AI rerank endpoint support

New LLM API Endpoints (4 new endpoints)

Endpoint	Method	Description	Documentation
`/containers/{id}/files`	GET	List files in a container	Docs
`/containers/{id}/files/{file_id}`	GET	Retrieve container file metadata	Docs
`/containers/{id}/files/{file_id}`	DELETE	Delete a file from a container	Docs
`/containers/{id}/files/{file_id}/content`	GET	Retrieve container file content	Docs

New Models / Updated Models

New Model Support (270+ new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5.2`	400K	$1.75	$14.00	Reasoning, vision, PDF, caching
OpenAI	`gpt-5.2-pro`	400K	$21.00	$168.00	Reasoning, web search, vision
Azure	`azure/gpt-5.2`	400K	$1.75	$14.00	Reasoning, vision, PDF, caching
Azure	`azure/gpt-5.2-pro`	400K	$21.00	$168.00	Reasoning, web search
Bedrock	`us.writer.palmyra-x4-v1:0`	128K	$2.50	$10.00	Function calling, PDF input
Bedrock	`us.writer.palmyra-x5-v1:0`	1M	$0.60	$6.00	Function calling, PDF input
Bedrock	`eu.anthropic.claude-opus-4-5-20251101-v1:0`	200K	$5.00	$25.00	Reasoning, computer use, vision
Bedrock	`google.gemma-3-12b-it`	128K	$0.10	$0.30	Audio input
Bedrock	`moonshot.kimi-k2-thinking`	128K	$0.60	$2.50	Reasoning
Bedrock	`nvidia.nemotron-nano-12b-v2`	128K	$0.20	$0.60	Vision
Bedrock	`qwen.qwen3-next-80b-a3b`	128K	$0.15	$1.20	Function calling
Vertex AI	`vertex_ai/deepseek-ai/deepseek-v3.2-maas`	164K	$0.56	$1.68	Reasoning, caching
Mistral	`mistral/codestral-2508`	256K	$0.30	$0.90	Function calling
Mistral	`mistral/devstral-2512`	256K	$0.40	$2.00	Function calling
Mistral	`mistral/labs-devstral-small-2512`	256K	$0.10	$0.30	Function calling
Cerebras	`cerebras/zai-glm-4.6`	128K	-	-	Chat completions
NVIDIA NIM	`nvidia_nim/ranking/nvidia/llama-3.2-nv-rerankqa-1b-v2`	-	Free	Free	Rerank
Voyage	`voyage/rerank-2.5`	32K	$0.05/1K tokens	-	Rerank
Fireworks AI	227 new models	Various	Various	Various	Full model catalog

Features

OpenAI
- Add support for OpenAI GPT-5.2 models with reasoning_effort='xhigh' - PR #17836, PR #17875
- Include 'user' param for responses API models - PR #17648
- Use optimized async http client for text completions - PR #17831
Azure
- Add Azure GPT-5.2 models support - PR #17866
Azure AI
- Fix Azure AI Anthropic api-key header and passthrough cost calculation - PR #17656
- Remove unsupported params from Azure AI Anthropic requests - PR #17822
Anthropic
- Prevent duplicate tool_result blocks with same tool - PR #17632
- Handle partial JSON chunks in streaming responses - PR #17493
- Preserve server_tool_use and web_search_tool_result in multi-turn conversations - PR #17746
- Capture web_search_tool_result in streaming for multi-turn conversations - PR #17798
- Add retrieve batches and retrieve file content support - PR #17700
Bedrock
- Add new Bedrock OSS models to model list - PR #17638
- Add Bedrock Writer models (Palmyra-X4, Palmyra-X5) - PR #17685
- Add EU Claude Opus 4.5 model - PR #17897
- Add serviceTier support for Converse API - PR #17810
- Fix header forwarding with custom API for Bedrock embeddings - PR #17872
Gemini
- Add support for computer use for Gemini - PR #17756
- Handle context window errors - PR #17751
- Add speechConfig to GenerationConfig for Gemini TTS - PR #17851
Vertex AI
- Add DeepSeek-V3.2 model support - PR #17770
- Preserve systemInstructions for generate content request - PR #17803
Mistral
- Add Codestral 2508, Devstral 2512 models - PR #17801
Cerebras
- Add zai-glm-4.6 model support - PR #17683
- Fix context window errors not recognized - PR #17587
DeepSeek
- Add native support for thinking and reasoning_effort params - PR #17712
NVIDIA NIM Rerank
- Add llama-3.2-nv-rerankqa-1b-v2 rerank model - PR #17670
Fireworks AI
- Add 227 new Fireworks AI models - PR #17692
Dashscope
- Fix default base_url error - PR #17584

Bug Fixes

Anthropic
- Fix missing content in Anthropic to OpenAI conversion - PR #17693
- Avoid error when we have just the tool_calls in input - PR #17753
Azure
- Fix error about encoding video id for Azure - PR #17708
Azure AI
- Fix LLM provider for azure_ai in model map - PR #17805
Watsonx
- Fix Watsonx Audio Transcription to only send supported params to API - PR #17840
Router
- Handle tools=None in completion requests - PR #17684
- Add minimum request threshold for error rate cooldown - PR #17464

LLM API Endpoints

Features

Responses API
- Add usage details in responses usage object - PR #17641
- Fix error for response API polling - PR #17654
- Fix streaming tool_calls being dropped when text + tool_calls - PR #17652
- Transform image content in tool results for Responses API - PR #17799
- Fix responses api not applying tpm rate limits on api keys - PR #17707
Containers API
- Allow using LIST, Create Containers using custom-llm-provider - PR #17740
- Add new container API file management + UI Interface - PR #17745
Rerank API
- Add support for forwarding client headers in /rerank endpoint - PR #17873
Files API
- Add support for expires_after param in Files endpoint - PR #17860
Video API
- Use litellm params for all videos APIs - PR #17732
- Respect videos content db creds - PR #17771
Embeddings API
- Fix handling token array input decoding for embeddings - PR #17468
Chat Completions API
- Add v0 target storage support - store files in Azure AI storage and use with chat completions API - PR #17758
generateContent API
- Support model names with slashes on Gemini generateContent endpoints - PR #17743
General
- Use audio content for caching - PR #17651
- Return 403 exception when calling GET responses API - PR #17629
- Add nested field removal support to additional_drop_params - PR #17711
- Async post_call_streaming_iterator_hook now properly iterates async generators - PR #17626

Bugs

General
- Fix handle string content in is_cached_message - PR #17853

Management Endpoints / UI

Features

UI Settings
- Add Get and Update Backend Routes for UI Settings - PR #17689
- UI Settings page implementation - PR #17697
- Ensure Model Page honors UI Settings - PR #17804
- Add All Proxy Models to Default User Settings - PR #17902
Agent & Usage UI
- Daily Agent Usage Backend - PR #17781
- Agent Usage UI - PR #17797
- Add agent cost tracking on UI - PR #17899
- New Badge for Agent Usage - PR #17883
- Usage Entity labels for filtering - PR #17896
- Agent Usage Page minor fixes - PR #17901
- Usage Page View Select component - PR #17854
- Usage Page Components refactor - PR #17848
Logs & Spend
- Enhanced spend analytics in logs view - PR #17623
- Add user info delete modal for user management - PR #17625
- Show request and response details in logs view - PR #17928
Virtual Keys
- Fix x-litellm-key-spend header update - PR #17864
Models & Endpoints
- Model Hub Useful Links Rearrange - PR #17859
- Create Team Model Dropdown honors Organization's Models - PR #17834
SSO & Auth
- Allow upserting user role when SSO provider role changes - PR #17754
- Allow fetching role from generic SSO provider (Keycloak) - PR #17787
- JWT Auth - allow selecting team_id from request header - PR #17884
- Remove SSO Config Values from Config Table on SSO Update - PR #17668
Teams
- Attach team to org table - PR #17832
- Expose the team alias when authenticating - PR #17725
MCP Server Management
- Add extra_headers and allowed_tools to UpdateMCPServerRequest - PR #17940
Notifications
- Show progress and pause on hover for Notifications - PR #17942
General
- Allow Root Path to Redirect when Docs not on Root Path - PR #16843
- Show UI version number on top left near logo - PR #17891
- Re-organize left navigation with correct categories and agents on root - PR #17890
- UI Playground - allow custom model names in model selector dropdown - PR #17892

Bugs

UI Fixes
- Fix links + old login page deprecation message - PR #17624
- Filtering for Chat UI Endpoint Selector - PR #17567
- Race Condition Handling in SCIM v2 - PR #17513
- Make /litellm_model_cost_map public - PR #16795
- Custom Callback on UI - PR #17522
- Add User Writable Directory to Non Root Docker for Logo - PR #17180
- Swap URL Input and Display Name inputs - PR #17682
- Change deprecation banner to only show on /sso/key/generate - PR #17681
- Change credential encryption to only affect db credentials - PR #17741
Auth & Routes
- Return 403 instead of 503 for unauthorized routes - PR #17723
- AI Gateway Auth - allow using wildcard patterns for public routes - PR #17686

AI Integrations

New Integrations (4 new integrations)

Integration	Type	Description
SumoLogic	Logging	Native webhook integration for SumoLogic - PR #17630
Arize Phoenix	Prompt Management	Arize Phoenix OSS prompt management integration - PR #17750
Sendgrid	Email	Sendgrid email notifications integration - PR #17775
Onyx	Guardrails	Onyx guardrail hooks integration - PR #16591

Logging

Langfuse
- Propagate Langfuse trace_id - PR #17669
- Prefer standard trace id for Langfuse logging - PR #17791
- Move query params to create_pass_through_route call in Langfuse passthrough - PR #17660
- Add support for custom masking function - PR #17826
Prometheus
- Add 'exception_status' to prometheus logger - PR #17847
OpenTelemetry
- Add latency metrics (TTFT, TPOT, Total Generation Time) to OTEL payload - PR #17888
General
- Add polling via cache feature for async logging - PR #16862

Guardrails

HiddenLayer
- Add HiddenLayer Guardrail Hooks - PR #17728
Pillar Security
- Add opt-in evidence results for Pillar Security guardrail during monitoring - PR #17812
PANW Prisma AIRS
- Add configurable fail-open, timeout, and app_user tracking - PR #17785
Presidio
- Add support for configurable confidence score thresholds and scope in Presidio PII masking - PR #17817
LiteLLM Content Filter
- Mask all regex pattern matches, not just first - PR #17727
Regex Guardrails
- Add enhanced regex pattern matching for guardrails - PR #17915
Gray Swan Guardrail
- Add passthrough mode for model response - PR #17102

Prompt Management

General
- New API for integrating prompt management providers - PR #17829

Spend Tracking, Budgets and Rate Limiting

Service Tier Pricing - Extract service_tier from response/usage for OpenAI flex pricing - PR #17748
Agent Cost Tracking - Track agent_id in SpendLogs - PR #17795
Tag Activity - Deduplicate /tag/daily/activity metadata - PR #16764
Rate Limiting - Dynamic Rate Limiter - allow specifying ttl for in memory cache - PR #17679

MCP Gateway

Chat Completions Integration - Add support for using MCPs on /chat/completions - PR #17747
UI Session Permissions - Fix UI session MCP permissions across real teams - PR #17620
OAuth Callback - Fix MCP OAuth callback routing and URL handling - PR #17789
Tool Name Prefix - Fix MCP tool name prefix - PR #17908

Agent Gateway (A2A)

Cost Per Query - Add cost per query for agent invocations - PR #17774
Token Counting - Add token counting non streaming + streaming - PR #17779
Cost Per Token - Add cost per token pricing for A2A - PR #17780
LangGraph Provider - Add LangGraph provider for Agent Gateway - PR #17783
Bedrock & LangGraph Agents - Allow using Bedrock AgentCore, LangGraph agents with A2A Gateway - PR #17786
Agent Management - Allow adding LangGraph, Bedrock Agent Core agents - PR #17802
Azure Foundry Agents - Add Azure AI Foundry Agents support - PR #17845
Azure Foundry UI - Allow adding Azure Foundry Agents on UI - PR #17909
Azure Foundry Fixes - Ensure Azure Foundry agents work correctly - PR #17943

Performance / Loadbalancing / Reliability improvements

Memory Leak Fix - Cut memory leak in half - PR #17784
Spend Logs Memory - Reduce memory accumulation of spend_logs - PR #17742
Router Optimization - Replace time.perf_counter() with time.time() - PR #17881
Filter Internal Params - Filter internal params in fallback code - PR #17941
Gunicorn Suggestion - Suggest Gunicorn instead of uvicorn when using max_requests_before_restart - PR #17788
Pydantic Warnings - Mitigate PydanticDeprecatedSince20 warnings - PR #17657
Python 3.14 Support - Add Python 3.14 support via grpcio version constraints - PR #17666
OpenAI Package - Bump openai package to 2.9.0 - PR #17818

Documentation Updates

Contributing - Update clone instructions to recommend forking first - PR #17637
Getting Started - Improve Getting Started page and SDK documentation structure - PR #17614
JSON Mode - Make it clearer how to get Pydantic model output - PR #17671
drop_params - Update litellm docs for drop_params - PR #17658
Environment Variables - Document missing environment variables and fix incorrect types - PR #17649
SumoLogic - Add SumoLogic integration documentation - PR #17647
SAP Gen AI - Add SAP Gen AI provider documentation - PR #17667
Authentication - Add Note for Authentication - PR #17733
Known Issues - Adding known issues to 1.80.5-stable docs - PR #17738
Supported Endpoints - Fix Supported Endpoints page - PR #17710
Token Count - Document token count endpoint - PR #17772
Overview - Made litellm proxy and SDK difference cleaner in overview with a table - PR #17790
Containers API - Add docs for containers files API + code interpreter on LiteLLM - PR #17749
Target Storage - Add documentation for target storage - PR #17882
Agent Usage - Agent Usage documentation - PR #17931, PR #17932, PR #17934
Cursor Integration - Cursor Integration documentation - PR #17855, PR #17939
A2A Cost Tracking - A2A cost tracking docs - PR #17913
Azure Search - Update azure search docs - PR #17726
Milvus Client - Fix milvus client docs - PR #17736
Streaming Logging - Remove streaming logging doc - PR #17739
Integration Docs - Update integration docs location - PR #17644
Links - Updated docs links for mistral and anthropic - PR #17852
Community - Add community doc link - PR #17734
Pricing - Update pricing for global.anthropic.claude-haiku-4-5-20251001-v1:0 - PR #17703
gpt-image-1-mini - Correct model type for gpt-image-1-mini - PR #17635

Infrastructure / Deployment

Docker - Use python instead of wget for healthcheck in docker-compose.yml - PR #17646
Helm Chart - Add extraResources support for Helm chart deployments - PR #17627
Helm Versioning - Add semver prerelease suffix to helm chart versions - PR #17678
Database Schema - Add storage_backend and storage_url columns to schema.prisma for target storage feature - PR #17936

New Contributors

@xianzongxie-stripe made their first contribution in PR #16862
@krisxia0506 made their first contribution in PR #17637
@chetanchoudhary-sumo made their first contribution in PR #17630
@kevinmarx made their first contribution in PR #17632
@expruc made their first contribution in PR #17627
@rcII made their first contribution in PR #17626
@tamirkiviti13 made their first contribution in PR #16591
@Eric84626 made their first contribution in PR #17629
@vasilisazayka made their first contribution in PR #16053
@juliettech13 made their first contribution in PR #17663
@jason-nance made their first contribution in PR #17660
@yisding made their first contribution in PR #17671
@emilsvennesson made their first contribution in PR #17656
@kumekay made their first contribution in PR #17646
@chenzhaofei01 made their first contribution in PR #17584
@shivamrawat1 made their first contribution in PR #17733
@ephrimstanley made their first contribution in PR #17723
@hwittenborn made their first contribution in PR #17743
@peterkc made their first contribution in PR #17727
@saisurya237 made their first contribution in PR #17725
@Ashton-Sidhu made their first contribution in PR #17728
@CyrusTC made their first contribution in PR #17810
@jichmi made their first contribution in PR #17703
@ryan-crabbe made their first contribution in PR #17852
@nlineback made their first contribution in PR #17851
@butnarurazvan made their first contribution in PR #17468
@yoshi-p27 made their first contribution in PR #17915

Full Changelog

View complete changelog on GitHub

v1.80.8-stable - Introducing A2A Agent Gateway

2025-12-06T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.8-stable

pip install litellm
pip install litellm==1.80.8

Key Highlights

Agent Gateway (A2A) - Invoke agents through the AI Gateway with request/response logging and access controls
Guardrails API v2 - Generic Guardrail API with streaming support, structured messages, and tool call checks
Customer (End User) Usage UI - Track and visualize end-user spend directly in the dashboard
vLLM Batch + Files API - Support for batch and files API with vLLM deployments
Dynamic Rate Limiting on Teams - Enable dynamic rate limits and priority reservation on team-level
Google Cloud Chirp3 HD - New text-to-speech provider with Chirp3 HD voices

Agent Gateway (A2A)

This release introduces A2A Agent Gateway for LiteLLM, allowing you to invoke and manage A2A agents with the same controls you have for LLM APIs.

As a LiteLLM Gateway Admin, you can now do the following:

Request/Response Logging - Every agent invocation is logged to the Logs page with full request and response tracking.
Access Control - Control which Team/Key can access which agents.

As a developer, you can continue using the A2A SDK, all you need to do is point you A2AClient to the LiteLLM proxy URL and your API key.

Works with the A2A SDK:

from a2a.client import A2AClient

client = A2AClient(
    base_url="http://localhost:4000",  # Your LiteLLM proxy
    api_key="sk-1234"                   # LiteLLM API key
)

response = client.send_message(
    agent_id="my-agent",
    message="What's the status of my order?"
)

Get started with Agent Gateway here: Agent Gateway Documentation

Customer (End User) Usage UI

Users can now filter usage statistics by customers, providing the same granular filtering capabilities available for teams and organizations.

Details:

Filter usage analytics, spend logs, and activity metrics by customer ID
View customer-level breakdowns alongside existing team and user-level filters
Consistent filtering experience across all usage and analytics views

New Providers and Endpoints

New Providers (5 new providers)

Provider	Supported LiteLLM Endpoints	Description
Z.AI (Zhipu AI)	`/v1/chat/completions`, `/v1/responses`, `/v1/messages`	Built-in support for Zhipu AI GLM models
RAGFlow	`/v1/chat/completions`, `/v1/responses`, `/v1/messages`, `/v1/vector_stores`	RAG-based chat completions with vector store support
PublicAI	`/v1/chat/completions`, `/v1/responses`, `/v1/messages`	OpenAI-compatible provider via JSON config
Google Cloud Chirp3 HD	`/v1/audio/speech`, `/v1/audio/speech/stream`	Text-to-speech with Google Cloud Chirp3 HD voices

New LLM API Endpoints (2 new endpoints)

Endpoint	Method	Description	Documentation
`/v1/agents/invoke`	POST	Invoke A2A agents through the AI Gateway	Agent Gateway
`/cursor/chat/completions`	POST	Cursor BYOK endpoint - accepts Responses API input, returns Chat Completions output	Cursor Integration

New Models / Updated Models

New Model Support (33 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5.1-codex-max`	400K	$1.25	$10.00	Reasoning, vision, PDF input, responses API
Azure	`azure/gpt-5.1-codex-max`	400K	$1.25	$10.00	Reasoning, vision, PDF input, responses API
Anthropic	`claude-opus-4-5`	200K	$5.00	$25.00	Computer use, reasoning, vision
Bedrock	`global.anthropic.claude-opus-4-5-20251101-v1:0`	200K	$5.00	$25.00	Computer use, reasoning, vision
Bedrock	`amazon.nova-2-lite-v1:0`	1M	$0.30	$2.50	Reasoning, vision, video, PDF input
Bedrock	`amazon.titan-image-generator-v2:0`	-	-	$0.008/image	Image generation
Fireworks	`fireworks_ai/deepseek-v3p2`	164K	$1.20	$1.20	Function calling, response schema
Fireworks	`fireworks_ai/kimi-k2-instruct-0905`	262K	$0.60	$2.50	Function calling, response schema
DeepSeek	`deepseek/deepseek-v3.2`	164K	$0.28	$0.40	Reasoning, function calling
Mistral	`mistral/mistral-large-3`	256K	$0.50	$1.50	Function calling, vision
Azure AI	`azure_ai/mistral-large-3`	256K	$0.50	$1.50	Function calling, vision
Moonshot	`moonshot/kimi-k2-0905-preview`	262K	$0.60	$2.50	Function calling, web search
Moonshot	`moonshot/kimi-k2-turbo-preview`	262K	$1.15	$8.00	Function calling, web search
Moonshot	`moonshot/kimi-k2-thinking-turbo`	262K	$1.15	$8.00	Function calling, web search
OpenRouter	`openrouter/deepseek/deepseek-v3.2`	164K	$0.28	$0.40	Reasoning, function calling
Databricks	`databricks/databricks-claude-haiku-4-5`	200K	$1.00	$5.00	Reasoning, function calling
Databricks	`databricks/databricks-claude-opus-4`	200K	$15.00	$75.00	Reasoning, function calling
Databricks	`databricks/databricks-claude-opus-4-1`	200K	$15.00	$75.00	Reasoning, function calling
Databricks	`databricks/databricks-claude-opus-4-5`	200K	$5.00	$25.00	Reasoning, function calling
Databricks	`databricks/databricks-claude-sonnet-4`	200K	$3.00	$15.00	Reasoning, function calling
Databricks	`databricks/databricks-claude-sonnet-4-1`	200K	$3.00	$15.00	Reasoning, function calling
Databricks	`databricks/databricks-gemini-2-5-flash`	1M	$0.30	$2.50	Function calling
Databricks	`databricks/databricks-gemini-2-5-pro`	1M	$1.25	$10.00	Function calling
Databricks	`databricks/databricks-gpt-5`	400K	$1.25	$10.00	Function calling
Databricks	`databricks/databricks-gpt-5-1`	400K	$1.25	$10.00	Function calling
Databricks	`databricks/databricks-gpt-5-mini`	400K	$0.25	$2.00	Function calling
Databricks	`databricks/databricks-gpt-5-nano`	400K	$0.05	$0.40	Function calling
Vertex AI	`vertex_ai/chirp`	-	$30.00/1M chars	-	Text-to-speech (Chirp3 HD)
Z.AI	`zai/glm-4.6`	200K	$0.60	$2.20	Function calling
Z.AI	`zai/glm-4.5`	128K	$0.60	$2.20	Function calling
Z.AI	`zai/glm-4.5v`	128K	$0.60	$1.80	Function calling, vision
Z.AI	`zai/glm-4.5-flash`	128K	Free	Free	Function calling
Vertex AI	`vertex_ai/bge-large-en-v1.5`	-	-	-	BGE Embeddings

Features

OpenAI
- Add gpt-5.1-codex-max model pricing and configuration - PR #17541
- Add xhigh reasoning effort for gpt-5.1-codex-max - PR #17585
- Add clear error message for empty LLM endpoint responses - PR #17445
Azure OpenAI
- Allow reasoning_effort='none' for Azure gpt-5.1 models - PR #17311
Anthropic
- Add claude-opus-4-5 alias to pricing data - PR #17313
- Parse blocks for opus 4.5 - PR #17534
- Update new Anthropic features as reviewed - PR #17142
- Skip empty text blocks in Anthropic system messages - PR #17442
Bedrock
- Add Nova embedding support - PR #17253
- Add support for Bedrock Qwen 2 imported model - PR #17461
- Bedrock OpenAI model support - PR #17368
- Add support for file content download for Bedrock batches - PR #17470
- Make streaming chunk size configurable in Bedrock API - PR #17357
- Add experimental latest-user filtering for Bedrock - PR #17282
- Handle Cohere v4 embed response dictionary format - PR #17220
- Remove not compatible beta header from Bedrock - PR #17301
- Add model price and details for Global Opus 4.5 Bedrock endpoint - PR #17380
Gemini (Google AI Studio + Vertex AI)
- Add better handling in image generation for Gemini models - PR #17292
- Fix reasoning_content showing duplicate content in streaming responses - PR #17266
- Handle partial JSON chunks after first valid chunk - PR #17496
- Fix Gemini 3 last chunk thinking block - PR #17403
- Fix Gemini image_tokens treated as text tokens in cost calculation - PR #17554
- Make sure that media resolution is only for Gemini 3 model - PR #17137
Vertex AI
- Add Google Cloud Chirp3 HD support on /speech - PR #17391
- Add BGE Embeddings support - PR #17362
- Handle global location for Vertex AI image generation endpoint - PR #17255
- Add Google Private API Endpoint to Vertex AI fields - PR #17382
Z.AI (Zhipu AI)
- Add Z.AI as built-in provider - PR #17307
GitHub Copilot
- Add Embedding API support - PR #17278
- Preserve encrypted_content in reasoning items for multi-turn conversations - PR #17130
Databricks
- Update Databricks model pricing and add new models - PR #17277
OVHcloud
- Add support of audio transcription for OVHcloud - PR #17305
Mistral
- Add Mistral Large 3 model support - PR #17547
Moonshot
- Fix missing Moonshot turbo models and fix incorrect pricing - PR #17432
Together AI
- Add context window exception mapping for Together AI - PR #17284
WatsonX
- Allow passing zen_api_key dynamically - PR #16655
- Fix Watsonx Audio Transcription API - PR #17326
- Fix audio transcriptions, don't force content type in request headers - PR #17546
Fireworks AI
- Add new model fireworks_ai/kimi-k2-instruct-0905 - PR #17328
- Add fireworks/deepseek-v3p2 - PR #17395
DeepSeek
- Support Deepseek 3.2 with Reasoning - PR #17384
Nova Lite 2
- Add Nova Lite 2 reasoning support with reasoningConfig - PR #17371
Ollama
- Fix auth not working with ollama.com - PR #17191
Groq
- Fix supports_response_schema before using json_tool_call workaround - PR #17438
vLLM
- Fix empty response + vLLM streaming - PR #17516
Azure AI
- Migrate Anthropic provider to Azure AI - PR #17202
- Fix GA path for Azure OpenAI realtime models - PR #17260
Bedrock TwelveLabs
- Add support for TwelveLabs Pegasus video understanding - PR #17193

Bug Fixes

Bedrock
- Fix extra_headers in messages API bedrock invoke - PR #17271
- Fix Bedrock models in model map - PR #17419
- Make Bedrock converse messages respect modify_params as expected - PR #17427
- Fix Anthropic beta headers for Bedrock imported Qwen models - PR #17467
- Preserve usage from JSON response for OpenAI provider in Bedrock - PR #17589
SambaNova
- Fix acompletion throws error with SambaNova models - PR #17217
General
- Fix AttributeError when metadata is null in request body - PR #17306
- Fix 500 error for malformed request - PR #17291
- Respect custom LLM provider in header - PR #17290
- Replace deprecated .dict() with .model_dump() in streaming_handler - PR #17359

LLM API Endpoints

Features

Responses API
- Add cost tracking for responses API - PR #17258
- Map output_tokens_details of responses API to completion_tokens_details - PR #17458
- Add image generation support for Responses API - PR #16586
Batch API
- Add vLLM batch+files API support - PR #15823
- Fix optional parameter default value - PR #17434
- Add status parameter as optional for FileObject - PR #17431
Video Generation API
- Add passthrough cost tracking for Veo - PR #17296
OCR API
- Add missing OCR and aOCR to CallTypes enum - PR #17435
General
- Support routing to only websearch supported deployments - PR #17500

Bugs

General
- Fix streaming error validation - PR #17242
- Add length validation for empty tool_calls in delta - PR #17523

Management Endpoints / UI

Features

New Login Page
- New Login Page UI - PR #17443
- Refactor /login route - PR #17379
- Add auto_redirect_to_sso to UI Config - PR #17399
- Add Auto Redirect to SSO to New Login Page - PR #17451
Customer (End User) Usage
- Customer (end user) Usage feature - PR #17498
- Customer Usage UI - PR #17506
- Add Info Banner for Customer Usage - PR #17598
Virtual Keys
- Standardize API Key vs Virtual Key in UI - PR #17325
- Add User Alias Column to Internal User Table - PR #17321
- Delete Credential Enhancements - PR #17317
Models + Endpoints
- Show all credential values on Edit Credential Modal - PR #17397
- Change Edit Team Models Shown to Match Create Team - PR #17394
- Support Images in Compare UI - PR #17562
Callbacks
- Show all callbacks on UI - PR #16335
- Credentials to use React Query - PR #17465
Management Routes
- Allow admin viewer to access global tag usage - PR #17501
- Allow wildcard routes for nonproxy admin (SCIM) - PR #17178
- Return 404 when a user is not found on /user/info - PR #16850
OCI Configuration
- Enable Oracle Cloud Infrastructure configuration via UI - PR #17159

Bugs

UI Fixes
- Fix Request and Response Panel JSONViewer - PR #17233
- Adding Button Loading States to Edit Settings - PR #17236
- Fix Various Text, button state, and test changes - PR #17237
- Fix Fallbacks Immediately Deleting before API resolves - PR #17238
- Remove Feature Flags - PR #17240
- Fix metadata tags and model name display in UI for Azure passthrough - PR #17258
- Change labeling around Vertex Fields - PR #17383
- Remove second scrollbar when sidebar is expanded + tooltip z index - PR #17436
- Fix Select in Edit Membership Modal - PR #17524
- Change useAuthorized Hook to redirect to new Login Page - PR #17553
SSO
- Fix the generic SSO provider - PR #17227
- Clear SSO integration for all users - PR #17287
- Fix SSO users not added to Entra synced team - PR #17331
Auth / JWT
- JWT Auth - Allow using regular OIDC flow with user info endpoints - PR #17324
- Fix litellm user auth not passing issue - PR #17342
- Add other routes in JWT auth - PR #17345
- Fix new org team validate against org - PR #17333
- Fix litellm_enterprise ensure imported routes exist - PR #17337
- Use organization.members instead of deprecated organization field - PR #17557
Organizations/Teams
- Fix organization max budget not enforced - PR #17334
- Fix budget update to allow null max_budget - PR #17545

AI Integrations (2 new integrations)

Logging (1 new integration)

New Integration

Weave
- Basic Weave OTEL integration - PR #17439

Improvements & Fixes

DataDog
- Fix Datadog callback regression when ddtrace is installed - PR #17393
Arize Phoenix
- Fix clean arize-phoenix traces - PR #16611
MLflow
- Fix MLflow streaming spans for Anthropic passthrough - PR #17288
Langfuse
- Fix Langfuse logger test mock setup - PR #17591
General
- Improve PII anonymization handling in logging callbacks - PR #17207

Guardrails (1 new integration)

New Integration

Generic Guardrail API
- Generic Guardrail API - allows guardrail providers to add INSTANT support for LiteLLM w/out PR to repo - PR #17175
- Guardrails API V2 - user api key metadata, session id, specify input type (request/response), image support - PR #17338
- Guardrails API - add streaming support - PR #17400
- Guardrails API - support tool call checks on OpenAI /chat/completions, OpenAI /responses, Anthropic /v1/messages - PR #17459
- Guardrails API - new structured_messages param - PR #17518
- Correctly map a v1/messages call to the anthropic unified guardrail - PR #17424
- Support during_call event type for unified guardrails - PR #17514

Improvements & Fixes

Noma Guardrail
- Refactor Noma guardrail to use shared Responses transformation and include system instructions - PR #17315
Presidio
- Handle empty content and error dict responses in guardrails - PR #17489
- Fix Presidio guardrail test TypeError and license base64 decoding error - PR #17538
Tool Permissions
- Add regex-based tool_name/tool_type matching for tool-permission - PR #17164
- Add images for tool permission guardrail documentation - PR #17322
AIM Guardrails
- Fix AIM guardrail tests - PR #17499
Bedrock Guardrails
- Fix Bedrock Guardrail indent and import - PR #17378
General Guardrails
- Mask all matching keywords in content filter - PR #17521
- Ensure guardrail metadata is preserved in request_data - PR #17593
- Fix apply_guardrail method and improve test isolation - PR #17555

Secret Managers

CyberArk
- Allow setting SSL verify to false - PR #17433
General
- Make email and secret manager operations independent in key management hooks - PR #17551

Spend Tracking, Budgets and Rate Limiting

Rate Limiting
- Parallel Request Limiter with /messages - PR #17426
- Allow using dynamic rate limit/priority reservation on teams - PR #17061
- Dynamic Rate Limiter - Fix token count increases/decreases by 1 instead of actual count + Redis TTL - PR #17558
Spend Logs
- Deprecate spend/logs & add spend/logs/v2 - PR #17167
- Optimize SpendLogs queries to use timestamp filtering for index usage - PR #17504
Enforce User Param
- Enforce support of enforce_user_param to OpenAI post endpoints - PR #17407

MCP Gateway

MCP Configuration
- Remove URL format validation for MCP server endpoints - PR #17270
- Add stack trace to MCP error message - PR #17269
MCP Tool Results
- Preserve tool metadata in CallToolResult - PR #17561

Agent Gateway (A2A)

Agent Invocation
- Allow invoking agents through AI Gateway - PR #17440
- Allow tracking request/response in "Logs" Page - PR #17449
Agent Access Control
- Enforce Allowed agents by key, team + add agent access groups on backend - PR #17502
Agent Gateway UI
- Allow testing agents on UI - PR #17455
- Set allowed agents by key, team - PR #17511

Performance / Loadbalancing / Reliability improvements

Audio/Speech Performance
- Fix /audio/speech performance by using shared_sessions - PR #16739
Memory Optimization
- Prevent memory leak in aiohttp connection pooling - PR #17388
- Lazy-load utils to reduce memory + import time - PR #17171
Database
- Update default database connection number - PR #17353
- Update default proxy_batch_write_at number - PR #17355
- Add background health checks to db - PR #17528
Proxy Caching
- Fix proxy caching between requests in aiohttp transport - PR #17122
Session Management
- Fix session consistency, move Lasso API version away from source code - PR #17316
- Conditionally pass enable_cleanup_closed to aiohttp TCPConnector - PR #17367
Vector Store
- Fix vector store configuration synchronization failure - PR #17525

Documentation Updates

Provider Documentation
- Add Azure AI Foundry documentation for Claude models - PR #17104
- Document responses and embedding API for GitHub Copilot - PR #17456
- Add gpt-5.1-codex-max to OpenAI provider documentation - PR #17602
- Update Instructions For Phoenix Integration - PR #17373
Guides
- Add guide on how to debug gateway error vs provider error - PR #17387
- Agent Gateway documentation - PR #17454
- A2A Permission management documentation - PR #17515
- Update docs to link agent hub - PR #17462
Projects
- Add Google ADK and Harbor to projects - PR #17352
- Add Microsoft Agent Lightning to projects - PR #17422
Cleanup
- Cleanup: Remove orphan docs pages and Docusaurus template files - PR #17356
- Remove source .env from docs - PR #17466

Infrastructure / CI/CD

Helm Chart
- Add ingress-only labels - PR #17348
Docker
- Add retry logic to apk package installation in Dockerfile.non_root - PR #17596
- Chainguard fixes - PR #17406
OpenAPI Schema
- Refactor add_schema_to_components to move definitions to components/schemas - PR #17389
Security
- Fix security vulnerability: update mdast-util-to-hast to 13.2.1 - PR #17601
- Bump jws from 3.2.2 to 3.2.3 - PR #17494

New Contributors

@weichiet made their first contribution in PR #17242
@AndyForest made their first contribution in PR #17220
@omkar806 made their first contribution in PR #17217
@v0rtex20k made their first contribution in PR #17178
@hxomer made their first contribution in PR #17207
@orgersh92 made their first contribution in PR #17316
@dannykopping made their first contribution in PR #17313
@rioiart made their first contribution in PR #17333
@codgician made their first contribution in PR #17278
@epistoteles made their first contribution in PR #17277
@kothamah made their first contribution in PR #17368
@flozonn made their first contribution in PR #17371
@richardmcsong made their first contribution in PR #17389
@matt-greathouse made their first contribution in PR #17384
@mossbanay made their first contribution in PR #17380
@mhielpos-asapp made their first contribution in PR #17376
@Joilence made their first contribution in PR #17367
@deepaktammali made their first contribution in PR #17357
@axiomofjoy made their first contribution in PR #16611
@DevajMody made their first contribution in PR #17445
@andrewtruong made their first contribution in PR #17439
@AnasAbdelR made their first contribution in PR #17490
@dominicfeliton made their first contribution in PR #17516
@kristianmitk made their first contribution in PR #17504
@rgshr made their first contribution in PR #17130
@dominicfallows made their first contribution in PR #17489
@irfansofyana made their first contribution in PR #17467
@GusBricker made their first contribution in PR #17191
@OlivverX made their first contribution in PR #17255
@withsmilo made their first contribution in PR #17585

Full Changelog

View complete changelog on GitHub

v1.80.5-stable - Gemini 3.0 Support

2025-11-22T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.5-stable

pip install litellm
pip install litellm==1.80.5

Key Highlights

Gemini 3 - Day-0 support for Gemini 3 models with thought signatures
Prompt Management - Full prompt versioning support with UI for editing, testing, and version history
MCP Hub - Publish and discover MCP servers within your organization
Model Compare UI - Side-by-side model comparison interface for testing
Batch API Spend Tracking - Granular spend tracking with custom metadata for batch and file creation requests
AWS IAM Secret Manager - IAM role authentication support for AWS Secret Manager
Logging Callback Controls - Admin-level controls to prevent callers from disabling logging callbacks in compliance environments
Proxy CLI JWT Authentication - Enable developers to authenticate to LiteLLM AI Gateway using the Proxy CLI
Batch API Routing - Route batch operations to different provider accounts using model-specific credentials from your config.yaml

Prompt Management

This release introduces LiteLLM Prompt Studio - a comprehensive prompt management solution built directly into the LiteLLM UI. Create, test, and version your prompts without leaving your browser.

You can now do the following on LiteLLM Prompt Studio:

Create & Test Prompts: Build prompts with developer messages (system instructions) and test them in real-time with an interactive chat interface
Dynamic Variables: Use {{variable_name}} syntax to create reusable prompt templates with automatic variable detection
Version Control: Automatic versioning for every prompt update with complete version history tracking and rollback capabilities
Prompt Studio: Edit prompts in a dedicated studio environment with live testing and preview

API Integration:

Use your prompts in any application with simple API calls:

response = client.chat.completions.create(
    model="gpt-4",
    extra_body={
        "prompt_id": "your-prompt-id",
        "prompt_version": 2,  # Optional: specify version
        "prompt_variables": {"name": "value"}  # Optional: pass variables
    }
)

Get started here: LiteLLM Prompt Management Documentation

Performance – `/realtime` 182× Lower p99 Latency

This update reduces /realtime latency by removing redundant encodings on the hot path, reusing shared SSL contexts, and caching formatting strings that were being regenerated twice per request despite rarely changing.

Results

Metric	Before	After	Improvement
Median latency	2,200 ms	59 ms	−97% (~37× faster)
p95 latency	8,500 ms	67 ms	−99% (~127× faster)
p99 latency	18,000 ms	99 ms	−99% (~182× faster)
Average latency	3,214 ms	63 ms	−98% (~51× faster)
RPS	165	1,207	+631% (~7.3× increase)

Test Setup

Category	Specification
Load Testing	Locust: 1,000 concurrent users, 500 ramp-up
System	4 vCPUs, 8 GB RAM, 4 workers, 4 instances
Database	PostgreSQL (Redis unused)
Configuration	config.yaml
Load Script	no_cache_hits.py

Model Compare UI

New interactive playground UI enables side-by-side comparison of multiple LLM models, making it easy to evaluate and compare model responses.

Features:

Compare responses from multiple models in real-time
Side-by-side view with synchronized scrolling
Support for all LiteLLM-supported models
Cost tracking per model
Response time comparison
Pre-configured prompts for quick and easy testing

Details:

Parameterization: Configure API keys, endpoints, models, and model parameters, as well as interaction types (chat completions, embeddings, etc.)
Model Comparison: Compare up to 3 different models simultaneously with side-by-side response views
Comparison Metrics: View detailed comparison information including:
- Time To First Token
- Input / Output / Reasoning Tokens
- Total Latency
- Cost (if enabled in config)
Safety Filters: Configure and test guardrails (safety filters) directly in the playground interface

Get Started with Model Compare

New Providers and Endpoints

New Providers

Provider	Supported Endpoints	Description
Docker Model Runner	`/v1/chat/completions`	Run LLM models in Docker containers

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Azure	`azure/gpt-5.1`	272K	$1.38	$11.00	Reasoning, vision, PDF input, responses API
Azure	`azure/gpt-5.1-2025-11-13`	272K	$1.38	$11.00	Reasoning, vision, PDF input, responses API
Azure	`azure/gpt-5.1-codex`	272K	$1.38	$11.00	Responses API, reasoning, vision
Azure	`azure/gpt-5.1-codex-2025-11-13`	272K	$1.38	$11.00	Responses API, reasoning, vision
Azure	`azure/gpt-5.1-codex-mini`	272K	$0.275	$2.20	Responses API, reasoning, vision
Azure	`azure/gpt-5.1-codex-mini-2025-11-13`	272K	$0.275	$2.20	Responses API, reasoning, vision
Azure EU	`azure/eu/gpt-5-2025-08-07`	272K	$1.375	$11.00	Reasoning, vision, PDF input
Azure EU	`azure/eu/gpt-5-mini-2025-08-07`	272K	$0.275	$2.20	Reasoning, vision, PDF input
Azure EU	`azure/eu/gpt-5-nano-2025-08-07`	272K	$0.055	$0.44	Reasoning, vision, PDF input
Azure EU	`azure/eu/gpt-5.1`	272K	$1.38	$11.00	Reasoning, vision, PDF input, responses API
Azure EU	`azure/eu/gpt-5.1-codex`	272K	$1.38	$11.00	Responses API, reasoning, vision
Azure EU	`azure/eu/gpt-5.1-codex-mini`	272K	$0.275	$2.20	Responses API, reasoning, vision
Gemini	`gemini-3-pro-preview`	2M	$1.25	$5.00	Reasoning, vision, function calling
Gemini	`gemini-3-pro-image`	2M	$1.25	$5.00	Image generation, reasoning
OpenRouter	`openrouter/deepseek/deepseek-v3p1-terminus`	164K	$0.20	$0.40	Function calling, reasoning
OpenRouter	`openrouter/moonshot/kimi-k2-instruct`	262K	$0.60	$2.50	Function calling, web search
OpenRouter	`openrouter/gemini/gemini-3-pro-preview`	2M	$1.25	$5.00	Reasoning, vision, function calling
XAI	`xai/grok-4.1-fast`	2M	$0.20	$0.50	Reasoning, function calling
Together AI	`together_ai/z-ai/glm-4.6`	203K	$0.40	$1.75	Function calling, reasoning
Cerebras	`cerebras/gpt-oss-120b`	131K	$0.60	$0.60	Function calling
Bedrock	`anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.00	$15.00	Computer use, reasoning, vision

Features

Gemini (Google AI Studio + Vertex AI)
- Add Day 0 gemini-3-pro-preview support - PR #16719
- Add support for Gemini 3 Pro Image model - PR #16938
- Add reasoning_content to streaming responses with tools enabled - PR #16854
- Add includeThoughts=True for Gemini 3 reasoning_effort - PR #16838
- Support thought signatures for Gemini 3 in responses API - PR #16872
- Correct wrong system message handling for gemma - PR #16767
- Gemini 3 Pro Image: capture image_tokens and support cost_per_output_image - PR #16912
- Fix missing costs for gemini-2.5-flash-image - PR #16882
- Gemini 3 thought signatures in tool call id - PR #16895
Azure
- Add azure gpt-5.1 models - PR #16817
- Add Azure models 2025 11 to cost maps - PR #16762
- Update Azure Pricing - PR #16371
- Add SSML Support for Azure Text-to-Speech (AVA) - PR #16747
OpenAI
- Support GPT-5.1 reasoning.effort='none' in proxy - PR #16745
- Add gpt-5.1-codex and gpt-5.1-codex-mini models to documentation - PR #16735
- Inherit BaseVideoConfig to enable async content response for OpenAI video - PR #16708
Anthropic
- Add support for strict parameter in Anthropic tool schemas - PR #16725
- Add image as url support to anthropic - PR #16868
- Add thought signature support to v1/messages api - PR #16812
- Anthropic - support Structured Outputs output_format for Claude 4.5 sonnet and Opus 4.1 - PR #16949
Bedrock
- Haiku 4.5 correct Bedrock configs - PR #16732
- Ensure consistent chunk IDs in Bedrock streaming responses - PR #16596
- Add Claude 4.5 to US Gov Cloud - PR #16957
- Fix images being dropped from tool results for bedrock - PR #16492
Vertex AI
- Add Vertex AI Image Edit Support - PR #16828
- Update veo 3 pricing and add prod models - PR #16781
- Fix Video download for veo3 - PR #16875
Snowflake
- Snowflake provider support: added embeddings, PAT, account_id - PR #15727
OCI
- Add oci_endpoint_id Parameter for OCI Dedicated Endpoints - PR #16723
XAI
- Add support for Grok 4.1 Fast models - PR #16936
Together AI
- Add GLM 4.6 from together.ai - PR #16942
Cerebras
- Fix Cerebras GPT-OSS-120B model name - PR #16939

Bug Fixes

OpenAI
- Fix for 16863 - openai conversion from responses to completions - PR #16864
- Revert "Make all gpt-5 and reasoning models to responses by default" - PR #16849
General
- Get custom_llm_provider from query param - PR #16731
- Fix optional param mapping - PR #16852
- Add None check for litellm_params - PR #16754

LLM API Endpoints

Features

Responses API
- Add Responses API support for gpt-5.1-codex model - PR #16845
- Add managed files support for responses API - PR #16733
- Add extra_body support for response supported api params from chat completion - PR #16765
Batch API
- Support /delete for files + support /cancel for batches - PR #16387
- Add config based routing support for batches and files - PR #16872
- Populate spend_logs_metadata in batch and files endpoints - PR #16921
Search APIs
- Search APIs - error in firecrawl-search "Invalid request body" - PR #16943
Vector Stores
- Fix vector store create issue - PR #16804
- Team vector-store permissions now respected for key access - PR #16639
Audio Transcription
- Fix audio transcription cost tracking - PR #16478
- Add missing shared_sessions to audio/transcriptions - PR #16858
Video Generation API
- Fix videos tagging - PR #16770

Bugs

General
- Responses API cost tracking with custom deployment names - PR #16778
- Trim logged response strings in spend-logs - PR #16654

Management Endpoints / UI

Features

Proxy CLI Auth
- Allow using JWTs for signing in with Proxy CLI - PR #16756
Virtual Keys
- Fix Key Model Alias Not Working - PR #16896
Models + Endpoints
- Add additional model settings to chat models in test key - PR #16793
- Deactivate delete button on model table for config models - PR #16787
- Change Public Model Hub to use proxyBaseUrl - PR #16892
- Add JSON Viewer to request/response panel - PR #16687
- Standarize icon images - PR #16837
Teams
- Teams table empty state - PR #16738
Fallbacks
- Fallbacks icon button tooltips and delete with friction - PR #16737
MCP Servers
- Delete user and MCP Server Modal, MCP Table Tooltips - PR #16751
Callbacks
- Expose backend endpoint for callbacks settings - PR #16698
- Edit add callbacks route to use data from backend - PR #16699
Usage & Analytics
- Allow partial matches for user ID in User Table - PR #16952
General UI
- Allow setting base_url in API reference docs - PR #16674
- Change /public fields to honor server root path - PR #16930
- Correct ui build - PR #16702
- Enable automatic dark/light mode based on system preference - PR #16748

Bugs

UI Fixes
- Fix flaky tests due to antd Notification Manager - PR #16740
- Fix UI MCP Tool Test Regression - PR #16695
- Fix edit logging settings not appearing - PR #16798
- Add css to truncate long request ids in request viewer - PR #16665
- Remove azure/ prefix in Placeholder for Azure in Add Model - PR #16597
- Remove UI Session Token from user/info return - PR #16851
- Remove console logs and errors from model tab - PR #16455
- Change Bulk Invite User Roles to Match Backend - PR #16906
- Mock Tremor's Tooltip to Fix Flaky UI Tests - PR #16786
- Fix e2e ui playwright test - PR #16799
- Fix Tests in CI/CD - PR #16972
SSO
- Ensure role from SSO provider is used when a user is inserted onto LiteLLM - PR #16794
- Docs - SSO - Manage User Roles via Azure App Roles - PR #16796
Auth
- Ensure Team Tags works when using JWT Auth - PR #16797
- Fix key never expires - PR #16692
Swagger UI
- Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2 $defs not being properly exposed in the OpenAPI schema - PR #16784

AI Integrations

Logging

Arize Phoenix
- Fix arize phoenix logging - PR #16301
- Arize Phoenix - root span logging - PR #16949
Langfuse
- Filter secret fields form Langfuse - PR #16842
General
- Exclude litellm_credential_name from Sensitive Data Masker (Updated) - PR #16958
- Allow admins to disable, dynamic callback controls - PR #16750

Guardrails

IBM Guardrails
- Fix IBM Guardrails optional params, add extra_headers field - PR #16771
Noma Guardrail
- Use LiteLLM key alias as fallback Noma applicationId in NomaGuardrail - PR #16832
- Allow custom violation message for tool-permission guardrail - PR #16916
Grayswan Guardrail
- Grayswan guardrail passthrough on flagged - PR #16891
General Guardrails
- Fix prompt injection not working - PR #16701

Prompt Management

Prompt Management
- Allow specifying just prompt_id in a request to a model - PR #16834
- Add support for versioning prompts - PR #16836
- Allow storing prompt version in DB - PR #16848
- Add UI for editing the prompts - PR #16853
- Allow testing prompts with Chat UI - PR #16898
- Allow viewing version history - PR #16901
- Allow specifying prompt version in code - PR #16929
- UI, allow seeing model, prompt id for Prompt - PR #16932
- Show "get code" section for prompt management + minor polish of showing version history - PR #16941

Secret Managers

AWS Secrets Manager
- Adds IAM role assumption support for AWS Secret Manager - PR #16887

MCP Gateway

MCP Hub - Publish/discover MCP Servers within a company - PR #16857
MCP Resources - MCP resources support - PR #16800
MCP OAuth - Docs - mcp oauth flow details - PR #16742
MCP Lifecycle - Drop MCPClient.connect and use run_with_session lifecycle - PR #16696
MCP Server IDs - Add mcp server ids - PR #16904
MCP URL Format - Fix mcp url format - PR #16940

Performance / Loadbalancing / Reliability improvements

Realtime Endpoint Performance - Fix bottlenecks degrading realtime endpoint performance - PR #16670
SSL Context Caching - Cache SSL contexts to prevent excessive memory allocation - PR #16955
Cache Optimization - Fix cache cooldown key generation - PR #16954
Router Cache - Fix routing for requests with same cacheable prefix but different user messages - PR #16951
Redis Event Loop - Fix redis event loop closed at first call - PR #16913
Dependency Management - Upgrade pydantic to version 2.11.0 - PR #16909

Documentation Updates

Provider Documentation
- Add missing details to benchmark comparison - PR #16690
- Fix anthropic pass-through endpoint - PR #16883
- Cleanup repo and improve AI docs - PR #16775
API Documentation
- Add docs related to openai metadata - PR #16872
- Update docs with all supported endpoints and cost tracking - PR #16872
General Documentation
- Add mini-swe-agent to Projects built on LiteLLM - PR #16971

Infrastructure / CI/CD

UI Testing
- Break e2e_ui_testing into build, unit, and e2e steps - PR #16783
- Building UI for Testing - PR #16968
- CI/CD Fixes - PR #16937
Dependency Management
- Bump js-yaml from 3.14.1 to 3.14.2 in /tests/proxy_admin_ui_tests/ui_unit_tests - PR #16755
- Bump js-yaml from 3.14.1 to 3.14.2 - PR #16802
Migration
- Migration job labels - PR #16831
Config
- This yaml actually works - PR #16757
Release Notes
- Add perf improvements on embeddings to release notes - PR #16697
- Docs - v1.80.0 - PR #16694
Investigation
- Investigate issue root cause - PR #16859

New Contributors

@mattmorgis made their first contribution in PR #16371
@mmandic-coatue made their first contribution in PR #16732
@Bradley-Butcher made their first contribution in PR #16725
@BenjaminLevy made their first contribution in PR #16757
@CatBraaain made their first contribution in PR #16767
@tushar8408 made their first contribution in PR #16831
@nbsp1221 made their first contribution in PR #16845
@idola9 made their first contribution in PR #16832
@nkukard made their first contribution in PR #16864
@alhuang10 made their first contribution in PR #16852
@sebslight made their first contribution in PR #16838
@TsurumaruTsuyoshi made their first contribution in PR #16905
@cyberjunk made their first contribution in PR #16492
@colinlin-stripe made their first contribution in PR #16895
@sureshdsk made their first contribution in PR #16883
@eiliyaabedini made their first contribution in PR #16875
@justin-tahara made their first contribution in PR #16957
@wangsoft made their first contribution in PR #16913
@dsduenas made their first contribution in PR #16891

Known Issues

/audit and /user/available_users routes return 404. Fixed in PR #17337

Full Changelog

View complete changelog on GitHub

v1.80.0-stable - Introducing Agent Hub: Register, Publish, and Share Agents

2025-11-15T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.80.0-stable

pip install litellm
pip install litellm==1.80.0

Key Highlights

🆕 Agent Hub Support - Register and make agents public for your organization
RunwayML Provider - Complete video generation, image generation, and text-to-speech support
GPT-5.1 Family Support - Day-0 support for OpenAI's latest GPT-5.1 and GPT-5.1-Codex models
Prometheus OSS - Prometheus metrics now available in open-source version
Vector Store Files API - Complete OpenAI-compatible Vector Store Files API with full CRUD operations
Embeddings Performance - O(1) lookup optimization for router embeddings with shared sessions

Agent Hub

This release adds support for registering and making agents public for your organization. This is great for Proxy Admins who want a central place to make agents built in their organization, discoverable to their users.

Here's the flow:

Add agent to litellm.
Make it public.
Allow anyone to discover it on the public AI Hub page.

Get Started with Agent Hub

Performance – `/embeddings` 13× Lower p95 Latency

This update significantly improves /embeddings latency by routing it through the same optimized pipeline as /chat/completions, benefiting from all previously applied networking optimizations.

Results

Metric	Before	After	Improvement
p95 latency	5,700 ms	430 ms	−92% (~13× faster)**
p99 latency	7,200 ms	780 ms	−89%
Average latency	844 ms	262 ms	−69%
Median latency	290 ms	230 ms	−21%
RPS	1,216.7	1,219.7	+0.25%

Test Setup

Category	Specification
Load Testing	Locust: 1,000 concurrent users, 500 ramp-up
System	4 vCPUs, 8 GB RAM, 4 workers, 4 instances
Database	PostgreSQL (Redis unused)
Configuration	config.yaml
Load Script	no_cache_hits.py

🆕 RunwayML

Complete integration for RunwayML's Gen-4 family of models, supporting video generation, image generation, and text-to-speech.

Supported Endpoints:

/v1/videos - Video generation (Gen-4 Turbo, Gen-4 Aleph, Gen-3A Turbo)
/v1/images/generations - Image generation (Gen-4 Image, Gen-4 Image Turbo)
/v1/audio/speech - Text-to-speech (ElevenLabs Multilingual v2)

Quick Start:

Generate Video with RunwayML
curl --location 'http://localhost:4000/v1/videos' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
    "model": "runwayml/gen4_turbo",
    "prompt": "A high quality demo video of litellm ai gateway",
    "input_reference": "https://example.com/image.jpg",
    "seconds": 5,
    "size": "1280x720"
}'

Get Started with RunwayML

Prometheus Metrics - Open Source

Prometheus metrics are now available in the open-source version of LiteLLM, providing comprehensive observability for your AI Gateway without requiring an enterprise license.

Quick Start:

litellm_settings:
  success_callback: ["prometheus"]
  failure_callback: ["prometheus"]

Get Started with Prometheus

Vector Store Files API

Complete OpenAI-compatible Vector Store Files API now stable, enabling full file lifecycle management within vector stores.

Supported Endpoints:

POST /v1/vector_stores/{vector_store_id}/files - Create vector store file
GET /v1/vector_stores/{vector_store_id}/files - List vector store files
GET /v1/vector_stores/{vector_store_id}/files/{file_id} - Retrieve vector store file
GET /v1/vector_stores/{vector_store_id}/files/{file_id}/content - Retrieve file content
DELETE /v1/vector_stores/{vector_store_id}/files/{file_id} - Delete vector store file
DELETE /v1/vector_stores/{vector_store_id} - Delete vector store

Quick Start:

Create Vector Store File
curl --location 'http://localhost:4000/v1/vector_stores/vs_123/files' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
    "file_id": "file_abc"
}'

Get Started with Vector Stores

New Providers and Endpoints

New Providers

Provider	Supported Endpoints	Description
RunwayML	`/v1/videos`, `/v1/images/generations`, `/v1/audio/speech`	Gen-4 video generation, image generation, and text-to-speech

New LLM API Endpoints

Endpoint	Method	Description	Documentation
`/v1/vector_stores/{vector_store_id}/files`	POST	Create vector store file	Docs
`/v1/vector_stores/{vector_store_id}/files`	GET	List vector store files	Docs
`/v1/vector_stores/{vector_store_id}/files/{file_id}`	GET	Retrieve vector store file	Docs
`/v1/vector_stores/{vector_store_id}/files/{file_id}/content`	GET	Retrieve file content	Docs
`/v1/vector_stores/{vector_store_id}/files/{file_id}`	DELETE	Delete vector store file	Docs
`/v1/vector_stores/{vector_store_id}`	DELETE	Delete vector store	Docs

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5.1`	272K	$1.25	$10.00	Reasoning, vision, PDF input, responses API
OpenAI	`gpt-5.1-2025-11-13`	272K	$1.25	$10.00	Reasoning, vision, PDF input, responses API
OpenAI	`gpt-5.1-chat-latest`	128K	$1.25	$10.00	Reasoning, vision, PDF input
OpenAI	`gpt-5.1-codex`	272K	$1.25	$10.00	Responses API, reasoning, vision
OpenAI	`gpt-5.1-codex-mini`	272K	$0.25	$2.00	Responses API, reasoning, vision
Moonshot	`moonshot/kimi-k2-thinking`	262K	$0.60	$2.50	Function calling, web search, reasoning
Mistral	`mistral/magistral-medium-2509`	40K	$2.00	$5.00	Reasoning, function calling
Vertex AI	`vertex_ai/moonshotai/kimi-k2-thinking-maas`	256K	$0.60	$2.50	Function calling, web search
OpenRouter	`openrouter/deepseek/deepseek-v3.2-exp`	164K	$0.20	$0.40	Function calling, prompt caching
OpenRouter	`openrouter/minimax/minimax-m2`	205K	$0.26	$1.02	Function calling, reasoning
OpenRouter	`openrouter/z-ai/glm-4.6`	203K	$0.40	$1.75	Function calling, reasoning
OpenRouter	`openrouter/z-ai/glm-4.6:exacto`	203K	$0.45	$1.90	Function calling, reasoning
Voyage	`voyage/voyage-3.5`	32K	$0.06	-	Embeddings
Voyage	`voyage/voyage-3.5-lite`	32K	$0.02	-	Embeddings

Video Generation Models

Provider	Model	Cost Per Second	Resolutions	Features
RunwayML	`runwayml/gen4_turbo`	$0.05	1280x720, 720x1280	Text + image to video
RunwayML	`runwayml/gen4_aleph`	$0.15	1280x720, 720x1280	Text + image to video
RunwayML	`runwayml/gen3a_turbo`	$0.05	1280x720, 720x1280	Text + image to video

Image Generation Models

Provider	Model	Cost Per Image	Resolutions	Features
RunwayML	`runwayml/gen4_image`	$0.05	1280x720, 1920x1080	Text + image to image
RunwayML	`runwayml/gen4_image_turbo`	$0.02	1280x720, 1920x1080	Text + image to image
Fal.ai	`fal_ai/fal-ai/flux-pro/v1.1`	$0.04/image	-	Image generation
Fal.ai	`fal_ai/fal-ai/flux/schnell`	$0.003/image	-	Fast image generation
Fal.ai	`fal_ai/fal-ai/bytedance/seedream/v3/text-to-image`	$0.03/image	-	Image generation
Fal.ai	`fal_ai/fal-ai/bytedance/dreamina/v3.1/text-to-image`	$0.03/image	-	Image generation
Fal.ai	`fal_ai/fal-ai/ideogram/v3`	$0.06/image	-	Image generation
Fal.ai	`fal_ai/fal-ai/imagen4/preview/fast`	$0.02/image	-	Fast image generation
Fal.ai	`fal_ai/fal-ai/imagen4/preview/ultra`	$0.06/image	-	High-quality image generation

Audio Models

Provider	Model	Cost	Features
RunwayML	`runwayml/eleven_multilingual_v2`	$0.0003/char	Text-to-speech

Features

OpenAI
- Add GPT-5.1 family support with reasoning capabilities - PR #16598
- Add support for reasoning_effort='none' for GPT-5.1 - PR #16658
- Add verbosity parameter support for GPT-5 family models - PR #16660
- Fix forward OpenAI organization for image generation - PR #16607
Gemini (Google AI Studio + Vertex AI)
- Add support for reasoning_effort='none' for Gemini models - PR #16548
- Add all Gemini image models support in image generation - PR #16526
- Add Gemini image edit support - PR #16430
- Fix preserve non-ASCII characters in function call arguments - PR #16550
- Fix Gemini conversation format issue with MCP auto-execution - PR #16592
Bedrock
- Add support for filtering knowledge base queries - PR #16543
- Ensure correct aws_region is used when provided dynamically for embeddings - PR #16547
- Add support for custom KMS encryption keys in Bedrock Batch operations - PR #16662
- Add bearer token authentication support for AgentCore - PR #16556
- Fix AgentCore SSE stream iterator to async for proper streaming support - PR #16293
Anthropic
- Add context management param support - PR #16528
- Fix preserve $defs for Anthropic tools input schema - PR #16648
- Fix support Anthropic tool_use and tool_result in token counter - PR #16351
Vertex AI
- Add Vertex Kimi-K2-Thinking support - PR #16671
- Add vertex_credentials support to litellm.rerank() - PR #16479
Mistral
- Fix Magistral streaming to emit reasoning chunks - PR #16434
Moonshot (Kimi)
- Add Kimi K2 thinking model support - PR #16445
SambaNova
- Fix SambaNova API rejecting requests when message content is passed as a list format - PR #16612
VLLM
- Fix use vllm passthrough config for hosted vllm provider instead of raising error - PR #16537
- Add headers to VLLM Passthrough requests with success event logging - PR #16532
Azure
- Fix improve Azure auth parameter handling for None values - PR #14436
Groq
- Fix parse failed chunks for Groq - PR #16595
Voyage
- Add Voyage 3.5 and 3.5-lite embeddings pricing and doc update - PR #16641
Fal.ai
- Add fal-ai/flux/schnell support - PR #16580
- Add all Imagen4 variants of fal ai in model map - PR #16579

Bug Fixes

General
- Fix sanitize null token usage in OpenAI-compatible responses - PR #16493
- Fix apply provided timeout value to ClientTimeout.total - PR #16395
- Fix raising wrong 429 error on wrong exception - PR #16482
- Add new models, delete repeat models, update pricing - PR #16491
- Update model logging format for custom LLM provider - PR #16485

LLM API Endpoints

New Endpoints

GET /providers
- Add GET list of providers endpoint - PR #16432

Features

Video Generation API
- Allow internal users to access video generation routes - PR #16472
Vector Stores API
- Vector store files stable release with complete CRUD operations - PR #16643
  - POST /v1/vector_stores/{vector_store_id}/files - Create vector store file
  - GET /v1/vector_stores/{vector_store_id}/files - List vector store files
  - GET /v1/vector_stores/{vector_store_id}/files/{file_id} - Retrieve vector store file
  - GET /v1/vector_stores/{vector_store_id}/files/{file_id}/content - Retrieve file content
  - DELETE /v1/vector_stores/{vector_store_id}/files/{file_id} - Delete vector store file
  - DELETE /v1/vector_stores/{vector_store_id} - Delete vector store
- Ensure users can access search_results for both stream + non-stream response - PR #16459

Bugs

Video Generation API
- Fix use GET for /v1/videos/{video_id}/content - PR #16672
General
- Fix remove generic exception handling - PR #16599

Management Endpoints / UI

Features

Proxy CLI Auth
- Fix remove strict master_key check in add_deployment - PR #16453
Virtual Keys
- UI - Add Tags To Edit Key Flow - PR #16500
- UI - Test Key Page show models based on selected endpoint - PR #16452
- UI - Expose user_alias in view and update path - PR #16669
Models + Endpoints
- UI - Add LiteLLM Params to Edit Model - PR #16496
- UI - Add Model use backend data - PR #16664
- UI - Remove Description Field from LLM Credentials - PR #16608
- UI - Add RunwayML on Admin UI supported models/providers - PR #16606
- Infra - Migrate Add Model Fields to Backend - PR #16620
- Add API Endpoint for creating model access group - PR #16663
Teams
- UI - Invite User Searchable Team Select - PR #16454
- Fix use user budget instead of key budget when creating new team - PR #16074
Budgets
- UI - Move Budgets out of Experimental - PR #16544
Guardrails
- UI - Config Guardrails should not be deletable from table - PR #16540
- Fix remove enterprise restriction from guardrails list endpoint - PR #15333
Callbacks
- UI - New Callbacks table - PR #16512
- Fix delete callbacks failing - PR #16473
Usage & Analytics
- UI - Improve Usage Indicator - PR #16504
- UI - Model Info Page Health Check - PR #16416
- Infra - Show Deprecation Warning for Model Analytics Tab - PR #16417
- Fix Litellm tags usage add request_id - PR #16111
Health Check
- Add Langfuse OTEL and SQS to Health Check - PR #16514
General UI
- UI - Normalize table action columns appearance - PR #16657
- UI - Button Styles and Sizing in Settings Pages - PR #16600
- UI - SSO Modal Cosmetic Changes - PR #16554
- Fix UI logos loading with SERVER_ROOT_PATH - PR #16618
- Fix remove misleading 'Custom' option mention from OpenAI endpoint tooltips - PR #16622
SSO
- Ensure role from SSO provider is used when a user is inserted onto LiteLLM - PR #16794

Bugs

Management Endpoints
- Fix inconsistent error responses in customer management endpoints - PR #16450
- Fix correct date range filtering in /spend/logs endpoint - PR #16443
- Fix /spend/logs/ui Access Control - PR #16446
- Add pagination for /spend/logs/session/ui endpoint - PR #16603
- Fix LiteLLM Usage shows key_hash - PR #16471
- Fix app_roles missing from jwt payload - PR #16448

Logging / Guardrail / Prompt Management Integrations

New Integration

🆕 Zscaler AI Guard
- Add Zscaler AI Guard hook for security policy enforcement - PR #15691

Logging

Langfuse
- Fix handle null usage values to prevent validation errors - PR #16396
CloudZero
- Fix updated spend would not be sent to CloudZero - PR #16201

Guardrails

IBM Detector
- Ensure detector-id is passed as header to IBM detector server - PR #16649

Prompt Management

Custom Prompt Management
- Add SDK focused examples for custom prompt management - PR #16441

Spend Tracking, Budgets and Rate Limiting

End User Budgets
- Allow pointing max_end_user budget to an id, so the default ID applies to all end users - PR #16456

MCP Gateway

Configuration
- Add dynamic OAuth2 metadata discovery for MCP servers - PR #16676
- Fix allow tool call even when server name prefix is missing - PR #16425
- Fix exclude unauthorized MCP servers from allowed server list - PR #16551
- Fix unable to delete MCP server from permission settings - PR #16407
- Fix avoid crashing when MCP server record lacks credentials - PR #16601

Agents

Agent Registration (A2A Spec)
- Support agent registration + discovery following Agent-to-Agent specification - PR #16615

Performance / Loadbalancing / Reliability improvements

Embeddings Performance
- Use router's O(1) lookup and shared sessions for embeddings - PR #16344
Router Reliability
- Support default fallbacks for unknown models - PR #16419
Callback Management
- Add atexit handlers to flush callbacks for async completions - PR #16487

General Proxy Improvements

Configuration Management
- Fix update model_cost_map_url to use environment variable - PR #16429

Documentation Updates

Provider Documentation
- Fix streaming example in README - PR #16461
- Update broken Slack invite links to support page - PR #16546
- Fix code block indentation for fallbacks page - PR #16542
- Documentation code example corrections - PR #16502
- Document reasoning_effort summary field options - PR #16549
API Documentation
- Add docs on APIs for model access management - PR #16673
- Add docs for showing how to auto reload new pricing data - PR #16675
- LiteLLM Quick start - show how model resolution works - PR #16602
- Add docs for tracking callback failure - PR #16474
General Documentation
- Fix container api link in release page - PR #16440
- Add softgen to projects that are using litellm - PR #16423

New Contributors

@artplan1 made their first contribution in PR #16423
@JehandadK made their first contribution in PR #16472
@vmiscenko made their first contribution in PR #16453
@mcowger made their first contribution in PR #16429
@yellowsubmarine372 made their first contribution in PR #16395
@Hebruwu made their first contribution in PR #16201
@jwang-gif made their first contribution in PR #15691
@AnthonyMonaco made their first contribution in PR #16502
@andrewm4894 made their first contribution in PR #16487
@f14-bertolotti made their first contribution in PR #16485
@busla made their first contribution in PR #16293
@MightyGoldenOctopus made their first contribution in PR #16537
@ultmaster made their first contribution in PR #14436
@bchrobot made their first contribution in PR #16542
@sep-grindr made their first contribution in PR #16622
@pnookala-godaddy made their first contribution in PR #16607
@dtunikov made their first contribution in PR #16592
@lukapecnik made their first contribution in PR #16648
@jyeros made their first contribution in PR #16618

Full Changelog

View complete changelog on GitHub

v1.79.3-stable - Built-in Guardrails on AI Gateway

2025-11-08T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.79.3-stable

pip install litellm
pip install litellm==1.79.3.rc.1

Key Highlights

LiteLLM Custom Guardrail - Built-in guardrail with UI configuration support
Performance Improvements - /responses API 19× Lower Median Latency
Veo3 Video Generation (Vertex AI + Google AI Studio) - Use OpenAI Video API to generate videos with Vertex AI and Google AI Studio Veo3 models

Built-in Guardrails on AI Gateway

This release introduces built-in guardrails for LiteLLM AI Gateway, allowing you to enforce protections without depending on an external guardrail API.

Blocking Keywords - Block known sensitive keywords like "litellm", "python", etc.
Pattern Detection - Block known sensitive patterns like emails, Social Security Numbers, API keys, etc.
Custom Regex Patterns - Define custom regex patterns for your specific use case.

Get started with the built-in guardrails on AI Gateway here.

Performance – `/responses` 19× Lower Median Latency

This update significantly improves /responses latency by integrating our internal network management for connection handling, eliminating per-request setup overhead.

Results

Metric	Before	After	Improvement
Median latency	3,600 ms	190 ms	−95% (~19× faster)
p95 latency	4,300 ms	280 ms	−93%
p99 latency	4,600 ms	590 ms	−87%
Average latency	3,571 ms	208 ms	−94%
RPS	231	1,059	+358%

Test Setup

Category	Specification
Load Testing	Locust: 1,000 concurrent users, 500 ramp-up
System	4 vCPUs, 8 GB RAM, 4 workers, 4 instances
Database	PostgreSQL (Redis unused)
Configuration	config.yaml
Load Script	no_cache_hits.py

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Azure	`azure/gpt-5-pro`	272K	$15.00	$120.00	Responses API, reasoning, vision, PDF input
Azure	`azure/gpt-image-1-mini`	-	-	-	Image generation - per pixel pricing
Azure	`azure/container`	-	-	-	Container API - $0.03/session
OpenAI	`openai/container`	-	-	-	Container API - $0.03/session
Cohere	`cohere/embed-v4.0`	128K	$0.12	-	Embeddings with image input support
Gemini	`gemini/gemini-live-2.5-flash-preview-native-audio-09-2025`	1M	$0.30	$2.00	Native audio, vision, web search
Vertex AI	`vertex_ai/minimaxai/minimax-m2-maas`	196K	$0.30	$1.20	Function calling, tool choice
NVIDIA	`nvidia/nemotron-nano-9b-v2`	-	-	-	Chat completions

OCR Models

Provider	Model	Cost Per Page	Features
Azure AI	`azure_ai/doc-intelligence/prebuilt-read`	$0.0015	Document reading
Azure AI	`azure_ai/doc-intelligence/prebuilt-layout`	$0.01	Layout analysis
Azure AI	`azure_ai/doc-intelligence/prebuilt-document`	$0.01	Document processing
Vertex AI	`vertex_ai/mistral-ocr-2505`	$0.0005	OCR processing

Search Models

Provider	Model	Pricing	Features
Firecrawl	`firecrawl/search`	Tiered: $0.00166-$0.0166/query	10-100 results per query
SearXNG	`searxng/search`	Free	Open-source metasearch

Features

Azure
- Add Azure GPT-5-Pro Responses API support with reasoning capabilities - PR #16235
- Add gpt-image-1-mini pricing for Azure with quality tiers (low/medium/high) - PR #16182
- Add support for returning Azure Content Policy error information when exceptions from Azure OpenAI occur - PR #16231
- Fix Azure GPT-5 incorrectly routed to O-series config (temperature parameter unsupported) - PR #16246
- Fix Azure doesn't accept extra body param - PR #16116
- Fix Azure DALL-E-3 health check content policy violation by using safe default prompt - PR #16329
Bedrock
- Fix empty assistant message handling in AWS Bedrock Converse API to prevent 400 Bad Request errors - PR #15850
- Fix: Filter AWS authentication params from Bedrock InvokeModel request body - PR #16315
- Fix Bedrock proxy adding name to file content, breaks when cache_control in use - PR #16275
- Fix global.anthropic.claude-haiku-4-5-20251001-v1:0 supports_reasoning flag and update pricing - PR #16263
Gemini (Google AI Studio + Vertex AI)
- Add gemini live audio model cost in model map - PR #16183
- Fix translation problem with Gemini parallel tool calls - PR #16194
- Fix: Send Gemini API key via x-goog-api-key header with custom api_base - PR #16085
- Fix image_config.aspect_ratio not working for gemini-2.5-flash-image - PR #15999
- Fix Gemini minimal reasoning env overrides disabling thoughts - PR #16347
- Fix cache_read_input_token_cost for gemini-2.5-flash - PR #16354
Anthropic
- Fix Anthropic token counting for VertexAI - PR #16171
- Fix anthropic-adapter: properly translate Anthropic image format to OpenAI - PR #16202
- Enable automated prompt caching message format for Claude on Databricks - PR #16200
- Add support for Anthropic Memory Tool - PR #16115
- Propagate cache creation/read token costs for model info to fix Anthropic long context cost calculations - PR #16376
Vertex AI
- Add Vertex MiniMAX m2 model support - PR #16373
- Correctly map 429 Resource Exhausted to RateLimitError - PR #16363
- Add vertex_credentials support to litellm.rerank() for Vertex AI - PR #16266
Databricks
- Fix databricks streaming - PR #16368
Deepgram
- Return the diarized transcript when it's required in the request - PR #16133
Fireworks
- Update Fireworks audio endpoints to new api.fireworks.ai domains - PR #16346
Cohere
- Add cohere embed-v4.0 model support - PR #16358
Watsonx
- Support reasoning_effort for watsonx chat models - PR #16261
OpenAI
- Remove automatic summary from reasoning_effort transformation - PR #16210
XAI
- Remove Grok 4 Models Reasoning Effort Parameter - PR #16265
Hosted VLLM
- Fix HostedVLLMRerankConfig will not be used - PR #16352

New Provider Support

Bedrock Agentcore
- Add Bedrock Agentcore as a provider on LiteLLM Python SDK and LiteLLM AI Gateway - PR #16252

LLM API Endpoints

Features

OCR API
- Add VertexAI OCR provider support + cost tracking - PR #16216
- Add Azure AI Doc Intelligence OCR support - PR #16219
Search API
- Add firecrawl search API support with tiered pricing - PR #16257
- Add searxng search API provider - PR #16259
Responses API
- Support responses API streaming in langfuse otel - PR #16153
- Pass extra_body parameters to provider in Responses API requests - PR #16320
Container API
- Add E2E Container API Support - PR #16136
- Update container documentation to be similar to others - PR #16327
Video Generation API
- Add Vertex and Gemini Videos API with Cost Tracking + UI support - PR #16323
- Add custom_llm_provider support for video endpoints (non-generation) - PR #16121
Audio API
- Add gpt-4o-transcribe cost tracking - PR #16412
Vector Stores
- Milvus - search vector store support + support multi-part form data on passthrough - PR #16035
- Azure AI Vector Stores - support "virtual" indexes + create vector store on passthrough API - PR #16160
- Milvus - Passthrough API support - adds create + read vector store support via passthrough API's - PR #16170
Embeddings API
- Use valid CallTypes enum value in embeddings endpoint - PR #16328
Rerank API
- Generalize tiered pricing in generic cost calculator - PR #16150

Bugs

General
- Fix index field not populated in streaming mode with n>1 and tool calls - PR #15962
- Pass aws_region_name in litellm_params - PR #16321
- Add retry-after header support for errors 502, 503, 504 - PR #16288

Management Endpoints / UI

Features

Virtual Keys
- UI - Delete Team Member with friction - PR #16167
- UI - Litellm test key audio support - PR #16251
- UI - Test Key Page Revert Model To Single Select - PR #16390
Models + Endpoints
- UI - Add Model Existing Credentials Improvement - PR #16166
- UI - Add Azure AD Token field and Azure API Key optional - PR #16331
- UI - Fixed Label for vLLM in Model Create Flow - PR #16285
- UI - Include Model Access Group Models on Team Models Table - PR #16298
- Fix /model_group/info Returning Entire Model List for SSO Users - PR #16296
- Litellm non root docker Model Hub Table fix - PR #16282
Guardrails
- UI - Fix regression where Guardrail Entity Could not be selected and entity was not displayed - PR #16165
- UI - Guardrail Info Page Show PII Config - PR #16164
- Change guardrail_information to list type - PR #16127
- UI - LiteLLM Guardrail - ensure you can see UI Friendly name for PII Patterns - PR #16382
- UI - Guardrails - LiteLLM Content Filter, Allow Viewing/Editing Content Filter Settings - PR #16383
- UI - Guardrails - allow updating guardrails through UI. Ensure litellm_params actually get updated in memory - PR #16384
SSO Settings
- Support dot notation on ui sso - PR #16135
- UI - Prevent trailing slash in sso proxy base url input - PR #16244
- UI - SSO Proxy Base URL input validation and remove normalizing / - PR #16332
- UI - Surface SSO Create errors on create flow - PR #16369
Usage & Analytics
- UI - Tag Usage Top Model Table View and Label Fix - PR #16249
- UI - Litellm usage date picker - PR #16264
Cache Settings
- UI - Cache Settings Redis Add Semantic Cache Settings - PR #16398

Bugs

General
- UI - Remove encoding_format in request for embedding models - PR #16367
- UI - Revert Changes for Test Key Multiple Model Select - PR #16372
- UI - Various Small Issues - PR #16406

AI Integrations

Logging

Langfuse
- Fix langfuse input tokens logic for cached tokens - PR #16203
Opik
- Fix the bug with not incorrect attachment to existing trace & refactor - PR #15529
S3
- S3 logger, add support for ssl_verify when using minio logger - PR #16211
- Strip base64 in s3 - PR #16157
- Add allowing Key based prefix to s3 path - PR #16237
- Add Prometheus metric to track callback logging failures in S3 - PR #16209
OpenTelemetry
- OTEL - Log Cost Breakdown on OTEL Logger - PR #16334
DataDog
- Add DD Agent Host support for datadog callback - PR #16379

Guardrails

Noma
- Revert Noma Apply Guardrail implementation - PR #16214
- Litellm noma guardrail support images - PR #16199
PANW Prisma AIRS
- PANW prisma airs guardrail deduplication and enhanced session tracking - PR #16273
LiteLLM Custom Guardrail
- Add LiteLLM Gateway built in guardrail - PR #16338
- UI - Allow configuring LiteLLM Custom Guardrail - PR #16339
- Bug Fix: Content Filter Guard - PR #16414

Secret Managers

CyberArk
- Add CyberArk Secrets Manager Integration - PR #16278
- Cyber Ark - Add Key Rotations support - PR #16289
HashiCorp Vault
- Add configurable mount name and path prefix for HashiCorp Vault - PR #16253
- Secret Manager - Hashicorp, add auth via approle - PR #16374
AWS Secrets Manager
- Add tags and descriptions support to aws secrets manager - PR #16224
Custom Secret Manager
- Add Custom Secret Manager - Allow users to define and write a custom secret manager - PR #16297
General
- Email Notifications - Ensure Users get Key Rotated Email - PR #16292
- Fix verify ssl on sts boto3 - PR #16313

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Fix OpenAI Responses API streaming tests usage field names and cost calculation - PR #16236

MCP Gateway

Configuration
- Configure static mcp header - PR #16179
- Persist mcp credentials in db - PR #16308

Performance / Loadbalancing / Reliability improvements

Memory Leak Fixes
- Resolve memory accumulation caused by Pydantic 2.11+ deprecation warnings - PR #16110
Session Management
- Add shared_session support to responses API - PR #16260
Error Handling
- Gracefully handle connection closed errors during streaming - PR #16294
- Handle None values in daily spend sort key - PR #16245
Configuration
- Remove minimum validation for cache control injection index - PR #16149
- Improve clearing logic - only remove unvisited endpoints - PR #16400
Redis
- Handle float redis_version from AWS ElastiCache Valkey - PR #16207
Hooks
- Add parallel execution handling in during_call_hook - PR #16279
Infrastructure
- Install runtime node for prisma - PR #16410

Documentation Updates

Provider Documentation
- Docs - v1.79.1 - PR #16163
- Fix broken link on model_management.md - PR #16217
- Fix image generation response format - use 'images' array instead of 'image' object - PR #16378
General Documentation
- Add minimum resource requirement for production - PR #16146
- Add benchmark comparison with other AI gateways - PR #16248
- LiteLLM content filter guard documentation - PR #16413
- Fix typo of the word orginal - PR #16255
Security
- Remove tornado test files (including test.key), fixes Python 3.13 security issues - PR #16342

New Contributors

@steve-gore-snapdocs made their first contribution in PR #16149
@timbmg made their first contribution in PR #16120
@Nivg made their first contribution in PR #16202
@pablobgar made their first contribution in PR #16194
@AlanPonnachan made their first contribution in PR #16150
@Chesars made their first contribution in PR #16236
@bowenliang123 made their first contribution in PR #16255
@dean-zavad made their first contribution in PR #16199
@alexkuzmik made their first contribution in PR #15529
@Granine made their first contribution in PR #16281
@Oodapow made their first contribution in PR #16279
@jgoodyear made their first contribution in PR #16275
@Qanpi made their first contribution in PR #16321
@ShimonMimoun made their first contribution in PR #16313
@andriykislitsyn made their first contribution in PR #16288
@reckless-huang made their first contribution in PR #16263
@chenmoneygithub made their first contribution in PR #16368
@stembe-digitalex made their first contribution in PR #16354
@jfcherng made their first contribution in PR #16352
@xingyaoww made their first contribution in PR #16246
@emerzon made their first contribution in PR #16373
@wwwillchen made their first contribution in PR #16376
@fabriciojoc made their first contribution in PR #16203
@jroberts2600 made their first contribution in PR #16273

Full Changelog

View complete changelog on GitHub

v1.79.1-stable - Guardrail Playground

2025-11-01T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.79.1-stable

pip install litellm
pip install litellm==1.80.0

Key Highlights

Container API Support - End-to-end OpenAI Container API support with proxy integration, logging, and cost tracking
FAL AI Image Generation - Native support for FAL AI image generation models with cost tracking
UI Enhancements - Guardrail Playground, Cache Settings, Tag Routing, SSO Settings
Batch API Rate Limiting - Input-based rate limits support for Batch API requests
Vector Store Expansion - Milvus vector store support and Azure AI virtual indexes
Memory Leak Fixes - Resolved issues accounting for 90% of memory leaks on Python SDK & AI Gateway

Dependency Upgrades

Dependencies
- Build(deps): bump starlette from 0.47.2 to 0.49.1 - PR #16027
- Build(deps): bump fastapi from 0.116.1 to 0.120.1 - PR #16054
- Build(deps): bump hono from 4.9.7 to 4.10.3 in /litellm-js/spend-logs - PR #15915

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Mistral	`mistral/codestral-embed`	8K	$0.15	-	Embeddings
Mistral	`mistral/codestral-embed-2505`	8K	$0.15	-	Embeddings
Gemini	`gemini/gemini-embedding-001`	2K	$0.15	-	Embeddings
FAL AI	`fal_ai/fal-ai/flux-pro/v1.1-ultra`	-	-	-	Image generation - $0.0398/image
FAL AI	`fal_ai/fal-ai/imagen4/preview`	-	-	-	Image generation - $0.0398/image
FAL AI	`fal_ai/fal-ai/recraft/v3/text-to-image`	-	-	-	Image generation - $0.0398/image
FAL AI	`fal_ai/fal-ai/stable-diffusion-v35-medium`	-	-	-	Image generation - $0.0398/image
FAL AI	`fal_ai/bria/text-to-image/3.2`	-	-	-	Image generation - $0.0398/image
OpenAI	`openai/sora-2-pro`	-	-	-	Video generation - $0.30/video/second

Features

Anthropic
- Extended Claude 3-7 Sonnet deprecation date from 2026-02-01 to 2026-02-19 - PR #15976
- Extended Claude Opus 4-0 deprecation date from 2025-03-01 to 2026-05-01 - PR #15976
- Removed Claude Haiku 3-5 deprecation date (previously 2025-03-01) - PR #15976
- Added Claude Opus 4-1, Claude Opus 4-0 20250513, Claude Sonnet 4 20250514 deprecation dates - PR #15976
- Added web search support for Claude Opus 4-1 - PR #15976
Bedrock
- Fix empty assistant message handling in AWS Bedrock Converse API to prevent 400 Bad Request errors - PR #15850
- Allow using ARNs when generating images via Bedrock - PR #15789
- Add per model group header forwarding for Bedrock Invoke API - PR #16042
- Preserve Bedrock inference profile IDs in health checks - PR #15947
- Added fallback logic for detecting file content-type when S3 returns generic type - When using Bedrock with S3-hosted files, if the S3 object's Content-Type is not correctly set (e.g., binary/octet-stream instead of image/png), Bedrock can now handle it correctly - PR #15635
Azure
- Add deprecation dates for Azure OpenAI models (gpt-4o-2024-08-06, gpt-4o-2024-11-20, gpt-4.1 series, o3-2025-04-16, text-embedding-3-small) - PR #15976
- Fix Azure OpenAI ContextWindowExceededError mapping from Azure errors - PR #15981
- Add handling for v1 under Azure API versions - PR #15984
- Fix azure doesn't accept extra body param - PR #16116
OpenAI
- Add deprecation dates for gpt-3.5-turbo-1106, gpt-4-0125-preview, gpt-4-1106-preview, o1-mini-2024-09-12 - PR #15976
- Add extended Sora-2 modality support (text + image inputs) - PR #15976
- Updated OpenAI Sora-2-Pro pricing to $0.30/video/second - PR #15976
OpenRouter
- Add Claude Haiku 4.5 pricing for OpenRouter - PR #15909
- Add base_url config with environment variables documentation - PR #15946
Mistral
- Add codestral-embed-2505 embedding model - PR #16071
Gemini (Google AI Studio + Vertex AI)
- Fix gemini request mutation for tool use - PR #16002
- Add gemini-embedding-001 pricing entry for Google GenAI API - PR #16078
- Changes to fix frequency_penalty and presence_penalty issue for gemini-2.5-pro model - PR #16041
DeepInfra
- Add vision support for Qwen/Qwen3-chat-32b model - PR #15976
Vercel AI Gateway
- Fix vercel_ai_gateway entry for glm-4.6 (moved from vercel_ai_gateway/glm-4.6 to vercel_ai_gateway/zai/glm-4.6) - PR #16084
Fireworks
- Don't add "accounts/fireworks/models" prefix for Fireworks Provider - PR #15938
Cohere
- Add OpenAI-compatible annotations support for Cohere v2 citations - PR #16038
Deepgram
- Handle Deepgram detected language when available - PR #16093

Bug Fixes

Xai
- Add Xai websearch cost tracking - PR #16001

New Provider Support

FAL AI
- Add FAL AI Image Generation support - PR #16067
OCI (Oracle Cloud Infrastructure)
- Add OCI Signer Authentication support - PR #16064

LLM API Endpoints

Features

Container API
- Add end-to-end OpenAI Container API support to LiteLLM SDK - PR #16136
- Add proxy support for container APIs - PR #16049
- Add logging support for Container API - PR #16049
- Add cost tracking support for containers with documentation - PR #16117
Responses API
- Respect LiteLLM-Disable-Message-Redaction header for Responses API - PR #15966
- Add /openai routes for responses API (Azure OpenAI SDK Compatibility) - PR #15988
- Redact reasoning summaries in ResponsesAPI output when message logging is disabled - PR #15965
- Support text.format parameter in Responses API for providers without native ResponsesAPIConfig - PR #16023
- Add LLM provider response headers to Responses API - PR #16091
Video Generation API
- Add custom_llm_provider support for video endpoints (non-generation) - PR #16121
- Fix documentation for videos - PR #15937
- Add OpenAI client usage documentation for videos and fix navigation visibility - PR #15996
Moderations API
- Moderations endpoint now respects api_base configuration parameter - PR #16087
Vector Stores
- Milvus - search vector store support - PR #16035
- Azure AI Vector Stores - support "virtual" indexes + create vector store on passthrough API - PR #16160
Passthrough Endpoints
- Support multi-part form data on passthrough - PR #16035

Management Endpoints / UI

Features

Virtual Keys
- Validation for Proxy Base URL in SSO Settings - PR #16082
- Test Key UI Embeddings support - PR #16065
- Add Key Type Select in Key Settings - PR #16034
- Key Already Exist Error Notification - PR #15993
Models + Endpoints
- Changed API Base from Select to Input in New LLM Credentials - PR #15987
- Remove limit from admin UI numerical input - PR #15991
- Config Models should not be editable - PR #16020
- Add tags in model creation - PR #16138
- Add Tags to update model - PR #16140
Guardrails
- Add Apply Guardrail Testing Playground - PR #16030
- Config Guardrails should not be editable and guardrail info fix - PR #16142
Cache Settings
- Allow setting cache settings on UI - PR #16143
Routing
- Allow setting all routing strategies, tag filtering on UI - PR #16139
Admin Settings
- Add license metadata to health/readiness endpoint - PR #15997
- Litellm Backend SSO Changes - PR #16029

Logging / Guardrail / Prompt Management Integrations

Features

OpenTelemetry
- Enable OpenTelemetry context propagation by external tracers - PR #15940
- Ensure error information is logged on OTEL - PR #15978
Langfuse
- Fix duplicate trace in langfuse_otel - PR #15931
- Support tool usage messages with Langfuse OTEL integration - PR #15932
DataDog
- Ensure key's metadata + guardrail is logged on DD - PR #15980
Opik
- Enhance requester metadata retrieval from API key auth - PR #15897
- User auth key metadata Documentation - PR #16004
SQS
- Add Base64 handling for SQS Logger - PR #16028
General
- Fix: User API key and team id and user id missing from custom callback is not misfiring - PR #15982

Guardrails

IBM Guardrails
- Update IBM Guardrails to correctly use SSL Verify argument - PR #15975
- Add additional detail to ibm_guardrails.md documentation - PR #15971
Model Armor
- Support during_call for model armor guardrails - PR #15970
Lasso Security
- Upgrade to Lasso API v3 and fix ULID generation - PR #15941
PANW Prisma AIRS
- Add per-request profile overrides to PANW Prisma AIRS - PR #16069
Grayswan
- Improve Grayswan guardrail documentation - PR #15875
Pillar AI
- Graceful degradation for pillar service when using litellm - PR #15857
General
- Ensure Key Guardrails are applied - PR #16025

Prompt Management

GitLab
- Add GitlabPromptCache and enable subfolder access - PR #15712

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Fix spend tracking for OCR/aOCR requests (log pages_processed + recognize OCRResponse) - PR #16070
Rate Limiting
- Add support for Batch API Rate limiting - PR1 adds support for input based rate limits - PR #16075
- Handle multiple rate limit types per descriptor and prevent IndexError - PR #16039

MCP Gateway

OAuth
- Add support for dynamic client registration - PR #15921
- Respect X-Forwarded- headers in OAuth endpoints - PR #16036

Performance / Loadbalancing / Reliability improvements

Memory Leak Fixes
- Fix: prevent httpx DeprecationWarning memory leak in AsyncHTTPHandler - PR #16024
- Fix: resolve memory accumulation caused by Pydantic 2.11+ deprecation warnings - PR #16110
- Fix(apscheduler): prevent memory leaks from jitter and frequent job intervals - PR #15846
Configuration
- Remove minimum validation for cache control injection index - PR #16149
- Fix prompt_caching.md: wrong prompt_tokens definition - PR #16044

Documentation Updates

Provider Documentation
- Use custom-llm-provider header in examples - PR #16055
- Litellm docs readme fixes - PR #16107
- Readme fixes add supported providers - PR #16109
Model References
- Add supports vision field to qwen-vl models in model_prices_and_context_window.json - PR #16106
General Documentation
- 1-79-0 docs - PR #15936
- Add minimum resource requirement for production - PR #16146

New Contributors

@RobGeada made their first contribution in PR #15975
@shanto12 made their first contribution in PR #15946
@dima-hx430 made their first contribution in PR #15976
@m-misiura made their first contribution in PR #15971
@ylgibby made their first contribution in PR #15947
@Somtom made their first contribution in PR #15909
@rodolfo-nobrega made their first contribution in PR #16023
@bernata made their first contribution in PR #15997
@AlbertDeFusco made their first contribution in PR #15881
@komarovd95 made their first contribution in PR #15789
@langpingxue made their first contribution in PR #15635
@OrionCodeDev made their first contribution in PR #16070
@sbinnee made their first contribution in PR #16078
@JetoPistola made their first contribution in PR #16106
@gvioss made their first contribution in PR #16093
@pale-aura made their first contribution in PR #16084
@tanvithakur94 made their first contribution in PR #16041
@li-boxuan made their first contribution in PR #16044
@1stprinciple made their first contribution in PR #15938
@raghav-stripe made their first contribution in PR #16137
@steve-gore-snapdocs made their first contribution in PR #16149

Full Changelog

View complete changelog on GitHub

v1.79.0-stable - Search APIs

2025-10-26T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.79.0-stable

pip install litellm
pip install litellm==1.79.0

Major Changes

Cohere models will now be routed to Cohere v2 API by default - PR #15722

Key Highlights

Search APIs - Native /v1/search endpoint with support for Perplexity, Tavily, Parallel AI, Exa AI, DataforSEO, and Google PSE with cost tracking
Vector Stores - Vertex AI Search API integration as vector store through LiteLLM with passthrough endpoint support
Guardrails Expansion - Apply guardrails across Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, and Anthropic Messages API via unified apply_guardrails function
New Guardrail Providers - Gray Swan, Dynamo AI, IBM Guardrails, Lasso Security v3, and Bedrock Guardrail apply_guardrail endpoint support
Video Generation API - Native support for OpenAI Sora-2 and Azure Sora-2 (Pro, Pro-High-Res) with cost tracking and logging support
Azure AI Speech (TTS) - Native Azure AI Speech integration with cost tracking for standard and HD voices

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Bedrock	`anthropic.claude-3-7-sonnet-20240620-v1:0`	200K	$3.60	$18.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Bedrock GovCloud	`us-gov-west-1/anthropic.claude-3-7-sonnet-20250219-v1:0`	200K	$3.60	$18.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Vertex AI	`mistral-medium-3`	128K	$0.40	$2.00	Chat, function calling, tool choice
Vertex AI	`codestral-2`	128K	$0.30	$0.90	Chat, function calling, tool choice
Bedrock	`amazon.titan-image-generator-v1`	-	-	-	Image generation - $0.008/image, $0.01/premium image
Bedrock	`amazon.titan-image-generator-v2`	-	-	-	Image generation - $0.008/image, $0.01/premium image
OpenAI	`sora-2`	-	-	-	Video generation - $0.10/video/second
Azure	`sora-2`	-	-	-	Video generation - $0.10/video/second
Azure	`sora-2-pro`	-	-	-	Video generation - $0.30/video/second
Azure	`sora-2-pro-high-res`	-	-	-	Video generation - $0.50/video/second

Features

Anthropic
- Fix cache_control incorrectly applied to all content items instead of last item only - PR #15699
- Forward anthropic-beta headers to Bedrock, VertexAI - PR #15700
- Change max_tokens value to match max_output_tokens for claude sonnet - PR #15715
Bedrock
- Add AWS us-gov-west-1 Claude 3.7 Sonnet costs - PR #15775
- Fix the date for sonnet 3.7 in govcloud - PR #15800
- Use proper bedrock model name in health check - PR #15808
- Support for embeddings_by_type Response Format in Bedrock Cohere Embed v1 - PR #15707
- Add titan image generations with cost tracking - PR #15916
Gemini
- Add imageConfig parameter for gemini-2.5-flash-image - PR #15530
- Replace deprecated gemini-1.5-pro-preview-0514 - PR #15852
- Update vertex ai gemini costs - PR #15911
Ollama
- Set 'think' to False when reasoning effort is minimal/none/disable - PR #15763
- Handle parsing ollama chunk error - PR #15717
Vertex AI
- Add mistral medium 3 and Codestral 2 on vertex - PR #15887
Databricks
- Allow prompt caching to be used for Anthropic Claude on Databricks - PR #15801
Azure
- Add Azure AVA TTS integration - PR #15749
- Add Azure AVA (Speech AI) Cost Tracking - PR #15754
- Azure AI Speech - Ensure voice is mapped from request body to SSML body, allow sending role and style - PR #15810
- Add Azure support for video generation functionality (Sora-2) - PR #15901
OpenAI
- OpenAI videos refactoring - PR #15900
General
- Read from custom-llm-provider header - PR #15528

LLM API Endpoints

Features

Responses API
- Add gpt 4.1 pricing for response endpoint - PR #15593
- Fix Incorrect status value in responses api with gemini - PR #15753
- Simplify reasoning item handling for gpt-5-codex - PR #15815
- ErrorEvent ValidationError when OpenAI Responses API returns nested error structure - PR #15804
- Fix reasoning item ID auto-generation causing encrypted content verification errors - PR #15782
- Support tags in metadata - PR #15867
- Security: prevent User A from retrieving User B's response, if response.id is leaked - PR #15757
Batch API
- Add pre and post call for list batches - PR #15673
- Add function responsible to call precall - PR #15636
- Fix "User default_user_id does not have access to the object" when object not in db - PR #15873
OCR API
- Add Azure AI - OCR to docs - PR #15768
- Add mode + Health check support for OCR models - PR #15767
Search API
- Add def search() APIs for Web Search - Perplexity API - PR #15769
- Add Tavily Search API - PR #15770
- Add Parallel AI - Search API - PR #15772
- Add EXA AI Search API to LiteLLM - PR #15774
- Add /search endpoint on LiteLLM Gateway - PR #15780
- Add DataforSEO Search API - PR #15817
- Add Google PSE Search Provider - PR #15816
- Add cost tracking for Search API requests - Google PSE, Tavily, Parallel AI, Exa AI - PR #15821
- Backend: Allow storing configured Search APIs in DB - PR #15862
- Exa Search API - ensure request params are sent to Exa AI - PR #15855
Vector Stores
- Support Vertex AI Search API as vector store through LiteLLM - PR #15781
- Azure AI - Search Vector Stores - PR #15873
- VertexAI Search Vector Store - Passthrough endpoint support + Vector store search Cost tracking support - PR #15824
- Don't raise error if managed object is not found - PR #15873
- Show config.yaml vector stores on UI - PR #15873
- Cost tracking for search spend - PR #15859
Images API
- Pass user-defined headers and extra_headers to image-edit calls - PR #15811
Video Generation API
- Add Azure support for video generation functionality (Sora-2, Sora-2-Pro, Sora-2-Pro-High-Res) - PR #15901
- OpenAI video generation refactoring (Sora-2) - PR #15900
Bedrock /invoke
- Fix: Hooks broken on /bedrock passthrough due to missing metadata - PR #15849
Realtime API
- Fix: OpenAI Realtime API integration fails due to websockets.exceptions.PayloadTooBig error - PR #15751

Management Endpoints / UI

Features

Passthrough
- Set auth on passthrough endpoints, on the UI - PR #15778
- Fix pass-through endpoint budget enforcement bug - PR #15805
Organizations
- Allow org admins to create teams on UI - PR #15924
Search Tools
- UI - Search Tools, allow adding search tools on UI + testing search - PR #15871
- UI - Add logos for search providers - PR #15872
General
- Fix routing for custom server root path - PR #15701

Logging / Guardrail / Prompt Management Integrations

Features

OpenTelemetry
- Fix OpenTelemetry Logging functionality - PR #15645
- Fix issue where headers were not being split correctly - PR #15916
Sentry
- Add SENTRY_ENVIRONMENT configuration for Sentry integration - PR #15760
Helicone
- Fix JSON serialization error in Helicone logging by removing OpenTelemetry span from metadata - PR #15728
MLFlow
- Fix MLFlow tags - split request_tags into (key, val) if request_tag has colon - PR #15914
General
- Rename configured_cold_storage_logger to cold_storage_custom_logger - PR #15798

Guardrails

Gray Swan
- Add GraySwan Guardrails support - PR #15756
- Rename GraySwan to Gray Swan - PR #15771
Dynamo AI
- New Guardrail - Dynamo AI Guardrail - PR #15920
IBM Guardrails
- IBM Guardrails integration - PR #15924
Lasso Security
- Add v3 API Support - PR #12452
- Fixed lasso import config, redis cluster hash tags for test keys - PR #15917
Bedrock Guardrails
- Implement Bedrock Guardrail apply_guardrail endpoint support - PR #15892
General
- Guardrails - Responses API, Image Gen, Text completions, Audio transcriptions, Audio Speech, Rerank, Anthropic Messages API support via the unified apply_guardrails function - PR #15706

Spend Tracking, Budgets and Rate Limiting

Rate Limiting
- Support absolute RPM/TPM in priority_reservation - PR #15813
- Org level tpm/rpm limits + Team tpm/rpm validation when assigned to org - PR #15549

MCP Gateway

OAuth
- Auth Header Fix for MCP Tool Call - PR #15736
- Add response_type + PKCE parameters to OAuth authorization endpoint - PR #15720

Performance / Loadbalancing / Reliability improvements

Database
- Minimize the occurrence of deadlocks - PR #15281
Redis
- Apply max_connections configuration to Redis async client - PR #15797
Caching
- Add documentation for enable_caching_on_provider_specific_optional_params setting - PR #15885

Documentation Updates

Provider Documentation
- Update worker recommendation - PR #15702
- Fix the wrong request body in json mode doc - PR #15729
- Add details in docs - PR #15721
- Add responses api on openai docs - PR #15866
- Add OpenAI responses api - PR #15868

New Contributors

@tlecomte made their first contribution in PR #15528
@tomhaynes made their first contribution in PR #15645
@talalryz made their first contribution in PR #15720
@1vinodsingh1 made their first contribution in PR #15736
@nuernber made their first contribution in PR #15775
@Thomas-Mildner made their first contribution in PR #15760
@javiergarciapleo made their first contribution in PR #15721
@lshgdut made their first contribution in PR #15717
@kk-wangjifeng made their first contribution in PR #15530
@anthonyivn2 made their first contribution in PR #15801
@romanglo made their first contribution in PR #15707
@mythral made their first contribution in PR #15859
@mubashirosmani made their first contribution in PR #15866
@CAFxX made their first contribution in PR #15281
@reflection made their first contribution in PR #15914
@shadielfares made their first contribution in PR #15917

PR Count Summary

10/26/2025

New Models / Updated Models: 20
LLM API Endpoints: 29
Management Endpoints / UI: 5
Logging / Guardrail / Prompt Management Integrations: 10
Spend Tracking, Budgets and Rate Limiting: 2
MCP Gateway: 2
Performance / Loadbalancing / Reliability improvements: 3
Documentation Updates: 5

Full Changelog

View complete changelog on GitHub

v1.78.5-stable - Native OCR Support

2025-10-18T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.78.5-stable

pip install litellm
pip install litellm==1.78.5

Key Highlights

Native OCR Endpoints - Native /v1/ocr endpoint support with cost tracking for Mistral OCR and Azure AI OCR
Global Vendor Discounts - Specify global vendor discount percentages for accurate cost tracking and reporting
Team Spending Reports - Team admins can now export detailed spending reports for their teams
Claude Haiku 4.5 - Day 0 support for Claude Haiku 4.5 across Bedrock, Vertex AI, and OpenRouter with 200K context window
GPT-5-Codex - Support for GPT-5-Codex via Responses API on OpenAI and Azure
Performance Improvements - Major router optimizations: O(1) model lookups, 10-100x faster shallow copy, 30-40% faster timing calls, and O(n) to O(1) hash generation

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-haiku-4-5`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Anthropic	`claude-haiku-4-5-20251001`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching, computer use
Bedrock	`anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`jp.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (JP Cross-Region)
Bedrock	`us.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (US region)
Bedrock	`eu.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (EU region)
Bedrock	`apac.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (APAC region)
Bedrock	`au.anthropic.claude-haiku-4-5-20251001-v1:0`	200K	$1.10	$5.50	Chat, reasoning, vision, function calling, prompt caching (AU region)
Vertex AI	`vertex_ai/claude-haiku-4-5@20251001`	200K	$1.00	$5.00	Chat, reasoning, vision, function calling, prompt caching
OpenAI	`gpt-5`	272K	$1.25	$10.00	Chat, responses API, reasoning, vision, function calling, prompt caching
OpenAI	`gpt-5-codex`	272K	$1.25	$10.00	Responses API mode
Azure	`azure/gpt-5-codex`	272K	$1.25	$10.00	Responses API mode
Gemini	`gemini-2.5-flash-image`	32K	$0.30	$2.50	Image generation (GA - Nano Banana) - $0.039/image
ZhipuAI	`glm-4.6`	-	-	-	Chat completions

Features

OpenAI
- GPT-5 return reasoning content via /chat/completions + GPT-5-Codex working on Claude Code - PR #15441
Anthropic
- Reduce claude-4-sonnet max_output_tokens to 64k - PR #15409
- Added claude-haiku-4.5 - PR #15579
- Add support for thinking blocks and redacted thinking blocks in Anthropic v1/messages API - PR #15501
Bedrock
- Add anthropic.claude-haiku-4-5-20251001-v1:0 on Bedrock, VertexAI - PR #15581
- Add Claude Haiku 4.5 support for Bedrock global and US regions - PR #15650
- Add Claude Haiku 4.5 support for Bedrock Other regions - PR #15653
- Add JP Cross-Region Inference jp.anthropic.claude-haiku-4-5-20251001 - PR #15598
- Fix: bedrock-pricing-geo-inregion-cross-region / add Global Cross-Region Inference - PR #15685
- Fix: Support us-gov prefix for AWS GovCloud Bedrock models - PR #15626
- Fix GPT-OSS in Bedrock now supports streaming. Revert fake streaming - PR #15668
Gemini
- Feat(pricing): Add Gemini 2.5 Flash Image (Nano Banana) in GA - PR #15557
- Fix: Gemini 2.5 Flash Image should not have supports_web_search=true - PR #15642
- Remove penalty params as supported params for gemini preview model - PR #15503
Ollama
- Fix(ollama/chat): correctly map reasoning_effort to think in requests - PR #15465
OpenRouter
- Add anthropic/claude-sonnet-4.5 to OpenRouter cost map - PR #15472
- Prompt caching for anthropic models with OpenRouter - PR #15535
- Get completion cost directly from OpenRouter - PR #15448
- Fix OpenRouter Claude Opus 4 model naming - PR #15495
CometAPI
- Fix(cometapi): improve CometAPI provider support (embeddings, image generation, docs) - PR #15591
Lemonade
- Adding new models to the lemonade provider - PR #15554
Watson X
- Fix (pricing): Fix pricing for watsonx model family for various models - PR #15670
Vercel AI Gateway
- Add glm-4.6 model to pricing configuration - PR #15679
Vertex AI
- Add Vertex AI Discovery Engine Rerank Support - PR #15532

Bug Fixes

Anthropic
- Fix: Pricing for Claude Sonnet 4.5 in US regions is 10x too high - PR #15374
OpenRouter
- Change gpt-5-codex support in model_price json - PR #15540
Bedrock
- Fix filtering headers for signature calcs - PR #15590
General
- Add native reasoning and streaming support flag for gpt-5-codex - PR #15569

LLM API Endpoints

Features

Responses API
- Responses API - enable calling anthropic/gemini models in Responses API streaming in openai ruby sdk + DB - sanity check pending migrations before startup - PR #15432
- Add support for responses mode in health check - PR #15658
OCR API
- Feat: Add native litellm.ocr() functions - PR #15567
- Feat: Add /ocr route on LiteLLM AI Gateway - Adds support for native Mistral OCR calling - PR #15571
- Feat: Add Azure AI Mistral OCR Integration - PR #15572
- Feat: Native /ocr endpoint support - PR #15573
- Feat: Add Cost Tracking for /ocr endpoints - PR #15678
/generateContent
- Fix: GEMINI - CLI - add google_routes to llm_api_routes - PR #15500
- Fix Pydantic validation error for citationMetadata.citationSources in Google GenAI responses - PR #15592
Images API
- Fix: Dall-e-2 for Image Edits API - PR #15604
Bedrock Passthrough
- Feat: Allow calling /invoke, /converse routes through AI Gateway + models on config.yaml - PR #15618

Bugs

General
- Fix: Convert object to a correct type - PR #15634
- Bug Fix: Tags as metadata dicts were raising exceptions - PR #15625
- Add type hint to function_to_dict and fix typo - PR #15580

Management Endpoints / UI

Features

Virtual Keys
- Docs: Key Rotations - PR #15455
- Fix: UI - Key Max Budget Removal Error Fix - PR #15672
- litellm_Key Settings Max Budget Removal Error Fix - PR #15669
Teams
- Feat: Allow Team Admins to export a report of the team spending - PR #15542
Passthrough
- Feat: Passthrough - allow admin to give access to specific passthrough endpoints - PR #15401
SCIM v2
- Feat(scim_v2.py): if group.id doesn't exist, use external id + Passthrough - ensure updates and deletions persist across instances - PR #15276
SSO
- Feat: UI SSO - Add PKCE for OKTA SSO - PR #15608
- Fix: Separate OAuth M2M authentication from UI SSO + Handle Introspection endpoint for Oauth2 - PR #15667
- Fix/entraid app roles jwt claim clean - PR #15583

Logging / Guardrail / Prompt Management Integrations

Guardrails

General
- Fix apply_guardrail endpoint returning raw string instead of ApplyGuardrailResponse - PR #15436
- Fix: Ensure guardrail memory sync after database updates - PR #15633
- Feat: add guardrail for image generation - PR #15619
- Feat: Add Guardrails for /v1/messages and /v1/responses API - PR #15686
Pillar Security
- Feature: update pillar security integration to support no persistence mode in litellm proxy - PR #15599

Prompt Management

General
- Small fix code snippet custom_prompt_management.md - PR #15544

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Feat: Cost Tracking - specify a global vendor discount for costs - PR #15546
- Feat: UI - Allow setting Provider Discounts on UI - PR #15550
Budgets
- Fix: improve budget clarity - PR #15682

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- Perf(router): use shallow copy instead of deepcopy for model aliases - 10-100x faster than deepcopy on nested dict structures - PR #15576
- Perf(router): optimize string concatenation in hash generation - Improves time complexity from O(n²) to O(n) - PR #15575
- Perf(router): optimize model lookups with O(1) data structures - Replace O(n) scans with index map lookups - PR #15578
- Perf(router): optimize model lookups with O(1) index maps - Use model_id_to_deployment_index_map and model_name_to_deployment_indices for instant lookups - PR #15574
- Perf(router): optimize timing functions in completion hot path - Use time.perf_counter() for duration measurements and time.monotonic() for timeout calculations, providing 30-40% faster timing calls - PR #15617
SSL/TLS Performance
- Feat(ssl): add configurable ECDH curve for TLS performance - Configure via ssl_ecdh_curve setting to disable PQC on OpenSSL 3.x for better performance - PR #15617
Token Counter
- Fix(token-counter): extract model_info from deployment for custom_tokenizer - PR #15680
Performance Metrics
- Add: perf summary - PR #15458
CI/CD
- Fix: CI/CD - Missing env key & Linter type error - PR #15606

Documentation Updates

Provider Documentation
- Litellm docs 10 11 2025 - PR #15457
- Docs: add ecs deployment guide - PR #15468
- Docs: Update benchmark results - PR #15461
- Fix: add missing context to benchmark docs - PR #15688
General
- Fixed a few typos - PR #15267

New Contributors

@jlan-nl made their first contribution in PR #15374
@ImadSaddik made their first contribution in PR #15267
@huangyafei made their first contribution in PR #15472
@mubashir1osmani made their first contribution in PR #15468
@kowyo made their first contribution in PR #15465
@dhruvyad made their first contribution in PR #15448
@davizucon made their first contribution in PR #15544
@FelipeRodriguesGare made their first contribution in PR #15540
@ndrsfel made their first contribution in PR #15557
@shinharaguchi made their first contribution in PR #15598
@TensorNull made their first contribution in PR #15591
@TeddyAmkie made their first contribution in PR #15583
@aniketmaurya made their first contribution in PR #15580
@eddierichter-amd made their first contribution in PR #15554
@konekohana made their first contribution in PR #15535
@Classic298 made their first contribution in PR #15495
@afogel made their first contribution in PR #15599
@orolega made their first contribution in PR #15633
@LucasSugi made their first contribution in PR #15634
@uc4w6c made their first contribution in PR #15619
@Sameerlite made their first contribution in PR #15658
@yuneng-jiang made their first contribution in PR #15672
@Nikro made their first contribution in PR #15680

Full Changelog

View complete changelog on GitHub

v1.78.0-stable - MCP Gateway: Control Tool Access by Team, Key

2025-10-11T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.78.0-stable

pip install litellm
pip install litellm==1.78.0.post1

Key Highlights

MCP Gateway - Control Tool Access by Team, Key - Control MCP tool access by team/key.
Performance Improvements - 70% Lower p99 Latency
GPT-5 Pro & GPT-Image-1-Mini - Day 0 support for OpenAI's GPT-5 Pro (400K context) and gpt-image-1-mini image generation
EnkryptAI Guardrails - New guardrail integration for content moderation
Tag-Based Budgets - Support for setting budgets based on request tags

MCP Gateway - Control Tool Access by Team, Key

Proxy admins can now control MCP tool access by team or key. This makes it easy to grant different teams selective access to tools from the same MCP server.

For example, you can now give your Engineering team access to list_repositories, create_issue, and search_code tools, while Sales only gets search_code and close_issue tools.

This makes it easier for Proxy Admins to govern MCP Tool Access.

Get Started

Performance - 70% Lower p99 Latency

This release cuts p99 latency by 70% on LiteLLM AI Gateway, making it even better for low-latency use cases.

These gains come from two key enhancements:

Reliable Sessions

Added support for shared sessions with aiohttp. The shared_session parameter is now consistently used across all calls, enabling connection pooling.

Faster Routing

A new model_name_to_deployment_indices hash map replaces O(n) list scans in _get_all_deployments() with O(1) hash lookups, boosting routing performance and scalability.

As a result, performance improved across all latency percentiles:

Median latency: 110 ms → 100 ms (−9.1%)
p95 latency: 440 ms → 150 ms (−65.9%)
p99 latency: 810 ms → 240 ms (−70.4%)
Average latency: 310 ms → 111.73 ms (−64.0%)

Test Setup

Locust

Concurrent users: 1,000
Ramp-up: 500

System Specs

Database was used
CPU: 4 vCPUs
Memory: 8 GB RAM
LiteLLM Workers: 4
Instances: 4

Configuration (config.yaml)

View the complete configuration: gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script: gist.github.com/AlexsanderHamir/no_cache_hits.py

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5-pro`	400K	$15.00	$120.00	Responses API, reasoning, vision, function calling, prompt caching, web search
OpenAI	`gpt-5-pro-2025-10-06`	400K	$15.00	$120.00	Responses API, reasoning, vision, function calling, prompt caching, web search
OpenAI	`gpt-image-1-mini`	-	$2.00/img	-	Image generation and editing
OpenAI	`gpt-realtime-mini`	128K	$0.60	$2.40	Realtime audio, function calling
Azure AI	`azure_ai/Phi-4-mini-reasoning`	131K	$0.08	$0.32	Function calling
Azure AI	`azure_ai/Phi-4-reasoning`	32K	$0.125	$0.50	Function calling, reasoning
Azure AI	`azure_ai/MAI-DS-R1`	128K	$1.35	$5.40	Reasoning, function calling
Bedrock	`au.anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.30	$16.50	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`global.anthropic.claude-sonnet-4-20250514-v1:0`	1M	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`cohere.embed-v4:0`	128K	$0.12	-	Embeddings, image input support
OCI	`oci/cohere.command-latest`	128K	$1.56	$1.56	Function calling
OCI	`oci/cohere.command-a-03-2025`	256K	$1.56	$1.56	Function calling
OCI	`oci/cohere.command-plus-latest`	128K	$1.56	$1.56	Function calling
Together AI	`together_ai/moonshotai/Kimi-K2-Instruct-0905`	262K	$1.00	$3.00	Function calling
Together AI	`together_ai/Qwen/Qwen3-Next-80B-A3B-Instruct`	262K	$0.15	$1.50	Function calling
Together AI	`together_ai/Qwen/Qwen3-Next-80B-A3B-Thinking`	262K	$0.15	$1.50	Function calling
Vertex AI	MedGemma models	Varies	Varies	Varies	Medical-focused Gemma models on custom endpoints
Watson X	27 new foundation models	Varies	Varies	Varies	Granite, Llama, Mistral families

Features

OpenAI
- Add GPT-5 Pro model configuration and documentation - PR #15258
- Add stop parameter to non-supported params for GPT-5 - PR #15244
- Day 0 Support, Add gpt-image-1-mini - PR #15259
- Add gpt-realtime-mini support - PR #15283
- Add gpt-5-pro-2025-10-06 to model costs - PR #15344
- Minimal fix: gpt5 models should not go on cooldown when called with temperature!=1 - PR #15330
Snowflake Cortex
- Add function calling support for Snowflake Cortex REST API - PR #15221
Gemini
- Fix header forwarding for Gemini/Vertex AI providers in proxy mode - PR #15231
Azure
- Removed stop param from unsupported azure models - PR #15229
- Fix(azure/responses): remove invalid status param from azure call - PR #15253
- Add new Azure AI models with pricing details - PR #15387
- AzureAD Default credentials - select credential type based on environment - PR #14470
Bedrock
- Add Global Cross-Region Inference - PR #15210
- Add Cohere Embed v4 support for AWS Bedrock - PR #15298
- Fix(bedrock): include cacheWriteInputTokens in prompt_tokens calculation - PR #15292
- Add Bedrock AU Cross-Region Inference for Claude Sonnet 4.5 - PR #15402
- Converse → /v1/messages streaming doesn't handle parallel tool calls with Claude models - PR #15315
Vertex AI
- Implement Context Caching for Vertex AI provider - PR #15226
- Support for Vertex AI Gemma Models on Custom Endpoints - PR #15397
- VertexAI - gemma model family support (custom endpoints) - PR #15419
- VertexAI Gemma model family streaming support + Added MedGemma - PR #15427
OCI
- Add OCI Cohere support with tool calling and streaming capabilities - PR #15365
Watson X
- Add Watson X foundation model definitions to model_prices_and_context_window.json - PR #15219
- Watsonx - Apply correct prompt templates for openai/gpt-oss model family - PR #15341
OpenRouter
- Fix - (openrouter): move cache_control to content blocks for claude/gemini - PR #15345
- Fix - OpenRouter cache_control to only apply to last content block - PR #15395
Together AI
- Add new together models - PR #15383

Bug Fixes

General
- Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - PR #15116
- Fix reasoning response ID - PR #15265
- Fix issue with parsing assistant messages - PR #15320
- Fix litellm_param based costing - PR #15336
- Fix lint errors - PR #15406

LLM API Endpoints

Features

Responses API
- Added streaming support for response api streaming image generation - PR #15269
- Add native Responses API support for litellm_proxy provider - PR #15347
- Temporarily relax ResponsesAPIResponse parsing to support custom backends (e.g., vLLM) - PR #15362
Files API
- Feat(files): add @client decorator to file operations - PR #15339
/generateContent
- Fix gemini cli by actually streaming the response - PR #15264
Azure Passthrough
- Azure - passthrough support with router models - PR #15240

Bugs

General
- Fix x-litellm-cache-key header not being returned on cache hit - PR #15348

Management Endpoints / UI

Features

Proxy CLI Auth
- Proxy CLI - dont store existing key in the URL, store it in the state param - PR #15290
Models + Endpoints
- Make PATCH /model/{model_id}/update handle team_id consistently with POST /model/new - PR #15297
- Feature: adds Infinity as a provider in the UI - PR #15285
- Fix: model + endpoints page crash when config file contains router_settings.model_group_alias - PR #15308
- Models & Endpoints Initial Refactor - PR #15435
- Litellm UI API Reference page updates - PR #15438
Teams
- Teams page: new column "Your Role" on the teams table - PR #15384
- LiteLLM Dashboard Teams UI refactor - PR #15418
UI Infrastructure
- Added prettier to autoformat frontend - PR #15215
- Adds turbopack to the npm run dev command in UI to build faster during development - PR #15250
- (perf) fix: Replaces bloated key list calls with lean key aliases endpoint - PR #15252
- Potentially fixes a UI spasm issue with an expired cookie - PR #15309
- LiteLLM UI Refactor Infrastructure - PR #15236
- Enforces removal of unused imports from UI - PR #15416
- Fix: usage page >> Model Activity >> spend per day graph: y-axis clipping on large spend values - PR #15389
- Updates guardrail provider logos - PR #15421
Admin Settings
- Fix: Router settings do not update despite success message - PR #15249
- Fix: Prevents DB from accidentally overriding config file values if they are empty in DB - PR #15340
SSO
- SSO - support EntraID app roles - PR #15351

Logging / Guardrail / Prompt Management Integrations

Features

PostHog
- Feat: posthog per request api key - PR #15379

Guardrails

EnkryptAI
- Add EnkryptAI Guardrails on LiteLLM - PR #15390

Spend Tracking, Budgets and Rate Limiting

Tag Management
- Tag Management - Add support for setting tag based budgets - PR #15433
Dynamic Rate Limiter v3
- QA/Fixes - Dynamic Rate Limiter v3 - final QA - PR #15311
- Fix dynamic Rate limiter v3 - inserting litellm_model_saturation - PR #15394
Shared Health Check
- Implement Shared Health Check State Across Pods - PR #15380

MCP Gateway

Tool Control
- MCP Gateway - UI - Select allowed tools for Key, Teams - PR #15241
- MCP Gateway - Backend - Allow storing allowed tools by team/key - PR #15243
- MCP Gateway - Fine-grained Database Object Storage Control - PR #15255
- MCP Gateway - Litellm mcp fixes team control - PR #15304
- MCP Gateway - QA/Fixes - Ensure Team/Key level enforcement works for MCPs - PR #15305
- Feature: Include server_name in /v1/mcp/server/health endpoint response - PR #15431
OpenAPI Integration
- MCP - support converting OpenAPI specs to MCP servers - PR #15343
- MCP - specify allowed params per tool - PR #15346
Configuration
- MCP - support setting CA_BUNDLE_PATH - PR #15253
- Fix: Ensure MCP client stays open during tool call - PR #15391
- Remove hardcoded "public" schema in migration.sql - PR #15363

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- Fix - Router: add model_name index for O(1) deployment lookups - PR #15113
- Refactor Utils: extract inner function from client - PR #15234
- Fix Networking: remove limitations - PR #15302
Session Management
- Fix - Sessions not being shared - PR #15388
- Fix: remove panic from hot path - PR #15396
- Fix - shared session parsing and usage issue - PR #15440
- Fix: handle closed aiohttp sessions - PR #15442
- Fix: prevent session leaks when recreating aiohttp sessions - PR #15443
SSL/TLS Performance
- Perf: optimize SSL/TLS handshake performance with prioritized cipher - PR #15398
Dependencies
- Upgrades tenacity version to 8.5.0 - PR #15303
Data Masking
- Fix - SensitiveDataMasker converts lists to string - PR #15420

General AI Gateway Improvements

Security

General
- Fix: redact AWS credentials when redact_user_api_key_info enabled - PR #15321

Documentation Updates

Provider Documentation
- Update doc: perf update - PR #15211
- Add W&B Inference documentation - PR #15278
Deployment
- Deletion of docker-compose buggy comment that cause config.yaml based startup fail - PR #15425

New Contributors

@Gal-bloch made their first contribution in PR #15219
@lcfyi made their first contribution in PR #15315
@ashengstd made their first contribution in PR #15362
@vkolehmainen made their first contribution in PR #15363
@jlan-nl made their first contribution in PR #15330
@BCook98 made their first contribution in PR #15402
@PabloGmz96 made their first contribution in PR #15425

Full Changelog

v1.77.7-stable - 2.9x Lower Median Latency

2025-10-04T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.77.7.rc.1

pip install litellm
pip install litellm==1.77.7.rc.1

Key Highlights

Dynamic Rate Limiter v3 - Automatically maximizes throughput when capacity is available (< 80% saturation) by allowing lower-priority requests to use unused capacity, then switches to fair priority-based allocation under high load (≥ 80%) to prevent blocking
Major Performance Improvements - 2.9x lower median latency at 1,000 concurrent users.
Claude Sonnet 4.5 - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
MCP Gateway Enhancements - Fine-grained tool control, server permissions, and forwardable headers
AMD Lemonade & Nvidia NIM - New provider support for AMD Lemonade and Nvidia NIM Rerank
GitLab Prompt Management - GitLab-based prompt management integration

Performance - 2.9x Lower Median Latency

This update removes LiteLLM router inefficiencies, reducing complexity from O(M×N) to O(1). Previously, it built a new array and ran repeated checks like data["model"] in llm_router.get_model_ids(). Now, a direct ID-to-deployment map eliminates redundant allocations and scans.

As a result, performance improved across all latency percentiles:

Median latency: 320 ms → 110 ms (−65.6%)
p95 latency: 850 ms → 440 ms (−48.2%)
p99 latency: 1,400 ms → 810 ms (−42.1%)
Average latency: 864 ms → 310 ms (−64%)

Test Setup

Locust

Concurrent users: 1,000
Ramp-up: 500

System Specs

CPU: 4 vCPUs
Memory: 8 GB RAM
LiteLLM Workers: 4
Instances: 4

Configuration (config.yaml)

View the complete configuration: gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script: gist.github.com/AlexsanderHamir/no_cache_hits.py

MCP OAuth 2.0 Support

This release adds support for OAuth 2.0 Client Credentials for MCP servers. This is great for Internal Dev Tools use-cases, as it enables your users to call MCP servers, with their own credentials. E.g. Allowing your developers to call the Github MCP, with their own credentials.

Set it up today on Claude Code

Scheduled Key Rotations

This release brings support for scheduling virtual key rotations on LiteLLM AI Gateway.

From this release you can enforce Virtual Keys to rotate on a schedule of your choice e.g every 15 days/30 days/60 days etc.

This is great for Proxy Admins who need to enforce security policies for production workloads.

Get Started

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-sonnet-4-5`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Anthropic	`claude-sonnet-4-5-20250929`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Bedrock	`eu.anthropic.claude-sonnet-4-5-20250929-v1:0`	200K	$3.00	$15.00	Chat, reasoning, vision, function calling, prompt caching
Azure AI	`azure_ai/grok-4`	131K	$5.50	$27.50	Chat, reasoning, function calling, web search
Azure AI	`azure_ai/grok-4-fast-reasoning`	131K	$0.43	$1.73	Chat, reasoning, function calling, web search
Azure AI	`azure_ai/grok-4-fast-non-reasoning`	131K	$0.43	$1.73	Chat, function calling, web search
Azure AI	`azure_ai/grok-code-fast-1`	131K	$3.50	$17.50	Chat, function calling, web search
Groq	`groq/moonshotai/kimi-k2-instruct-0905`	Context varies	Pricing varies	Pricing varies	Chat, function calling
Ollama	Ollama Cloud models	Varies	Free	Free	Self-hosted models via Ollama Cloud

Features

Anthropic
- Add new claude-sonnet-4-5 model family with tiered pricing above 200K tokens - PR #15041
- Add anthropic/claude-sonnet-4-5 to model price json with prompt caching support - PR #15049
- Add 200K prices for Sonnet 4.5 - PR #15140
- Add cost tracking for /v1/messages in streaming response - PR #15102
- Add /v1/messages/count_tokens to Anthropic routes for non-admin user access - PR #15034
Gemini
- Ignore type param for gemini tools - PR #15022
Vertex AI
- Add LiteLLM Overhead metric for VertexAI - PR #15040
- Support googlemap grounding in vertex ai - PR #15179
Azure
- Add azure_ai grok-4 model family - PR #15137
- Use the extra_query parameter for GET requests in Azure Batch - PR #14997
- Use extra_query for download results (Batch API) - PR #15025
- Add support for Azure AD token-based authorization - PR #14813
Ollama
- Add ollama cloud models - PR #15008
Groq
- Add groq/moonshotai/kimi-k2-instruct-0905 - PR #15079
OpenAI
- Add support for GPT 5 codex models - PR #14841
DeepInfra
- Update DeepInfra model data refresh with latest pricing - PR #14939
Bedrock
- Add JP Cross-Region Inference - PR #15188
- Add "eu.anthropic.claude-sonnet-4-5-20250929-v1:0" - PR #15181
- Add twelvelabs bedrock Async Invoke Support - PR #14871
Nvidia NIM
- Add Nvidia NIM Rerank Support - PR #15152

Bug Fixes

VLLM
- Fix response_format bug in hosted vllm audio_transcription - PR #15010
- Fix passthrough of atranscription into kwargs going to upstream provider - PR #15005
OCI
- Fix OCI Generative AI Integration when using Proxy - PR #15072
General
- Fix: Authorization header to use correct "Bearer" capitalization - PR #14764
- Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - PR #15116
- Update request handling for original exceptions - PR #15013

New Provider Support

AMD Lemonade
- Add AMD Lemonade provider support - PR #14840

LLM API Endpoints

Features

Responses API
- Return Cost for Responses API Streaming requests - PR #15053
/generateContent
- Add full support for native Gemini API translation - PR #15029
Passthrough Gemini Routes
- Add Gemini generateContent passthrough cost tracking - PR #15014
- Add streamGenerateContent cost tracking in passthrough - PR #15199
Passthrough Vertex AI Routes
- Add cost tracking for Vertex AI Passthrough /predict endpoint - PR #15019
- Add cost tracking for Vertex AI Live API WebSocket Passthrough - PR #14956
General
- Preserve Whitespace Characters in Model Response Streams - PR #15160
- Add provider name to payload specification - PR #15130
- Ensure query params are forwarded from origin url to downstream request - PR #15087

Management Endpoints / UI

Features

Virtual Keys
- Ensure LLM_API_KEYs can access pass through routes - PR #15115
- Support 'guaranteed_throughput' when setting limits on keys belonging to a team - PR #15120
Models + Endpoints
- Ensure OCI secret fields not shared on /models and /v1/models endpoints - PR #15085
- Add snowflake on UI - PR #15083
- Make UI theme settings publicly accessible for custom branding - PR #15074
Admin Settings
- Ensure OTEL settings are saved in DB after set on UI - PR #15118
- Top api key tags - PR #15151, PR #15156
MCP
- show health status of MCP servers - PR #15185
- allow setting extra headers on the UI - PR #15185
- allow editing allowed tools on the UI - PR #15185

Bug Fixes

Virtual Keys
- (security) prevent user key from updating other user keys - PR #15201
- (security) don't return all keys with blank key alias on /v2/key/info - PR #15201
- Fix Session Token Cookie Infinite Logout Loop - PR #15146
Models + Endpoints
- Make UI theme settings publicly accessible for custom branding - PR #15074
Teams
- fix failed copy to clipboard for http ui - PR #15195
Logs
- fix logs page render logs on filter lookup - PR #15195
- fix lookup list of end users (migrate to more efficient /customers/list lookup) - PR #15195
Test key
- update selected model on key change - PR #15197
Dashboard
- Fix LiteLLM model name fallback in dashboard overview - PR #14998

Logging / Guardrail / Prompt Management Integrations

Features

OpenTelemetry
- Use generation_name for span naming in logging method - PR #14799
Langfuse
- Handle non-serializable objects in Langfuse logging - PR #15148
- Set usage_details.total in langfuse integration - PR #15015
Prometheus
- support custom metadata labels on key/team - PR #15094

Guardrails

Javelin
- Add Javelin standalone guardrails integration for LiteLLM Proxy - PR #14983
- Add logging for important status fields in guardrails - PR #15090
- Don't run post_call guardrail if no text returned from Bedrock - PR #15106

Prompt Management

GitLab
- GitLab based Prompt manager - PR #14988

Spend Tracking, Budgets and Rate Limiting

Cost Tracking
- Proxy: end user cost tracking in the responses API - PR #15124
Parallel Request Limiter v3
- Use well known redis cluster hashing algorithm - PR #15052
- Fixes to dynamic rate limiter v3 - add saturation detection - PR #15119
- Dynamic Rate Limiter v3 - fixes for detecting saturation + fixes for post saturation behavior - PR #15192
Teams
- Add model specific tpm/rpm limits to teams on LiteLLM - PR #15044

MCP Gateway

Server Configuration
- Specify forwardable headers, specify allowed/disallowed tools for MCP servers - PR #15002
- Enforce server permissions on call tools - PR #15044
- MCP Gateway Fine-grained Tools Addition - PR #15153
Bug Fixes
- Remove servername prefix mcp tools tests - PR #14986
- Resolve regression with duplicate Mcp-Protocol-Version header - PR #15050
- Fix test_mcp_server.py - PR #15183

Performance / Loadbalancing / Reliability improvements

Router Optimizations
- +62.5% P99 Latency Improvement - Remove router inefficiencies (from O(M*N) to O(1)) - PR #15046
- Remove hasattr checks in Router - PR #15082
- Remove Double Lookups - PR #15084
- Optimize _filter_cooldown_deployments from O(n×m + k×n) to O(n) - PR #15091
- Optimize unhealthy deployment filtering in retry path (O(n*m) → O(n+m)) - PR #15110
Cache Optimizations
- Reduce complexity of InMemoryCache.evict_cache from O(n*log(n)) to O(log(n)) - PR #15000
- Avoiding expensive operations when cache isn't available - PR #15182
Worker Management
- Add proxy CLI option to recycle workers after N requests - PR #15007
Metrics & Monitoring
- LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits - PR #15045

Documentation Updates

Provider Documentation
- Update litellm docs from latest release - PR #15004
- Add missing api_key parameter - PR #15058
General Documentation
- Use docker compose instead of docker-compose - PR #15024
- Add railtracks to projects that are using litellm - PR #15144
- Perf: Last week improvement - PR #15193
- Sync models GitHub documentation with Loom video and cross-reference - PR #15191

Security Fixes

JWT Token Security - Don't log JWT SSO token on .info() log - PR #15145

New Contributors

@herve-ves made their first contribution in PR #14998
@wenxi-onyx made their first contribution in PR #15008
@jpetrucciani made their first contribution in PR #15005
@abhijitjavelin made their first contribution in PR #14983
@ZeroClover made their first contribution in PR #15039
@cedarm made their first contribution in PR #15043
@Isydmr made their first contribution in PR #15025
@serializer made their first contribution in PR #15013
@eddierichter-amd made their first contribution in PR #14840
@malags made their first contribution in PR #15000
@henryhwang made their first contribution in PR #15029
@plafleur made their first contribution in PR #15111
@tyler-liner made their first contribution in PR #14799
@Amir-R25 made their first contribution in PR #15144
@georg-wolflein made their first contribution in PR #15124
@niharm made their first contribution in PR #15140
@anthony-liner made their first contribution in PR #15015
@rishiganesh2002 made their first contribution in PR #15153
@danielaskdd made their first contribution in PR #15160
@JVenberg made their first contribution in PR #15146
@speglich made their first contribution in PR #15072
@daily-kim made their first contribution in PR #14764

Full Changelog

v1.77.5-stable - MCP OAuth 2.0 Support

2025-09-29T10:00:00.000Z

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.77.5-stable

pip install litellm
pip install litellm==1.77.5

Key Highlights

MCP OAuth 2.0 Support - Enhanced authentication for Model Context Protocol integrations
Scheduled Key Rotations - Automated key rotation capabilities for enhanced security
New Gemini 2.5 Flash & Flash-lite Models - Latest September 2025 preview models with improved pricing and features
Performance Improvements - 54% RPS improvement

Performance Improvements - 54% RPS Improvement

This release brings a 54% RPS improvement (1,040 → 1,602 RPS, aggregated) per instance.

The improvement comes from fixing O(n²) inefficiencies in the LiteLLM Router, primarily caused by repeated use of in statements inside loops over large arrays.

Tests were run with a database-only setup (no cache hits).

Test Setup

All benchmarks were executed using Locust with 1,000 concurrent users and a ramp-up of 500. The environment was configured to stress the routing layer and eliminate caching as a variable.

System Specs

CPU: 8 vCPUs
Memory: 32 GB RAM

Configuration (config.yaml)

View the complete configuration: gist.github.com/AlexsanderHamir/config.yaml

Load Script (no_cache_hits.py)

View the complete load testing script: gist.github.com/AlexsanderHamir/no_cache_hits.py

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Gemini	`gemini-2.5-flash-preview-09-2025`	1M	$0.30	$2.50	Chat, reasoning, vision, audio
Gemini	`gemini-2.5-flash-lite-preview-09-2025`	1M	$0.10	$0.40	Chat, reasoning, vision, audio
Gemini	`gemini-flash-latest`	1M	$0.30	$2.50	Chat, reasoning, vision, audio
Gemini	`gemini-flash-lite-latest`	1M	$0.10	$0.40	Chat, reasoning, vision, audio
DeepSeek	`deepseek-chat`	131K	$0.60	$1.70	Chat, function calling, caching
DeepSeek	`deepseek-reasoner`	131K	$0.60	$1.70	Chat, reasoning
Bedrock	`deepseek.v3-v1:0`	164K	$0.58	$1.68	Chat, reasoning, function calling
Azure	`azure/gpt-5-codex`	272K	$1.25	$10.00	Responses API, reasoning, vision
OpenAI	`gpt-5-codex`	272K	$1.25	$10.00	Responses API, reasoning, vision
SambaNova	`sambanova/DeepSeek-V3.1`	33K	$3.00	$4.50	Chat, reasoning, function calling
SambaNova	`sambanova/gpt-oss-120b`	131K	$3.00	$4.50	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-coder-480b-a35b-v1:0`	262K	$0.22	$1.80	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-235b-a22b-2507-v1:0`	262K	$0.22	$0.88	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-coder-30b-a3b-v1:0`	262K	$0.15	$0.60	Chat, reasoning, function calling
Bedrock	`qwen.qwen3-32b-v1:0`	131K	$0.15	$0.60	Chat, reasoning, function calling
Vertex AI	`vertex_ai/qwen/qwen3-next-80b-a3b-instruct-maas`	262K	$0.15	$1.20	Chat, function calling
Vertex AI	`vertex_ai/qwen/qwen3-next-80b-a3b-thinking-maas`	262K	$0.15	$1.20	Chat, function calling
Vertex AI	`vertex_ai/deepseek-ai/deepseek-v3.1-maas`	164K	$1.35	$5.40	Chat, reasoning, function calling
OpenRouter	`openrouter/x-ai/grok-4-fast:free`	2M	$0.00	$0.00	Chat, reasoning, function calling
XAI	`xai/grok-4-fast-reasoning`	2M	$0.20	$0.50	Chat, reasoning, function calling
XAI	`xai/grok-4-fast-non-reasoning`	2M	$0.20	$0.50	Chat, function calling

Features

Gemini
- Added Gemini 2.5 Flash and Flash-lite preview models (September 2025 release) with improved pricing - PR #14948
- Added new Anthropic web fetch tool support - PR #14951
XAI
- Add xai/grok-4-fast models - PR #14833
Anthropic
- Updated Claude Sonnet 4 configs to reflect million-token context window pricing - PR #14639
- Added supported text field to anthropic citation response - PR #14164
Bedrock
- Added support for Qwen models family & Deepseek 3.1 to Amazon Bedrock - PR #14845
- Support requestMetadata in Bedrock Converse API - PR #14570
Vertex AI
- Added vertex_ai/qwen models and azure/gpt-5-codex - PR #14844
- Update vertex ai qwen model pricing - PR #14828
- Vertex AI Context Caching: use Vertex ai API v1 instead of v1beta1 and accept 'cachedContent' param - PR #14831
SambaNova
- Add sambanova deepseek v3.1 and gpt-oss-120b - PR #14866
OpenAI
- Fix inconsistent token configs for gpt-5 models - PR #14942
- GPT-3.5-Turbo price updated - PR #14858
OpenRouter
- Add gpt-5 and gpt-5-codex to OpenRouter cost map - PR #14879
VLLM
- Fix vllm passthrough - PR #14778
Flux
- Support flux image edit - PR #14790

Bug Fixes

Anthropic
- Fix: Support claude code auth via subscription (anthropic) - PR #14821
- Fix Anthropic streaming IDs - PR #14965
- Revert incorrect changes to sonnet-4 max output tokens - PR #14933
OpenAI
- Fix a bug where openai image edit silently ignores multiple images - PR #14893
VLLM
- Fix: vLLM provider's rerank endpoint from /v1/rerank to /rerank - PR #14938

New Provider Support

W&B Inference
- Add W&B Inference to LiteLLM - PR #14416

LLM API Endpoints

Features

General
- Add SDK support for additional headers - PR #14761
- Add shared_session parameter for aiohttp ClientSession reuse - PR #14721

Bugs

General
- Fix: Streaming tool call index assignment for multiple tool calls - PR #14587
- Fix load credentials in token counter proxy - PR #14808

Management Endpoints / UI

Features

Proxy CLI Auth
- Allow re-using cli auth token - PR #14780
- Create a python method to login using litellm proxy - PR #14782
- Fixes for LiteLLM Proxy CLI to Auth to Gateway - PR #14836

Virtual Keys

Initial support for scheduled key rotations - PR #14877
Allow scheduling key rotations when creating virtual keys - PR #14960

Models + Endpoints

Fix: added Oracle to provider's list - PR #14835

Bugs

SSO - Fix: SSO "Clear" button writes empty values instead of removing SSO config - PR #14826
Admin Settings - Remove useful links from admin settings - PR #14918
Management Routes - Add /user/list to management routes - PR #14868

Logging / Guardrail / Prompt Management Integrations

Features

DataDog
- Logging - datadog callback Log message content w/o sending to datadog - PR #14909
Langfuse
- Adding langfuse usage details for cached tokens - PR #10955
Opik
- Improve opik integration code - PR #14888
SQS
- Error logging support for SQS Logger - PR #14974

Guardrails

LakeraAI v2 Guardrail - Ensure exception is raised correctly - PR #14867
Presidio Guardrail - Support custom entity types in Presidio guardrail with Union[PiiEntityType, str] - PR #14899
Noma Guardrail - Add noma guardrail provider to ui - PR #14415

Prompt Management

BitBucket Integration - Add BitBucket Integration for Prompt Management - PR #14882

Spend Tracking, Budgets and Rate Limiting

Service Tier Pricing - Add service_tier based pricing support for openai (BOTH Service & Priority Support) - PR #14796
Cost Tracking - Show input, output, tool call cost breakdown in StandardLoggingPayload - PR #14921
Parallel Request Limiter v3
- Ensure Lua scripts can execute on redis cluster - PR #14968
- Fix: get metadata info from both metadata and litellm_metadata fields - PR #14783
Priority Reservation - Fix: Priority Reservation: keys without priority metadata receive higher priority than keys with explicit priority configurations - PR #14832

MCP Gateway

MCP Configuration - Enable custom fields in mcp_info configuration - PR #14794
MCP Tools - Remove server_name prefix from list_tools - PR #14720
OAuth Flow - Initial commit for v2 oauth flow - PR #14964

Performance / Loadbalancing / Reliability improvements

Memory Leak Fix - Fix InMemoryCache unbounded growth when TTLs are set - PR #14869
Cache Performance - Fix: cache root cause - PR #14827
Concurrency Fix - Fix concurrency/scaling when many Python threads do streaming using sync completions - PR #14816
Performance Optimization - Fix: reduce get_deployment cost to O(1) - PR #14967
Performance Optimization - Fix: remove slow string operation - PR #14955
DB Connection Management - Fix: DB connection state retries - PR #14925

Documentation Updates

Provider Documentation - Fix docs for provider_specific_params.md - PR #14787
Model References - Update model references from gemini-pro to gemini-2.5-pro - PR #14775
Letta Guide - Add Letta Guide documentation - PR #14798
README - Make the README document clearer - PR #14860
Session Management - Update docs for session management availability - PR #14914
Cost Documentation - Add documentation for additional cost-related keys in custom pricing - PR #14949
Azure Passthrough - Add azure passthrough documentation - PR #14958
General Documentation - Doc updates sept 2025 - PR #14769
- Clarified bridging between endpoints and mode in docs.
- Added Vertex AI Gemini API configuration as an alternative in relevant guides. Linked AWS authentication info in the Bedrock guardrails documentation.
- Added Cancel Response API usage with code snippets
- Clarified that SSO (Single Sign-On) is free for up to 5 users:
- Alphabetized sidebar, leaving quick start / intros at top of categories
- Documented max_connections under cache_params.
- Clarified IAM AssumeRole Policy requirements.
- Added transform utilities example to Getting Started (showing request transformation).
- Added references to models.litellm.ai as the full models list in various docs.
- Added a code snippet for async_post_call_success_hook.
- Removed broken links to callbacks management guide. - Reformatted and linked cookbooks + other relevant docs
Documentation Corrections - Corrected docs updates sept 2025 - PR #14916

New Contributors

@uzaxirr made their first contribution in PR #14761
@xprilion made their first contribution in PR #14416
@CH-GAGANRAJ made their first contribution in PR #14779
@otaviofbrito made their first contribution in PR #14778
@danielmklein made their first contribution in PR #14639
@Jetemple made their first contribution in PR #14826
@akshoop made their first contribution in PR #14818
@hazyone made their first contribution in PR #14821
@leventov made their first contribution in PR #14816
@fabriciojoc made their first contribution in PR #10955
@onlylonly made their first contribution in PR #14845
@Copilot made their first contribution in PR #14869
@arsh72 made their first contribution in PR #14899
@berri-teddy made their first contribution in PR #14914
@vpbill made their first contribution in PR #14415
@kgritesh made their first contribution in PR #14893
@oytunkutrup1 made their first contribution in PR #14858
@nherment made their first contribution in PR #14933
@deepanshululla made their first contribution in PR #14974
@TeddyAmkie made their first contribution in PR #14758
@SmartManoj made their first contribution in PR #14775
@uc4w6c made their first contribution in PR #14720
@luizrennocosta made their first contribution in PR #14783
@AlexsanderHamir made their first contribution in PR #14827
@dharamendrak made their first contribution in PR #14721
@TomeHirata made their first contribution in PR #14164
@mrFranklin made their first contribution in PR #14860
@luisfucros made their first contribution in PR #14866
@huangyafei made their first contribution in PR #14879
@thiswillbeyourgithub made their first contribution in PR #14949
@Maximgitman made their first contribution in PR #14965
@subnet-dev made their first contribution in PR #14938
@22mSqRi made their first contribution in PR #14972

liteLLM Blog

v1.82.0 - Realtime Guardrails, Projects Management, and 10+ Performance Optimizations

Deploy this version​

Key Highlights​

New Models / Updated Models​

New Model Support (20 new models)​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Prompt Management​

Secret Managers​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Security​

Documentation Updates​

New Contributors​

Diff Summary​

02/28/2026​

Full Changelog​

v1.81.14 - New Gateway Level Guardrails & Compliance Playground

Deploy this version​

Key Highlights​

Guardrail Garden​

3 New Built-in Guardrails​

Store Model in DB Settings via UI​

Eval results​

Compliance Playground​

Performance & Reliability — Up to 13% Lower Latency​

New Providers and Endpoints​

New Providers (1 new provider)​

New LLM API Endpoints (1 new endpoint)​

New Models / Updated Models​

New Model Support (13 new models)​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Auto Routing​

Prompt Management​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Database Changes​

Schema Updates​

Security​

Docker Image Scan Summary​

Critical Severity​

High Severity​

Medium Severity (all images)​

Recommendations​

Documentation Updates​

New Contributors​

Full Changelog​

v1.81.12-stable.1 - Guardrail Policy Templates & Action Builder

Deploy this version​

Key Highlights​

Add Semgrep & fix OOMs​

Guardrail Action Builder​

Access Groups​

New Providers and Endpoints​

New Providers (2 new providers)​

New Models / Updated Models​

New Model Support (19 highlighted models)​

Features​

Deploy this version

Key Highlights

New Models / Updated Models

New Model Support (20 new models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Prompt Management

Secret Managers

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Security

Documentation Updates

New Contributors

Diff Summary

02/28/2026

Full Changelog

Deploy this version

Key Highlights

Guardrail Garden

3 New Built-in Guardrails

Store Model in DB Settings via UI

Eval results

Compliance Playground

Performance & Reliability — Up to 13% Lower Latency

New Providers and Endpoints

New Providers (1 new provider)

New LLM API Endpoints (1 new endpoint)

New Models / Updated Models

New Model Support (13 new models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Auto Routing

Prompt Management

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Database Changes

Schema Updates

Security

Docker Image Scan Summary

Critical Severity

High Severity

Medium Severity (all images)

Recommendations

Documentation Updates

New Contributors

Full Changelog

Deploy this version

Key Highlights

Add Semgrep & fix OOMs

Guardrail Action Builder

Access Groups

New Providers and Endpoints

New Providers (2 new providers)

New Models / Updated Models

New Model Support (19 highlighted models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs