Releases · tensorzero/tensorzero

Caution

Security Advisory

This release fixed a high-risk vulnerability affecting the TensorZero Gateway.

Please refer to the security advisory for more details: GHSA-824w-x939-6cmc

@pragnyanramtha

New Features

Accept both strings and array of strings for stop in the OpenAI-compatible inference endpoint (thanks @pragnyanramtha).
Emit additional OpenInference attributes for Arize compatibility.

Bug Fixes

Treat SSE body decoding errors as fatal.

@arisp

Caution

Breaking Changes

The UI will now require authentication when the gateway requires authentication. Previously, the UI only required authentication for gateway usage.

New Features

Improve error handling (e.g. status code propagation) and logging for complex streaming inferences (e.g. fallbacks).

& multiple under-the-hood and UI improvements (thanks @arisp)

@ianliuy

Caution

Breaking Changes

The gateway now defaults to async observability writes to reduce tail latency: inferences are sent to the client before they are persisted in the database. To restore the previous behavior, set observability.async_writes = false. [docs]

Warning

Deprecations

Removed the TensorZero Autopilot "Sessions" page from the UI. We recently added a TensorZero MCP that integrates nicely with coding agents, and we'll re-introduce advanced TensorZero Autopilot workflows in a platform-agnostic format soon.

Bug Fixes

Return HTTP code 429 for rate limiting errors.
Fixed a bug affecting ClickHouse database names with hyphens. (thanks @ianliuy!)

New Features

Added TypeScript evaluators (for inference evaluations).
Added support for vLLM's new reasoning field.
Added aggregated variant usage data (tokens, cost, etc.) to the UI.
Added inference cost data to exported OpenTelemetry traces. (thanks @kimsehwan96!)
Added export.otlp.traces.include_content (default false) configuration field to include inference content (e.g. prompts, messages) in exported OpenTelemetry GenAI traces.

& multiple under-the-hood and UI improvements

New Features

Add an MCP server to the gateway exposing its API in /mcp.
Report provider prompt caching statistics via API and UI.
Report usage statistics (e.g. tokens, latency, cost) for inference evaluations via CLI tool, API, and UI.
Add the Prometheus metrics tensorzero_input_tokens_total and tensorzero_output_tokens_total.
Add configuration field content_type_overrides to handle file inputs for long-tail providers.

& multiple under-the-hood and UI improvements

@wangfenjin

Warning

Planned Deprecations

The configuration for inference evaluations should be nested under the relevant functions moving forward [docs]. You can run evaluations by providing a function name and a list of evaluators. The legacy format will be removed in a future release.
```
[functions.write_haiku.evaluators.exact_match]
type = "exact_match"
```
The legacy implementation of GEPA (launch_optimization with GEPAConfig) will be removed in a future release. Please use t0.optimization.gepa.launch instead. [docs]

Bug Fixes

Fixed a UI bug where a custom gateway base_path was not handled correctly in certain routes. (thanks @wangfenjin!)

New Features

Started including embeddings requests in the Prometheus metrics tensorzero_requests_total and tensorzero_inferences_total.
Added the configuration field observability.batch_writes.write_queue_capacity to enable backpressure for observability data in the gateway.

& multiple under-the-hood and UI improvements (thanks @majiayu000)!

Important

🆕 TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests.

It dramatically improves the performance of LLM agents across diverse tasks:

Bar chart showing baseline vs. optimized scores across diverse LLM tasks

Learn more → Schedule a demo →

@eibrahim95

Bug Fixes

Fixed two edge cases affecting batch inference.
Fixed a UI bug affecting "Try with..." with inputs that include base64 files.
Removed assistant message prefill for JSON functions + Anthropic (deprecated by Anthropic).

New Features

Added an implementation of GEPA (automated prompt engineering) based on durable workflows.
Allow users to specify duplicate tool calls in all_of tool evaluators to evaluate parallel tool calling.
Allow users to specify an expiration date for API keys in the UI. (thanks @eibrahim95)
Allow users to specify object_storage.endpoint = "env::MY_ENV_VAR" in addition to static values. (thanks @Meredith2328)

& multiple under-the-hood and UI improvements (thanks @majiayu000)!

Bug Fixes

Fixed an UI issue that prevented certain pages from rendering when depending on historical configuration.

New Features

Added Postgres as an alternative observability backend to ClickHouse. Postgres is the simplest way to get started; we recommend ClickHouse if you're handling >100 RPS.
Added the openrouter::xxx short-hand for embedding models.
Added support for per-session API keys in the browser (instead of a global environment variable) when auth is enabled.

& multiple under-the-hood and UI improvements!

Warning

Completed Deprecations

Removed the deprecated model_provider_name filter for extra_body and extra_headers. Please use model_name and provider_name instead.
Removed the legacy experimental list_inferences endpoint and method. Please use the new endpoint instead. [docs]
Removed several long-deprecated types and methods from the TensorZero Python SDK.

Warning

Planned Deprecations

The embedded gateway in the TensorZero Python SDK will be removed in a future release (2026.6+). patch_openai_client and build_embedded are deprecated. Please deploy a standalone TensorZero Gateway instead (usage: base_url for OpenAI SDK; build_http for TensorZero SDK).
The variant configuration field weight will be removed in a future release (2026.6+). Please use the new experimentation configuration semantics. [docs]

Bug Fixes

Fixed a compatibility bug with Valkey-based caching that only affected Redis.

New Features

Added support for launching optimization workflows with dataset_name (instead of an inference query) in launch_optimization_workflow.

& multiple under-the-hood and UI improvements!

Releases: tensorzero/tensorzero

2026.6.0

Uh oh!

2026.5.2

Contributors

Uh oh!

2026.5.1

Uh oh!

2026.5.0

Contributors

Uh oh!

2026.4.1

Contributors

Uh oh!

2026.4.0

Uh oh!

2026.3.4

🆕 TensorZero Autopilot

Contributors

Uh oh!

2026.3.3

Contributors

Uh oh!

2026.3.2

Uh oh!

2026.3.1

Uh oh!