examples/blog: can-ai-agents-improve-ai-agents starter by anndvision · Pull Request #7404 · tensorzero/tensorzero

anndvision · 2026-04-28T20:58:11Z

Summary

Adds the companion code for the Can AI Agents Improve AI Agents? blog post under examples/blog/can-ai-agents-improve-ai-agents/. Self-contained starter that lets a reader's coding agent (Claude Code or Codex) optimize a TensorZero function on the YC Bench Tutorial environment using a local Docker-Compose'd gateway, a markdown skill, and a real baseline rollout's traces.

What's in the starter

docker-compose.yml — three services: tensorzero/postgres:17 (with cron.database_name=tensorzero for pg_cron), a one-shot gateway-run-postgres-migrations, and the gateway itself. Postgres is required because the experimentation routing the agent uses (track_and_stop / adaptive) writes per-variant trial counts and means.
tensorzero.toml — function yc_bench_tutorial_v0::yc_bench_act on openai::gpt-5.4-mini, single initial variant, [gateway.observability] backend = "postgres".
functions/yc_bench_tutorial_v0::yc_bench_act/initial/ — initial system + user templates and user_schema.json (legacy per-role config style).
tools/run_command.json — the YC Bench agent's only tool. Exposed so the gateway accepts well-formed inference requests; the simulator itself is not installed locally.
baseline_data/ — .gitignore (excludes the two large jsonls), fetch.sh (downloads from a sibling anndvision/data Git LFS host with SHA-256 verification), initial_config/ (frozen day-one snapshot of the live config tree, used as a read-only reference once the agent starts editing).
README.md — quick-start (manual + "from the blog button"), what's in the starter, troubleshooting.
SKILL.md — the agent's playbook. Major sections:
- Setup — six numbered steps the agent runs end-to-end on a fresh laptop: locate or clone the repo, make the API key reachable to the gateway (recommends a gitignored .env over a same-shell export), bring up the stack, fetch the baseline traces, smoke-test the initial variant, print a "ready" handoff.
- Environment / Task / Available Models / Baseline data — environment metadata, real baseline numbers (tasks_succeeded mean = 0.800 on the 20-episode test split), JSONL projection patterns (grep → node -e projection, with examples adapted from the eval harness's skill_no_mcp.md).
- Methodology — survey → diagnose → write → restart → probe → iterate → exit. Deliberately does not enumerate failure modes; the agent forms its own hypotheses from the worst-scoring test episodes.
- Routing: Experimentation Config — example track_and_stop block.

Notes

Baseline traces hosted out-of-tree. inferences.jsonl is 98 MiB, well over the repo's check-added-large-files 1 MiB threshold. Fetched on demand via Git LFS in anndvision/data#1 (already merged); fetch.sh is idempotent, SHA-256-verifies, and is gated behind a .gitignore so the blobs never land here.
§ Setup detects state. Detects whether the agent is already in the example dir, already inside the cloned repo, or somewhere else entirely; clones only when needed; surfaces concrete safe options when OPENAI_API_KEY isn't set or Docker isn't running (per-OS daemon hints, gitignored .env over shell export, no --privileged / root recommendations).

Out of scope

The blog-side "Open in Claude Code" CTA component — landing separately.
Codex deeplink (no documented codex:// URL scheme; the blog will ship Claude Code only and a prose pointer for Codex users).
Running the full evaluation harness — that's the user's job after the agent exits.

Test plan

docker compose up -d end-to-end on a clean machine
bash baseline_data/fetch.sh downloads + verifies
All in-skill node -e snippets run against real data
User manually walked through the deeplink flow with a real OpenAI key (no PR-blocking issues found)

Note

Low Risk
Low risk: this PR only adds a new self-contained example directory (docs, compose config, and fetch script) and does not change runtime/library code paths.

Overview
Adds a new examples/blog/can-ai-agents-improve-ai-agents/ starter used in the accompanying blog post, including a runnable Docker Compose stack (gateway + Postgres + one-shot migrations) and a baseline tensorzero.toml/prompt template setup for yc_bench_tutorial_v0::yc_bench_act.

Includes an agent-oriented playbook (SKILL.md) plus a baseline_data/fetch.sh downloader with SHA-256 verification and gitignored large trace files, along with README quick-start and troubleshooting guidance.

^{Reviewed by Cursor Bugbot for commit 8627461. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 320f15cb1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c0ddecd. Configure here.}

anndvision requested a review from GabrielBianconi April 28, 2026 20:58

anndvision assigned GabrielBianconi Apr 28, 2026

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread ...an-ai-agents-improve-ai-agents/functions/yc_bench_tutorial_v0__yc_bench_act/user_schema.json

Comment thread examples/blog/can-ai-agents-improve-ai-agents/SKILL.md Outdated

anndvision force-pushed the andrew/blog-can-ai-agents-starter branch from 320f15c to c0ddecd Compare April 28, 2026 21:08

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread examples/blog/can-ai-agents-improve-ai-agents/docker-compose.yml

anndvision force-pushed the andrew/blog-can-ai-agents-starter branch 5 times, most recently from 4264444 to 09df5dd Compare April 29, 2026 14:31

examples/blog: can-ai-agents-improve-ai-agents starter

8dbd47a

anndvision force-pushed the andrew/blog-can-ai-agents-starter branch from 09df5dd to 8dbd47a Compare April 29, 2026 14:40

GabrielBianconi assigned virajmehta and unassigned GabrielBianconi May 4, 2026

Merge branch 'main' into andrew/blog-can-ai-agents-starter

8627461

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/blog: can-ai-agents-improve-ai-agents starter#7404

examples/blog: can-ai-agents-improve-ai-agents starter#7404
anndvision wants to merge 2 commits into
mainfrom
andrew/blog-can-ai-agents-starter

anndvision commented Apr 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anndvision commented Apr 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the starter

Notes

Out of scope

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anndvision commented Apr 28, 2026 •

edited by cursor Bot

Loading