examples/blog: can-ai-agents-improve-ai-agents starter#7404
Open
anndvision wants to merge 2 commits into
Open
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 320f15cb1c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
320f15c to
c0ddecd
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c0ddecd. Configure here.
4264444 to
09df5dd
Compare
09df5dd to
8dbd47a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds the companion code for the Can AI Agents Improve AI Agents? blog post under
examples/blog/can-ai-agents-improve-ai-agents/. Self-contained starter that lets a reader's coding agent (Claude Code or Codex) optimize a TensorZero function on the YC Bench Tutorial environment using a local Docker-Compose'd gateway, a markdown skill, and a real baseline rollout's traces.What's in the starter
docker-compose.yml— three services:tensorzero/postgres:17(withcron.database_name=tensorzerofor pg_cron), a one-shotgateway-run-postgres-migrations, and the gateway itself. Postgres is required because the experimentation routing the agent uses (track_and_stop/ adaptive) writes per-variant trial counts and means.tensorzero.toml— functionyc_bench_tutorial_v0::yc_bench_actonopenai::gpt-5.4-mini, singleinitialvariant,[gateway.observability] backend = "postgres".functions/yc_bench_tutorial_v0::yc_bench_act/initial/— initial system + user templates anduser_schema.json(legacy per-role config style).tools/run_command.json— the YC Bench agent's only tool. Exposed so the gateway accepts well-formed inference requests; the simulator itself is not installed locally.baseline_data/—.gitignore(excludes the two large jsonls),fetch.sh(downloads from a sibling anndvision/data Git LFS host with SHA-256 verification),initial_config/(frozen day-one snapshot of the live config tree, used as a read-only reference once the agent starts editing).README.md— quick-start (manual + "from the blog button"), what's in the starter, troubleshooting.SKILL.md— the agent's playbook. Major sections:.envover a same-shell export), bring up the stack, fetch the baseline traces, smoke-test theinitialvariant, print a "ready" handoff.tasks_succeededmean = 0.800 on the 20-episode test split), JSONL projection patterns (grep→node -eprojection, with examples adapted from the eval harness'sskill_no_mcp.md).track_and_stopblock.Notes
inferences.jsonlis 98 MiB, well over the repo'scheck-added-large-files1 MiB threshold. Fetched on demand via Git LFS in anndvision/data#1 (already merged);fetch.shis idempotent, SHA-256-verifies, and is gated behind a.gitignoreso the blobs never land here.OPENAI_API_KEYisn't set or Docker isn't running (per-OS daemon hints, gitignored.envover shell export, no--privileged/ root recommendations).Out of scope
codex://URL scheme; the blog will ship Claude Code only and a prose pointer for Codex users).Test plan
docker compose up -dend-to-end on a clean machinebash baseline_data/fetch.shdownloads + verifiesnode -esnippets run against real dataNote
Low Risk
Low risk: this PR only adds a new self-contained example directory (docs, compose config, and fetch script) and does not change runtime/library code paths.
Overview
Adds a new
examples/blog/can-ai-agents-improve-ai-agents/starter used in the accompanying blog post, including a runnable Docker Compose stack (gateway + Postgres + one-shot migrations) and a baselinetensorzero.toml/prompt template setup foryc_bench_tutorial_v0::yc_bench_act.Includes an agent-oriented playbook (
SKILL.md) plus abaseline_data/fetch.shdownloader with SHA-256 verification and gitignored large trace files, along with README quick-start and troubleshooting guidance.Reviewed by Cursor Bugbot for commit 8627461. Bugbot is set up for automated code reviews on this repo. Configure here.