Skip to content

Add YC Bench baseline rollout data for can-ai-agents-improve-ai-agents starter#1

Merged
anndvision merged 1 commit into
mainfrom
add-can-ai-agents-baseline-data
Apr 27, 2026
Merged

Add YC Bench baseline rollout data for can-ai-agents-improve-ai-agents starter#1
anndvision merged 1 commit into
mainfrom
add-can-ai-agents-baseline-data

Conversation

@anndvision
Copy link
Copy Markdown
Owner

Summary

Hosts the baseline rollout traces consumed by the can-ai-agents-improve-ai-agents starter — the companion code for the Can AI Agents Improve AI Agents? TensorZero blog post.

The starter ships in tensorzero/tensorzero, but the inference traces (98 MiB) are too large to live there — pre-commit blocks at 1 MiB and that repo doesn't use Git LFS. The starter's baseline_data/fetch.sh will download these jsonls from this repo's LFS-backed media URLs.

Tracked through Git LFS — *.jsonl filter=lfs ... added to .gitattributes (next to the existing *.csv LFS rule).

Files

File Rows Size SHA-256
can-ai-agents-improve-ai-agents/baseline_data/inferences.jsonl 1,380 98 MiB 9bac777bcedd790146ed082252ad77d41496f19f1beaecc8acac12cffe55d176
can-ai-agents-improve-ai-agents/baseline_data/feedback.jsonl 320 36 KiB e59685147aea4679d6e617e39e12cd8474e052649f0eb48ccca9f6b2a6fe319d

Provenance

Real baseline rollout of yc_bench_tutorial_v0::yc_bench_act against the initial variant on openai::gpt-5.4-mini: 80 unique train tasks + 20 unique test tasks (Codex YC Bench seed 0, 2026-04-23). Matches the artifacts the autopilot-evals harness dumps to <run_dir>/claude_code/baseline_data/ before invoking the optimizer agent.

Raw URLs (post-merge)

https://media.githubusercontent.com/media/anndvision/data/main/can-ai-agents-improve-ai-agents/baseline_data/inferences.jsonl
https://media.githubusercontent.com/media/anndvision/data/main/can-ai-agents-improve-ai-agents/baseline_data/feedback.jsonl

(LFS-backed files use media.githubusercontent.com/media/... to fetch actual content rather than the LFS pointer that raw.githubusercontent.com returns.)

Test plan

  • Both URLs resolve post-merge and serve the correct bytes (SHA-256 matches the table above).
  • Starter's bash baseline_data/fetch.sh downloads + verifies both files.

@anndvision anndvision merged commit f692405 into main Apr 27, 2026
@anndvision anndvision deleted the add-can-ai-agents-baseline-data branch April 27, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant