← FOG·CITY

Tech

VCN #45: Bench

When
Wednesday, July 29 · 7:00 PM – 10:00 PM
Listed by
Lu.ma — Frontier Tower
Heads up: this is a hands-on build night and bringing a laptop is mandatory. You don't know if your coding agent is good. You have a vibe. Tonight we replace the vibe with a number. Everyone ships an agent and says it "feels solid." Nobody can tell you its pass rate on a task it has never seen. VCN #45: Bench is the night you build a real eval and start measuring. Format: The walkthrough. What a coding-agent eval actually is. SWE-bench-style task sets, where they come from, and why a public benchmark tells you almost nothing about YOUR repo.Assemble your own bench. Pull real tasks out of your own codebase: a bug with a known fix, a refactor with a clear oracle, a feature with a passing test. Five tasks beats a thousand you can't trust.Build the harness. Wire a deterministic oracle per task (the test that decides pass or fail, no LLM judge). Run the agent. Score it. Then run it again, and again, and measure pass@k AND pass^k, the consecutive-green reliability that actually predicts whether you can leave it alone.Read the leaderboard. Your agent's real numbers on your real tasks. Where it's flaky, where it's solid, what a single failing oracle just told you. By 10pm you have a repeatable eval that scores any coding agent on your own tasks. We reuse this exact bench next Saturday at the Bake-Off (#46) to score agents head to head. Builders only. Bring a repo with at least one test you trust. Doors 7pm. Walkthrough 7:30. Frontier Tower Floor 10. Hosted by Vibe Coding Nights: Rayyan Zahid (Immersive Commons), Michalis Vasileiadis (Hacker Bob), Eric Mockler (AI Geneticist), Devinder Sodhi (Learning Layer Labs). Facilitator: Rayyan Zahid. Guest speaker TBD (open call). Your ticket includes z.ai + Claude Code for the session and Nebius Token Factory credits to run the labs. RSVP if you ship a coding agent and can't currently put a number on how good it is. Frontier Tower members: your ticket is on us. Reach out to the team directly and we'll get you a free RSVP.

More tech soon