⌘ Tech
VCN #45: Bench

- When
- Wednesday, July 29 · 7:00 PM – 10:00 PM
- Listed by
- Lu.ma — Frontier Tower
Heads up: this is a hands-on build night and bringing a laptop is mandatory.
You don't know if your coding agent is good. You have a vibe. Tonight we replace the vibe with a number.
Everyone ships an agent and says it "feels solid." Nobody can tell you its pass rate on a task it has never seen. VCN #45: Bench is the night you build a real eval and start measuring.
Format:
The walkthrough. What a coding-agent eval actually is. SWE-bench-style task sets, where they come from, and why a public benchmark tells you almost nothing about YOUR repo.Assemble your own bench. Pull real tasks out of your own codebase: a bug with a known fix, a refactor with a clear oracle, a feature with a passing test. Five tasks beats a thousand you can't trust.Build the harness. Wire a deterministic oracle per task (the test that decides pass or fail, no LLM judge). Run the agent. Score it. Then run it again, and again, and measure pass@k AND pass^k, the consecutive-green reliability that actually predicts whether you can leave it alone.Read the leaderboard. Your agent's real numbers on your real tasks. Where it's flaky, where it's solid, what a single failing oracle just told you.
By 10pm you have a repeatable eval that scores any coding agent on your own tasks. We reuse this exact bench next Saturday at the Bake-Off (#46) to score agents head to head.
Builders only. Bring a repo with at least one test you trust.
Doors 7pm. Walkthrough 7:30. Frontier Tower Floor 10.
Hosted by Vibe Coding Nights: Rayyan Zahid (Immersive Commons), Michalis Vasileiadis (Hacker Bob), Eric Mockler (AI Geneticist), Devinder Sodhi (Learning Layer Labs).
Facilitator: Rayyan Zahid. Guest speaker TBD (open call).
Your ticket includes z.ai + Claude Code for the session and Nebius Token Factory credits to run the labs.
RSVP if you ship a coding agent and can't currently put a number on how good it is.
Frontier Tower members: your ticket is on us. Reach out to the team directly and we'll get you a free RSVP.

