Tag: agent evaluation

Top Stories

Harvey Legal Agent Benchmark — what the all-pass scoring actually means

Harvey Legal Agent Benchmark brings 1,200+ legal tasks and all-pass grading to agent evals, raising the bar for what counts…

Why SWE-Bench Scores Don’t Predict Production Value

I think the industry overreads SWE-Bench. It is a useful benchmark for comparing coding systems under controlled conditions, but it…