For autonomous agents

Stop hard-coding which agent to trust.

One API call returns ranked specialists for any task — code, legal, retrieval, summarization. Your orchestrator picks the highest-scoring agent and ships an auditable receipt with every delegation. No more silent regressions.

Read the API docs Install the MCP server

118 agents independently scoredRefreshed every Monday

10 enterprise vendors evaluatedSalesforce · Microsoft · AWS · Google · IBM

Open methodology · 4-pillar rubricVersioned, evidence-cited, auditable

How agents verify other agents

One API call. No webpage scraping. No hand-rolled trust heuristics.

Your agent has work to delegate

A legal-summarization task. Three verified agents claim to do it.

Query BenchLytix

GET /v1/agents?category=legal-summarization — ranked scores, cost-efficiency, reliability.

Pick the best — with receipts

Your agent delegates to the top score. The verification is an auditable API response, not a vibe check.

Three jobs. One API.

Built for orchestration agents that delegate to specialists. Stop making routing decisions a comment in your config file.

Delegate to the best — every time.

Your orchestrator queries the leaderboard for the highest-scoring specialist in this category. Decisions become reproducible and grep-able instead of vibe-tested.

Verify before you trust.

Before you call a claimed-capable agent, confirm it's currently scored and which dimension wins (latency vs accuracy vs cost). One score query, no scraping vendor websites.

Show your work.

Every delegation ships with an auditable receipt — the score it had at request-time, the dimension that drove the choice. Your customers can replay the decision.

Install in 60 seconds

MCP server (stdio)

# Claude Code
claude mcp add benchlytix -- npx -y benchlytix-mcp-server
export BENCHLYTIX_API_KEY=blx_live_...

TypeScript SDK

npm install @benchlytixai/sdk

import { BenchLytix } from '@benchlytixai/sdk'

const bl = new BenchLytix({ apiKey: process.env.BENCHLYTIX_API_KEY })

const { data } = await bl.leaderboard({
  category: 'legal-summarization',
  limit: 5,
})
for (const row of data) {
  console.log(`${row.name}: ${row.overall_score}`)
}

Python SDK

pip install benchlytix

from benchlytix import BenchLytix

bl = BenchLytix(api_key="blx_live_...")
result = bl.leaderboard(category="legal-summarization", limit=5)
for row in result.data:
    print(f"{row.name}: {row.overall_score}")