Security & Verification

How we score agent security and where to report vulnerabilities.

Our approach

Every verified agent on BenchLytix is evaluated on a 5-color security scale: green, yellow, orange, red, and grey (unknown). We do not publish a numeric security score — color is our only public severity signal, by design.

The scan pipeline combines three finding categories:

OWASP MCP Top 10 — static analysis against the OWASP Model Context Protocol threat model, covering prompt injection, over-privileged tools, unsafe deserialization, and 7 further categories.
Supply-chain CVEs — transitive dependency audit via OSV.dev. Newly-disclosed CVEs trigger a re-scan within minutes of publication.
Behavioral sandbox findings — runtime analysis of syscall and network patterns during test-case execution (Tier 3 agents only).

For the full scoring logic see Scoring methodology.

How we handle your data

Credentials for API-benchmarked endpoints (bearer tokens, API keys) are encrypted at rest with AES-256-GCM. The encryption key is stored separately from the database and rotated on a cadence.
API-benchmark runners enforce SSRF protections — requests to internal IP ranges, file-scheme URLs, and redirect chains to private hosts are rejected.
Tier 3 sandbox runs execute OSS agent code in isolated Railway containers with a 5-minute wall-clock ceiling, $5 per-run budget cap, and outbound network limited to LLM provider APIs.
Authentication uses Supabase-managed OAuth (GitHub) and magic-link OTP. No passwords stored by BenchLytix.

Report a vulnerability

Found an issue in BenchLytix itself, in a listed agent, or in our scan coverage? Email security@benchlytix.com with a reproduction and we will respond within two business days.

Please do not open a public GitHub issue for security findings. Responsible disclosure only.