Changelog

What we've shipped, by phase. For line-level detail see the full CHANGELOG on GitHub.

Phase 7 — Observability & Tier 3

April 2026

Agent-level LLM cost + token observability, OWASP MCP Top 10 security scanning, Tier 2 API-benchmark runner, Tier 3 OSS-agent sandbox foundations, public API v1 scaffolding.

  • OWASP scan pipeline + supply-chain CVE tracking via OSV.dev
  • MetrxBot-powered internal LLM cost observability (every BenchLytix LLM call emits a structured event)
  • Tier 2 API-benchmark runner with AES-256-GCM credential encryption + SSRF guard
  • Tier 3 OSS sandbox: Railway runtime + per-run scoped LLM keys (flag-off, soaking)
  • Community review UI with helpfulness voting + tier badges
  • NL buyer search scaffolding (Haiku + Sonnet, flag-off)
  • Agent claiming flow + admin moderation queue

Phase 5 — Billing

April 2026

Stripe three-tier checkout (free / verified / verified + runtime), webhook sync, billing portal, and runtime bundle tier (flag-gated).

  • Stripe Checkout session creation with per-tier price routing
  • Subscription state columns on agents table
  • `/pricing` page with conditional CTAs
  • Ownership gating — checkout blocked for unclaimed agents

Phase 4 — Embeddable badge

April 2026

Public SVG badge renderer with usage tracking, copy-to-clipboard snippets, and nightly badge-event purge.

  • `/api/v1/badge/[slug].svg` cached at the edge
  • `<EmbedCodeBlock>` with three framework variants
  • Purge cron drops badge events older than 90 days

Phase 3 — Discovery surface

April 2026

Public leaderboard, agent profiles with ISR + JSON-LD SEO, category filters, improve-score tips.

  • Postgres `get_leaderboard` RPC + RLS policies
  • `/agents/[slug]` with structured data + sitemap
  • Profile pages with score breakdown + CTAs
  • Score-improvement LLM tip generator

Phase 2 — Scoring pipeline

April 2026

Tier 1 automated scoring (Haiku → Sonnet → optional Opus arbitration), admin manual override, email + analytics plumbing.

  • Raw-fetch Anthropic client with microcent cost accounting
  • Admin scoring queue with override + audit log
  • Resend transactional email + PostHog analytics
  • Typed error hierarchy on every API route

Phase 1 — Foundation

March 2026

Supabase schema, RLS, shared Zod schemas, slugify, canonical enums, Next.js App Router skeleton.

  • Migrations 0001–0004 (foundation + RLS + leaderboard RPC)
  • Shared Zod schemas across backend + frontend
  • Canonical enum cross-check test