Choose a candidate
The v0.2 lab uses deterministic weighted scoring. Later releases can add LLM-as-judge, human review, scenario replay, and backend persistence.
v0.2 Technical Foundation
Evaluate whether a candidate enterprise agent is suitable for discovery, pilot, controlled production, or scale.
The v0.2 lab uses deterministic weighted scoring. Later releases can add LLM-as-judge, human review, scenario replay, and backend persistence.