Supercharge LLM Evaluation.
The Right Way.
Private testing is underway. Get early access to shape the roadmap.

19ms
Latency
99.1%
Accuracy
4.1.2
Version
4.3k+
Stars
2.2k
Downloads
Feature Highlights
Evaluate with Trusted Metrics
Out-of-the-box support for factuality, faithfulness, coherence, and answer relevance.
Tailor It to Your Needs
Create and run your own custom-metrics for niche workflows.
Plug into Your Stack
Works seamlessly with LangChain, LlamaIndex, and OpenInference.
What's coming next
Visual Dashboard & Reporting
Enhanced visualization tools with customizable dashboards and automated reporting features.

Custom Metric SDK
Build and deploy your own evaluation metrics with our easy-to-use software development kit.

Granular Failure Analysis
Detailed insights into specific failure modes with actionable recommendations.

Dataset Hub
Access curated evaluation datasets across domains.

Built in the open
Evolving with you
