Ragas.io is live nowRead More →
Supercharge AI Evaluation.
The Right Way.
Private testing is underway. Get early access to shape the roadmap.
experiment.py
1
2
3
4
5
6
7
8
9
10
11
12
13
from my_app import my_agent
from ragas import DiscreteMetric
project = Project.get(name="my_project")
test_dataset = project.get_dataset(dataset_name="test-yann-lecun")
my_metric = DiscreteMetric(
name='score',
prompt="Evaluate{response} \n based on the Grading notes\n:{grading_notes}.",
values=["pass","fail"],
)
@project.experiment
async def experiment_func(item):
response = await my_agent.ask(item.question)
score = await my_metric.ascore(query=item.question,response=response,grading_notes=item.grading_notes)
Python
13 lines
Feature Highlights
95%
Evaluate and Experiment
Align LLMs as judges using human reviews and run experiments in your pipeline with trust.
Synthetic Data for Evaluation
High quality synthetic data customised to your use case.
Seamlessly integrates with your existing observability stack
Works seamlessly with Datadog, Sentry, Langfuse, etc.
What's Coming Next
Dataset Hub
Access curated evaluation datasets across domains.
Experiment Tracking
Enhanced visualization tools with customizable dashboards and automated reporting features.
Custom Metric
Build and deploy your own evaluation metrics with our easy-to-use software development kit.
Error Analysis
Detailed insights into specific failure modes with actionable recommendations.
Frequently Asked Questions
Early Access
Get access to early features