Ragas.io is live nowRead More →

Supercharge AI Evaluation.
The Right Way.

Private testing is underway. Get early access to shape the roadmap.

experiment.py

1

2

3

4

5

6

7

8

9

10

11

12

13

from my_app import my_agent
from ragas import DiscreteMetric

project = Project.get(name="my_project")
test_dataset = project.get_dataset(dataset_name="test-yann-lecun")

my_metric = DiscreteMetric(
    name='score',
    prompt="Evaluate{response} \n based on the Grading notes\n:{grading_notes}.",
    values=["pass","fail"],
)

@project.experiment
async def experiment_func(item):
    response = await my_agent.ask(item.question)
    score = await my_metric.ascore(query=item.question,response=response,grading_notes=item.grading_notes)

Python

13 lines

Feature Highlights

95%

Evaluate and Experiment

Align LLMs as judges using human reviews and run experiments in your pipeline with trust.

Synthetic Data for Evaluation

High quality synthetic data customised to your use case.

Seamlessly integrates with your existing observability stack

Works seamlessly with Datadog, Sentry, Langfuse, etc.

What's Coming Next

Dataset Hub

Access curated evaluation datasets across domains.

Experiment Tracking

Enhanced visualization tools with customizable dashboards and automated reporting features.

Custom Metric

Build and deploy your own evaluation metrics with our easy-to-use software development kit.

Error Analysis

Detailed insights into specific failure modes with actionable recommendations.

Frequently Asked Questions

Built in the Open, Evolving with You

We're inviting a select group of developers and researchers to shape Ragas together.

Early Access

Get access to early features

Supercharge AI Evaluation.The Right Way.