Launch Week #1, Day 5. Online evaluations

At Laminar, we're excited to announce our newest feature: Online Evaluations. This feature allows engineering teams to run custom evaluators, either LLM-based or Python-based, on their LLM calls as they happen in production.

What are Online Evaluations?

Online evaluations run automated checks and produce labels on your LLM calls as they happen in production. Instead of collecting data for post-hoc analysis, Laminar automatically evaluates each model call in real-time by analyzing the inputs and outputs of your LLM spans.

Why We Built It

When you have thousands of LLM calls happening every day, it's hard to know if your LLMs are behaving as expected. Online evaluations allow you to monitor the quality of your LLMs in real-time, collect performance statistics, and detect issues before they impact users.

How It Works

Laminar's online evaluations system is built around three core concepts:

1. Span Paths

Span paths uniquely identify where LLM calls happen in your code. They're automatically constructed from the location of the call, making it easy to track specific functions and endpoints.

2. Span Labels

Labels are values attached to spans that indicate evaluation results.

3. Evaluators

Evaluators analyze inputs and outputs to generate labels. Laminar supports two types of evaluators:

LLM-based evaluators
Python-based evaluators

Setting Up Evaluations

Setting up evaluations

Getting started with Laminar's online evaluations is straightforward:

Navigate to "Traces" in your Laminar dashboard
Select the span you want to evaluate
Click "Add Label" and create or choose a label class
Configure your evaluator:
- Choose between Python code or LLM-based evaluation
- Test your evaluator directly in the UI
Save and enable for production

Once enabled, Laminar will automatically run your evaluator on your LLM calls and attach labels to the spans. This label will be marked as AUTO in the dashboard.

Evaluations in action

Best Practices

Start Simple

Begin with basic format and content checks
Add more sophisticated evaluations gradually
Monitor evaluator performance impact

Layer Your Checks

Technical validation (format, structure)
Content validation (completeness, relevance)
Quality metrics (coherence, accuracy)

Monitor Results

Track evaluation trends over time
Regularly review and refine criteria

Conclusion

Online evaluations represent a significant step forward in LLM operations, bringing immediate quality feedback to production systems. With Laminar's implementation, teams can maintain high standards while gathering valuable insights about their models' behavior. Try out online evaluations today and let us know what you think! Check out our documentation for detailed setup instructions and best practices.