What Are Scores and When Should I Use Them?
Scores are covered in detail on the Evaluation Concepts page, including:
- When to use scores — user feedback, production monitoring, guardrails, experiments
- Score types — numeric, categorical, boolean, and text
- Score configs — enforce schemas and validate values on ingestion
- Scores vs tags — when to use which
- Score comments — add context to any score
How to Create Scores
There are four ways to add scores:
- LLM-as-a-Judge: Set up automated evaluators that score traces based on custom criteria (e.g. hallucination, tone, relevance). These can return numeric, categorical, or boolean (
true/false) scores plus reasoning, and can run on live production traces or on experiment results. - Annotation in the UI: Team members manually score traces, observations, or sessions directly in the Langfuse dashboard. Requires a score config to be set up first.
- Annotation queues: Set up structured review workflows where reviewers work through batches of traces.
- Scores via API/SDK: Programmatically add scores from your application code — for user feedback, guardrail results, custom evaluation pipelines, or open-ended text feedback.
Was this page helpful?