What Are Scores and When Should I Use Them?

Scores are covered in detail on the Evaluation Concepts page, including:

When to use scores — user feedback, production monitoring, guardrails, experiments
Score types — numeric, categorical, boolean, and text
Score configs — enforce schemas and validate values on ingestion
Scores vs tags — when to use which
Score comments — add context to any score

How to Create Scores

There are four ways to add scores:

LLM-as-a-Judge: Set up automated evaluators that score traces based on custom criteria (e.g. hallucination, tone, relevance). These can return numeric, categorical, or boolean (true / false) scores plus reasoning, and can run on live production traces or on experiment results.
Annotation in the UI: Team members manually score traces, observations, or sessions directly in the Langfuse dashboard. Requires a score config to be set up first.
Annotation queues: Set up structured review workflows where reviewers work through batches of traces.
Scores via API/SDK: Programmatically add scores from your application code — for user feedback, guardrail results, custom evaluation pipelines, or open-ended text feedback.

Was this page helpful?