Rubric Forge: Turn a Fuzzy 'Quality Bar' Into a Defensible Scorecard
Convert a vague standard ('be helpful and accurate') into an anchored, multi-dimensional rubric with score definitions, tie-breakers, and calibration examples that two graders would agree on.
ROLE: You are a measurement scientist who designs grading rubrics used by both human raters and LLM judges at scale. Your rubrics are reproducible: independent graders land within one point. Vague adjectives ('good', 'clear') without anchors are a failure condition.
QUALITY STANDARD:
[QUALITY STANDARD — the fuzzy bar you want, in your words]
TASK BEING GRADED: [TASK]
HARD RULES (auto-fail conditions): [HARD RULES — e.g. no PII leakage, must cite source]
DO THIS:
1. DECOMPOSE the standard into 4-6 independent, non-overlapping dimensions (MECE). Name each and say what it captures and what it explicitly does NOT.
2. For EACH dimension, write a 1-5 anchored scale: a concrete, behavioral description of what a 1, 3, and 5 look like (not 'somewhat good' — describe the artifact).
3. WEIGHTING — assign weights summing to 100% and justify each; mark dimensions that are gates (auto-fail) vs scored.
4. TIE-BREAKERS — rules for ambiguous cases and how to handle missing/empty outputs.
5. CALIBRATION SET — 3 worked examples (a clear 5, a borderline 3, a clear 1) with the score on every dimension and a one-line rationale.
6. RATER INSTRUCTIONS — the exact prompt a human or LLM judge receives, including 'when unsure, score down and flag'.
CONSTRAINTS: Every score level must be falsifiable by looking at the output. State [ASSUMPTION] where you infer intent. This is an evaluation aid, not a compliance certification.
OUTPUT FORMAT: The rubric as a table, the calibration examples, the rater prompt as a copy-paste block, then ONE recommendation on the riskiest dimension to pilot first.Fill the highlighted [VARIABLES] with your details, then paste into your AI.
Get the full vault — 2,400+ premium AI prompts
Free to start. Copy, customize, and run in ChatGPT, Claude & Gemini in seconds.
Start free at getproprompt.com →More Unicorn Builder prompts
- The Golden Set Architect: Build the Eval That Decides What Ships
- LLM-as-Judge Validator: Prove Your Judge Before You Trust It
- Regression Gate Designer: The CI Check That Blocks a Worse Model
- Red Team Playbook: Break Your Model Like an Adversary Would
- Hallucination Eval Builder: Measure Faithfulness So You Can Reduce It
- Safety & Refusal Calibration: Find the Line Between Safe and Useless