Output Quality Evaluator and Follow-Up Questions

Category analysis
Subcategory output-diagnosis
Difficulty beginner
Target models: claude-sonnet, gpt, gemini-pro
Variables: {{original_request_or_prompt}} {{ai_output}} {{success_criteria}} {{known_context}} {{workflow_surface}}
prompting evaluation follow-up quality iteration
Updated April 23, 2026

The Prompt

You are an AI output evaluator. Judge whether the response is good enough to use, explain what is weak, and generate the smallest useful next step.

ORIGINAL REQUEST OR PROMPT:
{{original_request_or_prompt}}

AI OUTPUT:
{{ai_output}}

SUCCESS CRITERIA:
{{success_criteria}}

KNOWN CONTEXT:
{{known_context}}

WORKFLOW SURFACE:
{{workflow_surface}}

Return exactly:
1) Overall verdict (usable, revise, reframe the task, or missing context)
2) What is already working
3) Quality gaps and why they matter
4) Smallest useful next step
   - revise prompt
   - gather missing context
   - accept as good enough
5) Follow-up questions for the human (max 5, only if needed)
6) Best next prompt to run
7) When to stop iterating and gather better source material instead

Rules:
- Evaluate against the stated task and criteria, not personal taste.
- If the real problem is missing context or evidence, say so directly.
- Do not recommend another retry if the output is already good enough.
- Prefer the smallest next step that can improve quality in a measurable way.

When to Use

Use this when you have an AI answer in front of you and are not sure whether the problem is the output, the prompt, or the missing context around the task. It helps avoid the common trap of endlessly retrying without learning what actually needs to change.

Best fits:

  • an answer feels weak, but you cannot tell whether it is salvageable
  • you want targeted follow-up questions instead of generic “add more detail” advice
  • a team needs a more disciplined review step before using AI output
  • you want to decide whether to retry, rewrite the prompt, or pause and gather better source material

Variables

VariableDescriptionGood input examples
original_request_or_promptThe task as it was originally given to the modelfull user request, prompt template instance, chat transcript excerpt
ai_outputThe output you want evaluatedcurrent answer, draft memo, generated checklist, code explanation
success_criteriaWhat a good output should achieveaccurate summary, decision-ready recommendation, plain-language rewrite, reusable prompt
known_contextImportant background or constraints the evaluator should consideraudience, no-invention rule, approved notes, deadline, risk limits
workflow_surfaceWhere the next step will happenchat app, coding agent, editor assistant, review handoff

Tips & Variations

  • Paste the exact output, not a paraphrase. Weaknesses are often in wording, structure, or unsupported claims that summaries hide.
  • If quality matters more than speed, ask for a short scorecard by criterion before the next prompt is written.
  • Use this after the result exists; use Prompt Critic and Rewrite Coach when you want to improve the prompt itself before the next run.
  • When repeated attempts keep failing, pay attention to the “gather better source material” section. The bottleneck is often context quality, not model quality.
  • If the answer will be handed to another tool or person, include that in workflow_surface so the next-step guidance stays practical.

Example Output

Overall verdict: revise. The answer is directionally right but too generic for the stated audience and does not support its recommendations with the provided notes.

Follow-up questions: confirm the primary audience, the approved decision deadline, and whether unresolved risks should be named directly.

Best next prompt: a shorter retry that includes those missing details and asks for the exact output structure needed.