The Problem
Before I write code, I want to know if the problem is real. Are people paying to solve it today? Is the market big enough? Can someone else build this faster?
Claude can help answer all of these — but each question is its own deep dive. Research the competitors, check the market data, pressure-test the assumptions. Do that and you're going to spend a lot of time and energy for one idea.
I wanted to run all of that in parallel and get a single score at the end.
The Solution
Tweak Idea is a Claude Code skillset. One command triggers the full pipeline:
/tweak:evaluate "A mobile app that lets restaurants sell unsold food at a discount 30 minutes before closing"The full schema:
The pipeline runs in three stages:
Research
Captures your problem and solution, searches the web for competitors and market data, extracts testable hypotheses, and evaluates your founder-market fit — all in parallel.
Evaluation
14 Sonnet agents score your idea independently, one per dimension. No agent sees another's results — preventing anchoring bias.
Verdict
An Opus agent merges scores into a weighted report: GO, PIVOT, or STOP. Surfaces your top strengths and weaknesses, with concrete next steps ranked by score impact.
How It Works
Research
When you submit an idea, three things happen in parallel.
A research agent searches the web for competitors, market signals, and user complaints — building the evidence base that evaluators will draw from.
A hypothesis extractor pulls out every testable claim in your pitch and flags each one for your review. Before scoring begins, you mark which claims you can actually back up — confirmed ones get full scoring credit, unconfirmed ones are withheld. This is what keeps the system honest — it won't inflate your score based on things you haven't verified.
A founder-fit interview probes your specific advantages for this idea — domain expertise, customer access, technical ability — while building a persistent founder profile that gets sharper with each run.
Evaluation
14 Sonnet agents evaluate simultaneously — one per dimension, completely isolated. No agent sees another's results.
Pain Intensity
How much does this problem hurt?
Willingness to Pay
Will people actually pay to solve this?
Solution Gap
Why hasn't this been solved already?
Founder-Market Fit
Are YOU the right person to solve this?
Urgency
Does this need solving NOW?
Frequency
How often do people encounter this problem?
Market Size
How big is the opportunity?
Defensibility
Can you protect this business once built?
Market Growth
Is the problem getting bigger or smaller?
Scalability
Can this grow without proportional cost increase?
Target Customer
Do you know exactly who you're building for?
Behavior Change
How hard is it to get people to adopt?
Mandatory Nature
Are people forced to solve this?
Incumbent Indifference
Will big players try to crush you?
Each dimension comes with a signal table — concrete indicators that separate strong evidence from weak. “I can't stand this” vs. “it would be nice if” for Pain Intensity. “Budget already exists” vs. “requires multiple approvals” for Willingness to Pay. The evaluator maps what it finds to these signals before scoring.
If the shared research doesn't cover a dimension well enough, the evaluator runs its own targeted searches — looking for data points specific to its angle on the idea.
Each evaluator outputs a Score based on confirmed evidence and a Potential — where you'd land if unconfirmed hypotheses hold. A dimension at 2/5 with potential 4/5 tells you exactly what to validate — and how much it would move your total.
Verdict
An Opus agent merges all 14 evaluations into a weighted scorecard — your actual total and your potential side by side, with top strengths and weaknesses. The full report is also available as a self-contained HTML page with a radar chart of all 14 dimensions.
The best part is the recommended next steps. Each one targets a specific dimension, tells you exactly what to do, and shows the expected score impact:
Secure one LOI or paid pilot at the $300/month price point from an identified TA lead — Willingness to Pay: 4/5 → 5/5 (+0.12 weighted)You know exactly what to validate, why, and how much it matters.
Example
Here's a real evaluation — an AI recruiting agent scored PIVOT at 3.0/5.0 with potential 3.7/5.0. Strong pain but low defensibility. View the full report.
Try It Yourself
Tweak Idea is open source and MIT licensed. You need Claude Code with access to Sonnet and Opus models.
Install
npx tweakideaEvaluate
/tweak:evaluateEach run saves a full snapshot to ~/.tweakidea/runs/, including an HTML report with a radar chart of all 14 dimensions.
Found a bug or have an idea? Open an issue on GitHub.