Tweak Idea: evaluate your startup ideas in Claude Code

The Problem

Before I write code, I want to know if the problem is real. Are people paying to solve it today? Is the market big enough? Can someone else build this faster?

Claude can help answer all of these — but each question is its own deep dive. Research the competitors, check the market data, pressure-test the assumptions. Do that and you're going to spend a lot of time and energy for one idea.

I wanted to run all of that in parallel and get a single score at the end.

The Solution

Tweak Idea is a Claude Code skillset. One command triggers the full pipeline:

/tweak:evaluate "A mobile app that lets restaurants sell unsold food at a discount 30 minutes before closing"

The full schema:

The pipeline runs in three stages:

Research

Captures your problem and solution, searches the web for competitors and market data, extracts testable hypotheses, and evaluates your founder-market fit — all in parallel.

Evaluation

14 Sonnet agents score your idea independently, one per dimension. No agent sees another's results — preventing anchoring bias.

Verdict

An Opus agent merges scores into a weighted report: GO, PIVOT, or STOP. Surfaces your top strengths and weaknesses, with concrete next steps ranked by score impact.

How It Works

Research

When you submit an idea, three things happen in parallel.

A research agent searches the web for competitors, market signals, and user complaints — building the evidence base that evaluators will draw from.

A hypothesis extractor pulls out every testable claim in your pitch and flags each one for your review. Before scoring begins, you mark which claims you can actually back up — confirmed ones get full scoring credit, unconfirmed ones are withheld. This is what keeps the system honest — it won't inflate your score based on things you haven't verified.

A founder-fit interview probes your specific advantages for this idea — domain expertise, customer access, technical ability — while building a persistent founder profile that gets sharper with each run.

Evaluation

14 Sonnet agents evaluate simultaneously — one per dimension, completely isolated. No agent sees another's results.

12%

Pain Intensity

How much does this problem hurt?

12%

Willingness to Pay

Will people actually pay to solve this?

12%

Solution Gap

Why hasn't this been solved already?

12%

Founder-Market Fit

Are YOU the right person to solve this?

Urgency

Does this need solving NOW?

Frequency

How often do people encounter this problem?

Market Size

How big is the opportunity?

Defensibility

Can you protect this business once built?

Market Growth

Is the problem getting bigger or smaller?

Scalability

Can this grow without proportional cost increase?

Target Customer

Do you know exactly who you're building for?

Behavior Change

How hard is it to get people to adopt?

Mandatory Nature

Are people forced to solve this?

Incumbent Indifference

Will big players try to crush you?

Each dimension comes with a signal table — concrete indicators that separate strong evidence from weak. “I can't stand this” vs. “it would be nice if” for Pain Intensity. “Budget already exists” vs. “requires multiple approvals” for Willingness to Pay. The evaluator maps what it finds to these signals before scoring.

If the shared research doesn't cover a dimension well enough, the evaluator runs its own targeted searches — looking for data points specific to its angle on the idea.

Each evaluator outputs a Score based on confirmed evidence and a Potential — where you'd land if unconfirmed hypotheses hold. A dimension at 2/5 with potential 4/5 tells you exactly what to validate — and how much it would move your total.

Verdict

An Opus agent merges all 14 evaluations into a weighted scorecard — your actual total and your potential side by side, with top strengths and weaknesses. The full report is also available as a self-contained HTML page with a radar chart of all 14 dimensions.

The best part is the recommended next steps. Each one targets a specific dimension, tells you exactly what to do, and shows the expected score impact:

Secure one LOI or paid pilot at the $300/month price point from an identified TA lead — Willingness to Pay: 4/5 → 5/5 (+0.12 weighted)

You know exactly what to validate, why, and how much it matters.

Example

Here's a real evaluation — an AI recruiting agent scored PIVOT at 3.0/5.0 with potential 3.7/5.0. Strong pain but low defensibility. View the full report.

Try It Yourself

Tweak Idea is open source and MIT licensed. You need Claude Code with access to Sonnet and Opus models.

Install

npx tweakidea

Evaluate

/tweak:evaluate

Each run saves a full snapshot to ~/.tweakidea/runs/, including an HTML report with a radar chart of all 14 dimensions.

Found a bug or have an idea? Open an issue on GitHub.