~/compare / paelladoc-vs-conductor.md · 7 min · 1411 words
[compare] · paelladoc vs · by

PaellaDoc vs Conductor: review your agents, or verify them

Both run coding agents in parallel on your Mac. The difference is what happens when an agent says done: you read the diff and decide, or a gate runs the code against your criteria.

JL @jlcases PaellaDoc creator · València
PaellaDoc vs Conductor: both run parallel agents on your Mac. Conductor decides done when you review the diff. PaellaDoc decides done when a gate runs the code against your acceptance criteria.

Both run coding agents in parallel on your Mac, in isolated git worktrees, with whatever models you already pay for. On that, they agree. The difference is one step: when an agent finishes, who decides it’s done, your eyes on the diff, or a gate that runs the code?

What Conductor does

Conductor, by Melty Labs, is a Mac app that runs Claude Code, Codex, and Cursor agents in parallel, each in its own isolated worktree. You see at a glance what each one is doing, then you review the diffs and merge what you want. It adds checkpoints to roll back, spotlight testing to sync changes into your main repo, and a multi-model mode to run two models on the same prompt and compare.

Credit where it’s due. It’s free (you bring your own subscription), it’s polished, and the diff-first review workflow is genuinely well made. For a developer who wants to fan out agents and review the results, Conductor is purpose-built and a pleasure to use.

What PaellaDoc does

PaellaDoc also runs agents locally, in isolated worktrees, model-agnostic, with your own subscription. We share that ground with Conductor, and it’s worth saying plainly. Three things are different:

  • It carries a product methodology, not just an agent runner, and it’s extensible through an open SDK. The work becomes first-class, versioned .paella artifacts, a PRD, epics, user stories, acceptance criteria, that you can compare and diff. You make product, not just code.
  • “Done” is decided by an execution gate: the code runs against those acceptance criteria, and a green build doesn’t count.
  • There’s a no-coder mode that builds a whole product from a description, for someone who can’t review a diff at all.

The difference that matters: review vs. verify

Two loops side by side. Conductor: the agent produces a diff, you read it, you merge, you decide by looking. PaellaDoc: the agent produces a diff, a gate runs it against your acceptance criteria, then it's done, the gate decides by running.

Conductor’s loop is diff-first: the agent produces changes, you read them, you merge. It makes you a faster, better-equipped reviewer. PaellaDoc’s loop is verify-first: the agent produces changes, a gate executes them against criteria the agent didn’t write, and only then is it done.

This isn’t a matter of taste. We measured it. Across 210 runs, a coding agent’s output passed the build but was genuinely wrong 40% of the time, and even the strongest frontier model failed a hard task two out of three times, on different runs each time. A green build is not a correct feature. Reading a diff that looks right is the same trap: looks right and is right are different claims, and on a non-deterministic agent you can’t close the gap by eye.

Conductor’s spotlight testing helps here, it lets you sync changes back and test them. But that’s you running tests by hand, when you remember to, on the runs you choose to look at. The gate runs the criteria on every change, every time, whether you’re watching or not.

Code, or product

This is the bigger gap, and it’s easy to miss. Conductor operates on code: agents, diffs, worktrees, merges. It’s very good at moving code around, and it doesn’t pretend to be more, which is part of why it’s clean.

PaellaDoc operates on product. The unit isn’t a diff, it’s a product artifact: a PRD, an epic, a user story, its acceptance criteria, each a real .paella file that’s versioned, portable, and comparable. You can diff your product spec the way Conductor diffs code, and compare two approaches at the level of intent, not just changed lines.

The whole thing is extensible through an open SDK, and the community can build and share packs of four kinds: method packs for the methodology itself, stack packs for your tech stack, design packs for theming and design tokens, and validator packs for the gates that check the work. None of this has an equivalent in a tool that operates on diffs.

So the line is this. Conductor makes you faster at producing code, and the product is whatever you remember to keep coherent on top. PaellaDoc makes you build the product, with the agents and the code underneath it.

It doesn’t assume you’re sitting at the Mac

Conductor is an app you sit in front of: you watch the agents and review the diffs. PaellaDoc relaxes both assumptions.

Point it at a repository you already have and it does a reverse intake, reading the existing code and reconstructing the product context around it, so you’re not stuck with greenfield. And you can drive it from Telegram: start work, check a gate, approve a step, from your phone, nowhere near the machine. Together with the no-coder mode, that’s three ways it reaches past “a developer at a Mac,” which is exactly who Conductor is built for, and built well for.

One repo, or all of them

Conductor runs agents inside a repository. That’s the unit: a repo, its worktrees, its diffs.

In the AI era you don’t have one repo, you have a hundred, scattered across your machine, half of them half-finished. PaellaDoc opens and organizes them: every project in one place, taggable, with its state visible, the agents, the gates, where each one stands. It’s a control room for all your projects, not a runner inside one of them.

Why this matters more if you can’t read code

Conductor’s whole workflow is built around reading the diff. That’s the right design for a developer. It’s the wrong design for someone who can’t read code at all.

A no-coder has no diff to review, or rather, the diff means nothing to them. Conductor doesn’t serve that person, because its core loop is human review. PaellaDoc’s no-coder mode is built for exactly them: the gate is the review they can’t do themselves.

What we share, so this is fair

Both are local, on your Mac. Both are model-agnostic (Claude, Codex, and more). Both use your own subscription instead of charging per token. Both isolate work in git worktrees. If you want parallel agents with a clean review UX, Conductor does that well, and it’s free, which is hard to argue with.

Side by side

  Conductor PaellaDoc
Local, on your Mac Yes Yes
Parallel agents in worktrees Yes Yes
Model-agnostic, your subscription Yes Yes
Works on Code (diffs, merges) Product (PRD, US, AC as .paella artifacts)
Deciding “done” You review the diff Execution gate vs. criteria
Extensible packs via open SDK No Method, stack, design, validator packs
Reverse intake (rebuild product context from an existing repo) No Yes
Drive it remotely (Telegram) No Yes
Manages all your projects (control center, tags) No (per repo) Yes
For non-coders No (diff-first) Yes (no-coder mode)
Polish, review UX Ahead Earlier

Who each is for

If you’re a developer who wants to run several agents at once and review their diffs yourself, with a clean, polished UX, Conductor is built for that, and it’s free. Hard to beat for that job.

If you want “done” decided by running the code against criteria instead of by your eyes, or you need a whole product built by someone who can’t review code, that’s where PaellaDoc is different.

It isn’t better, and it isn’t trying to be Conductor. Conductor is a focused tool for running agents and reviewing their code, free and well made. PaellaDoc is the system around the work: the product, the gate, the packs, all your repos, and the people who can’t read a diff. Different jobs, and one of them still works when there’s no one to review.