Spec-Driven Development in Practice: GitHub Spec Kit, OpenSpec, and GSD Compared

Compare GitHub Spec Kit, OpenSpec, and GSD — the top spec-driven development frameworks for AI-assisted development — with a real feature walkthrough and honest trade-offs.

Authors

Elian Ortega

Software Developer

min read

May 21, 2026

‍

Spec-Driven Development has a clear premise: the quality of what AI produces is proportional to the quality of what you give it. But knowing that isn't enough. The real question is which framework, which artifacts, and which workflow to use when applying SDD to a real feature. This post compares the three frameworks the developer community has converged on: GitHub Spec Kit, OpenSpec, and GSD. Each one is applied to the same real feature so you can see exactly how each one thinks.

‍

The bottleneck shifted

GitHub Spec Kit, OpenSpec, and GSD have accumulated over 204,000 combined GitHub stars in under eighteen months. The developer community has converged on these three as working answers to a shared problem.

We've all heard about Spec-Driven Development. The premise is clear: the quality of what AI produces is proportional to the quality of what we give it. Write a vague prompt, get vague code. Specify the intent, get something closer to what you need. What's less discussed is what that looks like in practice: which framework, which artifacts, which workflow, and what the trade-offs are when you apply them to a real feature.

Eighteen months ago, AI in a developer's workflow meant tab-complete on steroids. Today, engineering teams are asking agents to implement entire features. The tooling caught up faster than the workflows did, and the result is familiar: you describe a feature in a prompt, the agent generates several hundred lines of code, and what comes back is plausible but wrong in ways that take longer to untangle than the original work would have taken. The agent didn't misunderstand the language. It misunderstood the intent, because the intent was never specified clearly enough to be misunderstood in a useful way.

If you want the foundations of why SDD matters structurally, we've covered them in an earlier post. This piece starts where that one left off: three frameworks the community is reaching for, applied to a single real feature so you can see how each one thinks.

Where engineering judgment still lives

Code generation is not the hard part of building software. It never was, not when we were writing it by hand, and not now. The hard parts are understanding what the user actually needs, deciding how to model the domain, making architectural trade-offs with downstream consequences, and verifying that what shipped matches what was intended.

Those are judgment calls. Frameworks can scaffold the artifacts that capture that judgment, but they cannot replace it. What the best SDD tooling does is force that judgment to happen earlier, before the agent touches a file, and make it explicit enough to be reviewed, versioned, and challenged.

The example: a forgot-password flow

To make the framework comparison concrete, we'll follow a single feature through each one: a forgot-password flow, built without delegating to Firebase Auth or Auth0. This is illustrative, not a tutorial.

The feature looks simple. It isn't. Token TTL decisions, rate limiting on the request endpoint, user enumeration prevention (returning the same response whether or not the email exists), email deliverability, expired-link UX, audit logging, localization: a spec that doesn't surface these requirements upfront will have an agent making silent assumptions about all of them. The frameworks diverge right there.

GitHub Spec Kit: the formal, spec-first standard

GitHub Spec Kit (currently v0.8.7, MIT) installs a Python CLI called specify and a family of slash commands into your AI coding agent of choice. The install is explicit about version pinning:

The workflow is phase-gated. Before any code is written, the team works through a chained command sequence: /speckit.constitution to establish non-negotiable principles, /speckit.specify to describe what to build in user-story terms, /speckit.plan to document the how, and /speckit.tasks to decompose the plan into reviewable work units. /speckit.implement executes last. Each phase produces a named artifact; skipping a phase means skipping its artifact.

For the forgot-password flow, /speckit.specify would produce a spec.md inside .specify/specs/[FEATURE-ID]/ that describes the flow entirely in functional terms: no stack decisions, no implementation choices. Something like:

The user enumeration requirement lives in the spec, not in a comment in the code. /speckit.plan would then document the token TTL, the rate-limiting strategy, and the email provider, each decision with its rationale. The constitution.md ships with nine default articles including Test-First, which means the plan-checker enforces that tests exist and fail before implementation begins.

The overhead is real. Birgitta Böckeler's analysis on martinfowler.com noted that a single spec generates eight files. For a three-point story, that's hard to justify. For a feature with the surface area of a password reset flow (token storage, expiry handling, email templating, audit trail), the artifact count carries its weight.

Spec Kit currently supports 30+ AI coding agents. Its constitution.md (architectural principles applied uniformly across every feature) is the most distinctive thing about it and the most direct answer to the problem of agents making inconsistent decisions across a long-running project.

OpenSpec: brownfield-first, change by change

OpenSpec (v1.3.1, MIT, @fission-ai/openspec on npm, Node.js 20.19.0+) takes a different position on what a spec is for. Where Spec Kit treats a spec as an upfront description of a feature to be built, OpenSpec treats a spec as a living record of how the system currently behaves, with each change expressed as a delta against that record.

The directory layout makes the distinction structural:

For the forgot-password flow, /opsx:propose forgot-password creates the changes/forgot-password/ folder. The delta spec inside would contain only what's new:

When the change ships, openspec archive forgot-password merges those requirements into openspec/specs/auth/spec.md and moves the change folder to changes/archive/2026-05-11-forgot-password/. The spec grows to reflect the new behavior. The history of how it got there lives in changes/archive/.

The practical advantage for existing codebases is that you never have to reconstruct the entire spec before you can work. You describe what's changing. For teams with an existing auth system that already handles login, session management, and token refresh, this model fits. You're not writing a greenfield spec for something that already exists; you're documenting a delta.

OpenSpec supports 25+ tools and requires no MCP dependency or API keys. The trade-off against Spec Kit is ecosystem maturity: Spec Kit has GitHub's distribution and documentation surface. OpenSpec's community is active but Discord-driven. The /opsx:ff fast-forward command, which scaffolds all change artifacts in a single step, is a practical time-saver once the workflow is familiar.

GSD: phase-based delivery with atomic execution

GSD (Get Stuff Done, v1.41.2, MIT, get-shit-done-cc on npm) attacks a different problem. Where Spec Kit and OpenSpec focus on the quality of the specification artifact, GSD focuses on what happens after the plan is approved: keeping the agent's execution context clean across a multi-task implementation.

The core insight is that context rot (degradation in output quality as a single AI session accumulates planning artifacts, execution history, and review feedback) is a structural problem, not a prompt engineering problem. GSD's answer is to give each execution unit a fresh ~200K-token context window and commit atomically after each task. One task, one commit. The git log becomes a verifiable trace of what was built and in what order.

For the forgot-password flow, the phase lifecycle looks like this:

‍

The plan-checker step is worth noting. Before any execution begins, a "Nyquist auditor" validates that every task in the plan has an automated feedback command: a curl call, a test invocation, something that confirms the task completed correctly. Plans without those commands are rejected and bounced back to the planner, up to three times.

GSD is the most execution-focused of the three frameworks. Its /gsd-discuss-phase command surfaces the forgot-password feature's hidden requirements (token TTL, rate limiting, user enumeration) through structured Q&A rather than through a formal spec artifact. The context is captured, but it's session-local rather than independently reviewable as a specification document.

Community posts report roughly a 4:1 token overhead versus ad-hoc prompting. The Claude Pro plan is generally insufficient for sustained use; most experienced users run Max or the API directly.

A factual comparison

‍

	GitHub Spec Kit	OpenSpec	GSD
Install	uv tool install specify-cli	npm install -g @fission-ai/openspec	npx get-shit-done-cc@latest
Runtime	Python 3.11+	Node.js 20.19.0+	Node.js (via npx)
Primary artifact	spec.md + constitution.md	changes/<name>/proposal.md → merges into specs/	CONTEXT.md + PLAN.md per phase
Default directory	.specify/	openspec/	.planning/
Supported agents	30+	25+	Claude Code–first, others supported
License	MIT	MIT	MIT
GitHub stars	~95.6k	~46.9k	~61.5k
Latest version	v0.8.7 (pre-1.0)	v1.3.1	v1.41.2

Data as of May 2026.

The tool is not the answer

All three frameworks are serious attempts to solve the same problem: the quality of what AI produces is directly proportional to the quality of what we give it. Each one reaches a different conclusion about where that quality is hardest to maintain. Spec Kit says it's at the upfront specification stage, and enforces structure there. OpenSpec says it's at the change boundary in an existing system, and makes delta management explicit. GSD says it's during execution, and manages context hygiene across the whole delivery cycle.

None of them removes the engineering judgment. They structure where it happens and make its outputs reviewable.

At Somnio, we've worked with each of these frameworks and we don't bind to one. The question we start with isn't "which framework?" but what the project needs. A greenfield feature in a new product has different requirements than adding a security-sensitive flow to a codebase that's been in production for three years. A team of two has different coordination overhead than a team of fifteen.

What we don't lose sight of, regardless of which framework or which agent: we are building applications to be used by users, not by engineers or AI agents. The specification exists to ensure that what gets built is what the user actually needs. Every phase gate, every artifact, every atomic commit is in service of that. The frameworks are a means to that end. Engineering judgment is what connects them.

At Somnio Software, we design and build high-quality digital products using Flutter, Dart, and modern AI-assisted development practices. If you're looking for a team that takes software quality seriously at every layer, from specification to production, we'd like to hear about your project.