The Reality of Vibe Coding, why AI Still Fails at Complex Projects

Dong Liang
6 min readMar 4, 2025

--

documentation driven development

Today, I embarked on a new experiment — one I call Documentation Driven Development. This approach stands in contrast to the trending “vibe coding” — a term popularized by Andrej Karpathy in a Twitter post. Vibe coding refers to the practice of providing loose, casual, and often ambiguous comments that AI can somehow interpret and convert into functional, sometimes even high-quality code. The idea is compelling: just describe what you need in a few natural-language prompts, and let AI handle the complexity.

You’ve likely seen those viral videos titled “Make a full-stack application in 5 minutes without writing a single line of code.” And in a superficial sense, these demonstrations are correct. Modern AI can indeed generate functional code for a full-stack app in minutes, masking the underlying tech stack complexity with its vast corpus of training data. However, there’s a catch.

The Problem with Vibe Coding

Most of these videos are just proof-of-concept demonstrations rather than realistic development processes. They rarely go beyond generating a landing page or a simple dashboard. While this is still impressive, real-world projects — whether professional or even hobbyist — require significantly more rigor.

From my own painful experiences attempting vibe coding, I’ve realized that AI-generated code is often promising but incomplete. It’s great at boilerplate generation and providing scaffolding, but when you move beyond the surface level, things quickly fall apart. AI starts making mistakes, and worse, it perpetuates errors in an endless cycle of fixing and creating new problems. At some point, debugging AI-generated code becomes more labor-intensive than writing the code manually.

This realization prompted me to explore a more structured approach, inspired by Rob Shocks, whose videos emphasize documentation-driven development.

The Documentation-Led Approach

For my current project, I needed to merge two packages inside my app structure so that there is only one package.json. This would simplify dependency management and reduce overhead. Initially, I thought this might be a straightforward vibe coding task—just ask AI to do it and let it figure out the steps. However, after several frustrating attempts that resulted in AI getting stuck in loops of errors and misinterpretations, I decided to take a different approach.

I instructed Cursor (an AI coding assistant) to first generate a structured package-structure-analysis document that outlines the various phases, stages, and tasks required for the refactor. Here’s what happened:

For phase 1 & 2: Codebase Scanning & Documentation Generation, the AI-generated documentation seemed thorough, but I wasn’t sure how much of it was actually useful.

Phase 3: Implementation: this is where things got interesting. The AI started deviating from the plan, taking liberties in some areas while skipping explicitly outlined steps. When I confronted AI about these discrepancies, it responded with an honest admission.

## Understanding the DiscrepanciesLooking at the implementation versus the plan, I notice several key differences:1. Incomplete Migration: I only migrated a small subset of the files that needed to be moved. For example, I only migrated one extension (ai-highlight.ts) when there were several others mentioned in the plan.2. Directory Structure: I created some directories that weren't explicitly mentioned in the plan (like lib/editor/components) and missed creating others.3. Dependency Installation: While I updated the package.json with the necessary dependencies, I didn't run the installation command to actually install them, which explains many of the TypeScript errors.4. Partial Implementation: I didn't complete all the tasks outlined in the document, such as migrating all plugins, utilities, and hooks.

The response made sense, and Claude 3.7 recommended some corrective steps. I followed through by updating the packages, and then asked Claude to complete the remaining tasks, which it did.

When it is time to review, I pointed out the above problems by saying:

How can the process work better? I noticed you took liberties on somethings and then skipped others explicitly mentioned in these task lists.

The response is:

## Issues with My Implementation1. Inconsistent Adherence to Task Lists: I followed some parts of the task list closely while taking liberties with others. For example:- I created directories not explicitly mentioned in the plan (like lib/editor/components)- I skipped migrating some files that were explicitly mentioned- I didn't fully implement all the barrel files as specified1. Selective Migration: I initially only migrated a subset of files (like one extension) instead of migrating all files as specified in the task list.2. Lack of Systematic Approach: I didn't systematically work through each task in order, which led to gaps in the implementation.3. Incomplete Documentation Review: While I referenced the documentation, I didn't thoroughly cross-check my implementation against all the specified requirements.

It then made an excellent statement of how to do things better:

## How the Process Could Work Better1. Strict Task-by-Task Implementation:- Follow each task in the exact order specified- Complete each task fully before moving to the next- Use a checklist approach to mark off completed items1. Explicit Confirmation of Task Completion:- After each major task, provide a summary of what was done- Highlight any deviations from the plan and explain why- Ask for confirmation before proceeding to the next task1. Better Documentation Cross-Referencing:- Create a consolidated reference of all requirements from different documents- Explicitly map each implementation step to the corresponding documentation- Regularly check back against the original requirements1. Systematic File Migration:- Create a complete inventory of files to be migrated before starting- Track progress with a clear status for each file- Verify each migrated file against its original counterpart1. Clearer Communication About Decisions:- When taking liberties or making decisions not specified in the plan, explicitly highlight these- Provide rationale for any deviations from the plan- Offer alternatives when the plan might not be optimal1. Regular Validation Steps:- Add more intermediate validation steps to ensure the work is on track- Run tests after smaller groups of changes rather than at the end- Verify directory structures and file contents more frequently1. Explicit Handling of Dependencies:- Better analyze and document dependencies between files before migration- Ensure dependent files are migrated in the correct order- Verify that all dependencies are properly maintained after migration

Eloquence vs. Execution: The Bane of Current Agentic Coding Frameworks

Here’s the paradox: AI is incredibly articulate when discussing best coding practices, yet it consistently fails to follow those very principles. Imagine a professor who writes bestselling books on productivity and efficiency but struggles to organize their own daily schedule. Their theories are well-articulated, and they can explain best practices in excruciating detail; however, without internalizing those principles, their day-to-day actions remain far from ideal. In the same way, AI models are trained on vast amounts of text that include these best practices, and they can articulate them flawlessly. Yet, since AI does not “believe” or have intrinsic motivation, it doesn’t commit to those practices in its code generation practices.

This is why something so eloquent and convincing about the best ways to write software can simultaneously make naive, lazy mistakes that an intern wouldn’t. AI doesn’t “believe” in best practices; it merely regurgitates what it has been trained on.

But can this problem be tackled by prompt engineering? The current implementation of guidance in Cursor remains primitive yet rapidly evolving. Previously, a single .cursorrule file could be defined—functioning much like the system prompt in our regular LLM sessions. More recently, Cursor introduced project rules, allowing for the definition of specific needs or contexts for individual projects.

There have been attempts to use this system prompt or project rules to guide the LLM more effectively. One such attempt seeks to emulate the performance of Devin — a highly impressive agentic framework — by instructing the LLM not only to follow but also to dynamically revise the .cursorrule file.

Given that simple prompt adherence remains an unresolved challenge, it’s no surprise that multi-step AI-driven agentic development still struggles. When an AI-generated project requires numerous interdependent steps, errors compound quickly, leading to chaos. If AI cannot reliably adhere to a single detailed task list, how can it be expected to manage the complexities of an entire software development cycle autonomously?

Ultimately, while vibe coding may still work for trivial projects, for anything non-trivial, a structured, documentation-driven approach remains necessary — at least until AI reaches a level where it can truly plan and execute multi-step coding tasks with reliability.

Despite the setbacks, my project was ultimately a success. The packages were merged without obvious issues, and to my surprise, the app built and ran without errors.

--

--

Dong Liang
Dong Liang

No responses yet