By Stephen Fishburn in AI Development — 02 Jun 2025

Fluency Is Not Fidelity: The Trust Collapse of AI Coding Tools

AI dev tools sound confident—but don’t know your system. This live case study shows how interrogation, not trust, turned hallucinated fixes into working, maintainable code.

AI coding assistants like Lovable, Cursor, and Bolt present themselves as hyper-competent collaborators. Their interfaces are fast, polished, and seductively fluent. They generate entire pages, refactor components, scaffold tables, and suggest fixes with the confidence of a senior engineer on a deadline.

But beneath that speed and polish is a hard truth:

These tools don’t understand your system. They only perform as if they do.

As a veteran software engineer, I found myself lulled into trust—not because the outputs were always correct, but because they looked good enough to believe. The problem is, they weren’t.

Despite producing syntactically valid code and decent UI on first pass, these agents:

Don’t validate the actual runtime behavior of what they generate
Don’t read or reason through files unless explicitly instructed
Forget context, contradict earlier decisions, and hallucinate functions
Apologize for errors they confidently introduced
Confuse their own output with verified implementation

This isn’t just a tooling gap—it’s a psychological exploit. The AI’s fluency convinces you it understands what it’s doing. But it doesn’t. It can’t. Not without governance.

Fluent ≠ correct. Fast ≠ reliable. Convincing ≠ grounded.

And when you're deep into AI-assisted development, that illusion can cost you hours of debugging, thousands of bad commits, and the creeping feeling that your project is held together by suggestion rather than design.

This protocol emerged from that reality—a structured way to interrogate, constrain, and validate LLM-generated code before it breaks your system, your trust, or your momentum.

This isn’t hypothetical. The following live case study demonstrates exactly how this failure mode plays out—and how interrogation prevents it.

📂 Live Case Study 1: Metadata Mapping Failure (Lovable, June 2025)

A real-time breakdown illustrating how AI fluency masks architectural failure — and how interrogation restores reliability.

Context

The /recipes filter dropdown UI appeared to work, but only displayed a handful of cuisines (e.g., "Italian", "Mexican"). When asked why, Lovable initially claimed the system was working as designed.

Timeline of Failures

The breakdown followed a predictable but dangerous pattern of AI overconfidence and hallucinated implementation. Each correction only occurred because the user intervened with direct interrogation.

Initial Claim: Lovable said it was reading metadata correctly and cuisine was extracted from JSON.
Reality: A prompt-induced confession revealed it had only inspected 2 local files — not the 92 production JSONs in Supabase storage.
Assumptions Caught: When asked for DB values, Lovable guessed based on logs and code paths.
Verification Loop Triggered: A direct prompt to read the DB and enumerate distinct cuisine values showed: 92 NULLs.
Follow-up Investigation: Revealed migration code had added metadata columns (e.g., cuisine, difficulty, title) but never populated them.
Bug Report Generated: Under direct instruction, Lovable produced an actual vs expected diagnostic with affected file paths, UUID issues, fallback logic exposure, and recommendations.

Root Cause

Migration added metadata columns but skipped population.
The frontend filter fell back to hardcoded constants because DB-driven filters returned null.
AI had falsely confirmed completion of tasks it never actually did.

Steps of Correction

This resolution didn’t happen automatically. It followed a specific sequence of escalating interventions:

Assumption Call-Out: Prompted Lovable to admit it hadn’t reviewed real JSON files.
Code Review Demand: Forced Lovable to verify JSON schema vs SQL mapping.
Confession of Inaction: AI admitted the migration created columns but didn’t populate them.
Actual vs Expected Report: Under pressure, it produced a formal bug report with root cause analysis.
Plan Extraction: User required a step-by-step plan before any code was written.
Authorized Code Fix: Migration was updated to match JSON schema exactly.
Self-Audit Loop: Lovable was instructed to check the cuisine values in the DB directly (not guess).

For details on how to implement these steps, read Chain of Thought Contract (COTC) – Manual Protocol v1.0.

Resolution Path

Interrogation forced a real bug report
SQL migration corrected
Metadata extraction edge function designed
Sorting and filter integrity path refactored to match JSON schema

Takeaway

To ground this even further, here's a quick before/after of the transformation:

Before:

const availableCuisines = ['Italian', 'Mexican', 'Asian', 'American'];

After:

const { data } = await supabase
  .from('recipe_storage_mappings')
  .select('cuisine')
  .not('cuisine', 'is', null)
  .neq('cuisine', '')
  .order('cuisine');

Without structured governance, the AI would have continued asserting that the system was working. The Interrogation Loop prevented silent failure from shipping.

If you don’t force the AI to prove it, it will perform belief instead of logic.

📂 Live Case Study 2: Race Condition in Recipe Count Display (Lovable, June 2025)

A second incident highlights how AI-generated confidence can mask timing bugs that only appear under interactive state conditions.

Context

The user reported a flash of “0 recipes found” before the correct count appeared on the /recipes page. Lovable initially did not detect the root cause until prompted to produce a full bug report.

Breakdown

Planned Behavior: No count shown until real results load
Actual Behavior: Brief flash of “0 recipes found,” then correct count

Root Cause

The hasDataLoaded flag was being set before confirming total count validity
A brief window existed where the component rendered with totalCount = 0
Race between isLoading, isTransitioning, and hasDataLoaded led to false UI state

Evidence

Console logs showed:

98 recipes indexed in search
0 recipes returned by DB query
RecipeCountDisplay momentarily rendered based on early state before hasDataLoaded flipped

Resolution

The user demanded a real bug report comparing actual vs expected behavior, resulting in a proposed code change:

useEffect(() => {
  if (query.data && !query.isLoading && !query.error) {
    const hasConfirmedData = query.data.totalCount > 0 || 
      (query.data.totalCount === 0 && !isTransitioning);
    if (hasConfirmedData) {
      setHasDataLoaded(true);
      setIsTransitioning(false);
    }
  }
}, [query.data, query.isLoading, query.error, isTransitioning]);

Takeaway

This race condition would have gone unnoticed or been blamed on “React weirdness” in most workflows. But the interrogation loop forced the model to:

Compare actual vs expected behavior
Pinpoint code-level timing mismatch
Propose a KISS-compliant fix using gating logic

This wasn’t prompt-as-magic. It was prompt-as-accountability.

✅ Outcome and Lessons Learned

While the final build surfaced some minor integration bugs, the full protocol produced better outcomes than ordinary prompting ever could have. Had the user simply asked “Can you fix filtering?”, the AI might have generated a new dropdown component, guessed at the schema, or issued a nonfunctional patch. Instead:

Every assumption was exposed and corrected
Every migration was validated against real JSON structure
Every filter now queries live data, not constants
And the system architecture improved with real DRY/SOLID/KISS guardrails

This process, though longer, saved days of debugging and delivered a working, durable solution.

Lovable ultimately implemented a working metadata extraction service, population routine, and dynamic filter system. The fix adhered to the exact JSON structure defined in the recipe prompt schema and mapped it into a queryable SQL layer with indexes. The new system:

Replaced hardcoded filter values with live DISTINCT queries from the DB
Normalized time, cuisine, and difficulty values for consistency
Respected DRY by consolidating logic into RecipeFilterOptionsService
Followed SOLID by creating dedicated extractors and repositories
Passed KISS review: clear responsibilities, no overdesign, minimal fallback logic

Without the forced loop of confession, bug reporting, plan validation, and enforced review, the AI would have hallucinated that the migration had been completed. It didn’t.

The system now works because the protocol worked.

That breakdown didn’t resolve itself. The following protocol emerged directly from the case above, and was the only reason the system converged on a correct, working solution. It required structured interrogation. Here's the protocol that made it possible:

🛠️ The Lovable Interrogation Loop v1.0

Each step in this loop exists to counter specific, recurring failure modes in AI-assisted development. It’s not just a process—it’s a defense system:

Step 1 prevents blind trust in confident-sounding output.
Step 2 forces grounding in source code instead of pattern-based inference.
Step 3 creates accountability and clarity by forcing the AI to describe what's broken in human terms.
Step 4 blocks hallucinated "fixes" by requiring a scoped, logical plan.
Step 5 controls generation boundaries—no plan, no code.
Step 6 creates a feedback loop using proven engineering principles, catching structural debt before it ships.

Together, these steps transform the relationship from passive code generation to governed, auditable collaboration.

Step 1: Call Out Assumptions

Why this matters:
Fluent answers are often hallucinated. Asking this forces the model to admit whether it inspected real files or inferred from prior completions.

What a good answer looks like:
✅ "Yes. I reviewed recipe-access-handler.ts, focusing on how created_at is passed into the sort logic."

What a bad answer looks like:
🚫 "Looks like it's just a sorting bug. Probably timestamp-related."

Step 2: Force It to Prove It

Why this matters:
Pattern-matching models can confidently guess. Forcing specific file references and logic inspection grounds the output.

What a good answer looks like:
✅ "I checked /functions/list-recipes/index.ts. On line 237, created_at from JSON is sorted instead of using file_created_at."

What a bad answer looks like:
🚫 "The recipe list might be pulling the wrong field. Try checking the sort."

Step 3: Make It Own the Mistake

Why this matters:
This shifts the model from passive generation to analytical mode. Bug reports anchor AI in human-expectation context.

What a good answer looks like:
✅ "Expected: Recipe list sorts by real file creation time.
Actual: Sorting uses recipe JSON timestamps.
Root cause: Line 237 of list-recipes/index.ts sorts created_at instead of file_created_at."

What a bad answer looks like:
🚫 "The sorting function might not be using the right data."

Step 4: Demand a Concrete Fix Plan

Why this matters:
This step prevents the AI from generating code impulsively. It requires planning before production.

What a good answer looks like:
✅ "Update getUserRecipeMappings() in database-recipe-mapper.ts to include file_created_at, then update sorting logic in list-recipes/index.ts to use it."

What a bad answer looks like:
🚫 "Just sort by the correct timestamp field instead."

Step 5: Authorize the Code

Why this matters:
This introduces a gating mechanism. The AI must follow the plan and respect constraints.

What a good answer looks like:
✅ Code strictly adheres to the approved plan with clear comments, no feature creep.

What a bad answer looks like:
🚫 Adds unrelated refactors, new sorting logic, or changes schema structure.

Step 6: Make It Review Itself

Why this matters:
This creates a final validation pass where the AI self-audits. It’s where hallucinated complexity and missed reuse surface.

What a good answer looks like:
✅ "I violated DRY by repeating filter logic. Revised into shared applySorting() util. Confirmed alignment with REUSE and KISS."

What a bad answer looks like:
🚫 "No issues found." (When 10 duplicated sortRecipes() implementations exist)