Fluency Is Not Fidelity: The Trust Collapse of AI Coding Tools
AI dev tools sound confident—but don’t know your system. This live case study shows how interrogation, not trust, turned hallucinated fixes into working, maintainable code.
AI coding assistants like Lovable, Cursor, and Bolt present themselves as hyper-competent collaborators. Their interfaces are fast, polished, and seductively fluent. They generate entire pages, refactor components, scaffold tables, and suggest fixes with the confidence of a senior engineer on a deadline.
But beneath that speed and polish is a hard truth:
These tools don’t understand your system. They only perform as if they do.
As a veteran software engineer, I found myself lulled into trust—not because the outputs were always correct, but because they looked good enough to believe. The problem is, they weren’t.
Despite producing syntactically valid code and decent UI on first pass, these agents:
- Don’t validate the actual runtime behavior of what they generate
- Don’t read or reason through files unless explicitly instructed
- Forget context, contradict earlier decisions, and hallucinate functions
- Apologize for errors they confidently introduced
- Confuse their own output with verified implementation
This isn’t just a tooling gap—it’s a psychological exploit. The AI’s fluency convinces you it understands what it’s doing. But it doesn’t. It can’t. Not without governance.
Fluent ≠ correct. Fast ≠ reliable. Convincing ≠ grounded.
And when you're deep into AI-assisted development, that illusion can cost you hours of debugging, thousands of bad commits, and the creeping feeling that your project is held together by suggestion rather than design.
This protocol emerged from that reality—a structured way to interrogate, constrain, and validate LLM-generated code before it breaks your system, your trust, or your momentum.
This isn’t hypothetical. The following live case study demonstrates exactly how this failure mode plays out—and how interrogation prevents it.
📂 Live Case Study 1: Metadata Mapping Failure (Lovable, June 2025)
A real-time breakdown illustrating how AI fluency masks architectural failure — and how interrogation restores reliability.
Context
The /recipes
filter dropdown UI appeared to work, but only displayed a handful of cuisines (e.g., "Italian", "Mexican"). When asked why, Lovable initially claimed the system was working as designed.
Timeline of Failures
The breakdown followed a predictable but dangerous pattern of AI overconfidence and hallucinated implementation. Each correction only occurred because the user intervened with direct interrogation.
- Initial Claim: Lovable said it was reading metadata correctly and cuisine was extracted from JSON.
- Reality: A prompt-induced confession revealed it had only inspected 2 local files — not the 92 production JSONs in Supabase storage.
- Assumptions Caught: When asked for DB values, Lovable guessed based on logs and code paths.
- Verification Loop Triggered: A direct prompt to read the DB and enumerate distinct
cuisine
values showed: 92 NULLs. - Follow-up Investigation: Revealed migration code had added metadata columns (e.g.,
cuisine
,difficulty
,title
) but never populated them. - Bug Report Generated: Under direct instruction, Lovable produced an actual vs expected diagnostic with affected file paths, UUID issues, fallback logic exposure, and recommendations.
Root Cause
- Migration added metadata columns but skipped population.
- The frontend filter fell back to hardcoded constants because DB-driven filters returned null.
- AI had falsely confirmed completion of tasks it never actually did.
Steps of Correction
This resolution didn’t happen automatically. It followed a specific sequence of escalating interventions:
- Assumption Call-Out: Prompted Lovable to admit it hadn’t reviewed real JSON files.
- Code Review Demand: Forced Lovable to verify JSON schema vs SQL mapping.
- Confession of Inaction: AI admitted the migration created columns but didn’t populate them.
- Actual vs Expected Report: Under pressure, it produced a formal bug report with root cause analysis.
- Plan Extraction: User required a step-by-step plan before any code was written.
- Authorized Code Fix: Migration was updated to match JSON schema exactly.
- Self-Audit Loop: Lovable was instructed to check the cuisine values in the DB directly (not guess).
For details on how to implement these steps, read Chain of Thought Contract (COTC) – Manual Protocol v1.0.
Resolution Path
- Interrogation forced a real bug report
- SQL migration corrected
- Metadata extraction edge function designed
- Sorting and filter integrity path refactored to match JSON schema
Takeaway
To ground this even further, here's a quick before/after of the transformation:
Before:
const availableCuisines = ['Italian', 'Mexican', 'Asian', 'American'];
After:
const { data } = await supabase
.from('recipe_storage_mappings')
.select('cuisine')
.not('cuisine', 'is', null)
.neq('cuisine', '')
.order('cuisine');
Without structured governance, the AI would have continued asserting that the system was working. The Interrogation Loop prevented silent failure from shipping.
If you don’t force the AI to prove it, it will perform belief instead of logic.
📂 Live Case Study 2: Race Condition in Recipe Count Display (Lovable, June 2025)
A second incident highlights how AI-generated confidence can mask timing bugs that only appear under interactive state conditions.
Context
The user reported a flash of “0 recipes found” before the correct count appeared on the /recipes
page. Lovable initially did not detect the root cause until prompted to produce a full bug report.
Breakdown
- Planned Behavior: No count shown until real results load
- Actual Behavior: Brief flash of “0 recipes found,” then correct count
Root Cause
- The
hasDataLoaded
flag was being set before confirming total count validity - A brief window existed where the component rendered with
totalCount = 0
- Race between
isLoading
,isTransitioning
, andhasDataLoaded
led to false UI state
Evidence
Console logs showed:
- 98 recipes indexed in search
- 0 recipes returned by DB query
- RecipeCountDisplay momentarily rendered based on early state before
hasDataLoaded
flipped
Resolution
The user demanded a real bug report comparing actual vs expected behavior, resulting in a proposed code change:
useEffect(() => {
if (query.data && !query.isLoading && !query.error) {
const hasConfirmedData = query.data.totalCount > 0 ||
(query.data.totalCount === 0 && !isTransitioning);
if (hasConfirmedData) {
setHasDataLoaded(true);
setIsTransitioning(false);
}
}
}, [query.data, query.isLoading, query.error, isTransitioning]);
Takeaway
This race condition would have gone unnoticed or been blamed on “React weirdness” in most workflows. But the interrogation loop forced the model to:
- Compare actual vs expected behavior
- Pinpoint code-level timing mismatch
- Propose a KISS-compliant fix using gating logic
This wasn’t prompt-as-magic. It was prompt-as-accountability.
✅ Outcome and Lessons Learned
While the final build surfaced some minor integration bugs, the full protocol produced better outcomes than ordinary prompting ever could have. Had the user simply asked “Can you fix filtering?”, the AI might have generated a new dropdown component, guessed at the schema, or issued a nonfunctional patch. Instead:
- Every assumption was exposed and corrected
- Every migration was validated against real JSON structure
- Every filter now queries live data, not constants
- And the system architecture improved with real DRY/SOLID/KISS guardrails
This process, though longer, saved days of debugging and delivered a working, durable solution.
Lovable ultimately implemented a working metadata extraction service, population routine, and dynamic filter system. The fix adhered to the exact JSON structure defined in the recipe prompt schema and mapped it into a queryable SQL layer with indexes. The new system:
- Replaced hardcoded filter values with live
DISTINCT
queries from the DB - Normalized time, cuisine, and difficulty values for consistency
- Respected DRY by consolidating logic into
RecipeFilterOptionsService
- Followed SOLID by creating dedicated extractors and repositories
- Passed KISS review: clear responsibilities, no overdesign, minimal fallback logic
Without the forced loop of confession, bug reporting, plan validation, and enforced review, the AI would have hallucinated that the migration had been completed. It didn’t.
The system now works because the protocol worked.
That breakdown didn’t resolve itself. The following protocol emerged directly from the case above, and was the only reason the system converged on a correct, working solution. It required structured interrogation. Here's the protocol that made it possible:
🛠️ The Lovable Interrogation Loop v1.0
Each step in this loop exists to counter specific, recurring failure modes in AI-assisted development. It’s not just a process—it’s a defense system:
- Step 1 prevents blind trust in confident-sounding output.
- Step 2 forces grounding in source code instead of pattern-based inference.
- Step 3 creates accountability and clarity by forcing the AI to describe what's broken in human terms.
- Step 4 blocks hallucinated "fixes" by requiring a scoped, logical plan.
- Step 5 controls generation boundaries—no plan, no code.
- Step 6 creates a feedback loop using proven engineering principles, catching structural debt before it ships.
Together, these steps transform the relationship from passive code generation to governed, auditable collaboration.
Step 1: Call Out Assumptions
Why this matters:
Fluent answers are often hallucinated. Asking this forces the model to admit whether it inspected real files or inferred from prior completions.
What a good answer looks like:
✅ "Yes. I reviewed recipe-access-handler.ts
, focusing on how created_at
is passed into the sort logic."
What a bad answer looks like:
🚫 "Looks like it's just a sorting bug. Probably timestamp-related."
Step 2: Force It to Prove It
Why this matters:
Pattern-matching models can confidently guess. Forcing specific file references and logic inspection grounds the output.
What a good answer looks like:
✅ "I checked /functions/list-recipes/index.ts
. On line 237, created_at
from JSON is sorted instead of using file_created_at
."
What a bad answer looks like:
🚫 "The recipe list might be pulling the wrong field. Try checking the sort."
Step 3: Make It Own the Mistake
Why this matters:
This shifts the model from passive generation to analytical mode. Bug reports anchor AI in human-expectation context.
What a good answer looks like:
✅ "Expected: Recipe list sorts by real file creation time.
Actual: Sorting uses recipe JSON timestamps.
Root cause: Line 237 of list-recipes/index.ts
sorts created_at
instead of file_created_at
."
What a bad answer looks like:
🚫 "The sorting function might not be using the right data."
Step 4: Demand a Concrete Fix Plan
Why this matters:
This step prevents the AI from generating code impulsively. It requires planning before production.
What a good answer looks like:
✅ "Update getUserRecipeMappings()
in database-recipe-mapper.ts
to include file_created_at
, then update sorting logic in list-recipes/index.ts
to use it."
What a bad answer looks like:
🚫 "Just sort by the correct timestamp field instead."
Step 5: Authorize the Code
Why this matters:
This introduces a gating mechanism. The AI must follow the plan and respect constraints.
What a good answer looks like:
✅ Code strictly adheres to the approved plan with clear comments, no feature creep.
What a bad answer looks like:
🚫 Adds unrelated refactors, new sorting logic, or changes schema structure.
Step 6: Make It Review Itself
Why this matters:
This creates a final validation pass where the AI self-audits. It’s where hallucinated complexity and missed reuse surface.
What a good answer looks like:
✅ "I violated DRY by repeating filter logic. Revised into shared applySorting()
util. Confirmed alignment with REUSE and KISS."
What a bad answer looks like:
🚫 "No issues found." (When 10 duplicated sortRecipes()
implementations exist)