The Antidote to Code Slop

How Determinism-in-the-loop saves your tokens, and your sanity.

Chris Arter Jun 22, 2026 June 22, 2026 6 min read Updated June 26, 2026

There is a growing horde of tools, skills, posts, tips, and tweets all trying to solve the same unsolved problem in agentic coding (or vibe coding). Codebases eventually devolve into a soup (see Chet from Weird Science).

This isn't just a vibe. GitClear analyzed 211 million changed lines of code from 2020 to 2024. Refactoring ("moved" code) fell from 25% of changes to under 10%, while copy/pasted lines climbed and duplicate blocks multiplied. Their read: AI adds code well and reuses it poorly. That gap, measured across a decade of commits, is the soup quantified.

I am a big fan of harness designs that introduce sound engineering practices into the agent workflow. Non-deterministic LLM guardrails don't work reliably on their own, though. Below is perhaps the most famous quality prompt gate.

No mistakes

The missing piece is a deterministic counterweight to the LLM. It is the Yin to its Yang.

I personally have called this Determinism-in-the-loop. I am not claiming to originate this term, just what my lizard brain arrived at (yes, my brain).

diagram of agentic hook gate from static scripts

The deterministic half matters for a concrete reason. LLMs catch their own slop poorly and fix it well once something points at it. A 2024 study found models detect errors and vulnerabilities in their own output poorly, then fix them readily when you hand them a failing test or a static analysis report. The deterministic gate supplies the one thing the LLM can't generate for itself: a reliable signal about what's actually wrong.

This flow gives you guardrails. They set where the LLM is free to do its best work and which standards hold no matter what.

In Claude Code, the entry points are lifecycle hooks like PreToolUse and PostToolUse. The idea matters more than the mechanism: LLMs are one piece of the quality puzzle. You need opinion and static gating to free the agent for the work it does best.

Three Pillars of Static Code Quality

Architecture
Linting
Static Analysis

Architecture

This is the foundational enforcement layer for the whole project. It keeps your application in the same repeatable, expected shape for existing features and new ones. Consistent architecture makes your application structure far more predictable, which makes life easier for the agent. It also enforces patterns that keep code testable. For our example Typescript stack, I reach for dependency-cruiser .

Architecture enforcement vehicles vary a lot by stack. In the Laravel ecosystem, the test suite handles it through Pest's architecture assertions .

Here's an example of what you can enforce in Pest:

arch()
    ->expect('App')
    ->toUseStrictTypes()
    ->not->toUse(['die', 'dd', 'dump']);

Linting

This is another lever for making code predictable and your agent more efficient. A linter picks up your coding style and pushes output toward your flavor. When it doesn't, it enforces the style on file write. The Typescript ecosystem has no shortage of linting tools, and lately I've been using BiomeJS .

Static Analysis

Static analysis is a massive lever for code quality. I'd rank it as the top ROI tool for effort spent versus quality gained, next to tests. I lean on BiomeJS for general Typescript static analysis, though it has gaps I sometimes need to fill.

I often add complexity rules in Biome too, such as noExcessiveCognitiveComplexity. This one goes past taste. Cognitive Complexity is the first code-based metric empirically validated to track how long humans take to understand a function, across a meta-analysis of ~24,000 evaluations. If a person can't hold it in their head, the next agent can't either.

Here's an example from the docs :

function tooComplex() {
    for (let x = 0; x < 10; x++) {
        for (let y = 0; y < 10; y++) {
            for (let z = 0; z < 10; z++) {
                if (x % 2 === 0) {
                    if (y % 2 === 0) {
                        console.log(x > y ? `${x} > ${y}` : `${y} > ${x}`);
                    }
                }
            }
        }
    }
}

Which throws:

code-block.js:1:10 lint/complexity/noExcessiveCognitiveComplexity ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  ℹ Excessive complexity of 21 detected (max: 15).

  > 1 │ function tooComplex() {
      │          ^^^^^^^^^^
    2 │     for (let x = 0; x < 10; x++) {
    3 │         for (let y = 0; y < 10; y++) {

  ℹ Please refactor this function to reduce its complexity score from 21 to the max allowed complexity 15.

Bonus: Tests

I left tests out of the three main pillars on purpose. This post sits at the per-code-edit level, and running your unit suite on every file edit is inefficient for most setups. For tests, I enforce at commit time with lefthook . It runs when the agent tries to commit its work.

When the agent commits, I enforce:

Passing unit & feature tests
The same static checks as above, across the whole repository
A test-coverage ratchet (coverage % can't drop)

Once you explore the static tooling we've had for years, you start to see what you can offload from your agent. The more you offload, the better it gets at actually writing code.

Why offloading works: clear the desk

You might read this section as the consolation prize: static gates can't catch everything, so at least they handle the easy stuff. The opposite is true.

An LLM has a fixed attention budget. Every token it spends re-deriving "don't nest this five deep" or "don't duplicate this block" is a token it can't spend on the actual problem. A linter that owns those rules takes them off the agent's plate completely. What's left is the work no linter can touch: intent, naming, tradeoffs, whether the change even fits the architecture.

So the split is clean. Deterministic tools own anything with a fixed rule. The agent owns anything that needs judgment. The gate clears the agent's desk, and the agent spends its judgment where judgment is the only thing that works.

This also explains the GitClear numbers. Duplication and complexity creep are the decidable problems an unconstrained LLM never spends attention on, because nothing forces them out of its hands. Hand those problems to a machine that solves them deterministically, and the slop never gets written.

What should we gate?

Understanding what to shape with prompts vs what we should enforce with determinism can sometimes be hard to pin down. This is my thought framework:

What does a “clean” codebase look like?
Of those traits, how much can be reasonably enforced with static tools?
What are the ambiguous parts? (This is where we fill in the gap with prompts & skills)

Generally my goal is always to offload as much concrete quality gating onto deterministic static checks as I can.

The big picture

Most of these tools pre-date AI. Almost nothing I'm recommending is new. These are the quality gates we've used to hold a software quality floor for decades. The best quality harnesses lean on what already makes projects nice for people and optimize that access for agents.

One note: GitClear shows code quality dropping alongside AI adoption, not proof AI caused it. You don't need causation to act on it. The gates earned their place before agents existed.

If you want to try this without wiring up everything by hand, I built a package called Bully for Claude Code. It ships a config file for enforcing rules on the PostToolUse hook: https://github.com/dynamik-dev/bully

#AI #Claude Code