Responsible AI Usage: Don't Burn Tokens on Problems Scripts Already Solved

Category:blog

Every LLM call has a cost. Not just the dollar on your invoice — electricity, water for datacenter cooling, carbon, chip wear, and somebody's attention to verify the output afterward. Most of that cost is invisible to the person making the request, which is exactly why it gets wasted so casually.

Being responsible with AI isn't about avoiding it. It's about using it where it actually earns its keep, and refusing to use it where a boring, deterministic solution would do the same job for a fraction of the footprint.

The Waste Is Real

A single GPT-class inference isn't free the way a local function call is free. Training the models took megawatt-hours. Serving them takes datacenters full of accelerators running hot around the clock. When you fire off a prompt to solve a problem that a for loop could have solved, you're pulling on that whole infrastructure to get an answer you could have computed locally in microseconds.

Multiply that by the number of engineers now reaching for an LLM as their first instinct, and the aggregate waste gets ugly fast. It's the same pattern as leaving lights on in an empty room — individually trivial, collectively substantial, and entirely avoidable with a small amount of discipline.

The responsible move is to treat model calls as a finite resource. Not because you'll personally run out, but because the whole system has real costs that show up somewhere, even if not on your bill.

The Determinism Test

Before reaching for an AI tool, ask whether the task meets these criteria:

The inputs are well-defined.
The transformation rules are knowable.
The output should be the same every time given the same input.
You will run this more than once.

If you answered yes to all four, you don't have an AI problem. You have a scripting problem. Write the script. Test it. Commit it. Move on — and save the tokens for work that actually needs them.

If you answered no to any of them — especially the third — then you might be in genuine LLM territory. Things like summarizing a long thread, drafting an email in a specific tone, classifying ambiguous user feedback, or pulling intent from messy free-text. Tasks where "close enough" is the actual goal and exact reproducibility isn't.

Why Determinism Is the Dividing Line

Every LLM call is a roll of the dice. Even with temperature set to zero, even with the same prompt, you can get drift over time as the model updates, as the context window gets crowded, as some upstream tokenizer changes. For tasks where the right answer is knowable, that variance isn't a feature — it's a defect generator, and it's expensive to catch.

Consider renaming a thousand files according to a pattern. You could:

Write a script. It runs in milliseconds on your laptop. It uses almost no energy. It will produce the same result on every machine, every time, forever.
Ask an LLM to do it. It costs tokens. It pulls on a datacenter. It runs slower. It might miss one. It might rename two files the same way. You won't know unless you check, and the act of checking defeats the purpose.

The script wins on every axis that matters: cost, speed, reliability, and environmental footprint. It isn't close.

The Hidden Cost of Tokens

People underestimate how expensive non-determinism is, because the per-token cost is small and the per-request latency is small. The real cost is the tax you pay every time you have to verify the output.

A deterministic script needs to be verified once — when you write it. After that, every run is trustworthy by construction, and the energy cost per run approaches zero.

A non-deterministic process needs to be verified every single time. You're either spot-checking the output (cheap but risky) or building an automated verification layer (expensive and ironic — you've now written the script you were trying to avoid writing, just to validate the AI that replaced it). Either way, you're spending more compute, more human attention, and more energy than the deterministic path would have required.

If the LLM gets it right 99% of the time, you might think that's good enough. But if you run it ten thousand times, that's a hundred wrong outcomes you have to find and fix — plus ten thousand model calls that you didn't need to make. A script that gets it right 100% of the time finds zero errors and burns almost nothing. The math is brutal at scale, and the planet is downstream of that math.

Where AI Genuinely Earns Its Keep

This isn't an anti-AI piece. There are entire categories of work where deterministic code is the wrong tool, and using an LLM is the responsible choice because nothing else will do the job:

Natural language understanding. Extracting intent from a customer support ticket. Summarizing a meeting transcript. Translating tone.
Fuzzy classification. Is this PR description "small," "medium," or "large"? Is this image "professional" or "casual"? Humans disagree on the answer; a deterministic script can't help you here.
Generative tasks where variation is desirable. Writing first drafts, brainstorming names, suggesting alternative phrasings.
Code synthesis from intent. Asking an LLM to write the script for you, which you then commit and run deterministically forever after. This is the best of both worlds: one model call produces a reusable artifact that replaces thousands of future model calls.

The pattern in all of these: the input isn't fully structured, the output has no single correct answer, and the task is exercising judgment rather than mechanical transformation. That's the kind of work where tokens are well spent.

The Code-Generation Loophole

The most responsible use of AI for deterministic problems is to use it once — to generate the script — and then never use it again for that task. A single prompt that produces a rename-files.sh you commit to your repo is orders of magnitude cheaper than running the rename through an LLM every time you need it.

This is the move I'd encourage most: let the model do the creative lift of writing the solution, then let deterministic infrastructure carry the load of executing it. You get leverage from AI without paying for AI on every run.

A Concrete Example: Commit Message Validation

Here's a real pattern I've seen teams get wrong. The requirement: every commit message has to reference a Jira ticket like PROJ-1234. Someone decides the enforcement mechanism should be an instruction in CLAUDE.md telling the agent to check for a ticket reference before committing.

That's a script problem wearing a chatbot costume. The rule is knowable, the input is a string, and the answer is a boolean. Every commit on every branch by every developer does not need a model call to answer "does this string match a regex?"

The responsible version is a commit-msg hook, maybe fifteen lines of shell:

#!/usr/bin/env bash
# .git/hooks/commit-msg  (or via husky, lefthook, pre-commit, etc.)

commit_msg_file="$1"
commit_msg=$(cat "$commit_msg_file")

# Match PROJ-1234, FOO-42, etc.
if ! echo "$commit_msg" | grep -qE '[A-Z]+-[0-9]+'; then
  echo "✖ Commit message must reference a Jira ticket (e.g. PROJ-1234)."
  echo "  Your message: $commit_msg"
  exit 1
fi

Write it once. Commit it to the repo. It runs locally on every commit for every developer, costs nothing, never drifts, and never forgets. The same check in CLAUDE.md would fire an LLM call on every commit the agent makes, and do nothing at all for the humans on the team committing from their terminal. Same rule, one version is infrastructure and the other is vibes.

The broader lesson: any rule you'd write in natural language and expect an agent to follow is usually better expressed as code that the machine can't forget. CLAUDE.md is for context and taste — things that genuinely require judgment. It is not a linter, a test runner, or a pre-commit hook. Don't ask an LLM to be any of those things when the real tool is right there and free.

Tests Follow the Same Rule

The same logic applies to testing. If your system is deterministic, your tests should be deterministic too. Snapshot tests, fixture-based assertions, table-driven tests — these are reliable because the system under test is reliable, and the test harness reflects that.

When teams reach for LLM-as-judge to evaluate the output of a deterministic system, they're often patching over a different problem: their assertions are too loose or their fixtures are too brittle. Tightening the assertions is almost always the better fix — cheaper to run, faster to debug, and uses none of the shared infrastructure that LLM calls depend on.

LLM-as-judge has a real role, but it's evaluating the output of non-deterministic systems — judging the quality of generated text, scoring model outputs against rubrics, comparing two LLM responses. Use it where determinism doesn't exist on either side. Don't use it as a crutch for flaky deterministic tests.

A Practical Heuristic

When you catch yourself reaching for an AI tool to solve a problem, pause and ask:

If I gave this exact problem to ten different engineers, would they all produce the same answer?

If yes, you want a script. The answer is knowable, the solution is shareable, and every run after the first is essentially free. That's the responsible choice.

If no, an LLM might genuinely help — because the variance isn't a bug, it's the whole point, and there's no cheaper tool that can do what the model does.

Be The Person Who Thinks Before Prompting

The best engineers I know default to deterministic tools and only reach for AI when the problem actually requires judgment. They aren't anti-AI — several of them build AI products for a living. They just refuse to pay token costs, energy costs, and verification costs for outcomes that should have been free and certain.

That's what responsible AI usage looks like in practice. Not a ban, not a guilt trip — just a small mental checkpoint before each prompt. Is this a script problem wearing a chatbot costume? If so, write the script. Save the model calls for the work that actually needs them.

The quiet professionalism of knowing when not to use the shiny tool is, increasingly, the thing that separates thoughtful engineers from wasteful ones. Everything else is a shell script waiting to be written.