thisago's blog


Should I Use LLM?

Table of Contents

We used to wear clothes that held craftsmanship, we used to eat food we planted. But now we use polyester clothes and eat food packed in plastic bags. The price is cheaper, but does the drawbacks worth?

7 months ago I wrote my thoughts regarding LLMs and my stand against. I never got confident with this post, it didn't highlighted the real issues and seemed a frustration report.

And now, after using cutting-edge models from bigcorps again (mainly Opus from Anthropic) during the last 50 days I got some extra points.

The evolution of 7 months

Before previous post I was mainly using GPT-4.1 and GPT-5-mini.1 Now with Opus the change is perceptible.

Not only the models. Previously I mainly used Aider, which didn't had subagents feature. Now with Claude Code and OpenCode the capability increases:

  • Parallel subagent dispatching
  • Smarter prompt usage (AKA skills)
  • Per-subagent specialized prompts
  • MCP servers

OK, not a list of mind-blowing innovations. Most of innovation lies in the SaaS' inference system. However since frontier models are getting very capable, client-side features are working really better now. In 2024 I remember the struggle to get local llama 3.1 7b use the tools correctly.

Being honest, I like all this stuff. I tinkered a little and did some agents and skills, making a setup for reusable prompts across SKILLs/AGENTs with Org Mode and cross compatibility with both OpenCode and Claude Code. It even reply the way I want.

I have nothing to complain regarding the technology advancement, really impressive.

However there's a couple of implications (good and bad) that makes the decision of "should I use LLM?" kinda complex.

The good

Generates code fast

All languages I played with it worked pretty fluid:

  • Go
  • TypeScript
  • Python
  • Nim
  • Bash
  • Awk
  • jq
  • mongosh (JS)
  • Org Mode
  • YAML

Most of the time its output is usable. Syntax errors are really rare. What I remember was mostly in YAML, which added unquoted strings with colons:

yq <<<'text: "If quoted: OK"'
yq <<<'text: If unquoted: Syntax error' 2>&1
text: "If quoted: OK"
Error: bad file '-': yaml: mapping values are not allowed in this context

Most of my usage issues comes from a lack of details in planning, not in the model inabilities, which is something I can tweak in the prompts.

However, obvious as it sounds, it doesn't means you'll deliver features at speed it can generate.

Really cheap

Considering the real costs of self-hosting the LLM models, SaaS inference from bigcorps are ridiculously cheap.

Technical debt becomes past

That 37 TODOs you left in the codebase for the future-you can be now solved with low-quality requests:

$ rg TODO: internal/mypackage
refactor this thing

Speed means money

It's for sure a strategical tool for taking advantage in the market: prototyping apps, testing new approaches and data refining.

Most of people are using, so the strategical bonus of using is way lower than the loss of not using.

Subagents can be parallel

I sticked into this, the main agent delegate parallel subagents is awesome.

The bad

Privacy

This is the most critical point for me but was underrated in the previous post.

As someone that picks self-hosted and privacy-friendly alternatives for common bigtechs solutions, I faced a new limitation now: My computer cannot run a LLM that really replaces their solutions.

An hope is SaaS GPUs and host own model there, ensuring encryption and that usage leaves no traces.

But if self-hosted models in consumer-grade GPUs is still not enough, there's might be some privacy/security mitigations for SaaS LLM inference, ie.:

Local small language model to rewrite human text
Reduce fingerprinting and filter content.
Virtualize the agentic application
OS-level security and manageable exposition.

Excessive speed

The speed it generate complex code is absurd for humans, this brings anxiety and makes harder to think in the problem. It's like a momentum that pushes faster than you can think, so you end up by just vomiting replies back to LLM, ignoring even some typos.

Cognition

Last post I mention a developed laziness in reasoning I perceived in me. From avoiding reading the codebase until preferring to decide things with LLM. And that were GPT-4.1 times. Now I had the same problem, but intensified.

There's even some few studies regarding the cognitive harm it can bring.2 "Cognitive debt" is a term that applies well.

In other words, the cognitive load is under your learning point.3 Maybe this can be solved with added friction:

  • Some prompt that turns off "YOLO mode"
  • Write and read in a (natural) language that you're newbie.
  • Reduce chat-bot iteration and work driven to specs, writing full documentation by own.

Will indeed feel improductive.

But it depends on how much productivity is worth sacrifice.

It owns the code

When you're the author of the code, you know exactly where change. When LLM edits, your changes will first require a code review, like we always did when working with others.

With insanely fast code generation, you see as more productive to just ask LLM to do the edit.

We're adopting a no-code and no-depth-think working style, which engineers becomes users of a chat-bot.

You vibe-code or you're the bottleneck

Review the +2398 -1290 churn without getting stuck for hours or overwhelmed is not trivial.

If you vibe-code, welcome to the world of script-kiddies. If you review, you might delay the planned deliver, which "expects you to use AI".

There is indeed mitigations, large PRs never was good for reviewers, but from my experience, if you want to hold quality, you're likely to spend the same amount of time reviewing and polishing as if you had written by yourself. Unless the problem is too hard for you or you're writing boilerplate by hand.

And you might agree: Write code is more pleasing than reviewing, and easier to get the flow.

Expensive technology

Frontier LLM models aren't consumer-grade software, you can't self-host unless you're ready to spend a good amount of cash, and it won't be comparable to the cutting-edge.

The hope is the technology get more efficient, and/or powerful enough computers get more accessible.

SSD and RAM in retail

It doubled the price. Now a SSD I bought from BRL 400 is BRL 1k at Amazon BR.

Are homelabs in potential threat?

Considerations

LLMs might get restricted

The money these companies raises from investors can cease and since it requires a immense amount of computational power, the service price can raise. Or hardware can get even more expensive.

If this be true, we're in a opportunity gap to generate massive amounts of texts.

Conclusion?

The options ordered by preference:

  1. Run frontier OSS models in a GPU rent infra replacing Opus 4.8. Stick with it if loss is manageable.
  2. Consider some privacy workaround with SaaS LLM inference. If fingerprint leaks remains frequent, discard.
  3. Evaluate results and consider leaving LLMs only in local with smaller models for data fetching and aggregation.

In the end I conclude that LLM generating code you need trust is counterintuitive.

No-code never was production-grade quality.

Footnotes:

1

GPT-5 were excessive slow

3

See: Zone of Proximal Development