The Query Carries the Architecture

We focus on keeping log files local. The question you ask about those logs carries the same information — endpoint names, failure modes, stack details, incident timelines. When you send a diagnostic query to a cloud API, you send a compressed map of your production system.

Michael Shatny·June 22, 2026·7 min read

The Log Is the Obvious Concern

For years, the conversation around log security has focused on the file itself. Who has read access. Whether it travels encrypted. Whether it gets retained longer than compliance allows. Whether a third-party vendor receives it during an incident investigation. The log file is the obvious concern, and the industry has built real infrastructure around protecting it.

When cloud AI APIs arrived with the ability to reason over log content, the debate continued in the same frame. Should you send your logs to an external API? What data classification applies? What does your privacy policy say about third-party processing? The concern was still about the file — now traveling to a language model instead of a SIEM vendor.

That framing misses half the problem.

What the Diagnostic Query Actually Reveals

Consider a question a developer might ask while investigating an incident:

“Why did HikariPool-1 run out of connections right after the session-cleanup job ran, and why did that cause /api/auth to return 500?”

That question is not abstract. It contains: the name of a specific connection pool, the library it belongs to (HikariCP, Java ecosystem), the name of a scheduled job, the causal relationship between the job and the failure, the affected endpoint, and the error class. An attacker reading that question knows your stack, your job scheduler, your auth surface, and the failure mode that takes down authentication.

The question is a compressed architectural description of your system at the moment of failure. It carries more signal than a stack trace, because it includes your interpretation — the causal chain you've already assembled in your head before you asked.

Sending that query to a cloud API sends your production map to a third party. The log file you protected stays local. The knowledge it generated does not.

Voice Makes It Literal

Typed queries are somewhat filtered. You compose them. You choose what to include. The act of writing creates a small editorial distance between what you know and what you transmit.

Voice removes that distance. When you speak a question out loud — “why are the connections dropping on our auth service every time the session cleanup job runs?” — you are narrating your production environment in real time. The vocabulary is richer, less curated, and more accurate. You say the actual service name, the actual job name, the actual symptom as you're experiencing it.

A voice interface to log analysis that routes through a cloud transcription service and a cloud language model is sending a live narration of your incident to two external systems simultaneously. Most developers would not read their on-call runbook aloud on a public call. This is structurally the same exposure.

The solution is not to avoid voice. It is to keep transcription and inference on the same machine as the logs.

Three Tiers of Log Intelligence

Log analysis has always had three tiers, though the middle one only became practical recently.

grep

Pattern matching. Finds what you already know to look for. Fast, local, zero cost. Useless for causation, correlation, or anything requiring interpretation.

Cloud AI

$10–90 / 1M tokens

Sophisticated reasoning over arbitrary content. A 500MB log file costs $437–$2,625 to analyze via cloud APIs. Your query and your log leave your network.

Local LLM

The same reasoning capability running on your own hardware. Ollama made this practical. The inference gap between a 7B quantized model and a frontier API has narrowed to the point where it is irrelevant for most log analysis tasks.

The framing of local LLMs as a cost optimization is incomplete. Cost is the most visible benefit. The architectural benefit is that reasoning stays co-located with data. The place where inference happens is the place where your production knowledge lives. Moving inference to the cloud moves that knowledge boundary outward.

The Full Local Stack

When LogParseIQX added voice input, the requirement was not just that the LLM run locally. It was that no component of the pipeline touch a network boundary. That meant the speech-to-text layer had to be local too.

faster-whisper runs OpenAI's Whisper model via CTranslate2, entirely on-device. On Apple Silicon, it runs on CPU with int8 quantization — no GPU required. The base model is 150MB, downloads once, and transcribes a 6-second voice query in under two seconds. Ollama handles the LLM inference against the transcribed text. Both run on the same machine as the logs they are analyzing.

The pipeline is: microphone → faster-whisper → Ollama → terminal output. At no point does audio, transcription, log content, or the diagnostic query cross a network boundary. The voice recording of you describing a production incident stays on your machine. So does everything it produced.

The Boundary Is a Design Decision

Most infrastructure security thinking focuses on what enters and exits the network perimeter. Logs are data. Data has classification levels. Classification determines what can leave.

Diagnostic reasoning about that data is not usually classified separately. It should be. The interpretation of sensitive data — the causal chains you construct, the hypotheses you form, the questions you ask — often carries more operational intelligence than the raw data itself. A raw stack trace is noise. The question “why does this stack trace appear only when request concurrency exceeds forty?” is a finding.

Keeping reasoning local is not a paranoid position. It is a recognition that the boundary between data and knowledge about that data is worth drawing deliberately, rather than leaving it to default to wherever the cheapest inference happens to run.

The cost of running a 7B model locally on an M-series chip is zero. The engineering cost of the local stack — Ollama, a voice library, 300 lines of Python — is a weekend. The architectural cost of sending production knowledge to a cloud API on every incident is harder to quantify and impossible to reverse after the fact.

LogParseIQX is open source at github.com/semanticintent/logparseiqx. It installs with pip install logparseiqx. Voice input requires pip install logparseiqx[voice].