research

The work that justifies the practice.

Every engagement we ship is downstream of a research thesis we've tested in production. This page is where those theses live, in the order we arrived at them. Some are frameworks. Some are tools. Some are open problems we haven't closed yet.

None of this was commissioned. None of it was published for citation. It exists because the work required it, and it remains here because the next engagement will require it again.

01 · framework
active

The Cognitive Efficiency Framework

The industry is scaling models. We are compressing what the models need to read.

LLMs are treated as tools. They are not. They are energy — a generative force that must be channeled, not invoked. The industry's response to unreliable output has been to expand context windows, increase training data, and hope. Our response has been to invert the question: what if the model needed less, not more?

The framework defines a three-tier compression model — working context, structural context, and canonical reference — paired with deterministic routing (the Arbiter), tiered caching, and front-of-house / back-of-house model selection. A production deployment of this architecture against a mid-market data agent achieved approximately 85% cost reduction against equivalent Cortex Agent workflows, without measurable quality loss.

llm.architecture cost.reduction semantic.caching production.deployed
02 · system
deployed

The Codex — A Structural Registry

Documentation describes what humans should do. A structural registry defines what the system does do.

The Codex is a runtime-queryable registry of every unit in a system — functions, components, actions, tools, chains, guardrails, module boundaries. It is not documentation. It is the canonical structural definition of the system, queried by the system itself at both build time and runtime.

A production deployment against a custom ERP currently indexes 1,235 source artifacts, 607 tools, 224 user-facing actions, 190 operation chains, and 103 connections, enforced by 21 self-auditing taxonomy dimensions. The registry generates its own violations continuously — currently tracking 1,034 structural observations against the active codebase. The system is its own auditor.

structural.registry runtime.metadata self.auditing production.deployed
03 · platform
active

The Agent Workbench

Discipline over intelligence. Control loops, not pipelines.

A multi-agent orchestration platform built on the premise that smarter models still fail in ambiguous situations. Every agent is a Seed — a disciplined eight-step loop (orient, intend, act, verify, align, absorb, deposit, advance) with mandatory state persistence, bounded iteration, and mechanical verification at every transition.

The workbench sits above the Seeds as the orchestration layer: which agents exist, how do they hand off, where does the human intervene, what gets escalated. The philosophy: agentic scaffolding makes situations unambiguous rather than relying on AI capabilities. Phases 1–19 complete. Phase 20 (Command Center) in active build. Phase 21 (Event-Driven North Star) designed.

multi.agent control.loops orchestration active.build
04 · deployed
deployed

The Data Agent

An NL-to-SQL agent with production-grade cost discipline.

A Slack-native data agent running against a medallion Snowflake warehouse. Handles natural-language questions, routes through the cognitive efficiency pipeline for cost optimization, executes generated SQL against semantic views, formats results in Block Kit, and escalates failures to admins. Includes scheduled report generation with natural-language schedule parsing, thread-based multi-turn conversations, concurrency guards, and Langfuse + Prometheus observability.

Materially more capable than Snowflake's native Cortex Agent offering — the cost optimization layer alone is the differentiator. In production use since Q4 2025 at a $100M multi-brand holding company as the primary query interface for leadership.

nl.to.sql slack.native scheduled.reports production.deployed
05 · thesis
active

Discipline Over Intelligence

Smarter AI models still fail in ambiguous situations. Architecture does not.

The founding thesis underneath every system we ship. The industry's dominant bet — larger models, more training data, longer context windows — optimizes for capability. Our bet is that capability alone is insufficient because failure modes in agentic systems are usually structural, not cognitive. An agent that cannot be wrong by construction beats an agent that is merely smart.

The practical expression is a set of non-negotiable system properties: bounded iteration, verified transitions, deterministic routing before LLM escalation, mechanical enforcement of module boundaries, and guardrails that cannot be overridden by the agent itself. The system does not trust its own intelligence. It trusts its architecture.

thesis agent.design failure.modes
06 · open problem
exploratory

Gateway Numbers — a reframing of Collatz

A new formulation of the 3n+1 problem that shifts which tools apply.

An independent research direction on the Collatz conjecture. The "gateway" construction identifies odd numbers of the form (4^m − 1)/3 whose Collatz paths terminate exactly on powers of 2. Computational experiments demonstrate that gateway 5 alone catches approximately 94% of all orbits in tested ranges, suggesting a structural lens through which the dynamics may be more tractable.

The work is not a proof. It is a reformulation — a change of coordinates that moves the real difficulty into a different mathematical frame, where different tools become relevant. The honest status: potentially publishable as an equivalent formulation, not as a solution. Active.

number.theory collatz open.problem
why this exists

Every engagement we ship is the compiled form of an idea we had to think through first.

The implementation arm of this firm is priced, scoped, and delivered. The research arm is not. It is where we prove to ourselves, before we propose to a client, that the architecture holds.

We publish selectively. Some ideas are ready. Some are not. Some we deliberately hold as internal tooling because the edge disappears the moment they're commoditized. This page will grow slowly and on our schedule, and that is the point.

If your problem is adjacent to one of these, we should talk.

Most of our engagements start with a client whose operational problem turned out to require one of the frameworks above. If something here resonates, apply. We respond to every application within 48 hours.

Apply for Engagement