Introducing the MAVENS Stack

Recent advances in free and open-source (FOSS) software distribution have made it possible to build interesting things that would have been significantly harder even a year ago, using a set of FOSS subcomponents that are greater together than the sum of their respective parts:

model-agnostic open agent harnesses (e.g., OpenCode, pi);
agent memory extensions (e.g., beads);
portable skill files and prompt assets (e.g., agentskills);
versioning and filesystem-backed workspace organization (e.g., Git)
intermediate text representations with rendering and export (e.g., TeX, docx, html, pdf, png, etc.);
terminal-native workflows (e.g., tmux, vim, emacs, etc.).

Taken individually, these are just software methodologies/packages. However, wired together, they transform into a powerful software substrate for knowledge work that is unlike anything previously available.¹

As a shorthand, I’m calling the pattern used to build this system the MAVENS (Memory, Agent, Versioned, Exportable, ruNtime, Skills) stack.² The MAVENS stack is a new paradigm in software development, and it represents a clear path to unlocking value in knowledge work not only in patent practice, but across many other domains.

maven (noun): expert or person of skill; from Yiddish meyvn, from Hebrew mevin (מבין), “one who understands.”

One of the biggest wins of the MAVENS stack, and one that cannot be emphasized enough, is that the FOSS constituent parts are community-maintained, which allows the knowledge worker (e.g., patent practitioner) to focus on improving the things that matter to the knowledge worker (the runtime, skills and their outputs). The FOSS parts are essential, but they are like the machinist’s caliper or the electrician’s multimeter–generic tools that Just Work.

The next biggest win is that the open agent harness+memory can be used separately to improve the MAVENS stack itself, leading to a self-improvement feedback loop. For example, a client recently gave me a bucket of TeX as a disclosure, including pages of formulas that would have taken weeks to transcribe by hand, and which would have been error-prone in any event. I was able to extend my MAVENS stack to include a TeX-archive import feature, and map it onto the runtime. This code is completely reusable.

In another case, a colleague sent me a set of figures that did not have reference numerals. I quickly extended my MAVENS stack to add them in programmatically based on the existing figure numbering and, optionally, the content of an associated specification. This was done in minutes, and is a tool that can be used in perpetuity. There are many much more exotic examples of this type of work, all represented using the MAVENS software architecture.

The step change here is not the ability to work with TeX from inventors to preserve math fidelity, or do image generation/ processing, or any of the other myriad things the software can do. Those are just examples. The leap in capabilities comes from the combination of the components, that enable the MAVENS stack itself to be modified and improved, while it is being used.

Confidentiality and Privacy and Governance, Oh my!

A brief digression to discuss model access, where the devil is in the details. For example, in the MS Azure model menagerie, there is an essential distinction between models sold directly by Azure (“Azure Direct models”) and models that are surfaced through Azure but are still from Partners and Community. The former are the cleaner case: Microsoft hosts them in Microsoft’s Azure environment, under Microsoft’s product and data-protection commitments, and Microsoft says prompts and completions are not shared with other customers or with the underlying model provider and are not used to train foundation models absent permission or instruction. Partner and Community models have their own ToS.

For example, whereas OpenAI models are Azure Direct, Anthropic’s models (Claude) are not, which means that the governing privacy posture is different, and Anthropic’s privacy policy and commercial terms are part of the analysis. As such, before using the MAVENS stack, it is essential to understand these terms in all of their excruciating detail.

Deployment characteristics also drive data location, and here the official terms matter too. Azure speaks in terms of region, geography, and DataZone. For Azure Direct models, data stored at rest remains in the customer-designated Azure geography, but the location of processing for prompts and responses depends on deployment type. A Standard or regional deployment processes in the deployment region. A Global deployment may process prompts and responses in any geography where the relevant model is deployed. A DataZone deployment may process prompts and responses anywhere within the Microsoft-defined data zone, today the United States or the European Union. So if you stand up a US DataZone deployment, processing may occur anywhere in the United States; if you stand one up in an EU member state, processing may occur in that or another EU member state. Thus, when using the MAVENS stack, it is essential to become intimately familiar with model categories, deployment types, geography of data at rest, and where inferencing may occur.

Deconstructing the MAVENS stack

I will now describe the technical underpinnings of the MAVENS stack (in no particular order, and with AI slop images for maximum engagement).

1. the Agent harness

An open agent harness allows the practitioner to interact with models in an agentic way. It’s the practitioner’s interface into the model provider(s) (which can be any, including the latest and greatest from OpenAI, Anthropic, Grok or others) and their respective model(s). The agent harness I’m primarily using currently, OpenCode, can be accessed in a Terminal User Interface (TUI, yay), CLI, via the web, desktop (beta, pretty rough), and via an IDE like Visual Studio.

Thus, the MAVENS stack may require a change in how you work. I will argue this is for the better–programmers have known for a long time that text, pipes and composability beat monolithic GUIs for software work, hands down. And all knowledge work is now software work.

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

– The Bell System Technical Journal. Bell Laboratories. M. D. McIlroy, E. N. Pinson, and B. A. Tague. “Unix Time-Sharing System Forward”. 1978. 57 (6, part 2). p. 1902

Thus, it stands to reason that knowledge workers should embrace UNIX principles for knowledge work, for maximum effectiveness.

Harness engineering and harness training

I already mentioned that coding agents can be used to improve the agent harness and other aspects of the MAVENS stack, to extend functionality on an as-needed basis, and that those customizations/extensions compound over time. Some recent research in this area hints at what will be coming down the pike soon:

Changing the harness around a fixed large language model (LLM) can produce a 6× performance gap on the same benchmark [46]. The harness–the code that determines what to store, retrieve, and show to the model–often matters as much as the model itself.

This paper showed automated harness engineering produces better classifications, faster, in legal and legal adjacent domains – including USPTO molecule-reactant datasets and criminal charging from case descriptions – than hand-curated techniques. Meta-Harness significantly outperformed LawBench, using far fewer tokens.

This was accomplished via harness training:

At a high level, [Meta-Harness] repeatedly proposes, evaluates, and logs new harnesses. […] [B]y leaving diagnosis and edit decisions to the proposer rather than hard-coding search heuristics, Meta-Harness can improve automatically as coding agents become more capable.

For now, I’m left to wonder, what would it take for a patent practitioner, or another knowledge worker, to learn to trust a self-improving orchestration layer? More on this in a future post, perhaps.³

Clawed Open

While I was doing final edits on this post, as if on cue, the interwebs began to vibrate with the news that the source code for the Claude Code harness had escaped into the wild.

The code was quickly ported via purported clean room engineering. The claw-code developer wrote the following:

At 4 AM on March 31, 2026, I woke up to my phone blowing up with notifications. The Claude Code source had been exposed, and the entire dev community was in a frenzy. My girlfriend in Korea was genuinely worried I might face legal action from Anthropic just for having the code on my machine — so I did what any engineer would do under pressure: I sat down, ported the core features to Python from scratch, and pushed it before the sun came up.

The whole thing was orchestrated end-to-end using oh-my-codex (OmX) by @bellman_ych — a workflow layer built on top of OpenAI’s Codex (@OpenAIDevs). I used $team mode for parallel code review and $ralph mode for persistent execution loops with architect-level verification. The entire porting session — from reading the original harness structure to producing a working Python tree with tests — was driven through OmX orchestration.

The result is a clean-room Python rewrite that captures the architectural patterns of Claude Code’s agent harness without copying any proprietary source. I’m now actively collaborating with @bellman_ych — the creator of OmX himself — to push this further. The basic Python foundation is already in place and functional, but we’re just getting started. Stay tuned — a much more capable version is on the way.

Without taking a position as to the legitimacy of this activity, a few things seem clear.

First, Anthropic needs better InfoSec. They seem to have physical security buttoned up:

Even this blankness is doled out grudgingly: all but two of the ten floors that the company occupies are off limits to outsiders. Access to the dark heart of the models is limited even further. Any unwitting move across the wrong transom, I quickly discovered, is instantly neutralized by sentinels in black. When I first visited, this past May, I was whisked to the tenth floor, where an airy, Scandinavian-style café is technically outside the cordon sanitaire. Even there, I was chaperoned to the bathroom. – https://www.newyorker.com/magazine/2026/02/16/what-is-claude-anthropic-doesnt-know-either

Second, the claw-code repository is of a piece with the types of LLM-based copying risks people have warned about for a while now in the IP community, which I wrote about in reference to (somewhat ironically) the Anthropic C compiler project, and which have, in the interim, cropped up in other contexts.

Third, setting aside the morality of the Claude clones, for practical purposes, the horse is out of the barn, and the digital commons will be the immediate beneficiary. Everyone can now read the Claude Harness, its code and its prompts to develop their own coding agents. They can do so in conjunction with any models. This leak seems likely to include trade secret information that, until yesterday, appeared to have given Claude Code a non-insignificant edge over competitors like OpenAI Codex.⁴

Having said that, some of the technical details about the inner workings of Claude Code sound kind of obvious in hindsight. A more tinfoil-hat interpretation is that Anthropic may not be terribly unhappy to have the wider FOSS world iterate on parts of the harness. But that is speculation.

Porcelain, not plumbing

One of the practical upsides of using an agent harness is that it abstracts away all of the LLM drudgery (counting tokens, counting costs, authenticating with providers, keeping a current list of models and their respective capabilities/parameters, etc.). Anyone who has spent any time coding against an LLM knows this bean counting starts to feel like medieval torture after a while, especially due to constant drift and churn among providers and their model offerings. The harness can simply ingest data from a FOSS API (e.g., models.dev) that the community keeps up to date. On its own, this is a huge win. All of the API programming, which can change in sudden and highly disruptive ways, is a thing of the past. A lot of boilerplate programming is no longer necessitated, given the better abstractions of the MAVENS stack.

Open agent harnesses also have a flexible model layer. A useful knowledge work system should be able to work with whichever models make sense in the moment: local models where they are good enough or operationally attractive, frontier hosted systems where stronger performance is worth it. Ad hoc or case-by-case selection should be available when it’s called for or as input modalities dictate. It’s nice to be able to ask OpenAI and Anthropic the same question, back-to-back, and compare responses.⁵

One potential wrinkle in the MAVENS stack is lackluster web and desktop support. While usable, they are not as polished. But if you’ve read this far, there’s a good chance a terminal interface not only might not scare you, but could be just the thing you’re looking for.

2. agent Memory

Screaming at goldfish

An excerpt from Steve Yegge that might resonate with you if you’ve tried using an agentic model to do anything remotely complex without agent memory (vibe coding, coding with a straight face, or otherwise):

Your agent enthusiastically begins working on phase 1 (out of 6). You go through the whole canonical multi-step vibe coding workflow loop together: design the solution, review the design, write code, review code, make fixes, write more tests, review, cleanup, rebase, update plans, git operations. (All stuff in our book!)

So far, the agent is happy, and you are happy. Everyone is so, so happy.

The agent eventually finishes phases 1 and 2 (out of 6), with you repeating this dev loop many times.

It’s working, but during this time, several compactions/restarts happen, which resets the agent’s memory.

By the start of phase 3 (out of 6), the AI has mostly forgotten where it came from. It wakes up, plops in your video cassette, and reads about phase 3. It then declares, “Oh wow, this is a big project, I’m going to break it into five phases and create a markdown plan.” Which should sound eerily familiar.

Then it begins working on phase 1 (out of 5) of phase 3 (out of 6). Except it just calls it “phase 1”, with no mention of the six outer phases it was just working on.

You start to sweat.

The agent finishes the first two phases 1 and 2 (out of 5), which takes many more compactions, and it has been displaying signs of increasing dementia along the way.

When the agent arrives at phase 3 (out of 5) of phase 3 (out of 6), it has gone full-blown bugshit amnesiac, and it announces triumphantly:

“Congratulations, the system is DONE! 🎉 Let’s start manual testing! 🚀”

And you’re like, “Woah… w-what? What… about… all the other phases? The inner ones and outer ones? I distinctly remember a lot more phases!”

And they deadass look you in the eye and say, “What phases?”

And you bite back a scream, and go look in their markdown plan, only to discover in horror that they have been creating a dozen 6-phase plans a day for the past three weeks since your last Great Markdown Purge, and you have literally hundreds of new markdown files, all with low-context titles like cleanup-tech-debt-plan-phase-4.md — all partly implemented, all partly obsolete, all 100% useless.

C’mon, raise your hand if this has happened to you. ✋ If you’re not sure, you might want to go look in your plans/ directory. Surprise! 🎉

After six months of this situation happening like twenty times a day, if you’re at all like me, you start thinking, I need a Plan Manager.

If you’ve tried using markdown-based plan files to guide an agent, you’ve probably choked back that same sweaty sob/scream, and you might know that agent memory is the way out.

A tool like beads helps to alleviate the “demented agent” issue by giving the model “a persistent, structured memory” that it can use “to handle long-horizon tasks without losing context”.⁶

However, for patent practitioners, the approximate parallel is a model context resetting/compacting, while in the midst of drafting a 68-page patent application, with a context window at 39% (304.2K tokens). Regardless of the knowledge domain, the ability to level set the context window predictably and somewhat losslessly becomes Very Important.

Previously, you’d need to attempt to store all of the things that the agent needs to know in flat text files and find a way to reinject that context. With memory, you can just throw all of your implementation tasks and context into hierarchical issues à la beads, and kill the model session and start a fresh one if the context gets too wild and woolly. When you come back, at 0% context window consumed, the model still has (mostly) everything it needs to pick up where it left off. This is simultaneously a faster, cheaper and less error-prone way to use agents.⁷

Editing modalities - Agents and Subagents

Some agent harnesses have the ability to use editing modalities, or agent modes. Interestingly, Pi does not expose separate modes in quite the same way. OpenCode ships with two: Plan and Build, which, respectively, (essentially) provide read-only and write access to the filesystem including use of tools. A typical patent drafting workflow looks like this:

Plan complex features/functionality, optionally based on existing language in plan mode. This is all done in memory with iterative model turns.
Once the plan looks good, save it as persistent hierarchical tasks using beads.
Switch to build mode and begin iteratively carrying out the plan. This involves generating new files or editing existing ones.

OpenCode ships with two sub-agents: General and Explore, which respectively, can research complex questions and execute multi-step tasks (including with tools); and quickly search files (think Grep, with extras).

The agent/subagent layer is where the MAVENS stack starts to feel like a real knowledge work environment. In OrdinarySkill.ai, I modify the default agent set with additional agents and subagents that are purpose-built to do patent-specific tasks. It works amazingly well–if you have used a coding agent, it’s like that–but for patent work.

The OpenCode API documentation is a good read if you are interested in more details about OpenCode’s agent implementation.

3. Versioning

Git is old hat for most software developers, and software patent practitioners. Suffice it to say that keeping the artifacts of patent drafting in a distributed VCS is very comforting, for all of the reasons it is in software development.

4. Export

Everything up to this point has been a reversion to the text-first, composable Unix mode: a terminal, a good editor, plain text files, and version control are a very natural environment for agentic work. It’s a shame that reality must intrude, in the form of document production.

For patent work, Word is the lingua franca. The USPTO effectively penalizes applicants for filing PDFs, which is only a slightly nuttier decision than the one to standardize the entire filing system around the dumpster fire of a file format that is docx.

Regardless, Word is the craggy remote airstrip where 99% of patent deliverables must land.

That means the drafting environment must be able to move cleanly from structured source into picture-perfect MS Word output. TeX, Pandoc, and related tooling are not glamorous, but they are powerful building blocks for getting there.

However, as much as I enjoy dumping on Word, Microsoft has in recent years done some things that make working on Windows not only bearable but so interoperable with Unix that there’s almost zero friction. I have nothing but love for Windows Subsystem for Linux, which allows any MAVENS stack to directly interoperate with Word and the Microsoft filesystem. The fact that WSL has Arch Linux out of the box is the cherry on top.

5. The ruNtime

The runtime includes matter state that the system can reason about.
Patent drafting is not just “write a lot of text.” It is a constrained document-building problem with dependencies everywhere: claims and specification must stay aligned, figures need to map cleanly into the detailed description, optional sections come and go, and terminology has to remain coherent over tens of pages. The ruNtime provides a stable substrate for this work that is grounded in document structure and practitioner know-how.

The runtime is where I think a lot of the durable value sits in the MAVENS stack, for patent practitioners but really for all knowledge workers who apply it. Agent harnesses will improve, models will churn, and export tooling will remain mildly cursed. A well-designed runtime gives all of those moving parts a common contract. It lets different agents, skills, and renderers operate against the same data without each inventing its own private version of reality. For me, this is the real promise: less hidden state, fewer surprises, and a workflow that gets better because the software actually knows what the work is.

6. The Skills

If the runtime is the source of truth, the skills are the reusable recipes/moves.
A skill is not just a giant prompt pasted into a text file. It is a named, portable procedure for doing a recurring piece of patent work: routing a workflow, drafting a first pass of particular text, reviewing for consistency, harmonizing terminology, generating a figure prompt, flagging possible support issues, and otherwise teeing up issues for practitioner review. The important thing is that skills can be recursively improved independently of the agent harness or model provider, with the practitioner as a guide. Skills are packages of reusable technique.

Once a useful drafting pass or review sequence exists, it can be reused across matters, sharpened over time, and composed with other skills, without collapsing into prompt soup. Instead of evaporating with individual chats, this knowledge starts compounding in an organized repository. And these tools are editable, in the tool. So after you finish a session where you correct a bunch of things, you can say something like “update my claims review skill file to include avoiding passive voice and nominalizations.” It will do it and report back. This is an incredibly powerful, and fast, way to work, where the practitioner can crystallize and preserve judgment for repeated use.

This is also one of the more interesting places for professional differentiation. While the MAVENS stack is open, the best skills can still reflect a practitioner’s own drafting habits, review heuristics, and sense of what makes an application good. Good taste is difficult to express in code, but the skill layer moves us closer.

Closing

AI software tooling has become critical to patent practice. It is not docketing software, accounting software, or some other ancillary system, removed from the legal work. An AI patent drafting stack reaches directly into the substantive legal function itself. It does not displace ultimate practitioner responsibility for what is claimed, argued, supported and filed. But it sits close enough to the heart of the work that it must be treated with the same seriousness.

At the same time, there is a merger underway between software functions and legal functions, which implies a merger of tooling. As legal work itself becomes more software-shaped, the tools built for sophisticated software work become increasingly applicable to sophisticated legal work.

What has become possible only very recently is the ability to assemble drafting environments from mature FOSS tooling, and to use them to access the best models available, while fully customizing the system for a given knowledge work domain, allowing the knowledge worker to build or shape a workflow around the actual work. For patent practice in particular, drafting has enough structure, repeatability, and document-centric gravity that this kind of environment can be very effective.

As I use agentic systems more, and particularly the MAVENS stack, I find myself running dozens of lightweight agents interactively to solve different problems concurrently. tmux allows me to run these agent sessions persistently, and no file, directory or data source is safe from the grip of my MAVENS stack. One agent tees up a first pass critique of a claim set, while another steps through an office action analysis workflow, another generates terrible images for a draft blog post. This is the promise of agentic AI–on-demand, parallel application of knowledge models to digital artifacts, wherever they may be.

In any event, I hope that you, too, always have an agent running.

Notes

There are, naturally, optional components that can be added to make the stack operator surface more appealing (e.g., nvim, tmux, WSL, etc.), or to support different workflows (e.g., web views, desktop extensions), but those are the main components. ↩
When it comes to gratuitous acronyms, why should software developers have all the fun?. ↩
There are other projects that might have interesting applications in this area, including self-correcting or self-learning approaches inspired by auto research. These approaches can be quantified. ↩
There are also even less scrupulous uses this code leak could be put to. ↩
FOSS agent harnesses also confer all of the other attendant benefits found in most FOSS software, like auditability, stability, large user base making bugs shallow, etc. ↩
https://github.com/steveyegge/beads ↩
Beads isn’t the only way to approach agent memory. For example, Claude recently rolled out a form of memory that’s less explicit and non-external like beads. Others may emerge. The great thing about the MAVENS framework is that swapping out the Memory component from beads to something else can be done in a few minutes, if new libraries come along. ↩