Using AI to Improve AI Tools Built by AI
Bootstrapping and self-auditing AI-built tools
The Bootstrapping Origin
As I wrote in this LinkedIn post, I’ve become completely enamoured with using AI to build tools for AI so those tools can help AI build better tools for AI.
It all started with two problems. The first problem is that I can’t keep up with the pace of change in the AI space. It’s so rapid that by the time I learn a new technique or best practice, it’s already outdated. The second problem is that I like building tools, but creating AI tools requires up-to-date expertise in the domain, which gets us right back to the first problem.
I’m forever playing catch-up, which meant the only way to scratch my tool-builder itch was to eliminate myself as the weakest link, i.e. substitute myself with AI, and build tools that could keep themselves up to date.
And that’s how I got to AI to build AI tools to improve the AI tools built by AI.
Recursion for the win.
For this, I needed two things: a repeatable mechanism for synthesising knowledge from multiple sources, and another for writing reusable AI skills based on that synthesis. If I could encode this process, I could use those tools to iteratively improve themselves and stay up to date.
My first attempts gave rise to the unusable library tool I wrote about in Complexity Cascades. Starting fresh after that humiliation, I pointed Claude Code at its own documentation on agent skills and asked it to encode the expertise in, well, a skill-writing skill (unimaginative, but practical).
I followed that up by, again using Claude, researching and designing a workflow for comprehensive knowledge synthesis from multiple sources (documents, URLs, whatever), and used skill-writing to create a topic-synthesis skill based on the results.
You can probably see where this is going, but I then ran topic-synthesis over the same documentation used to create skill-writing. This produced a more thorough synthesis of skill-writing expertise, which I then used to assess and improve skill-writing itself. With the improved skill-writing skill, I reviewed and improved topic-synthesis, and so on.
We were off to the races.
At this point there was still too much human-in-the-loop, so cogworks was born. Not only was the name cooler, but I now had an agent that orchestrated the whole process, calling on cogworks-encode (the evolved topic-synthesis) and cogworks-learn (the evolved skill-writing) as its component skills.
Of course, I wasn’t done yet. AI tools are only as good as the prompts that define them, so I needed to check the quality of the cogworks tool definitions. If these were suboptimal, that would have implications for all the tools generated downstream. Naturally, I fed it documentation on prompt engineering, created an advanced-prompting skill and used it to audit the cogworks (v1) tools to see if they themselves followed the prompting principles they encoded in the advanced-prompting skill.
They didn’t.
Which meant the advanced-prompting (v1) skill’s assessment was unreliable output from an unreliable tool that was a product of the unreliable tool kit it had just improved.
And yes, you guessed it.
I regenerated cogworks-encode (v1) with the prompt-improved cogworks-learn (v2) skill, then regenerated the skills creation and advanced prompting synthesis from source using the freshly generated cogworks-encode (v2), then generated cogworks-learn (v3) using cogworks-learn (v2) and the updated skill-writing synthesis, then generated advanced-prompting (v2) with the updated cogworks-learn (v3) skill...
It’s hard to keep typing through the tears...
The Point - Bootstrapping and Dogfooding
The point of all this isn’t to brag about how I suffered through a recursive improvement loop for my AI tools. The point is this:
If your tools can audit themselves, they can improve themselves.1
I wish this was a revolutionary insight, but alas, it’s a well-known principle in software engineering and AI research that has been around for decades.
In compiler design, the concept of self-hosting has long been considered a milestone of maturity in compiler design. Ken Thompson’s 1984 Turing Award lecture, “Reflections on Trusting Trust,” explored the implications of a compiler that compiles itself. If your tool can process its own output, you’ve established a powerful consistency check. Of course, there is a world of difference between a compiler and a prompt-driven AI tool, but the underlying principle of self-application as a quality check is the same.
In software engineering more broadly, the practice is called “eating your own dogfood”. The phrase traces back to Microsoft in 1988, where manager Paul Maritz used it to describe the practice of internal teams using their own pre-release software. Harrison’s analysis in IEEE Software argued that dogfooding creates one of the strongest quality feedback loops available when the people closest to a system’s design become its most demanding users. If you won’t use your own product, why should anyone else?
My Take-Away
AI-assisted software engineering is a strange new world that I find simultaneously fascinating and terrifying. So much of it feels like trying to catch smoke but, for now, I think iterative self-evaluation and self-improvement is the way to go when building AI tools with AI.
One last thing, however, to avoid spiralling into insanity: know when to stop. The recursive improvement loop is powerful, but unlike compiler self-hosting, which has a natural termination point (the compiler either compiles itself or it doesn’t), prompt-driven tools don’t reliably converge to a fixed point. LLMs aren’t deterministic. Each iteration of the improvement loop produces different results, not converging results. The advanced-prompting skill that audits cogworks today will generate different observations than the one that audits it tomorrow, even if nothing else has changed.
Hospedales et al.’s 2022 survey on meta-learning (the study of systems that improve their own learning processes, i.e. learning how to learn better by applying learning principles to the learning process itself) frames the ability to learn how to learn, not just what to learn as a fundamental capability. The challenge is knowing when the meta-learning starts producing diminishing returns.
With each iteration I underwent with cogworks, things genuinely improved. The temptation to keep going is partly because of the impossible pursuit of perfection, but also because I could see how another pass would improve them further.
For now, however, I’ll stop chasing the dopamine hits and use cogworks for what it’s intended.
Happy coding!
Editor’s Note (2026-02-19): An important clarification (thank you, Avishek!) - self-audit does not automatically imply self-improvement. In deterministic systems (like compilers), self-hosting creates a clear fixed point. In stochastic LLM systems, recursion produces a distribution. It doesn’t necessarily converge. Sometimes it just spirals out of hand. rather than convergence. Sometimes it just spirals out of hand, as the discussion below explores.

