From time to time I'm going to write posts in English. Yes, I'm trying to show off all the features my blog is going to have right away. I like to write about software development (and AI in particular) in English because I'm reading and watching so much content about it produced mainly by English speaking experts. Also I want to practice more and writing is one of the best ways to do it, even if I make mistakes or simplify things along the way. The struggle is real so it should definitely be helpful for me.
There is one idea I keep returning to every time I'm thinking about how much AI models can accomplish in real-world valuable tasks. It's from a talk by Andrej Karpathy https://www.youtube.com/watch?v=LCEmiRjPEtQ. BTW, he is a really interesting person and mostly known as the person who came up with the term “vibecoding” (in the tweet https://x.com/karpathy/status/1886192184808149383). He was working at Tesla and OpenAI on machine learning tasks, then moved to his own educational startup and now he also produces a lot of materials related to the topic of LLM's and AI in general. And he is really good at expressing things that are emerging at the AI frontier. So, the idea that I want to discuss here is the “autonomy slider” that, apparently, you need to have in any AI product. And it's not about some arbitrary UI/UX pattern as you might think (given my frontend development background).
It's funny to see how people are quickly getting used to the capabilities and intelligence of the frontier AI. And then start yelling on how stupid they are in some situations. I do that too. I think it's more of an autonomy miscalibration and aligning with capabilities you expect from model (what it really can or can't do). It's hard to get it right all the time, because the frontier is moving so fast, along with the tools (in coding it's usually called a “harness”). You have to adapt to the landscape with each update if you're actually buying the idea that you'll solve problems more effectively with AI (which is not the default).
When I started to tinker on how to apply LLMs to coding tasks it was only the chat interface available back then. The next step was a simple agent, basically a loop, with read/write tool called Aider. And I was pretty satisfied with the results, because it was pretty easy to feel the ceiling of capabilities and the tools I've used was getting the balance of autonomy right. Lately, it became so much harder, because we have a really powerful but jagged intelligence (this term was also coined by Karpathy in https://x.com/karpathy/status/1816531576228053133) and tools that are overloaded with features (because nobody knows how to design them in the way that would be genuinely useful).
I feel like I'm solving a difficult optimization problem each time I apply agentic coding to the task at hand. And the autonomy slider idea here is about these optimization dimensions I can use to control the process. As of now, there are a lot of heuristics we're trying to apply when tweaking the parameters (LLM choice, prompts, context gathering techniques, tools available, skills, etc.) and hope that the result meets our expectations.
P.S. This became even more unhinged with the OpenClaw release. The “ghost” (yes, Karpathy's term) lives on the computer and can do whatever it wants to. You can't control it anymore, at least in any meaningful sense, and you should be brave enough or stupid enough (or both) to let it run without any steering.