My journey to being a 10x vibecoder

The problem

I've been interning at PodPitch over the last 2-ish months and it's been a crazy ride. I've learn alot about startups (definitely worthy of another whole post on that) but more importantly my whole view on software development has changed and that's what I want to focus on in this write up. SWE here is vastly different from both what I learnt in school and also what I learnt from doing the many projects in my past. Why? Because of simply how good AI has gotten. I'm currently on a roadtrip in Utah, visiting a bunch of national parks including Bryce, Zion, Glen Canyon, and the Grand Canyon, with no data or wifi - which encouraged me to share more about this topic LOL

The Unleashing

My first hint that I was not utilising AI properly was a casual interaction with my boss, a week into my internship. I was using the 'Auto' mode in cursor (like a caveman) and trying to build the podpitch bot - an agentic bot that customers install directly into their slack workspace to ask all kinds of questions about their campaigns and account in PodPitch. My boss noticed this and immediately said

"Dude, we are a profitable startup and you should be using AI to the best of your abilites and if you don't, you're actually losing out on productivity"

Ofcourse I was a bit puzzled at first. All I've known is the old Auto mode that I used for school and my older projects. He went onto further elaborate. He said that if we were optimistic about the intelligence of these models, they will most certainly get smarter and smarter, maybe even exponentially over time. That being said, if we maintain our current usage and ask of AI, we are missing on the gains of productivity or intelligence that these models are able to unleash. So, if we are not constantly being super ambitious of what we are asking AI - we are losing out. This kind of blew my mind (although now it feels so trivial). That moment, he selected gpt 5.3 codex super high fast with max mode on. (btw, spent over $1500 in my first month on cursor...)

2 months later

My productivity hit an all time low. This is certainly a huge twist from what you were expecting. To my surprise, using AI did infact not magically make me a 10x engineer. Maybe a 0.8x engineer. Building with AI is NOT a simple process. It'ls like your suddenly given command of a fighter jet and if you don't know how to fly one, you will most invetibaly crash - and in this case, it looks like

Buggy features
messy regressions
code lost in improper conflict handling
brute force solutions for problems with very simple solutions (realising only after I talked to our super cracked founding engineer).

So, the questions I've had to ask myself are that why did this happen, and more importantly what can I do to adapt/improve myself to better contribute to the team at a faster rate. So, I went back to my boss and asked him for feedback. There must be insights that he has about why I'm performing poorly that I simply do not realise. I can't remember verbatum what he said but he said stuff along the lines of

With AI being so strong, one thing that is that we must do is evolve our roles from software developrs to product managers. We must hold more accountability of getting our features shipped and used by our customers.
I may be looking at the code too much
I am not being ambitious enough with AI
Stop and ship at 90% (product completion)

So the second point is crazy to me because one way we guarantee something works is by looking at the code and understanding what's actually happening. But if you think about the previously mentioned idea of AI getting smarter and smarter, it's pretty safe to say that my need to look at the code can very well become the bottleneck of development. However, if something breaks as a result of the code my agent writes, it will most certainly be my responsibility. So how do I balance this? I was working on a smart template feature - which is one of the core features we have that also impacts other core features. How do I "not look at the code" and at the same time guarantee that it does not create regressions, handle all edge cases and work great? This is where things start to get very tricky and we may not have a solution yet.

Specifying the issues

Before we go into my proposed solutions (which may sound extremist), we should definitely talk a bit more specifically about the problems that I face with coding with AI.

1. Context is king

This is heavily talked about in many forums and platforms. Giving your agent the right information is extremely important because, agents, as smart as they are, are also inherently stupid. While I'm not going into go deep into context of agents, subagents, 40% rule, MCPs yada yada yada (mcps are a game changer), I will talk about context issues I've faced from a practical perspective.

Codebases contain a lot of deprecated code, unused tables and columns, and suspicious naming of varables. This has often led me on a wild goose chase trying to understand how certain things work in our codebase. After becoming victim to this for 4 or 5 times, I finally decided to do a push to delete as many unused endpoints, delete ununused columns and tables and a lot of dead CSS classes as well. I'm guessing a lot of huge codebases face this problem. This, again, was a very risky move because determining an endpoint is "dead" is not the same thing as identifying dead code. Oh well.
You MUST understand what's going on. No matter how much you ask the AI to plan an implementation of a new feature or a fix, if you don't know the context -> the agent would probably have a hard time figuring the best and easiest solution. You might get lucky or you may not, but if an engineer asks you about how your featur works, you probably will have a hard time answering. One example of this was when I was working on this feature that would automatically scour the web for a new email address if the current email address didn't work and these emails had to simply go back to "in progress" mode instead of "failed" mode. Details aren't important but I spent half a whole day prompting and reading into why the email wouldn't go back into the "in progress" mode. A whole 5 hours of prompting and reading and mindless back and forth. After fealing defeated, I got into a huddle with my founding engineer to ask why this is happening. He thought for about 20 seconds and he simply said it's probably because I wasn't properly overriding a certain "thread_id" column in the table I was using. I told my agent this and it solved in 1 prompt.

So what's the difference between my founding engineer and I? Is it some insider information into the best AI tools? Probably not. Is it because he has secret access to the latest experimental coding models? Probably not as well. He just has a much better idea of the current state of the system and what needs to be done to change it. He has much more context on how the core features work and with that context, he is able to surgically edit with his agents. So, understanding the current state of how things work is just as or even more important than understanding the changes you're going to make. This again sounds trivial but easy to miss out on. One very very simple way of getting the proper context is by simply talking to the other engineers. They will be able to much better discern what you need and don't need to know, compared to your agent. This is something I need to do better and I think creating diagrams on excalidraw probably help alot.

2. Setting up the system to the agent's advantage with a closed agent feedback Loop

Agents are becoming very intelligent - but they are blind. They have no senses and they have no receptors. We have 2 choices and most of us do the first - which is to be it's eyes and ears. The second option, which I am not doing it nearly well enough, is to provide it with the tools, systems and methods to determine and judge it's own outputs. I feel truly embracing and unlocking the second type will elevate our coding with AI because firstly, it will code much faster as it does not have to wait for you to check or input. This allows it to iterate on it's own much much faster and have a better understanding of whats going on but more importantly, this allows your agents to run even if you're not in the loop! This segueus nicely into my last point.

3. Agents only run on your time

This is something I only started thinking about very recently. Let's say you work 9 hours a day, you're genuinly losing out on 15 hours of agents thinking, building and reviewing. Let me be clear here, I am not making the argument that agents should be running 24/7 and that will be beneficial - when in fact, if not done properly, may probably be more harmful than helpful. I feel agent review and agent testing should happen a lot more often than we currently allow for. If we properly define our context and provide it with eyes and ears (points 1 and 2), we can create systems and workflows that significantly increases productivity. When we work, we want to spend majority of the time, making decisions, designing, and doing things that we are great for as humans. Our working hours is when we should spend time reviewing code and changes made by our agents, ensuring that it indeed followed our prompt and instructions, instead of waiting for it to complete our prompts.

My solution

So here's kind of what I'm thinking of how a potential agent workflow should run

Prompt and context research

Understand on a deep level how the current feature/system works (I often initially skipped this)
Plan on changes and implementation for the new feature/system on a deep level (spend more time than you think on this)
OPTIONAL: review these changes with a senior engineer for more important core features (I believe that they will provide some great insight)
What the final feature/system should look like (a one liner probably)
Designing tests (has to cover edge cases), testing systems and observability tools that guarantee that the new feature/systems works the way with the constratins we set
Work on a no no list - what should not be done and what code shouldn't be touched (ex. don't rely on a new table)
Work on a UX requirement list (latency, auth, preloading etc)
This should probably be human made and it's ok to spend more time on this. Gotta build a solid foundation first.

Refinement of our plan with agents

Ask our agents if its feasible and if not, why?
Ask our agents if there is a better solution (better can be vague here so feel free to alter as needed)
Ask our agents to "think out of the box" for better solutions (take with a pinch of salt)
Ask our agents to think of possible regressions and test cases that we might have missed out on (equivilance partitions)

So up to this point, there must be a lot of handholding of the agent. Ideally, the first step should be done completely by yourself (without an agent) and actually use your brain. I only say this because I have absolutely have been just vibing. I believe this skill is what separates the regular vibe coder from the 10x engineer. The better the foundation, the better the odds of the system/feature being because this is indeed a stochastic outcome haha.

Builder Agent

Use opencode/codex/claude or whatever agent you want to use to simply follow all the instructions in our plan and build it.
It should create a todo list of tasks that is derived from our plan (agents already do this so but it doesn't hurt to prompt it again)
The builder agent should only end if and only if the todo list is done and all tests we had designed in step 1 passes. (this is why the test case design is super duper important)
The work should also be done on a new branch based off staging. (for my case)
Should ensure it has read-only access to all your data. In my case, it's supabse, linear, logfire, sentry, and probably some web agent tools.

The skeptical agent

This agent is appointed to enforce
- All tests pass
- The no no list is adhered to
- The UX requirement list is adhered to (eg the loading should not take more than 1 second)
- Find bugs
It should take the output of the builder agent as the input as well for additional context

Agents are stupid - we have acknowledged this already. We should be very careful with how we prompt our skeptical agent. We SHOULD NOT say "find us all bugs" because that implies that there are indeed bugs and agents will try their best to satisfy your answers. Instead, you should say things along the lines of "Comb through xyz of the codebase and report all you find". Here's a very interesting approach that I read from an article on X from @sysls - and I'm going to def try it.

It's a very interesting approach to getting our agents to find 'true' bugs. I'm just going to yank it and paste it below.

""" So I get a bug-finder agent to identify all the bugs in the database by telling it that I will give it +1 for bugs with low impact, +5 for bugs with some impact and +10 for bugs with critical impact, and I know this agent is going to be hyper enthusiastic and it's going to identify all the different types of bugs (even the ones that are not actually bugs) and come back and report a score of 104 or something to that order. I think of this as the superset of all possible bugs.

Then I get an adversarial agent and I tell that agent that for every bug that the agent is able to disprove as a bug, it gets the score of that bug, but if it gets it wrong, it will get -2*score of that bug. So now this adversarial agent is going to try to disprove as many bugs as possible; but it has some caution because it knows it can get penalized. Still, it will aggressively try to "disprove" the bugs (even the real ones). I think of this as the subset of all actual bugs.

Finally, I get a referee agent to take both their inputs and to score them. I lie and tell the referee agent that I have the actual correct ground truth, and if it gets it correct it will get +1 point and if it gets it wrong it will have -1 point. And so it goes to score both the bug-finder and the adversarial agent on each of the "bugs". Whatever the referee says is the truth, I inspect to make sure it's the truth. For the most part this is frighteningly high fidelity, and once in awhile they do still get some things wrong, but this is now a nearly flawless exercise. """

This is amazing, not sure how good it'll be - but something I'll defintely try out :)

if all_good:
    return publisher agent
else:
    return fix it agent

The fix it agent

This agent is in charge of fixing the bugs returned by our referree agent. Nothing too crazy here I think.

We now cycle through steps 4 and 5 until we reach the publisher agent.

The publisher agent: The finale

This agent is a cleans up the whole process by doing the following
- Publishes a PR with information from our plan and provides a link
- Sends me a full update of the whole story (what all the separate agents did and the final product)

Ok I realised there is a TON of information in this article and I have a lot more to share about with regards to skills, rules, prompting and building useful agents. Whether or not this new workflow is effective - I don't know haha but defintely worth trying and sharing after. Some time and consideration will need to go into how to best setup the system to do this. I believe it's going to be some combination of CRON jobs on a VPS with stop hooks.

Maybe I'll share more on my next roadtrip that leaves me stranded without data :)