you2idea@video:~$ watch -GyX21BL1Nw [20:24]
// transcript — 540 segments
0:02 Last month, I covered agent harnesses and why they're the next evolution for
0:07 AI agents, especially for agentic coding. The idea here is simple. If we
0:11 give too large of a request to our coding agent, even if we have a lot of
0:16 context engineering, the agent is going to completely fall on its face. And it's
0:21 all about context management. Agents don't do that well when you start to
0:24 fill their context window. It is the most precious resource when we are
0:29 engineering with them. And so that's what brings us to the idea of an agent
0:33 harness. It's really a wrapper that we build of persistence and progress
0:38 tracking over our coding agent. So that way we're able to string together
0:43 multiple different sessions with state management, a git workflow. It can get
0:47 pretty elaborate, but it allows us to extend how much we're able to send into
0:53 a system at once. And this really is the future of AI coding. If we're going to
0:57 push the boundaries of what is possible with our coding agents, it's going to be
1:01 with a harness as a wrapper. But there is a big problem here because if we're
1:06 building this harness like this is anthropics one that we'll talk about in
1:09 this video, we're trying to push the boundaries of our coding agent turning
1:14 it essentially into a full-on engineer. But engineers do a lot more than just
1:20 coding. They also communicate in a platform like Slack, giving us updates
1:24 on the progress. They manage the tasks in something like Linear or Jira.
1:27 They're maintaining the GitHub repository. We need all of these things
1:32 in the tool belt for our agent for it to be a true AI engineer. And this diagram,
1:36 what you're looking at right here is actually what I've built to show you
1:40 right now. I've been experimenting with some ideas here. How can we take an
1:44 agent harness and build a tool belt into it so that it can really be a full
1:48 engineer? So, I'll show you how this works right now, how you can extend this
1:52 for yourself. and stick around to the end of the video as well because I'll
1:55 talk about how this really is the future of Agentic Coding. Some big things that
1:59 I'm working on personally as well. And of course, this entire harness I have as
2:03 a GitHub repository for you, which I'll link to in the description. So, I
2:07 encourage you to try it out and even extend it yourself. I made it super easy
2:11 to tweak all the different sub aents that we'll talk about in a little bit to
2:14 connect to the different services. In the read me here, there's a really quick
2:19 setup guide. I'm also using Arcade. This is the platform to make it super easy
2:23 for us to connect to Linear, GitHub, and Slack through MCP. So, I'll talk about
2:27 that a bit more as well. Once you have this all set up, all you have to do to
2:32 send the context into the harness to begin is create an appspec. You can
2:36 think of this like a PRD. It's all of the features that you want it to build
2:42 autonomously in the harness loop. And so, you want to take this appspec and
2:46 use it as an example. So give it to your coding agent because there is a specific
2:50 format that works best for this harness. The biggest thing here is we have our
2:55 task list in a special JSON format. This is the official recommendation from
3:00 Anthropic because I've built my harness on top of Anthropic's harness for
3:04 longunning tasks that they open sourced at the end of last year. And of course
3:08 that does mean that I am using the Claude agents SDK to run this harness,
3:13 but you can use your Anthropic subscription. So really cost effective
3:17 and the Cloud Agent SDK is powering all of the harness experimentation I'm doing
3:22 right now. So for this app specifically, just to give you a really cool example
3:25 what this harness can build, I'm extending my second brain. It's yet
3:28 another thing I've covered on my channel recently. I want to build a dashboard
3:33 where I can paste in a bunch of research that my second brain has done and then
3:38 it'll in real time generate a layout that's unique to the specific research
3:42 that I gave it. So, I can glean insights really quickly. And boom, take a look at
3:47 that. We have a beautiful TLDDR for this pretty extensive research document. I
3:51 It's like 2,000 words in total. We can view the full thing as well. And this is
3:56 not a simple application. There is an agent behind the scenes deciding the
4:00 components to generate in real time to customize the dashboard based on what we
4:06 pasted in. And so using the harness to build this, it decided to create 44
4:12 tasks in total in linear. And so I ran all of this already. So everything is
4:16 done. So we can see all the tasks here. And then we also have the progress
4:20 tracker meta task. And so we need to hand off to the next agent session every
4:25 time we go through that loop in the harness. And so we need to let the next
4:29 agent know what did we do right now so that it can pick up where we left off.
4:33 It's also managing the GitHub repository. We got pull requests. It's
4:37 making a commit for every single feature that it built. That's really cool. You
4:41 can tweak this to your heart's content as well. And we're providing updates in
4:45 Slack. And so, for the sake of simplicity, I just have it message me
4:50 after the first and second sessions. And then when my application is fully
4:53 complete, so then I can come back to my computer to test everything myself, just
4:57 like you would do when you're reviewing the output from a real engineer. So we
5:01 have everything managed in linear everything in GitHub and then letting us
5:05 know when things are done. This is just beautiful to me. And by the way, I just
5:09 want you to know that like this is just the starting point for a harness. A lot
5:12 more work that I'm doing on top of this. A lot of ways you could extend this as
5:16 well. Another really good example is you could build the harness to just watch
5:20 24/7 for any issues that you create in linear and then it would pick those up
5:24 automatically. And so you can change the way that you interact with this harness.
5:28 The sky is really the limit for the way that you build it into these tools. You
5:31 could even have it work with GitHub issues, add in some other platform you
5:35 have like Aana or Jira. It's entirely up to you. All right, so with that, let's
5:39 now get into running this harness. We'll even do a live demo on a simpler
5:42 application and then of course I'll show you how this all works. I want you to
5:45 learn from this and see how you can extend it yourself. And so like I said
5:49 earlier, the readme is really easy to follow. You just set up your virtual
5:52 environment. Make sure you have Cloud Code installed and that you've logged in
5:56 because this harness is going to use the same subscription that you have with
6:00 Cloud Code. So, really easy there. The main thing that I want to cover right
6:05 now is setting up your env. So, Arcade is our ticket here to connect super
6:09 easily to linear Slack and GitHub. That's why I wanted to include it
6:12 because then we don't have to set up all of the individual MCP servers. And so,
6:16 you could change this harness to use those directly if you want. But Arcade
6:20 has a free tier. They also implement what's called agent authorization. So
6:24 they walk us through the OOTH flows really easily with these different
6:27 services. So we could even share this harness with our team members with our
6:32 Arcade MCP gateway. And they don't have to create a new linear API key and a new
6:36 Slack app, but we also don't have to share those credentials with them. So
6:39 it's a really really powerful platform. And so once you're signed in on the free
6:43 tier, you just create your MCP gateway. You give it a name, description, LLM
6:47 instructions. For the authentication, set it to arcade headers. And then for
6:51 the allowed tools, look at this. Boom. We got GitHub. I'll search for linear.
6:55 And then we got linear. And then finally, Slack. It is that easy to add
7:00 in all 91 tools. And by the way, we are using the new tool discovery for MCP and
7:04 Cloud Code. So, it's not like we're just dumping 91 tool definitions directly
7:09 into our coding agent. That would not be contexts efficient. And so, there we go.
7:12 You can create this. I'm just going to use the one that I already have. copy
7:17 your URL because you set that as one of your environment variables and then you
7:20 get your API key from the dashboard as well. That easy to get everything set
7:24 up. Then just use your email here. We can also configure the specific GitHub
7:28 repo that the harness leverages. So generally what I do is I'll create a
7:33 empty repo and then add it in here. And then you can define a slack channel for
7:37 updates too. And you can even change the model that each of our sub aents are
7:42 using for coding linear GitHub. And so we can make things really cost effective
7:45 or just really fast, right? Like we just want to really quickly create things in
7:49 linear. So let's just use haiku for the model. So do all that configuration and
7:54 then you'll run the authorize arcade script. So you just have to do this one
7:58 time because then it'll go through the OOTH flow. So the harness now has access
8:03 to your linear project, your Slack channel, and the GitHub repo that you're
8:07 working in. And then with all of that taken care of, we can run our harness.
8:11 Just a single command that we need to run to send our appspec into the
8:17 harness. And so make sure that you have your appspec fully fleshed out with the
8:20 help of your coding agent because looking at the first prompt here that's
8:25 sent to our initializer agent is going to read the appspec to understand what
8:28 we're building. So this is the single source of truth initially before we have
8:33 everything set up in linear. And now I'm using WSL here because sub aents don't
8:37 actually work that well in Windows with the cloud agent SDK. So use WSL Mac or
8:42 Linux to run this. And so I'm going to activate my virtual environment here if
8:47 I can type. There we go. All right. And then I'll run the command to kick off
8:51 the agent. And then I'm just going to specify the directory here. So it's
8:55 going to create this from scratch in the generations folder. So this is the
8:59 default location for all of the projects that it creates. And so I'll send this
9:02 off and it's going to kick off the initializer agent to scaffold everything
9:08 for our project linear the GitHub repo the initial configuration for our
9:11 codebase. I'll come back once it's done some of that. All right, take a look. So
9:16 it delegated to the linear agent to get things set up for us. So it starts the
9:20 project initially and now it's building all of these tasks. And so if I go to my
9:25 projects here, we got our new Pomodoro timer task or project. So if I go to the
9:29 issues here, there's six right now. And it's going to create more and more.
9:32 Maybe actually probably only need six for this cuz it's a really simple
9:36 application. So it created the five to build out the app. And then we have the
9:40 meta project progress tracker as well. So this is where we're going to update
9:44 things with our progress over time as we're handing off between the different
9:49 sessions for the harness. So all the setup is done in linear. And now it's
9:52 moving on to initializing the Git repository, calling the GitHub sub agent
9:56 for this. And so remember, we're using sub aents for context isolation. So
10:00 we're not bloating the main context window for our primary orchestrator
10:05 here. And so yeah, there's going to be a lot that it does here. It'll go on for a
10:09 while. And so while we wait for this, I'm going to go back to our diagrams
10:12 here because I want to show you exactly how this works. I think the diagrams are
10:17 a lot better of a visual than just watching the longs as it's running. And
10:21 of course, I'll show you the project once it's done, but let's cover this in
10:24 the meantime. So, going to the original harness here, I want to talk about what
10:28 Anthropic built to set the stage for how I've improved it to create our full AI
10:33 engineer. And so, we start with the appsp spec as the primary context that
10:38 goes into our initializer agent. And most of the harnesses that I've seen
10:41 over the past few months, they always start with an initializer. Because
10:45 before we get into the main loop of implementing all of the features that we
10:49 have in linear or in this case our local feature list, we need to set the stage
10:53 for our project. We need something to create those features in the first
10:57 place. And I don't know if you saw that blip there for a sec, but it actually
11:00 popped up the browser because it was validating our code behind the scenes
11:04 with Playright. So anyway, with our initializer agent here, it creates the
11:08 feature list. It's everything we have to knock out that we laid out in our
11:12 appspec. It creates a way to initialize the project and it scaffolds the project
11:18 and the git repository. And so these are the core artifacts that we have after
11:22 the initializer runs. We have the source of truth for everything that has to be
11:26 built. And our coding agents when it knocks out all the features, it'll go
11:30 back here and update things. And so this is our place to keep track of what have
11:34 we built already, what do we still have to build. And then for the session
11:38 handoff, we have a simple text file, which I appreciate the simplicity of
11:42 this harness. But I think there really is a a big use case to have the agent
11:48 work where we actually work, which is why I wanted to build this. But anyway,
11:52 I'll wrap up here with the coding agent loop. Every single time the agent runs,
11:56 we're running in a fresh context window. The whole point of this agent being able
12:00 to go for a longer time is that we're stringing together different agent
12:04 sessions and each one of them we want to start over so that we have fresh
12:08 context. So it starts by getting its bearings on the codebase and so reading
12:12 the feature listing like okay what should we build next? It'll do
12:15 regression testing. This is important for reliability of the harness because a
12:19 lot of times one agent is going to break what a different agent worked on
12:22 earlier. And then after it validates that then it'll pick the next feature
12:26 implement it update and commit which includes making the get commit and then
12:31 updating these two files as well. And so what I've built is very similar. I mean
12:35 you can even see that I I purposely have a the same architecture for the diagram
12:40 here but there are some big differences because of the service integrations and
12:44 how I'm using sub aents to orchestrate everything. So we still start with the
12:49 appspec going into an initializer agent. But now like we saw in the logs earlier,
12:53 it's delegating to the linear agent to set up the project in linear and all of
12:58 the issues. And then just so that we know for our codebase like what linear
13:03 project are we tied to? We also have a single local file. So for the most part,
13:07 I'm avoiding local files. I don't have all of these files, but we need at least
13:11 one file to point us to the right project ID. And then we'll also create
13:16 the meta linear issue. So this is replacing our cloud progress. And then
13:21 we'll create that git repo with our GitHub sub agent. And so now linear is
13:25 our source of truth instead of these local files. And so now when each agent
13:29 runs, it's going to start by reading the linear project. So that way it knows
13:33 what is our project in linear. It'll call the linear agent to then find,
13:37 okay, what are the features that we should validate? What should we pick up
13:40 next? And we're using Arcade for authentication. So the agent has access
13:45 to all of these services. And so then it'll do that implementation, use the
13:49 GitHub agent to push, and then we can also use the Slack sub agent to give a
13:54 progress update. And we're just going to loop over and over and over again until
14:00 every single task in linear is done. And I've set it up in a way for most of the
14:05 time it's going to just do one task at a time. But if the agent figures out it's
14:08 simple enough, it might actually just try to knock out multiple of them in a
14:12 single session. And this is all configurable in the prompts that we'll
14:16 get into as well. So the last thing I want to cover while we wait for our
14:19 harness to complete is the architecture and how you can tweak things for
14:24 yourself. Every single agent that we have in this harness for coding the
14:28 different services, they are controlled by these prompts that we have in the
14:32 prompts folder. And so when we create our agent, we're using the cloud agents
14:37 SDK. So we're defining everything in code. We're not using our cloud folder
14:41 like you would with cloud code. We have our system prompt loaded in right here.
14:47 So we're loading in from this file. And so our orchestrator, this is our system
14:51 prompt where we're describing. We're building from the appspec. Here are the
14:55 sub aents that we have access to. Here's what our workflow looks like. All that's
14:59 defined in the system prompt. And then when we are in our very first session,
15:04 that's when we use the initializer task. And so I'll show you here in the code. I
15:07 I promise I'll stay pretty high level with the code here. We're seeing is this
15:11 our first run? Do we have things initialized in linear or not? If it is
15:16 our first run, then this function is going to load in the prompt from this
15:20 file. So we're controlling with markdown files just like you would with sub aents
15:24 in cloud code. And then otherwise we're going to load the continuation task. And
15:30 so this is what we run every single loop when we're going to build that next
15:34 feature. So we read that linear project. We know what linear project we're
15:37 working with. We delegate [snorts] to the linear agent to figure out what we
15:40 should work on next. everything that I've already explained in the diagram.
15:42 I'm now just showing you how this maps to the prompts that we have for all the
15:47 sub aents here so that you can tweak all this for yourself. You can connect more
15:50 services, change how often it communicates in Slack, anything that you
15:55 want to do. And so the last thing that I want to show you here is that instead of
15:59 defining our MCP servers and our sub agents in thecloud folder, we're doing
16:04 it here in our cloud agent SDK definition. And so we're connecting to
16:09 obviously our arcade MCP gateway and then the Playright MCP server. Same kind
16:13 of way that you configure the configuration with something like claw
16:17 desktop for example. And then we have all of our agent definitions right here.
16:21 So this is being imported from this file. It is super easy to add on more
16:26 sub agents if you want because for every agent we just give it the description.
16:30 This is how our orchestrator knows when to call upon the sub aent. We are
16:34 loading the prompt from the file. Like for our linear agent, we're loading it
16:38 directly from the linear agent prompt right here. Just speaking to like how we
16:42 manage issues and projects and things like that, we have the tools, the tools
16:47 that are it's allowed to use with the arcade mcp. And then finally, the model.
16:52 So this is from ourv. We can use haiku, sonnet or opus. And so we just build up
16:56 these agent definitions here. So you can change the prompts, change the
17:00 description, add in another one. Very easy to configure. And that's all just
17:03 brought into our agent automatically. And so it really is all of these
17:07 markdown documents that define the entire flow. Really just using the
17:12 Claude agent SDK as the wrapper around these different prompts, connecting
17:16 everything together into this pretty elaborate system that's able to handle a
17:20 lot. Like going back here, we're not done quite yet, but we finished three
17:24 out of the five issues for this simple Pomodoro timer app. I'll come back once
17:28 everything is done so we can see the full example now that you know how it
17:32 all works and how you can extend this yourself. And here we go. The big
17:36 reveal. The application that we've been creating throughout this video is
17:40 complete. And interestingly enough, because this application was so
17:43 incredibly simple, it decided to build everything in the initializer session,
17:48 which I actually prompted it to do so if it determined it was simple enough just
17:52 to show you how dynamic this system can be. And of course, I showed you the more
17:56 complex app earlier where it did have to do many different sessions for 44 tasks.
18:00 But yeah, our application looks really good. We can start it here. We can pause
18:05 it, skip to our break. The Pomodoro technique is really awesome for
18:08 productivity, by the way. But yeah, we got our update in Slack that the project
18:12 is complete. It links to our GitHub repository here where we have six
18:16 commits, one for the initialization and then one for each of our tasks. And of
18:20 course, they're all marked as done with our progress tracking filled out as
18:24 well. So we started the initialization and then this is the project complete
18:28 status at the end. Super super cool. We built this entire thing just during this
18:33 video as I was covering the code and our diagrams. So I want to end this video by
1:36 what you're looking at right here is actually what I've built to show you
1:40 right now. I've been experimenting with some ideas here. How can we take an
1:44 agent harness and build a tool belt into it so that it can really be a full
1:48 engineer? So, I'll show you how this works right now, how you can extend this
1:52 for yourself. and stick around to the end of the video as well because I'll
1:55 talk about how this really is the future of Agentic Coding. Some big things that
1:59 I'm working on personally as well. And of course, this entire harness I have as
2:03 a GitHub repository for you, which I'll link to in the description. So, I
2:07 encourage you to try it out and even extend it yourself. I made it super easy
2:11 to tweak all the different sub aents that we'll talk about in a little bit to
2:14 connect to the different services. In the read me here, there's a really quick
2:19 setup guide. I'm also using Arcade. This is the platform to make it super easy
2:23 for us to connect to Linear, GitHub, and Slack through MCP. So, I'll talk about
2:27 that a bit more as well. Once you have this all set up, all you have to do to
2:32 send the context into the harness to begin is create an appspec. You can
2:36 think of this like a PRD. It's all of the features that you want it to build
2:42 autonomously in the harness loop. And so, you want to take this appspec and
2:46 use it as an example. So give it to your coding agent because there is a specific
2:50 format that works best for this harness. The biggest thing here is we have our
2:55 task list in a special JSON format. This is the official recommendation from
3:00 Anthropic because I've built my harness on top of Anthropic's harness for
3:04 longunning tasks that they open sourced at the end of last year. And of course
3:08 that does mean that I am using the Claude agents SDK to run this harness,
3:13 but you can use your Anthropic subscription. So really cost effective
3:17 and the Cloud Agent SDK is powering all of the harness experimentation I'm doing
3:22 right now. So for this app specifically, just to give you a really cool example
3:25 what this harness can build, I'm extending my second brain. It's yet
3:28 another thing I've covered on my channel recently. I want to build a dashboard
3:33 where I can paste in a bunch of research that my second brain has done and then
3:38 it'll in real time generate a layout that's unique to the specific research
3:42 that I gave it. So, I can glean insights really quickly. And boom, take a look at
3:47 that. We have a beautiful TLDDR for this pretty extensive research document. I
3:51 It's like 2,000 words in total. We can view the full thing as well. And this is
3:56 not a simple application. There is an agent behind the scenes deciding the
4:00 components to generate in real time to customize the dashboard based on what we
4:06 pasted in. And so using the harness to build this, it decided to create 44
4:12 tasks in total in linear. And so I ran all of this already. So everything is
4:16 done. So we can see all the tasks here. And then we also have the progress
4:20 tracker meta task. And so we need to hand off to the next agent session every
4:25 time we go through that loop in the harness. And so we need to let the next
4:29 agent know what did we do right now so that it can pick up where we left off.
4:33 It's also managing the GitHub repository. We got pull requests. It's
4:37 making a commit for every single feature that it built. That's really cool. You
4:41 can tweak this to your heart's content as well. And we're providing updates in
4:45 Slack. And so, for the sake of simplicity, I just have it message me
4:50 after the first and second sessions. And then when my application is fully
4:53 complete, so then I can come back to my computer to test everything myself, just
4:57 like you would do when you're reviewing the output from a real engineer. So we
5:01 have everything managed in linear everything in GitHub and then letting us
5:05 know when things are done. This is just beautiful to me. And by the way, I just
5:09 want you to know that like this is just the starting point for a harness. A lot
5:12 more work that I'm doing on top of this. A lot of ways you could extend this as
5:16 well. Another really good example is you could build the harness to just watch
5:20 24/7 for any issues that you create in linear and then it would pick those up
5:24 automatically. And so you can change the way that you interact with this harness.
5:28 The sky is really the limit for the way that you build it into these tools. You
5:31 could even have it work with GitHub issues, add in some other platform you
5:35 have like Aana or Jira. It's entirely up to you. All right, so with that, let's
5:39 now get into running this harness. We'll even do a live demo on a simpler
5:42 application and then of course I'll show you how this all works. I want you to
5:45 learn from this and see how you can extend it yourself. And so like I said
5:49 earlier, the readme is really easy to follow. You just set up your virtual
5:52 environment. Make sure you have Cloud Code installed and that you've logged in
5:56 because this harness is going to use the same subscription that you have with
6:00 Cloud Code. So, really easy there. The main thing that I want to cover right
6:05 now is setting up your env. So, Arcade is our ticket here to connect super
6:09 easily to linear Slack and GitHub. That's why I wanted to include it
6:12 because then we don't have to set up all of the individual MCP servers. And so,
6:16 you could change this harness to use those directly if you want. But Arcade
6:20 has a free tier. They also implement what's called agent authorization. So
6:24 they walk us through the OOTH flows really easily with these different
6:27 services. So we could even share this harness with our team members with our
6:32 Arcade MCP gateway. And they don't have to create a new linear API key and a new
6:36 Slack app, but we also don't have to share those credentials with them. So
6:39 it's a really really powerful platform. And so once you're signed in on the free
6:43 tier, you just create your MCP gateway. You give it a name, description, LLM
6:47 instructions. For the authentication, set it to arcade headers. And then for
6:51 the allowed tools, look at this. Boom. We got GitHub. I'll search for linear.
6:55 And then we got linear. And then finally, Slack. It is that easy to add
7:00 in all 91 tools. And by the way, we are using the new tool discovery for MCP and
7:04 Cloud Code. So, it's not like we're just dumping 91 tool definitions directly
7:09 into our coding agent. That would not be contexts efficient. And so, there we go.
7:12 You can create this. I'm just going to use the one that I already have. copy
7:17 your URL because you set that as one of your environment variables and then you
7:20 get your API key from the dashboard as well. That easy to get everything set
7:24 up. Then just use your email here. We can also configure the specific GitHub
7:28 repo that the harness leverages. So generally what I do is I'll create a
7:33 empty repo and then add it in here. And then you can define a slack channel for
7:37 updates too. And you can even change the model that each of our sub aents are
7:42 using for coding linear GitHub. And so we can make things really cost effective
7:45 or just really fast, right? Like we just want to really quickly create things in
7:49 linear. So let's just use haiku for the model. So do all that configuration and
7:54 then you'll run the authorize arcade script. So you just have to do this one
7:58 time because then it'll go through the OOTH flow. So the harness now has access
8:03 to your linear project, your Slack channel, and the GitHub repo that you're
8:07 working in. And then with all of that taken care of, we can run our harness.
8:11 Just a single command that we need to run to send our appspec into the
8:17 harness. And so make sure that you have your appspec fully fleshed out with the
8:20 help of your coding agent because looking at the first prompt here that's
8:25 sent to our initializer agent is going to read the appspec to understand what
8:28 we're building. So this is the single source of truth initially before we have
8:33 everything set up in linear. And now I'm using WSL here because sub aents don't
8:37 actually work that well in Windows with the cloud agent SDK. So use WSL Mac or
8:42 Linux to run this. And so I'm going to activate my virtual environment here if
8:47 I can type. There we go. All right. And then I'll run the command to kick off
8:51 the agent. And then I'm just going to specify the directory here. So it's
8:55 going to create this from scratch in the generations folder. So this is the
8:59 default location for all of the projects that it creates. And so I'll send this
9:02 off and it's going to kick off the initializer agent to scaffold everything
9:08 for our project linear the GitHub repo the initial configuration for our
9:11 codebase. I'll come back once it's done some of that. All right, take a look. So
9:16 it delegated to the linear agent to get things set up for us. So it starts the
9:20 project initially and now it's building all of these tasks. And so if I go to my
9:25 projects here, we got our new Pomodoro timer task or project. So if I go to the
9:29 issues here, there's six right now. And it's going to create more and more.
9:32 Maybe actually probably only need six for this cuz it's a really simple
9:36 application. So it created the five to build out the app. And then we have the
9:40 meta project progress tracker as well. So this is where we're going to update
9:44 things with our progress over time as we're handing off between the different
9:49 sessions for the harness. So all the setup is done in linear. And now it's
9:52 moving on to initializing the Git repository, calling the GitHub sub agent
9:56 for this. And so remember, we're using sub aents for context isolation. So
10:00 we're not bloating the main context window for our primary orchestrator
10:05 here. And so yeah, there's going to be a lot that it does here. It'll go on for a
10:09 while. And so while we wait for this, I'm going to go back to our diagrams
10:12 here because I want to show you exactly how this works. I think the diagrams are
10:17 a lot better of a visual than just watching the longs as it's running. And
10:21 of course, I'll show you the project once it's done, but let's cover this in
10:24 the meantime. So, going to the original harness here, I want to talk about what
10:28 Anthropic built to set the stage for how I've improved it to create our full AI
10:33 engineer. And so, we start with the appsp spec as the primary context that
10:38 goes into our initializer agent. And most of the harnesses that I've seen
10:41 over the past few months, they always start with an initializer. Because
10:45 before we get into the main loop of implementing all of the features that we
10:49 have in linear or in this case our local feature list, we need to set the stage
10:53 for our project. We need something to create those features in the first
10:57 place. And I don't know if you saw that blip there for a sec, but it actually
11:00 popped up the browser because it was validating our code behind the scenes
11:04 with Playright. So anyway, with our initializer agent here, it creates the
11:08 feature list. It's everything we have to knock out that we laid out in our
11:12 appspec. It creates a way to initialize the project and it scaffolds the project
11:18 and the git repository. And so these are the core artifacts that we have after
11:22 the initializer runs. We have the source of truth for everything that has to be
11:26 built. And our coding agents when it knocks out all the features, it'll go
11:30 back here and update things. And so this is our place to keep track of what have
11:34 we built already, what do we still have to build. And then for the session
11:38 handoff, we have a simple text file, which I appreciate the simplicity of
11:42 this harness. But I think there really is a a big use case to have the agent
11:48 work where we actually work, which is why I wanted to build this. But anyway,
11:52 I'll wrap up here with the coding agent loop. Every single time the agent runs,
11:56 we're running in a fresh context window. The whole point of this agent being able
12:00 to go for a longer time is that we're stringing together different agent
12:04 sessions and each one of them we want to start over so that we have fresh
12:08 context. So it starts by getting its bearings on the codebase and so reading
12:12 the feature listing like okay what should we build next? It'll do
12:15 regression testing. This is important for reliability of the harness because a
12:19 lot of times one agent is going to break what a different agent worked on
12:22 earlier. And then after it validates that then it'll pick the next feature
12:26 implement it update and commit which includes making the get commit and then
12:31 updating these two files as well. And so what I've built is very similar. I mean
12:35 you can even see that I I purposely have a the same architecture for the diagram
12:40 here but there are some big differences because of the service integrations and
12:44 how I'm using sub aents to orchestrate everything. So we still start with the
12:49 appspec going into an initializer agent. But now like we saw in the logs earlier,
12:53 it's delegating to the linear agent to set up the project in linear and all of
12:58 the issues. And then just so that we know for our codebase like what linear
13:03 project are we tied to? We also have a single local file. So for the most part,
13:07 I'm avoiding local files. I don't have all of these files, but we need at least
13:11 one file to point us to the right project ID. And then we'll also create
13:16 the meta linear issue. So this is replacing our cloud progress. And then
13:21 we'll create that git repo with our GitHub sub agent. And so now linear is
13:25 our source of truth instead of these local files. And so now when each agent
13:29 runs, it's going to start by reading the linear project. So that way it knows
13:33 what is our project in linear. It'll call the linear agent to then find,
13:37 okay, what are the features that we should validate? What should we pick up
13:40 next? And we're using Arcade for authentication. So the agent has access
13:45 to all of these services. And so then it'll do that implementation, use the
13:49 GitHub agent to push, and then we can also use the Slack sub agent to give a
13:54 progress update. And we're just going to loop over and over and over again until
14:00 every single task in linear is done. And I've set it up in a way for most of the
14:05 time it's going to just do one task at a time. But if the agent figures out it's
14:08 simple enough, it might actually just try to knock out multiple of them in a
14:12 single session. And this is all configurable in the prompts that we'll
14:16 get into as well. So the last thing I want to cover while we wait for our
14:19 harness to complete is the architecture and how you can tweak things for
14:24 yourself. Every single agent that we have in this harness for coding the
14:28 different services, they are controlled by these prompts that we have in the
14:32 prompts folder. And so when we create our agent, we're using the cloud agents
14:37 SDK. So we're defining everything in code. We're not using our cloud folder
14:41 like you would with cloud code. We have our system prompt loaded in right here.
14:47 So we're loading in from this file. And so our orchestrator, this is our system
14:51 prompt where we're describing. We're building from the appspec. Here are the
14:55 sub aents that we have access to. Here's what our workflow looks like. All that's
14:59 defined in the system prompt. And then when we are in our very first session,
15:04 that's when we use the initializer task. And so I'll show you here in the code. I
15:07 I promise I'll stay pretty high level with the code here. We're seeing is this
15:11 our first run? Do we have things initialized in linear or not? If it is
15:16 our first run, then this function is going to load in the prompt from this
15:20 file. So we're controlling with markdown files just like you would with sub aents
15:24 in cloud code. And then otherwise we're going to load the continuation task. And
15:30 so this is what we run every single loop when we're going to build that next
15:34 feature. So we read that linear project. We know what linear project we're
15:37 working with. We delegate [snorts] to the linear agent to figure out what we
15:40 should work on next. everything that I've already explained in the diagram.
15:42 I'm now just showing you how this maps to the prompts that we have for all the
15:47 sub aents here so that you can tweak all this for yourself. You can connect more
15:50 services, change how often it communicates in Slack, anything that you
15:55 want to do. And so the last thing that I want to show you here is that instead of
15:59 defining our MCP servers and our sub agents in thecloud folder, we're doing
16:04 it here in our cloud agent SDK definition. And so we're connecting to
16:09 obviously our arcade MCP gateway and then the Playright MCP server. Same kind
16:13 of way that you configure the configuration with something like claw
16:17 desktop for example. And then we have all of our agent definitions right here.
16:21 So this is being imported from this file. It is super easy to add on more
16:26 sub agents if you want because for every agent we just give it the description.
16:30 This is how our orchestrator knows when to call upon the sub aent. We are
16:34 loading the prompt from the file. Like for our linear agent, we're loading it
16:38 directly from the linear agent prompt right here. Just speaking to like how we
16:42 manage issues and projects and things like that, we have the tools, the tools
16:47 that are it's allowed to use with the arcade mcp. And then finally, the model.
16:52 So this is from ourv. We can use haiku, sonnet or opus. And so we just build up
16:56 these agent definitions here. So you can change the prompts, change the
17:00 description, add in another one. Very easy to configure. And that's all just
17:03 brought into our agent automatically. And so it really is all of these
17:07 markdown documents that define the entire flow. Really just using the
17:12 Claude agent SDK as the wrapper around these different prompts, connecting
17:16 everything together into this pretty elaborate system that's able to handle a
17:20 lot. Like going back here, we're not done quite yet, but we finished three
17:24 out of the five issues for this simple Pomodoro timer app. I'll come back once
17:28 everything is done so we can see the full example now that you know how it
17:32 all works and how you can extend this yourself. And here we go. The big
17:36 reveal. The application that we've been creating throughout this video is
17:40 complete. And interestingly enough, because this application was so
17:43 incredibly simple, it decided to build everything in the initializer session,
17:48 which I actually prompted it to do so if it determined it was simple enough just
17:52 to show you how dynamic this system can be. And of course, I showed you the more
17:56 complex app earlier where it did have to do many different sessions for 44 tasks.
18:00 But yeah, our application looks really good. We can start it here. We can pause
18:05 it, skip to our break. The Pomodoro technique is really awesome for
18:08 productivity, by the way. But yeah, we got our update in Slack that the project
18:12 is complete. It links to our GitHub repository here where we have six
18:16 commits, one for the initialization and then one for each of our tasks. And of
18:20 course, they're all marked as done with our progress tracking filled out as
18:24 well. So we started the initialization and then this is the project complete
18:28 status at the end. Super super cool. We built this entire thing just during this
18:33 video as I was covering the code and our diagrams. So I want to end this video by
18:38 talking about the future of AI coding with these harnesses and some things
18:41 that I'm working on myself because here's the thing. I hope that in
18:45 following this video you're inspired to try this harness yourself, even build on
18:49 top of it. And I hope I made that clear enough for you. But in the end, what's
18:54 most powerful is building AI coding workflows and harnesses that are
18:58 specific to your use case, exactly how you want to manage tasks, how you want
19:02 to share context between different sessions. I I really believe that if you
19:07 build your own optimized workflow, it's going to be way better than anything
19:11 that's off the shelf. But there's nothing that really helps you build that
19:14 right now. And it's such a powerful concept. And so that's what I'm going to
19:18 be working on. So, my open source project, Archon, I worked on this a lot
19:23 last year. This is my command center for AI coding. It gained a lot of traction.
19:26 I know it's not as many stars as something like OpenClaw, but I was
19:29 really happy with the traction that it gained, but it's really not as relevant
19:34 of a tool right now because it was all about task management and rag for AI
19:38 coding. But task management is getting built into all these tools like cloud
19:41 code and coding agents these days are so good at looking up documentation that
19:46 rag just isn't as important for coding specifically. And so I want to keep the
19:51 vision of archon being the command center for AI coding but I want to turn
19:56 it into the N8N for AI coding being able to define and orchestrate your own AI
20:01 coding workflows and harnesses so you can build something like this really
20:05 easily but actually make it custom to you. So, that's what I'm working on
20:08 behind the scenes right now. I know there hasn't been a lot of updates with
20:11 Archon because I've been shifting the vision, but I'm super excited for that.
20:15 And so, if you appreciate this video and you're looking forward to more things on
20:19 AI coding and these harnesses, I would really appreciate a like and a
20:23 subscribe. And with that, I will see you
$

Turn Claude Code into Your Full Engineering Team with Subagents

@ColeMedin 20:24 8 chapters
[AI agents and automation][developer tools and coding][hardware setup and infrastructure][marketing and growth hacking][productivity and workflows]
// chapters
// description

Agent harnesses are the future of agentic coding and how we can build our coding agent into a full blown AI engineer. Just like a human engineer, we need it to manage issues, create branches, open PRs, notify you when things are done, pick up the next task automatically, etc. That's what I've been experimenting with using subagents, the Anthropic harness for long-running agents, and Arcade's MCP gateway. I've taken my Claude Code to the next level - it tracks progress in Linear, works with repo

now: 0:00
// tags
[AI agents and automation][developer tools and coding][hardware setup and infrastructure][marketing and growth hacking][productivity and workflows]