// transcript — 540 segments
0:00 Introduction to Agent Harnesses
0:02 Last month, I covered agent harnesses and why they're the next evolution for
0:07 AI agents, especially for agentic coding. The idea here is simple. If we
0:11 give too large of a request to our coding agent, even if we have a lot of
0:16 context engineering, the agent is going to completely fall on its face. And it's
0:21 all about context management. Agents don't do that well when you start to
0:24 fill their context window. It is the most precious resource when we are
0:29 engineering with them. And so that's what brings us to the idea of an agent
0:33 harness. It's really a wrapper that we build of persistence and progress
0:38 tracking over our coding agent. So that way we're able to string together
0:43 multiple different sessions with state management, a git workflow. It can get
0:47 pretty elaborate, but it allows us to extend how much we're able to send into
0:53 a system at once. And this really is the future of AI coding. If we're going to
0:57 push the boundaries of what is possible with our coding agents, it's going to be
1:01 with a harness as a wrapper. But there is a big problem here because if we're
1:06 building this harness like this is anthropics one that we'll talk about in
1:09 this video, we're trying to push the boundaries of our coding agent turning
1:14 it essentially into a full-on engineer. But engineers do a lot more than just
1:20 coding. They also communicate in a platform like Slack, giving us updates
1:24 on the progress. They manage the tasks in something like Linear or Jira.
1:27 They're maintaining the GitHub repository. We need all of these things
1:32 in the tool belt for our agent for it to be a true AI engineer. And this diagram,
1:36 what you're looking at right here is actually what I've built to show you
1:40 right now. I've been experimenting with some ideas here. How can we take an
1:44 agent harness and build a tool belt into it so that it can really be a full
1:48 engineer? So, I'll show you how this works right now, how you can extend this
1:52 for yourself. and stick around to the end of the video as well because I'll
1:55 talk about how this really is the future of Agentic Coding. Some big things that
1:59 I'm working on personally as well. And of course, this entire harness I have as
2:03 a GitHub repository for you, which I'll link to in the description. So, I
2:07 encourage you to try it out and even extend it yourself. I made it super easy
2:11 to tweak all the different sub aents that we'll talk about in a little bit to
2:14 connect to the different services. In the read me here, there's a really quick
2:19 setup guide. I'm also using Arcade. This is the platform to make it super easy
2:23 for us to connect to Linear, GitHub, and Slack through MCP. So, I'll talk about
2:27 that a bit more as well. Once you have this all set up, all you have to do to
2:32 send the context into the harness to begin is create an appspec. You can
2:36 think of this like a PRD. It's all of the features that you want it to build
2:42 autonomously in the harness loop. And so, you want to take this appspec and
2:46 use it as an example. So give it to your coding agent because there is a specific
2:50 format that works best for this harness. The biggest thing here is we have our
2:55 task list in a special JSON format. This is the official recommendation from
3:00 Anthropic because I've built my harness on top of Anthropic's harness for
3:04 longunning tasks that they open sourced at the end of last year. And of course
3:08 that does mean that I am using the Claude agents SDK to run this harness,
3:13 but you can use your Anthropic subscription. So really cost effective
3:17 and the Cloud Agent SDK is powering all of the harness experimentation I'm doing
3:22 right now. So for this app specifically, just to give you a really cool example
3:25 what this harness can build, I'm extending my second brain. It's yet
3:28 another thing I've covered on my channel recently. I want to build a dashboard
3:33 where I can paste in a bunch of research that my second brain has done and then
3:38 it'll in real time generate a layout that's unique to the specific research
3:42 that I gave it. So, I can glean insights really quickly. And boom, take a look at
3:47 that. We have a beautiful TLDDR for this pretty extensive research document. I
3:51 It's like 2,000 words in total. We can view the full thing as well. And this is
3:56 not a simple application. There is an agent behind the scenes deciding the
4:00 components to generate in real time to customize the dashboard based on what we
4:06 pasted in. And so using the harness to build this, it decided to create 44
4:12 tasks in total in linear. And so I ran all of this already. So everything is
4:16 done. So we can see all the tasks here. And then we also have the progress
4:20 tracker meta task. And so we need to hand off to the next agent session every
4:25 time we go through that loop in the harness. And so we need to let the next
4:29 agent know what did we do right now so that it can pick up where we left off.
4:33 It's also managing the GitHub repository. We got pull requests. It's
4:37 making a commit for every single feature that it built. That's really cool. You
4:41 can tweak this to your heart's content as well. And we're providing updates in
4:45 Slack. And so, for the sake of simplicity, I just have it message me
4:50 after the first and second sessions. And then when my application is fully
4:53 complete, so then I can come back to my computer to test everything myself, just
4:57 like you would do when you're reviewing the output from a real engineer. So we
5:01 have everything managed in linear everything in GitHub and then letting us
5:05 know when things are done. This is just beautiful to me. And by the way, I just
5:09 want you to know that like this is just the starting point for a harness. A lot
5:12 more work that I'm doing on top of this. A lot of ways you could extend this as
5:16 well. Another really good example is you could build the harness to just watch
5:20 24/7 for any issues that you create in linear and then it would pick those up
5:24 automatically. And so you can change the way that you interact with this harness.
5:28 The sky is really the limit for the way that you build it into these tools. You
5:31 could even have it work with GitHub issues, add in some other platform you
5:35 have like Aana or Jira. It's entirely up to you. All right, so with that, let's
5:39 now get into running this harness. We'll even do a live demo on a simpler
5:42 application and then of course I'll show you how this all works. I want you to
5:45 learn from this and see how you can extend it yourself. And so like I said
5:49 earlier, the readme is really easy to follow. You just set up your virtual
5:52 environment. Make sure you have Cloud Code installed and that you've logged in
5:56 because this harness is going to use the same subscription that you have with
6:00 Cloud Code. So, really easy there. The main thing that I want to cover right
6:05 now is setting up your env. So, Arcade is our ticket here to connect super
6:09 easily to linear Slack and GitHub. That's why I wanted to include it
6:12 because then we don't have to set up all of the individual MCP servers. And so,
6:16 you could change this harness to use those directly if you want. But Arcade
6:20 has a free tier. They also implement what's called agent authorization. So
6:24 they walk us through the OOTH flows really easily with these different
6:27 services. So we could even share this harness with our team members with our
6:32 Arcade MCP gateway. And they don't have to create a new linear API key and a new
6:36 Slack app, but we also don't have to share those credentials with them. So
6:39 it's a really really powerful platform. And so once you're signed in on the free
6:43 tier, you just create your MCP gateway. You give it a name, description, LLM
6:47 instructions. For the authentication, set it to arcade headers. And then for
6:51 the allowed tools, look at this. Boom. We got GitHub. I'll search for linear.
6:55 And then we got linear. And then finally, Slack. It is that easy to add
7:00 in all 91 tools. And by the way, we are using the new tool discovery for MCP and
7:04 Cloud Code. So, it's not like we're just dumping 91 tool definitions directly
7:09 into our coding agent. That would not be contexts efficient. And so, there we go.
7:12 You can create this. I'm just going to use the one that I already have. copy
7:17 your URL because you set that as one of your environment variables and then you
7:20 get your API key from the dashboard as well. That easy to get everything set
7:24 up. Then just use your email here. We can also configure the specific GitHub
7:28 repo that the harness leverages. So generally what I do is I'll create a
7:33 empty repo and then add it in here. And then you can define a slack channel for
7:37 updates too. And you can even change the model that each of our sub aents are
7:42 using for coding linear GitHub. And so we can make things really cost effective
7:45 or just really fast, right? Like we just want to really quickly create things in
7:49 linear. So let's just use haiku for the model. So do all that configuration and
7:54 then you'll run the authorize arcade script. So you just have to do this one
7:58 time because then it'll go through the OOTH flow. So the harness now has access
8:03 to your linear project, your Slack channel, and the GitHub repo that you're
8:07 working in. And then with all of that taken care of, we can run our harness.
8:11 Just a single command that we need to run to send our appspec into the
8:17 harness. And so make sure that you have your appspec fully fleshed out with the
8:20 help of your coding agent because looking at the first prompt here that's
8:25 sent to our initializer agent is going to read the appspec to understand what
8:28 we're building. So this is the single source of truth initially before we have
8:33 everything set up in linear. And now I'm using WSL here because sub aents don't
8:37 actually work that well in Windows with the cloud agent SDK. So use WSL Mac or
8:42 Linux to run this. And so I'm going to activate my virtual environment here if
8:47 I can type. There we go. All right. And then I'll run the command to kick off
8:51 the agent. And then I'm just going to specify the directory here. So it's
8:55 going to create this from scratch in the generations folder. So this is the
8:59 default location for all of the projects that it creates. And so I'll send this
9:02 off and it's going to kick off the initializer agent to scaffold everything
9:08 for our project linear the GitHub repo the initial configuration for our
9:11 codebase. I'll come back once it's done some of that. All right, take a look. So
9:16 it delegated to the linear agent to get things set up for us. So it starts the
9:20 project initially and now it's building all of these tasks. And so if I go to my
9:25 projects here, we got our new Pomodoro timer task or project. So if I go to the
9:29 issues here, there's six right now. And it's going to create more and more.
9:32 Maybe actually probably only need six for this cuz it's a really simple
9:36 application. So it created the five to build out the app. And then we have the
9:40 meta project progress tracker as well. So this is where we're going to update
9:44 things with our progress over time as we're handing off between the different
9:49 sessions for the harness. So all the setup is done in linear. And now it's
9:52 moving on to initializing the Git repository, calling the GitHub sub agent
9:56 for this. And so remember, we're using sub aents for context isolation. So
10:00 we're not bloating the main context window for our primary orchestrator
10:05 here. And so yeah, there's going to be a lot that it does here. It'll go on for a
10:09 while. And so while we wait for this, I'm going to go back to our diagrams
10:11 How the Agent Harness Works (Diagram)
10:12 here because I want to show you exactly how this works. I think the diagrams are
10:17 a lot better of a visual than just watching the longs as it's running. And
10:21 of course, I'll show you the project once it's done, but let's cover this in
10:24 the meantime. So, going to the original harness here, I want to talk about what
10:28 Anthropic built to set the stage for how I've improved it to create our full AI
10:33 engineer. And so, we start with the appsp spec as the primary context that
10:38 goes into our initializer agent. And most of the harnesses that I've seen
10:41 over the past few months, they always start with an initializer. Because
10:45 before we get into the main loop of implementing all of the features that we
10:49 have in linear or in this case our local feature list, we need to set the stage
10:53 for our project. We need something to create those features in the first
10:57 place. And I don't know if you saw that blip there for a sec, but it actually
11:00 popped up the browser because it was validating our code behind the scenes
11:04 with Playright. So anyway, with our initializer agent here, it creates the
11:08 feature list. It's everything we have to knock out that we laid out in our
11:12 appspec. It creates a way to initialize the project and it scaffolds the project
11:18 and the git repository. And so these are the core artifacts that we have after
11:22 the initializer runs. We have the source of truth for everything that has to be
11:26 built. And our coding agents when it knocks out all the features, it'll go
11:30 back here and update things. And so this is our place to keep track of what have
11:34 we built already, what do we still have to build. And then for the session
11:38 handoff, we have a simple text file, which I appreciate the simplicity of
11:42 this harness. But I think there really is a a big use case to have the agent
11:48 work where we actually work, which is why I wanted to build this. But anyway,
11:52 I'll wrap up here with the coding agent loop. Every single time the agent runs,
11:56 we're running in a fresh context window. The whole point of this agent being able
12:00 to go for a longer time is that we're stringing together different agent
12:04 sessions and each one of them we want to start over so that we have fresh
12:08 context. So it starts by getting its bearings on the codebase and so reading
12:12 the feature listing like okay what should we build next? It'll do
12:15 regression testing. This is important for reliability of the harness because a
12:19 lot of times one agent is going to break what a different agent worked on
12:22 earlier. And then after it validates that then it'll pick the next feature
12:26 implement it update and commit which includes making the get commit and then
12:31 updating these two files as well. And so what I've built is very similar. I mean
12:35 you can even see that I I purposely have a the same architecture for the diagram
12:40 here but there are some big differences because of the service integrations and
12:44 how I'm using sub aents to orchestrate everything. So we still start with the
12:49 appspec going into an initializer agent. But now like we saw in the logs earlier,
12:53 it's delegating to the linear agent to set up the project in linear and all of
12:58 the issues. And then just so that we know for our codebase like what linear
13:03 project are we tied to? We also have a single local file. So for the most part,
13:07 I'm avoiding local files. I don't have all of these files, but we need at least
13:11 one file to point us to the right project ID. And then we'll also create
13:16 the meta linear issue. So this is replacing our cloud progress. And then
13:21 we'll create that git repo with our GitHub sub agent. And so now linear is
13:25 our source of truth instead of these local files. And so now when each agent
13:29 runs, it's going to start by reading the linear project. So that way it knows
13:33 what is our project in linear. It'll call the linear agent to then find,
13:37 okay, what are the features that we should validate? What should we pick up
13:40 next? And we're using Arcade for authentication. So the agent has access
13:45 to all of these services. And so then it'll do that implementation, use the
13:49 GitHub agent to push, and then we can also use the Slack sub agent to give a
13:54 progress update. And we're just going to loop over and over and over again until
14:00 every single task in linear is done. And I've set it up in a way for most of the
14:05 time it's going to just do one task at a time. But if the agent figures out it's
14:08 simple enough, it might actually just try to knock out multiple of them in a
14:12 single session. And this is all configurable in the prompts that we'll
14:15 Customizing Prompts and Subagents
14:16 get into as well. So the last thing I want to cover while we wait for our
14:19 harness to complete is the architecture and how you can tweak things for
14:24 yourself. Every single agent that we have in this harness for coding the
14:28 different services, they are controlled by these prompts that we have in the
14:32 prompts folder. And so when we create our agent, we're using the cloud agents
14:37 SDK. So we're defining everything in code. We're not using our cloud folder
14:41 like you would with cloud code. We have our system prompt loaded in right here.
14:47 So we're loading in from this file. And so our orchestrator, this is our system
14:51 prompt where we're describing. We're building from the appspec. Here are the
14:55 sub aents that we have access to. Here's what our workflow looks like. All that's
14:59 defined in the system prompt. And then when we are in our very first session,
15:04 that's when we use the initializer task. And so I'll show you here in the code. I
15:07 I promise I'll stay pretty high level with the code here. We're seeing is this
15:11 our first run? Do we have things initialized in linear or not? If it is
15:16 our first run, then this function is going to load in the prompt from this
15:20 file. So we're controlling with markdown files just like you would with sub aents
15:24 in cloud code. And then otherwise we're going to load the continuation task. And
15:30 so this is what we run every single loop when we're going to build that next
15:34 feature. So we read that linear project. We know what linear project we're
15:37 working with. We delegate [snorts] to the linear agent to figure out what we
15:40 should work on next. everything that I've already explained in the diagram.
15:42 I'm now just showing you how this maps to the prompts that we have for all the
15:47 sub aents here so that you can tweak all this for yourself. You can connect more
15:50 services, change how often it communicates in Slack, anything that you
15:55 want to do. And so the last thing that I want to show you here is that instead of
15:59 defining our MCP servers and our sub agents in thecloud folder, we're doing
16:04 it here in our cloud agent SDK definition. And so we're connecting to
16:09 obviously our arcade MCP gateway and then the Playright MCP server. Same kind
16:13 of way that you configure the configuration with something like claw
16:17 desktop for example. And then we have all of our agent definitions right here.
16:21 So this is being imported from this file. It is super easy to add on more
16:26 sub agents if you want because for every agent we just give it the description.
16:30 This is how our orchestrator knows when to call upon the sub aent. We are
16:34 loading the prompt from the file. Like for our linear agent, we're loading it
16:38 directly from the linear agent prompt right here. Just speaking to like how we
16:42 manage issues and projects and things like that, we have the tools, the tools
16:47 that are it's allowed to use with the arcade mcp. And then finally, the model.
16:52 So this is from ourv. We can use haiku, sonnet or opus. And so we just build up
16:56 these agent definitions here. So you can change the prompts, change the
17:00 description, add in another one. Very easy to configure. And that's all just
17:03 brought into our agent automatically. And so it really is all of these
17:07 markdown documents that define the entire flow. Really just using the
17:12 Claude agent SDK as the wrapper around these different prompts, connecting
17:16 everything together into this pretty elaborate system that's able to handle a
17:20 lot. Like going back here, we're not done quite yet, but we finished three
17:24 out of the five issues for this simple Pomodoro timer app. I'll come back once
17:28 everything is done so we can see the full example now that you know how it
17:32 all works and how you can extend this yourself. And here we go. The big
17:33 Final Results and App Review
17:36 reveal. The application that we've been creating throughout this video is
17:40 complete. And interestingly enough, because this application was so
17:43 incredibly simple, it decided to build everything in the initializer session,
17:48 which I actually prompted it to do so if it determined it was simple enough just
17:52 to show you how dynamic this system can be. And of course, I showed you the more
17:56 complex app earlier where it did have to do many different sessions for 44 tasks.
18:00 But yeah, our application looks really good. We can start it here. We can pause
18:05 it, skip to our break. The Pomodoro technique is really awesome for
18:08 productivity, by the way. But yeah, we got our update in Slack that the project
18:12 is complete. It links to our GitHub repository here where we have six
18:16 commits, one for the initialization and then one for each of our tasks. And of
18:20 course, they're all marked as done with our progress tracking filled out as
18:24 well. So we started the initialization and then this is the project complete
18:28 status at the end. Super super cool. We built this entire thing just during this
18:33 video as I was covering the code and our diagrams. So I want to end this video by
18:34 Future of AI Coding Workflows
1:35 Building an AI Engineer (How it Works)
1:36 what you're looking at right here is actually what I've built to show you
1:40 right now. I've been experimenting with some ideas here. How can we take an
1:44 agent harness and build a tool belt into it so that it can really be a full
1:48 engineer? So, I'll show you how this works right now, how you can extend this
1:52 for yourself. and stick around to the end of the video as well because I'll
1:55 talk about how this really is the future of Agentic Coding. Some big things that
1:59 I'm working on personally as well. And of course, this entire harness I have as
2:03 a GitHub repository for you, which I'll link to in the description. So, I
2:07 encourage you to try it out and even extend it yourself. I made it super easy
2:11 to tweak all the different sub aents that we'll talk about in a little bit to
2:14 connect to the different services. In the read me here, there's a really quick
2:19 setup guide. I'm also using Arcade. This is the platform to make it super easy
2:23 for us to connect to Linear, GitHub, and Slack through MCP. So, I'll talk about
2:27 that a bit more as well. Once you have this all set up, all you have to do to
2:32 send the context into the harness to begin is create an appspec. You can
2:36 think of this like a PRD. It's all of the features that you want it to build
2:42 autonomously in the harness loop. And so, you want to take this appspec and
2:46 use it as an example. So give it to your coding agent because there is a specific
2:50 format that works best for this harness. The biggest thing here is we have our
2:55 task list in a special JSON format. This is the official recommendation from
3:00 Anthropic because I've built my harness on top of Anthropic's harness for
3:04 longunning tasks that they open sourced at the end of last year. And of course
3:08 that does mean that I am using the Claude agents SDK to run this harness,
3:13 but you can use your Anthropic subscription. So really cost effective
3:17 and the Cloud Agent SDK is powering all of the harness experimentation I'm doing
3:22 right now. So for this app specifically, just to give you a really cool example
3:25 what this harness can build, I'm extending my second brain. It's yet
3:28 another thing I've covered on my channel recently. I want to build a dashboard
3:33 where I can paste in a bunch of research that my second brain has done and then
3:38 it'll in real time generate a layout that's unique to the specific research
3:42 that I gave it. So, I can glean insights really quickly. And boom, take a look at
3:47 that. We have a beautiful TLDDR for this pretty extensive research document. I
3:51 It's like 2,000 words in total. We can view the full thing as well. And this is
3:56 not a simple application. There is an agent behind the scenes deciding the
4:00 components to generate in real time to customize the dashboard based on what we
4:06 pasted in. And so using the harness to build this, it decided to create 44
4:12 tasks in total in linear. And so I ran all of this already. So everything is
4:16 done. So we can see all the tasks here. And then we also have the progress
4:20 tracker meta task. And so we need to hand off to the next agent session every
4:25 time we go through that loop in the harness. And so we need to let the next
4:29 agent know what did we do right now so that it can pick up where we left off.
4:33 It's also managing the GitHub repository. We got pull requests. It's
4:37 making a commit for every single feature that it built. That's really cool. You
4:41 can tweak this to your heart's content as well. And we're providing updates in
4:45 Slack. And so, for the sake of simplicity, I just have it message me
4:50 after the first and second sessions. And then when my application is fully
4:53 complete, so then I can come back to my computer to test everything myself, just
4:57 like you would do when you're reviewing the output from a real engineer. So we
5:01 have everything managed in linear everything in GitHub and then letting us
5:05 know when things are done. This is just beautiful to me. And by the way, I just
5:09 want you to know that like this is just the starting point for a harness. A lot
5:12 more work that I'm doing on top of this. A lot of ways you could extend this as
5:16 well. Another really good example is you could build the harness to just watch
5:20 24/7 for any issues that you create in linear and then it would pick those up
5:24 automatically. And so you can change the way that you interact with this harness.
5:28 The sky is really the limit for the way that you build it into these tools. You
5:31 could even have it work with GitHub issues, add in some other platform you
5:35 Setting Up Our AI Engineer Harness
5:35 have like Aana or Jira. It's entirely up to you. All right, so with that, let's
5:39 now get into running this harness. We'll even do a live demo on a simpler
5:42 application and then of course I'll show you how this all works. I want you to
5:45 learn from this and see how you can extend it yourself. And so like I said
5:49 earlier, the readme is really easy to follow. You just set up your virtual
5:52 environment. Make sure you have Cloud Code installed and that you've logged in
5:56 because this harness is going to use the same subscription that you have with
6:00 Cloud Code. So, really easy there. The main thing that I want to cover right
6:05 now is setting up your env. So, Arcade is our ticket here to connect super
6:09 easily to linear Slack and GitHub. That's why I wanted to include it
6:12 because then we don't have to set up all of the individual MCP servers. And so,
6:16 you could change this harness to use those directly if you want. But Arcade
6:20 has a free tier. They also implement what's called agent authorization. So
6:24 they walk us through the OOTH flows really easily with these different
6:27 services. So we could even share this harness with our team members with our
6:32 Arcade MCP gateway. And they don't have to create a new linear API key and a new
6:36 Slack app, but we also don't have to share those credentials with them. So
6:39 it's a really really powerful platform. And so once you're signed in on the free
6:43 tier, you just create your MCP gateway. You give it a name, description, LLM
6:47 instructions. For the authentication, set it to arcade headers. And then for
6:51 the allowed tools, look at this. Boom. We got GitHub. I'll search for linear.
6:55 And then we got linear. And then finally, Slack. It is that easy to add
7:00 in all 91 tools. And by the way, we are using the new tool discovery for MCP and
7:04 Cloud Code. So, it's not like we're just dumping 91 tool definitions directly
7:09 into our coding agent. That would not be contexts efficient. And so, there we go.
7:12 You can create this. I'm just going to use the one that I already have. copy
7:17 your URL because you set that as one of your environment variables and then you
7:20 get your API key from the dashboard as well. That easy to get everything set
7:24 up. Then just use your email here. We can also configure the specific GitHub
7:28 repo that the harness leverages. So generally what I do is I'll create a
7:33 empty repo and then add it in here. And then you can define a slack channel for
7:37 updates too. And you can even change the model that each of our sub aents are
7:42 using for coding linear GitHub. And so we can make things really cost effective
7:45 or just really fast, right? Like we just want to really quickly create things in
7:49 linear. So let's just use haiku for the model. So do all that configuration and
7:54 then you'll run the authorize arcade script. So you just have to do this one
7:58 time because then it'll go through the OOTH flow. So the harness now has access
8:03 to your linear project, your Slack channel, and the GitHub repo that you're
8:06 Running the Harness Live
8:07 working in. And then with all of that taken care of, we can run our harness.
8:11 Just a single command that we need to run to send our appspec into the
8:17 harness. And so make sure that you have your appspec fully fleshed out with the
8:20 help of your coding agent because looking at the first prompt here that's
8:25 sent to our initializer agent is going to read the appspec to understand what
8:28 we're building. So this is the single source of truth initially before we have
8:33 everything set up in linear. And now I'm using WSL here because sub aents don't
8:37 actually work that well in Windows with the cloud agent SDK. So use WSL Mac or
8:42 Linux to run this. And so I'm going to activate my virtual environment here if
8:47 I can type. There we go. All right. And then I'll run the command to kick off
8:51 the agent. And then I'm just going to specify the directory here. So it's
8:55 going to create this from scratch in the generations folder. So this is the
8:59 default location for all of the projects that it creates. And so I'll send this
9:02 off and it's going to kick off the initializer agent to scaffold everything
9:08 for our project linear the GitHub repo the initial configuration for our
9:11 codebase. I'll come back once it's done some of that. All right, take a look. So
9:16 it delegated to the linear agent to get things set up for us. So it starts the
9:20 project initially and now it's building all of these tasks. And so if I go to my
9:25 projects here, we got our new Pomodoro timer task or project. So if I go to the
9:29 issues here, there's six right now. And it's going to create more and more.
9:32 Maybe actually probably only need six for this cuz it's a really simple
9:36 application. So it created the five to build out the app. And then we have the
9:40 meta project progress tracker as well. So this is where we're going to update
9:44 things with our progress over time as we're handing off between the different
9:49 sessions for the harness. So all the setup is done in linear. And now it's
9:52 moving on to initializing the Git repository, calling the GitHub sub agent
9:56 for this. And so remember, we're using sub aents for context isolation. So
10:00 we're not bloating the main context window for our primary orchestrator
10:05 here. And so yeah, there's going to be a lot that it does here. It'll go on for a
10:09 while. And so while we wait for this, I'm going to go back to our diagrams
10:12 here because I want to show you exactly how this works. I think the diagrams are
10:17 a lot better of a visual than just watching the longs as it's running. And
10:21 of course, I'll show you the project once it's done, but let's cover this in
10:24 the meantime. So, going to the original harness here, I want to talk about what
10:28 Anthropic built to set the stage for how I've improved it to create our full AI
10:33 engineer. And so, we start with the appsp spec as the primary context that
10:38 goes into our initializer agent. And most of the harnesses that I've seen
10:41 over the past few months, they always start with an initializer. Because
10:45 before we get into the main loop of implementing all of the features that we
10:49 have in linear or in this case our local feature list, we need to set the stage
10:53 for our project. We need something to create those features in the first
10:57 place. And I don't know if you saw that blip there for a sec, but it actually
11:00 popped up the browser because it was validating our code behind the scenes
11:04 with Playright. So anyway, with our initializer agent here, it creates the
11:08 feature list. It's everything we have to knock out that we laid out in our
11:12 appspec. It creates a way to initialize the project and it scaffolds the project
11:18 and the git repository. And so these are the core artifacts that we have after
11:22 the initializer runs. We have the source of truth for everything that has to be
11:26 built. And our coding agents when it knocks out all the features, it'll go
11:30 back here and update things. And so this is our place to keep track of what have
11:34 we built already, what do we still have to build. And then for the session
11:38 handoff, we have a simple text file, which I appreciate the simplicity of
11:42 this harness. But I think there really is a a big use case to have the agent
11:48 work where we actually work, which is why I wanted to build this. But anyway,
11:52 I'll wrap up here with the coding agent loop. Every single time the agent runs,
11:56 we're running in a fresh context window. The whole point of this agent being able
12:00 to go for a longer time is that we're stringing together different agent
12:04 sessions and each one of them we want to start over so that we have fresh
12:08 context. So it starts by getting its bearings on the codebase and so reading
12:12 the feature listing like okay what should we build next? It'll do
12:15 regression testing. This is important for reliability of the harness because a
12:19 lot of times one agent is going to break what a different agent worked on
12:22 earlier. And then after it validates that then it'll pick the next feature
12:26 implement it update and commit which includes making the get commit and then
12:31 updating these two files as well. And so what I've built is very similar. I mean
12:35 you can even see that I I purposely have a the same architecture for the diagram
12:40 here but there are some big differences because of the service integrations and
12:44 how I'm using sub aents to orchestrate everything. So we still start with the
12:49 appspec going into an initializer agent. But now like we saw in the logs earlier,
12:53 it's delegating to the linear agent to set up the project in linear and all of
12:58 the issues. And then just so that we know for our codebase like what linear
13:03 project are we tied to? We also have a single local file. So for the most part,
13:07 I'm avoiding local files. I don't have all of these files, but we need at least
13:11 one file to point us to the right project ID. And then we'll also create
13:16 the meta linear issue. So this is replacing our cloud progress. And then
13:21 we'll create that git repo with our GitHub sub agent. And so now linear is
13:25 our source of truth instead of these local files. And so now when each agent
13:29 runs, it's going to start by reading the linear project. So that way it knows
13:33 what is our project in linear. It'll call the linear agent to then find,
13:37 okay, what are the features that we should validate? What should we pick up
13:40 next? And we're using Arcade for authentication. So the agent has access
13:45 to all of these services. And so then it'll do that implementation, use the
13:49 GitHub agent to push, and then we can also use the Slack sub agent to give a
13:54 progress update. And we're just going to loop over and over and over again until
14:00 every single task in linear is done. And I've set it up in a way for most of the
14:05 time it's going to just do one task at a time. But if the agent figures out it's
14:08 simple enough, it might actually just try to knock out multiple of them in a
14:12 single session. And this is all configurable in the prompts that we'll
14:16 get into as well. So the last thing I want to cover while we wait for our
14:19 harness to complete is the architecture and how you can tweak things for
14:24 yourself. Every single agent that we have in this harness for coding the
14:28 different services, they are controlled by these prompts that we have in the
14:32 prompts folder. And so when we create our agent, we're using the cloud agents
14:37 SDK. So we're defining everything in code. We're not using our cloud folder
14:41 like you would with cloud code. We have our system prompt loaded in right here.
14:47 So we're loading in from this file. And so our orchestrator, this is our system
14:51 prompt where we're describing. We're building from the appspec. Here are the
14:55 sub aents that we have access to. Here's what our workflow looks like. All that's
14:59 defined in the system prompt. And then when we are in our very first session,
15:04 that's when we use the initializer task. And so I'll show you here in the code. I
15:07 I promise I'll stay pretty high level with the code here. We're seeing is this
15:11 our first run? Do we have things initialized in linear or not? If it is
15:16 our first run, then this function is going to load in the prompt from this
15:20 file. So we're controlling with markdown files just like you would with sub aents
15:24 in cloud code. And then otherwise we're going to load the continuation task. And
15:30 so this is what we run every single loop when we're going to build that next
15:34 feature. So we read that linear project. We know what linear project we're
15:37 working with. We delegate [snorts] to the linear agent to figure out what we
15:40 should work on next. everything that I've already explained in the diagram.
15:42 I'm now just showing you how this maps to the prompts that we have for all the
15:47 sub aents here so that you can tweak all this for yourself. You can connect more
15:50 services, change how often it communicates in Slack, anything that you
15:55 want to do. And so the last thing that I want to show you here is that instead of
15:59 defining our MCP servers and our sub agents in thecloud folder, we're doing
16:04 it here in our cloud agent SDK definition. And so we're connecting to
16:09 obviously our arcade MCP gateway and then the Playright MCP server. Same kind
16:13 of way that you configure the configuration with something like claw
16:17 desktop for example. And then we have all of our agent definitions right here.
16:21 So this is being imported from this file. It is super easy to add on more
16:26 sub agents if you want because for every agent we just give it the description.
16:30 This is how our orchestrator knows when to call upon the sub aent. We are
16:34 loading the prompt from the file. Like for our linear agent, we're loading it
16:38 directly from the linear agent prompt right here. Just speaking to like how we
16:42 manage issues and projects and things like that, we have the tools, the tools
16:47 that are it's allowed to use with the arcade mcp. And then finally, the model.
16:52 So this is from ourv. We can use haiku, sonnet or opus. And so we just build up
16:56 these agent definitions here. So you can change the prompts, change the
17:00 description, add in another one. Very easy to configure. And that's all just
17:03 brought into our agent automatically. And so it really is all of these
17:07 markdown documents that define the entire flow. Really just using the
17:12 Claude agent SDK as the wrapper around these different prompts, connecting
17:16 everything together into this pretty elaborate system that's able to handle a
17:20 lot. Like going back here, we're not done quite yet, but we finished three
17:24 out of the five issues for this simple Pomodoro timer app. I'll come back once
17:28 everything is done so we can see the full example now that you know how it
17:32 all works and how you can extend this yourself. And here we go. The big
17:36 reveal. The application that we've been creating throughout this video is
17:40 complete. And interestingly enough, because this application was so
17:43 incredibly simple, it decided to build everything in the initializer session,
17:48 which I actually prompted it to do so if it determined it was simple enough just
17:52 to show you how dynamic this system can be. And of course, I showed you the more
17:56 complex app earlier where it did have to do many different sessions for 44 tasks.
18:00 But yeah, our application looks really good. We can start it here. We can pause
18:05 it, skip to our break. The Pomodoro technique is really awesome for
18:08 productivity, by the way. But yeah, we got our update in Slack that the project
18:12 is complete. It links to our GitHub repository here where we have six
18:16 commits, one for the initialization and then one for each of our tasks. And of
18:20 course, they're all marked as done with our progress tracking filled out as
18:24 well. So we started the initialization and then this is the project complete
18:28 status at the end. Super super cool. We built this entire thing just during this
18:33 video as I was covering the code and our diagrams. So I want to end this video by
18:38 talking about the future of AI coding with these harnesses and some things
18:41 that I'm working on myself because here's the thing. I hope that in
18:45 following this video you're inspired to try this harness yourself, even build on
18:49 top of it. And I hope I made that clear enough for you. But in the end, what's
18:54 most powerful is building AI coding workflows and harnesses that are
18:58 specific to your use case, exactly how you want to manage tasks, how you want
19:02 to share context between different sessions. I I really believe that if you
19:07 build your own optimized workflow, it's going to be way better than anything
19:11 that's off the shelf. But there's nothing that really helps you build that
19:14 right now. And it's such a powerful concept. And so that's what I'm going to
19:18 be working on. So, my open source project, Archon, I worked on this a lot
19:23 last year. This is my command center for AI coding. It gained a lot of traction.
19:26 I know it's not as many stars as something like OpenClaw, but I was
19:29 really happy with the traction that it gained, but it's really not as relevant
19:34 of a tool right now because it was all about task management and rag for AI
19:38 coding. But task management is getting built into all these tools like cloud
19:41 code and coding agents these days are so good at looking up documentation that
19:46 rag just isn't as important for coding specifically. And so I want to keep the
19:51 vision of archon being the command center for AI coding but I want to turn
19:56 it into the N8N for AI coding being able to define and orchestrate your own AI
20:01 coding workflows and harnesses so you can build something like this really
20:05 easily but actually make it custom to you. So, that's what I'm working on
20:08 behind the scenes right now. I know there hasn't been a lot of updates with
20:11 Archon because I've been shifting the vision, but I'm super excited for that.
20:15 And so, if you appreciate this video and you're looking forward to more things on
20:19 AI coding and these harnesses, I would really appreciate a like and a
20:23 subscribe. And with that, I will see you