you2idea@video:~$ watch i5kwX7jeWL8 [29:20]
// transcript — 1078 segments
0:02 Not only have I built hundreds of AI agents myself, I've seen other people
0:07 build thousands for every use case under the sun. And those who are the most
0:11 successful are the ones who don't over complicated. And in this video, I want
0:15 to show you how that can be you as well. Cuz here's the thing, and I see this all
0:20 of the time. When people first think about building their AI agents, their
0:24 perfectionism kicks in and they worry about creating the perfect system
0:28 prompt, defining the perfect tools, thinking about the LLM they want to use.
0:31 They consider the context and observability and latency and security
0:35 and deployment. They get overwhelmed with everything. And that might be you
0:38 as well. So, what I have to say to you right now is take a deep breath. That is
0:43 why I'm here. Honestly, you can learn 90% of what you need to build AI agents
0:48 from just this video. And that, my friend, is what I have for you in this
0:52 video. I want to cover each of the core components of building agents like
0:56 system prompts, tools, security, and context. And I want to break down what
1:00 you should focus on to build the first 90% of your agent. Basically, creating
1:04 that proof of concept. And then, honestly, even more importantly, I want
1:08 to talk about what you shouldn't focus on at first because otherwise, you're
1:12 over complicating it. the kinds of things you will need to look into at
1:15 some point when you want to specialize your agents and move to production. But
1:19 that's what my other content is for right now. Whether you're new to
1:22 building agents or you just want to build them faster, I want to help you
1:26 focus on the first 90% to make things dead simple. Oh, and by the way, this
1:30 agent that you're looking at right here is the mascot for the new Dynamis
1:33 Agentic coding course. So, if you want to master building systems around AI
1:37 coding, check out the link in the description. All right. So, the first
1:40 thing I want to cover with you is the four core components of any AI agent,
1:47 which quick recap, an AI agent is any large language model that is given the
1:51 ability to interact with the outside world on your behalf through tools. And
1:54 so, it can do something like book a meeting on your calendar, search the
1:58 internet for you. That's the first part of agents is these tools. It's the
2:01 functions that we give it that it can call upon to perform actions. And then
2:07 the brain for our AI agent is the large language model. It processes our
2:11 requests and it decides based on the instructions we give it which tools to
2:16 use. And speaking of those core instructions, that is our agent program,
2:21 aka the system prompt. It's the highest level set of instructions we give to any
2:25 AI agent at the start of any conversation that instructs it on its
2:30 persona, goals, how to use tools. We'll cover the different core components of
2:34 system prompts in a little bit. And then last but not least, we have memory
2:38 systems. That's the context we have from our conversations, both the short-term
2:42 and long-term memory. We'll talk about this a bit more when we get into context
2:47 as well. And so, as we go through each of these core components, I'm going to
2:50 move pretty quickly because I just want to cover the basics with you, but I'll
2:54 also link to different videos on my channel throughout this video if you
2:57 want to dive deeper into anything. And when building an AI agent, it is really
3:02 simple to get started. And I'll show you an example in code in just a bit here so
3:06 you can really see what I'm talking about. So when you're building the very
3:10 core of your AI agent, it's really just three steps. You need to pick a large
3:14 language model, write a basic system prompt as the agent instructions, and
3:18 then add your first tool because you need a tool otherwise it's really just a
3:22 regular large language model, not an agent. And so for picking a large
3:26 language model, I would highly recommend using a platform called Open Router
3:29 because it gives you access to pretty much any large language model you could
3:35 possibly want. And so Claude Haiku 4.5 is the general one that I use just as
3:38 I'm prototyping my AI agents, but you could use GPT 5 Mini. You could use an
3:43 open source model like DeepSeek, for example. Like all of them are available
3:47 on this platform. And then when creating your system prompt, you just want to
3:50 define your agents role and behavior. And you can refine this over time as
3:53 well. just starting really simple and then adding your first tool. Like you
3:57 can give it access to search the web. You can give it the ability to perform
4:01 mathematical computations with a calculator tool. Like literally whatever
4:04 it is, just start simple and then once you have this foundation, that's when
4:08 you can build on more capabilities and integrations. And I want to show you
4:11 more than just theory as well. Like let's actually go and build an AI agent
4:15 right now so you can see practically how dead simple it really is. And I'll have
4:19 a link to this repo in the description as well if you want to dive into this
4:23 extremely basic agent that's covering all of the components in this video,
4:27 even some things we'll talk about in a bit like observability. So you can get
4:30 this up and running yourself, even use this as a template for your first agent
4:35 if you want. And so I'm going to build it from scratch with you right now, like
4:38 show you line by line how simple this really is. It's going to be less than 50
4:42 lines in the end, just like I promised in the slide. And so first I'm going to
4:46 import all of the Python dependencies. I'm using Pantic AI since it's my
4:51 favorite AI agent framework, but it really doesn't matter the one that you
4:54 use. The principles that I'm covering in this video applies no matter how you're
4:57 building your agents, even if it's with a tool like N8N because what I'm
5:01 focusing on here is just defining our four core components. LLM, tools,
5:07 memory, and a system prompt. And so the first thing I'm going to do is define
5:11 the large language model that I want to leverage. And just like I talked about a
5:15 little bit ago, I'm using open router. So right now I'm going to use cloud
5:19 haiku 4.5 as my model. But literally just changing this line or just changing
5:24 my environment variable here. A single line change. I can swap to any model I
5:29 want like Gemini or DeepSeek or OpenAI. It's that easy. After I have my LLM
5:34 defined, now I define the agent itself including the system prompt, the
5:38 highlevel instructions. And so I'm importing this from a separate file.
5:42 I'll just show you a very very basic example of a system prompt here and then
5:45 more on this in a little bit. The core components that I generally include
5:50 including the persona goal tool instructions, the output format like how
5:54 it communicates back to us and then also any other miscellaneous instructions I
5:57 want to include. So I have this saved here. Now this is a part of my agent
6:01 that I've defined. And so the next thing that we need to add is a tool to really
6:06 turn it from an LLM or a chatbot into a full-fledged agent. And the way that you
6:11 do that with most AI Asian frameworks is you define a Python function like very
6:16 simply and then you add what is called a decorator. This signals to paid AI that
6:21 this function right here I want to take and attach to my agent as a capability
6:26 that it can now invoke. And so the agent defines these parameters when it calls
6:30 the tool. So like in this case this is a very basic tool to add two numbers
6:34 together because large language models as token prediction machines actually
6:38 suck at math. interesting fact. And so it defines these parameters and it
6:42 leverages this dock string as it's called like this comment is included as
6:47 a part of the prompt to the LLM because it defines when and how to leverage this
6:51 tool which in this case the functionality is very basic just adding
6:54 two numbers together. But this could be a tool to search the web based on a
6:59 query it defines create an event in our calendar based on a time range and title
7:02 that it defines right like all those things are parameters and then we
7:05 perform the functionality for the agent based on that. That is the tool that we
7:09 got for the agent. And that is really good. We've created our agent and added
7:13 the tools. The only thing we have to do now is set up a way to interact with it.
7:17 So I'm going to create a very basic command line interface here. We start
7:21 with empty conversation. This is where we'll add memory, which is the fourth
7:25 component of agents. And so in an infinite loop here, we're getting the
7:28 input from the user. Uh and we're exiting the program if they say exit.
7:32 Otherwise, we are going to call the agent. So it's very simply agent.run run
7:37 with the user's latest message and passing in for short-term memory the
7:41 conversation history so it knows what we said to each other up until this point
7:46 and then I'm going to add on to the conversation history everything that we
7:50 just said and then print out the agents latest response. Take a look at that.
7:53 And then even after we call our main function here, we are still below 50
7:59 lines of code. It is that easy to define our agents. And obviously there's so
8:03 many more things that we have to do to really get our agent to the point where
8:06 it's production ready. But again, I just want to focus on making it dead simple
8:10 for you right now. And I know that a lot of this might be review for you if you
8:14 built agents in the past. But especially if you have built a lot of AI agents
8:18 already, you're probably like me where a lot of times you just overcomplicate
8:22 things cuz you know how much can go into building agents. That's what I'm trying
8:25 to do is just draw you back to the fundamentals because you need to keep
8:29 things simple when you're first creating any agent really any software at all.
8:33 And so yeah, we can go into the terminal now and interact with our agent. So I'm
8:37 going to run agent.py here. Everything that we just built, I can say hello to
8:41 get a super simple response back here. And then I can say for example, what is
8:45 and I'll just do a couple of bigger numbers that I want to add together. And
8:49 so here it knows thanks to the tool description that it should use the add
8:54 numbers tool that we gave it to produce this sum. There we go. Take a look at
8:58 that. And I can even say did you use the tool, right? And it should say yes. Like
9:01 it actually recognizes based on the conversation history that it used the ad
9:05 numbers tool. Okay, perfect. So we got this agent with conversation history. It
9:09 knows when to use this tool. And now at this point we can start to expand the
9:13 tools that we give it. We can refine our system prompt, play around with
9:16 different LLMs. and I want to talk about that as well. Now, starting with large
9:20 language models, choosing your LLM, like I was saying when I was building the
9:24 agent, Claude Haiku 4.5 is the one that I recommend just a cheap and fast option
9:28 that's really good for building proof of concepts when I don't want to spend a
9:31 lot of money on tokens as I'm iterating on my agent initially. And then Claude
9:36 Sonnet 4.5 is generally the best all-around right now. This might change
9:40 in literally a week and people have different opinions. The main thing that
9:44 I want to communicate here is don't actually worry about picking the perfect
9:47 LLM up front, especially when you're using a platform like Open Router where
9:52 it makes it so easy to swap between LLMs. Even if you're not using Open
9:56 Router, it still is really easy. And then if you want a local model for
10:00 privacy reasons or you want to be 100% free running on your hardware, then
10:04 Mistl 3.1 Smaller Quen 3 are the ones that I recommend right now. And if you
10:08 haven't ever tried Open Router or a tool like it that really just routes you
10:12 between the different LLM providers, I would highly recommend trying one
10:15 because it makes it so easy to iterate on the LLM for your agent, giving you
10:19 instant access to take a look at this. We got Grock, Anthropic, Gemini, we've
10:25 got the GPT models, we've got uh Quen 3, all the open- source ones. No matter
10:28 what you want to experiment with, you've got it here. And so just use this as
10:32 your tool to iterate on the LM very quickly and just not have to think about
10:36 it that much. And then for the system prompt component, I promised I would
10:39 dive a little bit more into the different categories that I have. So
10:41 that's what I want to talk about very quickly. It can be especially easy to
10:46 overthink the system prompt because it's just such a broad problem to solve of
10:50 like what should the top level instruction set be for my agent? And so
10:54 I like to keep things simple by working off of a template that I use for all of
10:58 my AI agents at least as a starting point. I always have persona and goals,
11:02 tool instructions and examples, output format, and miscellaneous instructions.
11:07 And what you shouldn't worry about at this point is setting up elaborate
11:12 prompt evaluations or split testing your system prompts. You can get into that
11:15 when you really want to refine your agent instructions. But right now, just
11:20 keep it simple and refine at a high level as you are manually testing your
11:24 agent. And if you want to see that system prompt template in action, I've
11:27 got you covered. I'll have a link to this in the description as well. It's a
11:32 real example of me filling out those different sections, creating a system
11:36 prompt for a task management agent. So, I have my persona defined here. I'm
11:40 defining the goals for the task management agent. The tool instructions
11:44 like how I can use different tools together to manage tasks in my platform.
11:49 The output format, just specifying ways that I want it to communicate back to me
11:53 or things to avoid. Some examples. Now, this applies more to more complex agents
11:57 and system prompts where you actually want to kind of give an example of a
12:01 workflow of chaining different tools together, so it doesn't really apply
12:03 here. And then the last thing is just miscellaneous instructions. This is also
12:08 the place to go to add in extra instructions to fix those little issues
12:12 you see with your agent that doesn't necessarily fit into all the others. So,
12:15 a catchall to make sure that there's a place to put anything as you're
12:19 experimenting with your agent and refining your system prompt. And then as
12:23 far as tools go for your AI agents, there's just a few things I want to
12:26 cover quickly to help you keep things simple and focused. The first is that
12:31 you should keep your tools to under 10 for your AI agents, at least when
12:34 starting out. And you definitely want to make sure that each tool's purpose is
12:38 very distinct. Because if your tools have overlapping functionality or if you
12:42 have too many, then your large language model starts to get overwhelmed with all
12:46 the possibilities of its capabilities and it'll use the wrong tools. It will
12:51 forget to call tools. uh and it's just a mess. Like definitely keep it to under
12:56 10. And then also MCP servers are a great way to find preackaged sets of
13:00 tools you can bring into your an agent when you're, you know, creating
13:02 something initially and you just want to move very quickly. And so definitely
13:06 based on what you're building, you'll probably be able to find an MTP server
13:10 that gives you some functionality right out of the box for your agents. And then
13:14 the last thing I'll say is a lot of people ask me, "What capabilities should
13:18 I focus on learning first when I'm building agents?" and I want to give
13:24 them tools and rag is always the answer that I have for them. Giving your AI
13:28 agent tools that allows it to search your documents and knowledge base.
13:31 That's what retrieval augmented generation is. And so really, it's
13:35 giving your agents the ability to ground their responses in real data. And I
13:40 would say that probably over 80% of AI agents running out in the wild right
13:44 now, no matter the industry or niche, are using rag to some extent as part of
13:49 the capabilities for the agent. And then continuing with our theme here, what not
13:55 to focus on when building tools is don't worry about multi- aent systems or
13:59 complex tool orchestration through that yet. When you have a system that starts
14:03 to have more than 10 tools, that is generally when you start to split into
14:07 specialized sub aents and you have routing between them. Those kinds of
14:12 systems are powerful and necessary for a lot of applications, but definitely
14:15 overengineering when you're just getting started creating your agent or a system.
14:18 Also, if you want to learn more about rag and building that into your agents,
14:21 check out the video that I'll link to right here. I cover that all of the time
14:25 on my channel because it is so important. And so with that, moving on
14:29 to the next thing, we have our security essentials because it is important to
14:32 think about security when you're building any software upfront. But I
14:36 don't want you to over complicate it yet, right? Like don't become a security
14:40 expert overnight. There are existing tools out there to help us with
14:43 security. So we can still move quickly as we're building our agent initially.
14:47 We'll definitely want to pay more attention to security when we're going
14:50 into production. But at first there are a couple of tools that I want to call
14:54 out here. And then just some general principles to follow. Like for example,
14:58 don't hardcode your API keys, right? Like you don't want to have your OpenAI
15:02 or anthropic API key just sitting there right in your code or your end workflow
15:06 for example. You always want to store that in a secure way through things like
15:12 environment variables. And then also when we think about building AI agents
15:15 in particular, there's a lot of security that we want to implement through what
15:18 are called guard rails, right? So limiting what kind of information can
15:22 come into the large language model and then also limiting the kinds of
15:27 responses that the agent can give and having it like actually retry if it
15:31 produces any kind of response that isn't acceptable for us. And there's a super
15:36 popular open source repository that I lean on all the time to help with
15:39 guardrails and very creatively called guardrails AI. And so it's a Python
15:43 framework because I always love building my AI agents with Python that helps
15:48 build reliable AI applications by giving you both the input and output guard
15:51 rails that I'm talking about. So limiting what goes in and limiting what
15:56 the agent can produce. And they provide a lot of different options for
15:59 guardrails. Like for example, one thing that you want to avoid quite often is
16:04 inserting any kind of PII, personally identifiable information into a prompt
16:08 to an LLM, especially when it's going out to some model in the cloud like
16:12 anthropic or Gemini instead of a local LLM. So limiting that kind of thing,
16:16 maybe detecting any vulgar language that's outputed from an LLM because they
16:20 will do that sometimes. Like those are just some examples of input and output
16:24 guard rails. And it is very easy to install this as a Python package and
16:29 bring these guards right into your code as you are interacting with your agents
16:32 like we saw earlier when I had that, you know, simple command line tool to talk
16:35 to the agent. Like I could just add a guard before or after that call to the
16:39 agent. So yeah, guardrails don't have to be complicated. There are tools like
16:42 this, even completely open- source ones like guardrails AI that make it very
16:46 easy. Okay, so we've talked about guardrails and I gave you one example of
16:50 best practices for security in our codebase. But what about the other
16:54 million different vulnerabilities we have to account for in our codebase and
16:59 the dependencies we're bringing into our project? We can't expect ourselves to
17:03 become a security expert overnight. And so it's important to learn these things,
17:07 but also we can lean on existing tools to help us with this vulnerability
17:12 detection. There are a lot of options out there for this, but Sneak Studio is
17:16 one that I've been leaning on a lot recently. And they also have an MCP
17:21 server within the studio to help us handle vulnerability detection
17:25 automatically right within our coding process. So like always, I'm trying to
17:29 focus on open- source solutions for this video, but there's really no open-
17:33 source alternative to Sneak that I know about. This platform is incredible. So
17:38 in the Sneak Studio, we can set up these different projects and integrations. We
17:42 can have it analyze our codebase and dependencies for vulnerabilities in our
17:46 GitHub repositories. They have a CLI. We can do things locally. They have the MCP
17:49 server that I'm going to show you in a little bit. I'll link to all this in the
17:53 description. But yeah, the MCP server in particular is super cool to me because
17:57 we can have vulnerability detection built right into our AI coding
18:02 workflows. Now, so take a look at this. I have the Sneak MCP server connected
18:07 directly to my cloud code after I went through the Sneak authentication process
18:10 in the CLI. And you can connect this to literally any AI coding assistant or MCP
18:15 client. So now within cloud I could build this into a full AI coding
18:18 workflow which is very cool. I'm going to show you a simple demo right now.
18:23 I'll just say you know use the sneak MCP to analyze my code and dependencies
18:29 for vulnerabilities. And so it's able to leverage different tools within the MCP server to check for
18:37 both right like it's a very robust solution here. And so I'll let it go for
18:41 a little bit. I'll pause and come back once it has run the vulnerability
18:44 detection. Okay, this is so cool. Take a look at this. So, within my basic agent
18:49 repository, first it used the sneakc server to analyze for any
18:53 vulnerabilities with my dependencies, things like paidantic AI, for example.
18:59 And then it does a code scan. So, this would also detect things like if I had
19:03 my environment variables hardcoded like the example that I gave earlier. So, it
19:07 found three issues with my dependencies and nothing with my code, which I'm very
19:11 proud of. I got no issues with my code. And not only does it do the analysis,
19:15 but it gives me a summary and lists the actions I can take to remedy things.
19:18 Like here are the uh just medium severity vulnerabilities that I have
19:22 within a few of my dependencies. Nothing in my code. And then it gives me
19:26 recommendations to fix fix things. And so I can go and say yes action on this
19:30 now. And it's going to update my requirements.ext fix these things. And I
19:34 could even run the sneak MCP server again. And you can definitely see how
19:37 you'd build this kind of thing directly into the validation layer of your AI
19:42 coding workflow. Very, very neat for any AI agent or really any software you want
1:40 thing I want to cover with you is the four core components of any AI agent,
1:47 which quick recap, an AI agent is any large language model that is given the
1:51 ability to interact with the outside world on your behalf through tools. And
1:54 so, it can do something like book a meeting on your calendar, search the
1:58 internet for you. That's the first part of agents is these tools. It's the
2:01 functions that we give it that it can call upon to perform actions. And then
2:07 the brain for our AI agent is the large language model. It processes our
2:11 requests and it decides based on the instructions we give it which tools to
2:16 use. And speaking of those core instructions, that is our agent program,
2:21 aka the system prompt. It's the highest level set of instructions we give to any
2:25 AI agent at the start of any conversation that instructs it on its
2:30 persona, goals, how to use tools. We'll cover the different core components of
2:34 system prompts in a little bit. And then last but not least, we have memory
2:38 systems. That's the context we have from our conversations, both the short-term
2:42 and long-term memory. We'll talk about this a bit more when we get into context
2:47 as well. And so, as we go through each of these core components, I'm going to
2:50 move pretty quickly because I just want to cover the basics with you, but I'll
2:54 also link to different videos on my channel throughout this video if you
2:57 want to dive deeper into anything. And when building an AI agent, it is really
3:02 simple to get started. And I'll show you an example in code in just a bit here so
3:06 you can really see what I'm talking about. So when you're building the very
3:10 core of your AI agent, it's really just three steps. You need to pick a large
3:14 language model, write a basic system prompt as the agent instructions, and
3:18 then add your first tool because you need a tool otherwise it's really just a
3:22 regular large language model, not an agent. And so for picking a large
3:26 language model, I would highly recommend using a platform called Open Router
3:29 because it gives you access to pretty much any large language model you could
3:35 possibly want. And so Claude Haiku 4.5 is the general one that I use just as
3:38 I'm prototyping my AI agents, but you could use GPT 5 Mini. You could use an
3:43 open source model like DeepSeek, for example. Like all of them are available
3:47 on this platform. And then when creating your system prompt, you just want to
3:50 define your agents role and behavior. And you can refine this over time as
3:53 well. just starting really simple and then adding your first tool. Like you
3:57 can give it access to search the web. You can give it the ability to perform
4:01 mathematical computations with a calculator tool. Like literally whatever
4:04 it is, just start simple and then once you have this foundation, that's when
4:08 you can build on more capabilities and integrations. And I want to show you
4:11 more than just theory as well. Like let's actually go and build an AI agent
4:15 right now so you can see practically how dead simple it really is. And I'll have
4:19 a link to this repo in the description as well if you want to dive into this
4:23 extremely basic agent that's covering all of the components in this video,
4:27 even some things we'll talk about in a bit like observability. So you can get
4:30 this up and running yourself, even use this as a template for your first agent
4:35 if you want. And so I'm going to build it from scratch with you right now, like
4:38 show you line by line how simple this really is. It's going to be less than 50
4:42 lines in the end, just like I promised in the slide. And so first I'm going to
4:46 import all of the Python dependencies. I'm using Pantic AI since it's my
4:51 favorite AI agent framework, but it really doesn't matter the one that you
4:54 use. The principles that I'm covering in this video applies no matter how you're
4:57 building your agents, even if it's with a tool like N8N because what I'm
5:01 focusing on here is just defining our four core components. LLM, tools,
5:07 memory, and a system prompt. And so the first thing I'm going to do is define
5:11 the large language model that I want to leverage. And just like I talked about a
5:15 little bit ago, I'm using open router. So right now I'm going to use cloud
5:19 haiku 4.5 as my model. But literally just changing this line or just changing
5:24 my environment variable here. A single line change. I can swap to any model I
5:29 want like Gemini or DeepSeek or OpenAI. It's that easy. After I have my LLM
5:34 defined, now I define the agent itself including the system prompt, the
5:38 highlevel instructions. And so I'm importing this from a separate file.
5:42 I'll just show you a very very basic example of a system prompt here and then
5:45 more on this in a little bit. The core components that I generally include
5:50 including the persona goal tool instructions, the output format like how
5:54 it communicates back to us and then also any other miscellaneous instructions I
5:57 want to include. So I have this saved here. Now this is a part of my agent
6:01 that I've defined. And so the next thing that we need to add is a tool to really
6:06 turn it from an LLM or a chatbot into a full-fledged agent. And the way that you
6:11 do that with most AI Asian frameworks is you define a Python function like very
6:16 simply and then you add what is called a decorator. This signals to paid AI that
6:21 this function right here I want to take and attach to my agent as a capability
6:26 that it can now invoke. And so the agent defines these parameters when it calls
6:30 the tool. So like in this case this is a very basic tool to add two numbers
6:34 together because large language models as token prediction machines actually
6:38 suck at math. interesting fact. And so it defines these parameters and it
6:42 leverages this dock string as it's called like this comment is included as
6:47 a part of the prompt to the LLM because it defines when and how to leverage this
6:51 tool which in this case the functionality is very basic just adding
6:54 two numbers together. But this could be a tool to search the web based on a
6:59 query it defines create an event in our calendar based on a time range and title
7:02 that it defines right like all those things are parameters and then we
7:05 perform the functionality for the agent based on that. That is the tool that we
7:09 got for the agent. And that is really good. We've created our agent and added
7:13 the tools. The only thing we have to do now is set up a way to interact with it.
7:17 So I'm going to create a very basic command line interface here. We start
7:21 with empty conversation. This is where we'll add memory, which is the fourth
7:25 component of agents. And so in an infinite loop here, we're getting the
7:28 input from the user. Uh and we're exiting the program if they say exit.
7:32 Otherwise, we are going to call the agent. So it's very simply agent.run run
7:37 with the user's latest message and passing in for short-term memory the
7:41 conversation history so it knows what we said to each other up until this point
7:46 and then I'm going to add on to the conversation history everything that we
7:50 just said and then print out the agents latest response. Take a look at that.
7:53 And then even after we call our main function here, we are still below 50
7:59 lines of code. It is that easy to define our agents. And obviously there's so
8:03 many more things that we have to do to really get our agent to the point where
8:06 it's production ready. But again, I just want to focus on making it dead simple
8:10 for you right now. And I know that a lot of this might be review for you if you
8:14 built agents in the past. But especially if you have built a lot of AI agents
8:18 already, you're probably like me where a lot of times you just overcomplicate
8:22 things cuz you know how much can go into building agents. That's what I'm trying
8:25 to do is just draw you back to the fundamentals because you need to keep
8:29 things simple when you're first creating any agent really any software at all.
8:33 And so yeah, we can go into the terminal now and interact with our agent. So I'm
8:37 going to run agent.py here. Everything that we just built, I can say hello to
8:41 get a super simple response back here. And then I can say for example, what is
8:45 and I'll just do a couple of bigger numbers that I want to add together. And
8:49 so here it knows thanks to the tool description that it should use the add
8:54 numbers tool that we gave it to produce this sum. There we go. Take a look at
8:58 that. And I can even say did you use the tool, right? And it should say yes. Like
9:01 it actually recognizes based on the conversation history that it used the ad
9:05 numbers tool. Okay, perfect. So we got this agent with conversation history. It
9:09 knows when to use this tool. And now at this point we can start to expand the
9:13 tools that we give it. We can refine our system prompt, play around with
9:16 different LLMs. and I want to talk about that as well. Now, starting with large
9:20 language models, choosing your LLM, like I was saying when I was building the
9:24 agent, Claude Haiku 4.5 is the one that I recommend just a cheap and fast option
9:28 that's really good for building proof of concepts when I don't want to spend a
9:31 lot of money on tokens as I'm iterating on my agent initially. And then Claude
9:36 Sonnet 4.5 is generally the best all-around right now. This might change
9:40 in literally a week and people have different opinions. The main thing that
9:44 I want to communicate here is don't actually worry about picking the perfect
9:47 LLM up front, especially when you're using a platform like Open Router where
9:52 it makes it so easy to swap between LLMs. Even if you're not using Open
9:56 Router, it still is really easy. And then if you want a local model for
10:00 privacy reasons or you want to be 100% free running on your hardware, then
10:04 Mistl 3.1 Smaller Quen 3 are the ones that I recommend right now. And if you
10:08 haven't ever tried Open Router or a tool like it that really just routes you
10:12 between the different LLM providers, I would highly recommend trying one
10:15 because it makes it so easy to iterate on the LLM for your agent, giving you
10:19 instant access to take a look at this. We got Grock, Anthropic, Gemini, we've
10:25 got the GPT models, we've got uh Quen 3, all the open- source ones. No matter
10:28 what you want to experiment with, you've got it here. And so just use this as
10:32 your tool to iterate on the LM very quickly and just not have to think about
10:36 it that much. And then for the system prompt component, I promised I would
10:39 dive a little bit more into the different categories that I have. So
10:41 that's what I want to talk about very quickly. It can be especially easy to
10:46 overthink the system prompt because it's just such a broad problem to solve of
10:50 like what should the top level instruction set be for my agent? And so
10:54 I like to keep things simple by working off of a template that I use for all of
10:58 my AI agents at least as a starting point. I always have persona and goals,
11:02 tool instructions and examples, output format, and miscellaneous instructions.
11:07 And what you shouldn't worry about at this point is setting up elaborate
11:12 prompt evaluations or split testing your system prompts. You can get into that
11:15 when you really want to refine your agent instructions. But right now, just
11:20 keep it simple and refine at a high level as you are manually testing your
11:24 agent. And if you want to see that system prompt template in action, I've
11:27 got you covered. I'll have a link to this in the description as well. It's a
11:32 real example of me filling out those different sections, creating a system
11:36 prompt for a task management agent. So, I have my persona defined here. I'm
11:40 defining the goals for the task management agent. The tool instructions
11:44 like how I can use different tools together to manage tasks in my platform.
11:49 The output format, just specifying ways that I want it to communicate back to me
11:53 or things to avoid. Some examples. Now, this applies more to more complex agents
11:57 and system prompts where you actually want to kind of give an example of a
12:01 workflow of chaining different tools together, so it doesn't really apply
12:03 here. And then the last thing is just miscellaneous instructions. This is also
12:08 the place to go to add in extra instructions to fix those little issues
12:12 you see with your agent that doesn't necessarily fit into all the others. So,
12:15 a catchall to make sure that there's a place to put anything as you're
12:19 experimenting with your agent and refining your system prompt. And then as
12:23 far as tools go for your AI agents, there's just a few things I want to
12:26 cover quickly to help you keep things simple and focused. The first is that
12:31 you should keep your tools to under 10 for your AI agents, at least when
12:34 starting out. And you definitely want to make sure that each tool's purpose is
12:38 very distinct. Because if your tools have overlapping functionality or if you
12:42 have too many, then your large language model starts to get overwhelmed with all
12:46 the possibilities of its capabilities and it'll use the wrong tools. It will
12:51 forget to call tools. uh and it's just a mess. Like definitely keep it to under
12:56 10. And then also MCP servers are a great way to find preackaged sets of
13:00 tools you can bring into your an agent when you're, you know, creating
13:02 something initially and you just want to move very quickly. And so definitely
13:06 based on what you're building, you'll probably be able to find an MTP server
13:10 that gives you some functionality right out of the box for your agents. And then
13:14 the last thing I'll say is a lot of people ask me, "What capabilities should
13:18 I focus on learning first when I'm building agents?" and I want to give
13:24 them tools and rag is always the answer that I have for them. Giving your AI
13:28 agent tools that allows it to search your documents and knowledge base.
13:31 That's what retrieval augmented generation is. And so really, it's
13:35 giving your agents the ability to ground their responses in real data. And I
13:40 would say that probably over 80% of AI agents running out in the wild right
13:44 now, no matter the industry or niche, are using rag to some extent as part of
13:49 the capabilities for the agent. And then continuing with our theme here, what not
13:55 to focus on when building tools is don't worry about multi- aent systems or
13:59 complex tool orchestration through that yet. When you have a system that starts
14:03 to have more than 10 tools, that is generally when you start to split into
14:07 specialized sub aents and you have routing between them. Those kinds of
14:12 systems are powerful and necessary for a lot of applications, but definitely
14:15 overengineering when you're just getting started creating your agent or a system.
14:18 Also, if you want to learn more about rag and building that into your agents,
14:21 check out the video that I'll link to right here. I cover that all of the time
14:25 on my channel because it is so important. And so with that, moving on
14:29 to the next thing, we have our security essentials because it is important to
14:32 think about security when you're building any software upfront. But I
14:36 don't want you to over complicate it yet, right? Like don't become a security
14:40 expert overnight. There are existing tools out there to help us with
14:43 security. So we can still move quickly as we're building our agent initially.
14:47 We'll definitely want to pay more attention to security when we're going
14:50 into production. But at first there are a couple of tools that I want to call
14:54 out here. And then just some general principles to follow. Like for example,
14:58 don't hardcode your API keys, right? Like you don't want to have your OpenAI
15:02 or anthropic API key just sitting there right in your code or your end workflow
15:06 for example. You always want to store that in a secure way through things like
15:12 environment variables. And then also when we think about building AI agents
15:15 in particular, there's a lot of security that we want to implement through what
15:18 are called guard rails, right? So limiting what kind of information can
15:22 come into the large language model and then also limiting the kinds of
15:27 responses that the agent can give and having it like actually retry if it
15:31 produces any kind of response that isn't acceptable for us. And there's a super
15:36 popular open source repository that I lean on all the time to help with
15:39 guardrails and very creatively called guardrails AI. And so it's a Python
15:43 framework because I always love building my AI agents with Python that helps
15:48 build reliable AI applications by giving you both the input and output guard
15:51 rails that I'm talking about. So limiting what goes in and limiting what
15:56 the agent can produce. And they provide a lot of different options for
15:59 guardrails. Like for example, one thing that you want to avoid quite often is
16:04 inserting any kind of PII, personally identifiable information into a prompt
16:08 to an LLM, especially when it's going out to some model in the cloud like
16:12 anthropic or Gemini instead of a local LLM. So limiting that kind of thing,
16:16 maybe detecting any vulgar language that's outputed from an LLM because they
16:20 will do that sometimes. Like those are just some examples of input and output
16:24 guard rails. And it is very easy to install this as a Python package and
16:29 bring these guards right into your code as you are interacting with your agents
16:32 like we saw earlier when I had that, you know, simple command line tool to talk
16:35 to the agent. Like I could just add a guard before or after that call to the
16:39 agent. So yeah, guardrails don't have to be complicated. There are tools like
16:42 this, even completely open- source ones like guardrails AI that make it very
16:46 easy. Okay, so we've talked about guardrails and I gave you one example of
16:50 best practices for security in our codebase. But what about the other
16:54 million different vulnerabilities we have to account for in our codebase and
16:59 the dependencies we're bringing into our project? We can't expect ourselves to
17:03 become a security expert overnight. And so it's important to learn these things,
17:07 but also we can lean on existing tools to help us with this vulnerability
17:12 detection. There are a lot of options out there for this, but Sneak Studio is
17:16 one that I've been leaning on a lot recently. And they also have an MCP
17:21 server within the studio to help us handle vulnerability detection
17:25 automatically right within our coding process. So like always, I'm trying to
17:29 focus on open- source solutions for this video, but there's really no open-
17:33 source alternative to Sneak that I know about. This platform is incredible. So
17:38 in the Sneak Studio, we can set up these different projects and integrations. We
17:42 can have it analyze our codebase and dependencies for vulnerabilities in our
17:46 GitHub repositories. They have a CLI. We can do things locally. They have the MCP
17:49 server that I'm going to show you in a little bit. I'll link to all this in the
17:53 description. But yeah, the MCP server in particular is super cool to me because
17:57 we can have vulnerability detection built right into our AI coding
18:02 workflows. Now, so take a look at this. I have the Sneak MCP server connected
18:07 directly to my cloud code after I went through the Sneak authentication process
18:10 in the CLI. And you can connect this to literally any AI coding assistant or MCP
18:15 client. So now within cloud I could build this into a full AI coding
18:18 workflow which is very cool. I'm going to show you a simple demo right now.
18:23 I'll just say you know use the sneak MCP to analyze my code and dependencies
18:29 for vulnerabilities. And so it's able to leverage different tools within the MCP server to check for
18:37 both right like it's a very robust solution here. And so I'll let it go for
18:41 a little bit. I'll pause and come back once it has run the vulnerability
18:44 detection. Okay, this is so cool. Take a look at this. So, within my basic agent
18:49 repository, first it used the sneakc server to analyze for any
18:53 vulnerabilities with my dependencies, things like paidantic AI, for example.
18:59 And then it does a code scan. So, this would also detect things like if I had
19:03 my environment variables hardcoded like the example that I gave earlier. So, it
19:07 found three issues with my dependencies and nothing with my code, which I'm very
19:11 proud of. I got no issues with my code. And not only does it do the analysis,
19:15 but it gives me a summary and lists the actions I can take to remedy things.
19:18 Like here are the uh just medium severity vulnerabilities that I have
19:22 within a few of my dependencies. Nothing in my code. And then it gives me
19:26 recommendations to fix fix things. And so I can go and say yes action on this
19:30 now. And it's going to update my requirements.ext fix these things. And I
19:34 could even run the sneak MCP server again. And you can definitely see how
19:37 you'd build this kind of thing directly into the validation layer of your AI
19:42 coding workflow. Very, very neat for any AI agent or really any software you want
19:46 to build at all. Moving on, I want to talk about memory. Now, managing the
19:50 tokens that we're passing into the LLM calls for our agents. And this really is
19:54 a hot topic right now, especially with all the rate limiting that people are
19:58 getting with AI coding assistants like Claude Code. It really is important to
20:03 manage our context efficiently, only giving to our agents the information it
20:07 actually needs and not completely bloating our system prompts with
20:10 thousands of lines of instruction and tools that it doesn't actually need.
20:14 That's what you want to avoid. And so, just a couple of simple tips here going
20:18 along with our theme. The first one is to keep your prompts very concise. both
20:23 your system prompts and then also the tool descriptions that describe to your
20:27 agent when and how to use tools like I showed in the code earlier. You don't
20:31 need to over complicate it. That's why I have these templates for you like the
20:34 one for the system prompt, right? Like you have your goal just a couple of
20:38 sentences, your persona just a couple of sentences. Keep it very organized and
20:42 keeping it organized also helps you keep it quite concise. You don't need to
20:47 overthink it. And so keeping your system prompts to just a couple of hundred
20:51 lines at most is generally what I recommend. Some solutions might need
20:54 more, but that's when I'd start to question like could you really make that
20:58 more concise or split it into different specialized agents so each agent still
21:03 has a simple system prompt. Another thing you can do for agents that have
21:07 longer conversations is you can limit kind of in a sliding window to the 10 or
21:12 20 most recent messages, for example, that you actually include in the
21:15 context. And going back to the code, I'll even show you what that looks like
21:19 here. Like right now when we call our agent, we run it, we're passing in the
21:22 entire conversation history. But in Python, if I wanted to include just the
21:26 last 10 messages, I could do something like this. And so now maybe like, you
21:30 know, all previous messages aren't really as relevant anymore. We just want
21:34 to include the most recent 10. That's how we can do that. So that's another
21:37 really popular strategy. Also, tools like N8N have that as an option baked
21:41 directly into their short-term memory nodes. So very useful to know. And then
21:46 also when you start to have so much information about a single user that you
21:51 don't want to include it in the short-term memory, that's when you can
21:55 look at long-term memory. But also, don't build it from scratch. Again,
21:59 don't over complicate it. There are tools that you can use just like with
22:03 security to help us with long-term memory, and mem is one of those. Mem is
22:09 a completely open-source long-term memory agentic framework. And so I'll
22:12 show the GitHub in a second here, but yeah, when you have so much information
22:16 about a user that you can't just include it all in context, you need some way to
22:21 search through a longer term set of memories and bring only the ones in that
22:24 are relevant to the current conversation, which actually does use
22:27 rag under the hood, by the way. So again, another example why it's such an
22:32 important capability. Um, but yeah, basically you're able to pull core
22:36 memories from conversations and store it to be searched later. That's what
22:40 Memzero offers us. And it's so easy to include in our Python code to just like
22:44 guardrails AI. I'll show you an example really quickly in their quick start. You
22:48 install it as a Python package and then you basically have a function to search
22:53 for memories like performing rag to find memories related to the latest message
22:57 and then you have a function to add memories. And so it'll use a large
23:01 language model to extract the key information to store to be retrieved
23:05 later. And so this definitely solves the context problem because now you're able
23:09 to basically have infinite memory for an agent, but you don't have to give it all
23:13 to the LLM at once. It just retrieves things as needed. And of course, the
23:16 last thing I want to hit on for context is what not to focus on when you're
23:21 first building your agent. Do not worry about advanced memory compression
23:24 techniques. There's a lot of cool things that Enthropic especially has been doing
23:27 research on, but like don't worry about that. Don't worry about specialized sub
23:31 agents. These are both solutions to handle the memory problem when it starts
23:36 to get really really technical. But right now, just start simple and you can
23:40 always optimize things as you're starting to expand your agent and go to
23:44 production and you hit some limits. But right now, focusing on these things up
23:49 front is all you need to go the first 90% probably even beyond depending on
23:54 how simple your agents are. And context was the last of the four core components
23:57 of agents. So, we've covered the core four and security. Now, I want to talk a
24:02 bit about observability and deployment. Getting our agent ready for production.
24:06 And I will say that security, observability, and deployment definitely
24:10 go a lot more into the last 10% of building an agent. But I want to touch
24:13 on them here because there are some ways to design stuff up front very simply,
24:18 especially with observability. I want to introduce you to Langfuse right now. And
24:22 I covered this on my channel already. Link to a video right here on Langfuse
24:25 if you want to dive more into observability. But we can set up the
24:29 ability to watch for the actions that our agent is taking, view them in a
24:34 dashboard. We can do things like testing different prompts for our agents. It is
24:38 a beautiful platform and it's actually super easy to incorporate into our code.
24:43 And so I did this very sneakily already when I built the agent with you, but I
24:47 have this function here called setup observability. And all it does is it
24:53 initializes langfuse based on some environment variables that I have set
24:56 here. And I cover all that in my YouTube video on Langfuse if you're curious. But
24:59 you basically just connect to your Langfuse instance. And then after you
25:05 set up the connection and instrument your agent, your Pantic AI agent for
25:09 observability, that is all you have to do. Literally no more code in here for
25:13 Langfuse. And it's going to watch for all of our agent executions, even
25:17 getting a sense of the tool calls that it's making under the hood. So take a
25:20 look at this. So I'm in the Langfuse dashboard now where I can view that
25:25 execution that we had from our test earlier where it used the add numbers
25:29 function and we have all of this very rich data around the number of tokens
25:33 that it used the latency. We can view the tools and also look at the different
25:37 parameters that we have like the tool arguments like for the numbers to add.
25:41 We can view the system prompt that was leveraged here based on that template we
25:45 have defined. We have all this observability that also really helps for
25:49 monitoring our agents in production when other users are leveraging the agent. So
25:53 we can't just like look at our chat and see how the agent is performing. And
25:57 there's so many other things within langu as well that I don't want to get
26:01 into right now like eval for your agent. It is a totally open- source platform
26:06 just like me zero and guardrails AI. So again focusing on open source a lot in
26:10 this video. There are other solutions for this kind of observability like
26:14 Heliconee and Langmith for example, but Langfuse is the one that I love using.
26:18 And I know I didn't cover it too much in the code, but it really is as simple as
26:21 what I showed you. And so you can use the repository that I have linked below
26:25 as your template to like start an agent with observability baked right in if
26:29 you're interested. And then the very last component that I want to at least
26:33 touch on right now is how you can configure your agent upfront to work
26:37 well for deployment when you're ready to take your agent into production. Now,
26:41 obviously that's going to be part of the last 10%. Not something I'm going to
26:44 talk about a lot in this video, but the one big golden nugget that I want to
26:49 give you here is you should always think about how you can build your AI agent to
26:55 run as a Docker container. Docker is my method for packaging up any application
27:00 especially AI agents that I want to deploy to the cloud and also I will say
27:06 that AI coding assistants are very good at setting up docker configuration like
27:10 your docker files and docker compose u files. Yeah. So leverage those and then
27:14 you can add you know like a simple streamllet application with Python or
27:18 build a react front end to create a chat interface for your agent if it is a
27:22 conversationally driven agent or otherwise what I like to do for more you
27:26 know like background agents that run on a data set periodically I'll run it just
27:29 as a serverless function so it's kind of like background agent run it as
27:33 serverless in a docker container conversational agent you run it in a
27:37 docker container also with a front-end application that's pretty much like the
27:41 two tracks I have for any agent that I want to deploy. So yeah, just think like
27:45 Docker native. Have that in your mind from the get-go when you're building
27:48 your agent. What you don't want to focus on for observability and deployment and
27:52 everything production ready is Kubernetes orchestration, extensive LM
27:57 evals or prompt AB testing. Like some of the things we have in Langfuse that are
28:00 very powerful when you want to super refine your agent tools and system
28:03 prompt and everything like don't even worry about that yet. You can definitely
28:07 get there and like I said core part of the last 10%. But right now also don't
28:12 even think about like the infrastructure that much because unless you're running
28:15 local large language models, you don't really need heavy infrastructure for
28:19 your agents at all. Like obviously it depends on the amount of usage of your
28:23 agent. But for most use cases, just like a couple of vCPUs and a few gigabytes of
28:28 RAM is all you need to run an AI agent even if you have a front-end application
28:33 as well. very very lightweight as long as you are calling a third party for the
28:38 large language model like open router or you know anthropic or gemini whatever
3:06 you can really see what I'm talking about. So when you're building the very
3:10 core of your AI agent, it's really just three steps. You need to pick a large
3:14 language model, write a basic system prompt as the agent instructions, and
3:18 then add your first tool because you need a tool otherwise it's really just a
3:22 regular large language model, not an agent. And so for picking a large
3:26 language model, I would highly recommend using a platform called Open Router
3:29 because it gives you access to pretty much any large language model you could
3:35 possibly want. And so Claude Haiku 4.5 is the general one that I use just as
3:38 I'm prototyping my AI agents, but you could use GPT 5 Mini. You could use an
3:43 open source model like DeepSeek, for example. Like all of them are available
3:47 on this platform. And then when creating your system prompt, you just want to
3:50 define your agents role and behavior. And you can refine this over time as
3:53 well. just starting really simple and then adding your first tool. Like you
3:57 can give it access to search the web. You can give it the ability to perform
4:01 mathematical computations with a calculator tool. Like literally whatever
4:04 it is, just start simple and then once you have this foundation, that's when
4:08 you can build on more capabilities and integrations. And I want to show you
4:11 more than just theory as well. Like let's actually go and build an AI agent
4:15 right now so you can see practically how dead simple it really is. And I'll have
4:19 a link to this repo in the description as well if you want to dive into this
4:23 extremely basic agent that's covering all of the components in this video,
4:27 even some things we'll talk about in a bit like observability. So you can get
4:30 this up and running yourself, even use this as a template for your first agent
4:35 if you want. And so I'm going to build it from scratch with you right now, like
4:38 show you line by line how simple this really is. It's going to be less than 50
4:42 lines in the end, just like I promised in the slide. And so first I'm going to
4:46 import all of the Python dependencies. I'm using Pantic AI since it's my
4:51 favorite AI agent framework, but it really doesn't matter the one that you
4:54 use. The principles that I'm covering in this video applies no matter how you're
4:57 building your agents, even if it's with a tool like N8N because what I'm
5:01 focusing on here is just defining our four core components. LLM, tools,
5:07 memory, and a system prompt. And so the first thing I'm going to do is define
5:11 the large language model that I want to leverage. And just like I talked about a
5:15 little bit ago, I'm using open router. So right now I'm going to use cloud
5:19 haiku 4.5 as my model. But literally just changing this line or just changing
5:24 my environment variable here. A single line change. I can swap to any model I
5:29 want like Gemini or DeepSeek or OpenAI. It's that easy. After I have my LLM
5:34 defined, now I define the agent itself including the system prompt, the
5:38 highlevel instructions. And so I'm importing this from a separate file.
5:42 I'll just show you a very very basic example of a system prompt here and then
5:45 more on this in a little bit. The core components that I generally include
5:50 including the persona goal tool instructions, the output format like how
5:54 it communicates back to us and then also any other miscellaneous instructions I
5:57 want to include. So I have this saved here. Now this is a part of my agent
6:01 that I've defined. And so the next thing that we need to add is a tool to really
6:06 turn it from an LLM or a chatbot into a full-fledged agent. And the way that you
6:11 do that with most AI Asian frameworks is you define a Python function like very
6:16 simply and then you add what is called a decorator. This signals to paid AI that
6:21 this function right here I want to take and attach to my agent as a capability
6:26 that it can now invoke. And so the agent defines these parameters when it calls
6:30 the tool. So like in this case this is a very basic tool to add two numbers
6:34 together because large language models as token prediction machines actually
6:38 suck at math. interesting fact. And so it defines these parameters and it
6:42 leverages this dock string as it's called like this comment is included as
6:47 a part of the prompt to the LLM because it defines when and how to leverage this
6:51 tool which in this case the functionality is very basic just adding
6:54 two numbers together. But this could be a tool to search the web based on a
6:59 query it defines create an event in our calendar based on a time range and title
7:02 that it defines right like all those things are parameters and then we
7:05 perform the functionality for the agent based on that. That is the tool that we
7:09 got for the agent. And that is really good. We've created our agent and added
7:13 the tools. The only thing we have to do now is set up a way to interact with it.
7:17 So I'm going to create a very basic command line interface here. We start
7:21 with empty conversation. This is where we'll add memory, which is the fourth
7:25 component of agents. And so in an infinite loop here, we're getting the
7:28 input from the user. Uh and we're exiting the program if they say exit.
7:32 Otherwise, we are going to call the agent. So it's very simply agent.run run
7:37 with the user's latest message and passing in for short-term memory the
7:41 conversation history so it knows what we said to each other up until this point
7:46 and then I'm going to add on to the conversation history everything that we
7:50 just said and then print out the agents latest response. Take a look at that.
7:53 And then even after we call our main function here, we are still below 50
7:59 lines of code. It is that easy to define our agents. And obviously there's so
8:03 many more things that we have to do to really get our agent to the point where
8:06 it's production ready. But again, I just want to focus on making it dead simple
8:10 for you right now. And I know that a lot of this might be review for you if you
8:14 built agents in the past. But especially if you have built a lot of AI agents
8:18 already, you're probably like me where a lot of times you just overcomplicate
8:22 things cuz you know how much can go into building agents. That's what I'm trying
8:25 to do is just draw you back to the fundamentals because you need to keep
8:29 things simple when you're first creating any agent really any software at all.
8:33 And so yeah, we can go into the terminal now and interact with our agent. So I'm
8:37 going to run agent.py here. Everything that we just built, I can say hello to
8:41 get a super simple response back here. And then I can say for example, what is
8:45 and I'll just do a couple of bigger numbers that I want to add together. And
8:49 so here it knows thanks to the tool description that it should use the add
8:54 numbers tool that we gave it to produce this sum. There we go. Take a look at
8:58 that. And I can even say did you use the tool, right? And it should say yes. Like
9:01 it actually recognizes based on the conversation history that it used the ad
9:05 numbers tool. Okay, perfect. So we got this agent with conversation history. It
9:09 knows when to use this tool. And now at this point we can start to expand the
9:13 tools that we give it. We can refine our system prompt, play around with
9:16 different LLMs. and I want to talk about that as well. Now, starting with large
9:20 language models, choosing your LLM, like I was saying when I was building the
9:24 agent, Claude Haiku 4.5 is the one that I recommend just a cheap and fast option
9:28 that's really good for building proof of concepts when I don't want to spend a
9:31 lot of money on tokens as I'm iterating on my agent initially. And then Claude
9:36 Sonnet 4.5 is generally the best all-around right now. This might change
9:40 in literally a week and people have different opinions. The main thing that
9:44 I want to communicate here is don't actually worry about picking the perfect
9:47 LLM up front, especially when you're using a platform like Open Router where
9:52 it makes it so easy to swap between LLMs. Even if you're not using Open
9:56 Router, it still is really easy. And then if you want a local model for
10:00 privacy reasons or you want to be 100% free running on your hardware, then
10:04 Mistl 3.1 Smaller Quen 3 are the ones that I recommend right now. And if you
10:08 haven't ever tried Open Router or a tool like it that really just routes you
10:12 between the different LLM providers, I would highly recommend trying one
10:15 because it makes it so easy to iterate on the LLM for your agent, giving you
10:19 instant access to take a look at this. We got Grock, Anthropic, Gemini, we've
10:25 got the GPT models, we've got uh Quen 3, all the open- source ones. No matter
10:28 what you want to experiment with, you've got it here. And so just use this as
10:32 your tool to iterate on the LM very quickly and just not have to think about
10:36 it that much. And then for the system prompt component, I promised I would
10:39 dive a little bit more into the different categories that I have. So
10:41 that's what I want to talk about very quickly. It can be especially easy to
10:46 overthink the system prompt because it's just such a broad problem to solve of
10:50 like what should the top level instruction set be for my agent? And so
10:54 I like to keep things simple by working off of a template that I use for all of
10:58 my AI agents at least as a starting point. I always have persona and goals,
11:02 tool instructions and examples, output format, and miscellaneous instructions.
11:07 And what you shouldn't worry about at this point is setting up elaborate
11:12 prompt evaluations or split testing your system prompts. You can get into that
11:15 when you really want to refine your agent instructions. But right now, just
11:20 keep it simple and refine at a high level as you are manually testing your
11:24 agent. And if you want to see that system prompt template in action, I've
11:27 got you covered. I'll have a link to this in the description as well. It's a
11:32 real example of me filling out those different sections, creating a system
11:36 prompt for a task management agent. So, I have my persona defined here. I'm
11:40 defining the goals for the task management agent. The tool instructions
11:44 like how I can use different tools together to manage tasks in my platform.
11:49 The output format, just specifying ways that I want it to communicate back to me
11:53 or things to avoid. Some examples. Now, this applies more to more complex agents
11:57 and system prompts where you actually want to kind of give an example of a
12:01 workflow of chaining different tools together, so it doesn't really apply
12:03 here. And then the last thing is just miscellaneous instructions. This is also
12:08 the place to go to add in extra instructions to fix those little issues
12:12 you see with your agent that doesn't necessarily fit into all the others. So,
12:15 a catchall to make sure that there's a place to put anything as you're
12:19 experimenting with your agent and refining your system prompt. And then as
12:23 far as tools go for your AI agents, there's just a few things I want to
12:26 cover quickly to help you keep things simple and focused. The first is that
12:31 you should keep your tools to under 10 for your AI agents, at least when
12:34 starting out. And you definitely want to make sure that each tool's purpose is
12:38 very distinct. Because if your tools have overlapping functionality or if you
12:42 have too many, then your large language model starts to get overwhelmed with all
12:46 the possibilities of its capabilities and it'll use the wrong tools. It will
12:51 forget to call tools. uh and it's just a mess. Like definitely keep it to under
12:56 10. And then also MCP servers are a great way to find preackaged sets of
13:00 tools you can bring into your an agent when you're, you know, creating
13:02 something initially and you just want to move very quickly. And so definitely
13:06 based on what you're building, you'll probably be able to find an MTP server
13:10 that gives you some functionality right out of the box for your agents. And then
13:14 the last thing I'll say is a lot of people ask me, "What capabilities should
13:18 I focus on learning first when I'm building agents?" and I want to give
13:24 them tools and rag is always the answer that I have for them. Giving your AI
13:28 agent tools that allows it to search your documents and knowledge base.
13:31 That's what retrieval augmented generation is. And so really, it's
13:35 giving your agents the ability to ground their responses in real data. And I
13:40 would say that probably over 80% of AI agents running out in the wild right
13:44 now, no matter the industry or niche, are using rag to some extent as part of
13:49 the capabilities for the agent. And then continuing with our theme here, what not
13:55 to focus on when building tools is don't worry about multi- aent systems or
13:59 complex tool orchestration through that yet. When you have a system that starts
14:03 to have more than 10 tools, that is generally when you start to split into
14:07 specialized sub aents and you have routing between them. Those kinds of
14:12 systems are powerful and necessary for a lot of applications, but definitely
14:15 overengineering when you're just getting started creating your agent or a system.
14:18 Also, if you want to learn more about rag and building that into your agents,
14:21 check out the video that I'll link to right here. I cover that all of the time
14:25 on my channel because it is so important. And so with that, moving on
14:29 to the next thing, we have our security essentials because it is important to
14:32 think about security when you're building any software upfront. But I
14:36 don't want you to over complicate it yet, right? Like don't become a security
14:40 expert overnight. There are existing tools out there to help us with
14:43 security. So we can still move quickly as we're building our agent initially.
14:47 We'll definitely want to pay more attention to security when we're going
14:50 into production. But at first there are a couple of tools that I want to call
14:54 out here. And then just some general principles to follow. Like for example,
14:58 don't hardcode your API keys, right? Like you don't want to have your OpenAI
15:02 or anthropic API key just sitting there right in your code or your end workflow
15:06 for example. You always want to store that in a secure way through things like
15:12 environment variables. And then also when we think about building AI agents
15:15 in particular, there's a lot of security that we want to implement through what
15:18 are called guard rails, right? So limiting what kind of information can
15:22 come into the large language model and then also limiting the kinds of
15:27 responses that the agent can give and having it like actually retry if it
15:31 produces any kind of response that isn't acceptable for us. And there's a super
15:36 popular open source repository that I lean on all the time to help with
15:39 guardrails and very creatively called guardrails AI. And so it's a Python
15:43 framework because I always love building my AI agents with Python that helps
15:48 build reliable AI applications by giving you both the input and output guard
15:51 rails that I'm talking about. So limiting what goes in and limiting what
15:56 the agent can produce. And they provide a lot of different options for
15:59 guardrails. Like for example, one thing that you want to avoid quite often is
16:04 inserting any kind of PII, personally identifiable information into a prompt
16:08 to an LLM, especially when it's going out to some model in the cloud like
16:12 anthropic or Gemini instead of a local LLM. So limiting that kind of thing,
16:16 maybe detecting any vulgar language that's outputed from an LLM because they
16:20 will do that sometimes. Like those are just some examples of input and output
16:24 guard rails. And it is very easy to install this as a Python package and
16:29 bring these guards right into your code as you are interacting with your agents
16:32 like we saw earlier when I had that, you know, simple command line tool to talk
16:35 to the agent. Like I could just add a guard before or after that call to the
16:39 agent. So yeah, guardrails don't have to be complicated. There are tools like
16:42 this, even completely open- source ones like guardrails AI that make it very
16:46 easy. Okay, so we've talked about guardrails and I gave you one example of
16:50 best practices for security in our codebase. But what about the other
16:54 million different vulnerabilities we have to account for in our codebase and
16:59 the dependencies we're bringing into our project? We can't expect ourselves to
17:03 become a security expert overnight. And so it's important to learn these things,
17:07 but also we can lean on existing tools to help us with this vulnerability
17:12 detection. There are a lot of options out there for this, but Sneak Studio is
17:16 one that I've been leaning on a lot recently. And they also have an MCP
17:21 server within the studio to help us handle vulnerability detection
17:25 automatically right within our coding process. So like always, I'm trying to
17:29 focus on open- source solutions for this video, but there's really no open-
17:33 source alternative to Sneak that I know about. This platform is incredible. So
17:38 in the Sneak Studio, we can set up these different projects and integrations. We
17:42 can have it analyze our codebase and dependencies for vulnerabilities in our
17:46 GitHub repositories. They have a CLI. We can do things locally. They have the MCP
17:49 server that I'm going to show you in a little bit. I'll link to all this in the
17:53 description. But yeah, the MCP server in particular is super cool to me because
17:57 we can have vulnerability detection built right into our AI coding
18:02 workflows. Now, so take a look at this. I have the Sneak MCP server connected
18:07 directly to my cloud code after I went through the Sneak authentication process
18:10 in the CLI. And you can connect this to literally any AI coding assistant or MCP
18:15 client. So now within cloud I could build this into a full AI coding
18:18 workflow which is very cool. I'm going to show you a simple demo right now.
18:23 I'll just say you know use the sneak MCP to analyze my code and dependencies
18:29 for vulnerabilities. And so it's able to leverage different tools within the MCP server to check for
18:37 both right like it's a very robust solution here. And so I'll let it go for
18:41 a little bit. I'll pause and come back once it has run the vulnerability
18:44 detection. Okay, this is so cool. Take a look at this. So, within my basic agent
18:49 repository, first it used the sneakc server to analyze for any
18:53 vulnerabilities with my dependencies, things like paidantic AI, for example.
18:59 And then it does a code scan. So, this would also detect things like if I had
19:03 my environment variables hardcoded like the example that I gave earlier. So, it
19:07 found three issues with my dependencies and nothing with my code, which I'm very
19:11 proud of. I got no issues with my code. And not only does it do the analysis,
19:15 but it gives me a summary and lists the actions I can take to remedy things.
19:18 Like here are the uh just medium severity vulnerabilities that I have
19:22 within a few of my dependencies. Nothing in my code. And then it gives me
19:26 recommendations to fix fix things. And so I can go and say yes action on this
19:30 now. And it's going to update my requirements.ext fix these things. And I
19:34 could even run the sneak MCP server again. And you can definitely see how
19:37 you'd build this kind of thing directly into the validation layer of your AI
19:42 coding workflow. Very, very neat for any AI agent or really any software you want
19:46 to build at all. Moving on, I want to talk about memory. Now, managing the
19:50 tokens that we're passing into the LLM calls for our agents. And this really is
19:54 a hot topic right now, especially with all the rate limiting that people are
19:58 getting with AI coding assistants like Claude Code. It really is important to
20:03 manage our context efficiently, only giving to our agents the information it
20:07 actually needs and not completely bloating our system prompts with
20:10 thousands of lines of instruction and tools that it doesn't actually need.
20:14 That's what you want to avoid. And so, just a couple of simple tips here going
20:18 along with our theme. The first one is to keep your prompts very concise. both
20:23 your system prompts and then also the tool descriptions that describe to your
20:27 agent when and how to use tools like I showed in the code earlier. You don't
20:31 need to over complicate it. That's why I have these templates for you like the
20:34 one for the system prompt, right? Like you have your goal just a couple of
20:38 sentences, your persona just a couple of sentences. Keep it very organized and
20:42 keeping it organized also helps you keep it quite concise. You don't need to
20:47 overthink it. And so keeping your system prompts to just a couple of hundred
20:51 lines at most is generally what I recommend. Some solutions might need
20:54 more, but that's when I'd start to question like could you really make that
20:58 more concise or split it into different specialized agents so each agent still
21:03 has a simple system prompt. Another thing you can do for agents that have
21:07 longer conversations is you can limit kind of in a sliding window to the 10 or
21:12 20 most recent messages, for example, that you actually include in the
21:15 context. And going back to the code, I'll even show you what that looks like
21:19 here. Like right now when we call our agent, we run it, we're passing in the
21:22 entire conversation history. But in Python, if I wanted to include just the
21:26 last 10 messages, I could do something like this. And so now maybe like, you
21:30 know, all previous messages aren't really as relevant anymore. We just want
21:34 to include the most recent 10. That's how we can do that. So that's another
21:37 really popular strategy. Also, tools like N8N have that as an option baked
21:41 directly into their short-term memory nodes. So very useful to know. And then
21:46 also when you start to have so much information about a single user that you
21:51 don't want to include it in the short-term memory, that's when you can
21:55 look at long-term memory. But also, don't build it from scratch. Again,
21:59 don't over complicate it. There are tools that you can use just like with
22:03 security to help us with long-term memory, and mem is one of those. Mem is
22:09 a completely open-source long-term memory agentic framework. And so I'll
22:12 show the GitHub in a second here, but yeah, when you have so much information
22:16 about a user that you can't just include it all in context, you need some way to
22:21 search through a longer term set of memories and bring only the ones in that
22:24 are relevant to the current conversation, which actually does use
22:27 rag under the hood, by the way. So again, another example why it's such an
22:32 important capability. Um, but yeah, basically you're able to pull core
22:36 memories from conversations and store it to be searched later. That's what
22:40 Memzero offers us. And it's so easy to include in our Python code to just like
22:44 guardrails AI. I'll show you an example really quickly in their quick start. You
22:48 install it as a Python package and then you basically have a function to search
22:53 for memories like performing rag to find memories related to the latest message
22:57 and then you have a function to add memories. And so it'll use a large
23:01 language model to extract the key information to store to be retrieved
23:05 later. And so this definitely solves the context problem because now you're able
23:09 to basically have infinite memory for an agent, but you don't have to give it all
23:13 to the LLM at once. It just retrieves things as needed. And of course, the
23:16 last thing I want to hit on for context is what not to focus on when you're
23:21 first building your agent. Do not worry about advanced memory compression
23:24 techniques. There's a lot of cool things that Enthropic especially has been doing
23:27 research on, but like don't worry about that. Don't worry about specialized sub
23:31 agents. These are both solutions to handle the memory problem when it starts
23:36 to get really really technical. But right now, just start simple and you can
23:40 always optimize things as you're starting to expand your agent and go to
23:44 production and you hit some limits. But right now, focusing on these things up
23:49 front is all you need to go the first 90% probably even beyond depending on
23:54 how simple your agents are. And context was the last of the four core components
23:57 of agents. So, we've covered the core four and security. Now, I want to talk a
24:02 bit about observability and deployment. Getting our agent ready for production.
24:06 And I will say that security, observability, and deployment definitely
24:10 go a lot more into the last 10% of building an agent. But I want to touch
24:13 on them here because there are some ways to design stuff up front very simply,
24:18 especially with observability. I want to introduce you to Langfuse right now. And
24:22 I covered this on my channel already. Link to a video right here on Langfuse
24:25 if you want to dive more into observability. But we can set up the
24:29 ability to watch for the actions that our agent is taking, view them in a
24:34 dashboard. We can do things like testing different prompts for our agents. It is
24:38 a beautiful platform and it's actually super easy to incorporate into our code.
24:43 And so I did this very sneakily already when I built the agent with you, but I
24:47 have this function here called setup observability. And all it does is it
24:53 initializes langfuse based on some environment variables that I have set
24:56 here. And I cover all that in my YouTube video on Langfuse if you're curious. But
24:59 you basically just connect to your Langfuse instance. And then after you
25:05 set up the connection and instrument your agent, your Pantic AI agent for
25:09 observability, that is all you have to do. Literally no more code in here for
25:13 Langfuse. And it's going to watch for all of our agent executions, even
25:17 getting a sense of the tool calls that it's making under the hood. So take a
25:20 look at this. So I'm in the Langfuse dashboard now where I can view that
25:25 execution that we had from our test earlier where it used the add numbers
25:29 function and we have all of this very rich data around the number of tokens
25:33 that it used the latency. We can view the tools and also look at the different
25:37 parameters that we have like the tool arguments like for the numbers to add.
25:41 We can view the system prompt that was leveraged here based on that template we
25:45 have defined. We have all this observability that also really helps for
25:49 monitoring our agents in production when other users are leveraging the agent. So
25:53 we can't just like look at our chat and see how the agent is performing. And
25:57 there's so many other things within langu as well that I don't want to get
26:01 into right now like eval for your agent. It is a totally open- source platform
26:06 just like me zero and guardrails AI. So again focusing on open source a lot in
26:10 this video. There are other solutions for this kind of observability like
26:14 Heliconee and Langmith for example, but Langfuse is the one that I love using.
26:18 And I know I didn't cover it too much in the code, but it really is as simple as
26:21 what I showed you. And so you can use the repository that I have linked below
26:25 as your template to like start an agent with observability baked right in if
26:29 you're interested. And then the very last component that I want to at least
26:33 touch on right now is how you can configure your agent upfront to work
26:37 well for deployment when you're ready to take your agent into production. Now,
26:41 obviously that's going to be part of the last 10%. Not something I'm going to
26:44 talk about a lot in this video, but the one big golden nugget that I want to
26:49 give you here is you should always think about how you can build your AI agent to
26:55 run as a Docker container. Docker is my method for packaging up any application
27:00 especially AI agents that I want to deploy to the cloud and also I will say
27:06 that AI coding assistants are very good at setting up docker configuration like
27:10 your docker files and docker compose u files. Yeah. So leverage those and then
27:14 you can add you know like a simple streamllet application with Python or
27:18 build a react front end to create a chat interface for your agent if it is a
27:22 conversationally driven agent or otherwise what I like to do for more you
27:26 know like background agents that run on a data set periodically I'll run it just
27:29 as a serverless function so it's kind of like background agent run it as
27:33 serverless in a docker container conversational agent you run it in a
27:37 docker container also with a front-end application that's pretty much like the
27:41 two tracks I have for any agent that I want to deploy. So yeah, just think like
27:45 Docker native. Have that in your mind from the get-go when you're building
27:48 your agent. What you don't want to focus on for observability and deployment and
27:52 everything production ready is Kubernetes orchestration, extensive LM
27:57 evals or prompt AB testing. Like some of the things we have in Langfuse that are
28:00 very powerful when you want to super refine your agent tools and system
28:03 prompt and everything like don't even worry about that yet. You can definitely
28:07 get there and like I said core part of the last 10%. But right now also don't
28:12 even think about like the infrastructure that much because unless you're running
28:15 local large language models, you don't really need heavy infrastructure for
28:19 your agents at all. Like obviously it depends on the amount of usage of your
28:23 agent. But for most use cases, just like a couple of vCPUs and a few gigabytes of
28:28 RAM is all you need to run an AI agent even if you have a front-end application
28:33 as well. very very lightweight as long as you are calling a third party for the
28:38 large language model like open router or you know anthropic or gemini whatever
28:42 that might be. So there you go that's everything that I have for you today
28:46 helping you just keep it simple which will not just help you build better
28:49 agents even when you have to scale complexity but it'll also just help you
28:53 get over that hurdle of motivation because I'm giving you permission to not
28:57 be perfect at first. you just start with the foundations like I showed you and
29:01 then build on top and iterate as you need. And so I hope that inspires you to
29:05 just go and build your next AI agent right now because it can be super simple
29:10 to start. And so with that, if you appreciate this video and you're looking
29:13 forward to more things on building AI agents and using AI coding assistants,
29:17 I'd really appreciate a like and a subscribe. And with that, I will see you
$

Learn 90% of Building AI Agents in 30 Minutes

@ColeMedin 29:20 15 chapters
[AI agents and automation][developer tools and coding][content creation and YouTube][marketing and growth hacking][productivity and workflows]
// chapters
// description

Not only have I built hundreds of AI agents myself, I've seen other people build thousands of AI agents for every use case under the sun. The people who are the most successful are the ones who don't overcomplicate it - and I want that to be you too. It's easy to think building AI agents is super complicated, but honestly you can learn 90% of what you need to know (and what to focus on) from this video. No matter how you're building your agents, I'll show you here what you need to think about,

now: 0:00
// tags
[AI agents and automation][developer tools and coding][content creation and YouTube][marketing and growth hacking][productivity and workflows]