// transcript — 1078 segments
0:00 Master the First 90% of Building AI Agents
0:02 Not only have I built hundreds of AI agents myself, I've seen other people
0:07 build thousands for every use case under the sun. And those who are the most
0:11 successful are the ones who don't over complicated. And in this video, I want
0:15 to show you how that can be you as well. Cuz here's the thing, and I see this all
0:20 of the time. When people first think about building their AI agents, their
0:24 perfectionism kicks in and they worry about creating the perfect system
0:28 prompt, defining the perfect tools, thinking about the LLM they want to use.
0:31 They consider the context and observability and latency and security
0:35 and deployment. They get overwhelmed with everything. And that might be you
0:38 as well. So, what I have to say to you right now is take a deep breath. That is
0:43 why I'm here. Honestly, you can learn 90% of what you need to build AI agents
0:48 from just this video. And that, my friend, is what I have for you in this
0:52 video. I want to cover each of the core components of building agents like
0:56 system prompts, tools, security, and context. And I want to break down what
1:00 you should focus on to build the first 90% of your agent. Basically, creating
1:04 that proof of concept. And then, honestly, even more importantly, I want
1:08 to talk about what you shouldn't focus on at first because otherwise, you're
1:12 over complicating it. the kinds of things you will need to look into at
1:15 some point when you want to specialize your agents and move to production. But
1:19 that's what my other content is for right now. Whether you're new to
1:22 building agents or you just want to build them faster, I want to help you
1:26 focus on the first 90% to make things dead simple. Oh, and by the way, this
1:30 agent that you're looking at right here is the mascot for the new Dynamis
1:33 Agentic coding course. So, if you want to master building systems around AI
1:37 coding, check out the link in the description. All right. So, the first
1:40 thing I want to cover with you is the four core components of any AI agent,
1:47 which quick recap, an AI agent is any large language model that is given the
1:51 ability to interact with the outside world on your behalf through tools. And
1:54 so, it can do something like book a meeting on your calendar, search the
1:58 internet for you. That's the first part of agents is these tools. It's the
2:01 functions that we give it that it can call upon to perform actions. And then
2:07 the brain for our AI agent is the large language model. It processes our
2:11 requests and it decides based on the instructions we give it which tools to
2:16 use. And speaking of those core instructions, that is our agent program,
2:21 aka the system prompt. It's the highest level set of instructions we give to any
2:25 AI agent at the start of any conversation that instructs it on its
2:30 persona, goals, how to use tools. We'll cover the different core components of
2:34 system prompts in a little bit. And then last but not least, we have memory
2:38 systems. That's the context we have from our conversations, both the short-term
2:42 and long-term memory. We'll talk about this a bit more when we get into context
2:47 as well. And so, as we go through each of these core components, I'm going to
2:50 move pretty quickly because I just want to cover the basics with you, but I'll
2:54 also link to different videos on my channel throughout this video if you
2:57 want to dive deeper into anything. And when building an AI agent, it is really
3:02 simple to get started. And I'll show you an example in code in just a bit here so
3:06 you can really see what I'm talking about. So when you're building the very
3:10 core of your AI agent, it's really just three steps. You need to pick a large
3:14 language model, write a basic system prompt as the agent instructions, and
3:18 then add your first tool because you need a tool otherwise it's really just a
3:22 regular large language model, not an agent. And so for picking a large
3:26 language model, I would highly recommend using a platform called Open Router
3:29 because it gives you access to pretty much any large language model you could
3:35 possibly want. And so Claude Haiku 4.5 is the general one that I use just as
3:38 I'm prototyping my AI agents, but you could use GPT 5 Mini. You could use an
3:43 open source model like DeepSeek, for example. Like all of them are available
3:47 on this platform. And then when creating your system prompt, you just want to
3:50 define your agents role and behavior. And you can refine this over time as
3:53 well. just starting really simple and then adding your first tool. Like you
3:57 can give it access to search the web. You can give it the ability to perform
4:01 mathematical computations with a calculator tool. Like literally whatever
4:04 it is, just start simple and then once you have this foundation, that's when
4:08 you can build on more capabilities and integrations. And I want to show you
4:11 more than just theory as well. Like let's actually go and build an AI agent
4:15 right now so you can see practically how dead simple it really is. And I'll have
4:19 a link to this repo in the description as well if you want to dive into this
4:23 extremely basic agent that's covering all of the components in this video,
4:27 even some things we'll talk about in a bit like observability. So you can get
4:30 this up and running yourself, even use this as a template for your first agent
4:35 if you want. And so I'm going to build it from scratch with you right now, like
4:38 show you line by line how simple this really is. It's going to be less than 50
4:42 lines in the end, just like I promised in the slide. And so first I'm going to
4:46 import all of the Python dependencies. I'm using Pantic AI since it's my
4:51 favorite AI agent framework, but it really doesn't matter the one that you
4:54 use. The principles that I'm covering in this video applies no matter how you're
4:57 building your agents, even if it's with a tool like N8N because what I'm
5:01 focusing on here is just defining our four core components. LLM, tools,
5:07 memory, and a system prompt. And so the first thing I'm going to do is define
5:11 the large language model that I want to leverage. And just like I talked about a
5:15 little bit ago, I'm using open router. So right now I'm going to use cloud
5:19 haiku 4.5 as my model. But literally just changing this line or just changing
5:24 my environment variable here. A single line change. I can swap to any model I
5:29 want like Gemini or DeepSeek or OpenAI. It's that easy. After I have my LLM
5:34 defined, now I define the agent itself including the system prompt, the
5:38 highlevel instructions. And so I'm importing this from a separate file.
5:42 I'll just show you a very very basic example of a system prompt here and then
5:45 more on this in a little bit. The core components that I generally include
5:50 including the persona goal tool instructions, the output format like how
5:54 it communicates back to us and then also any other miscellaneous instructions I
5:57 want to include. So I have this saved here. Now this is a part of my agent
6:01 that I've defined. And so the next thing that we need to add is a tool to really
6:06 turn it from an LLM or a chatbot into a full-fledged agent. And the way that you
6:11 do that with most AI Asian frameworks is you define a Python function like very
6:16 simply and then you add what is called a decorator. This signals to paid AI that
6:21 this function right here I want to take and attach to my agent as a capability
6:26 that it can now invoke. And so the agent defines these parameters when it calls
6:30 the tool. So like in this case this is a very basic tool to add two numbers
6:34 together because large language models as token prediction machines actually
6:38 suck at math. interesting fact. And so it defines these parameters and it
6:42 leverages this dock string as it's called like this comment is included as
6:47 a part of the prompt to the LLM because it defines when and how to leverage this
6:51 tool which in this case the functionality is very basic just adding
6:54 two numbers together. But this could be a tool to search the web based on a
6:59 query it defines create an event in our calendar based on a time range and title
7:02 that it defines right like all those things are parameters and then we
7:05 perform the functionality for the agent based on that. That is the tool that we
7:09 got for the agent. And that is really good. We've created our agent and added
7:13 the tools. The only thing we have to do now is set up a way to interact with it.
7:17 So I'm going to create a very basic command line interface here. We start
7:21 with empty conversation. This is where we'll add memory, which is the fourth
7:25 component of agents. And so in an infinite loop here, we're getting the
7:28 input from the user. Uh and we're exiting the program if they say exit.
7:32 Otherwise, we are going to call the agent. So it's very simply agent.run run
7:37 with the user's latest message and passing in for short-term memory the
7:41 conversation history so it knows what we said to each other up until this point
7:46 and then I'm going to add on to the conversation history everything that we
7:50 just said and then print out the agents latest response. Take a look at that.
7:53 And then even after we call our main function here, we are still below 50
7:59 lines of code. It is that easy to define our agents. And obviously there's so
8:03 many more things that we have to do to really get our agent to the point where
8:06 it's production ready. But again, I just want to focus on making it dead simple
8:10 for you right now. And I know that a lot of this might be review for you if you
8:14 built agents in the past. But especially if you have built a lot of AI agents
8:18 already, you're probably like me where a lot of times you just overcomplicate
8:22 things cuz you know how much can go into building agents. That's what I'm trying
8:25 to do is just draw you back to the fundamentals because you need to keep
8:29 things simple when you're first creating any agent really any software at all.
8:33 And so yeah, we can go into the terminal now and interact with our agent. So I'm
8:37 going to run agent.py here. Everything that we just built, I can say hello to
8:41 get a super simple response back here. And then I can say for example, what is
8:45 and I'll just do a couple of bigger numbers that I want to add together. And
8:49 so here it knows thanks to the tool description that it should use the add
8:54 numbers tool that we gave it to produce this sum. There we go. Take a look at
8:58 that. And I can even say did you use the tool, right? And it should say yes. Like
9:01 it actually recognizes based on the conversation history that it used the ad
9:05 numbers tool. Okay, perfect. So we got this agent with conversation history. It
9:09 knows when to use this tool. And now at this point we can start to expand the
9:13 tools that we give it. We can refine our system prompt, play around with
9:16 different LLMs. and I want to talk about that as well. Now, starting with large
9:20 language models, choosing your LLM, like I was saying when I was building the
9:24 agent, Claude Haiku 4.5 is the one that I recommend just a cheap and fast option
9:28 that's really good for building proof of concepts when I don't want to spend a
9:31 lot of money on tokens as I'm iterating on my agent initially. And then Claude
9:36 Sonnet 4.5 is generally the best all-around right now. This might change
9:40 in literally a week and people have different opinions. The main thing that
9:44 I want to communicate here is don't actually worry about picking the perfect
9:47 LLM up front, especially when you're using a platform like Open Router where
9:52 it makes it so easy to swap between LLMs. Even if you're not using Open
9:56 Router, it still is really easy. And then if you want a local model for
10:00 privacy reasons or you want to be 100% free running on your hardware, then
10:04 Mistl 3.1 Smaller Quen 3 are the ones that I recommend right now. And if you
10:08 haven't ever tried Open Router or a tool like it that really just routes you
10:12 between the different LLM providers, I would highly recommend trying one
10:15 because it makes it so easy to iterate on the LLM for your agent, giving you
10:19 instant access to take a look at this. We got Grock, Anthropic, Gemini, we've
10:25 got the GPT models, we've got uh Quen 3, all the open- source ones. No matter
10:28 what you want to experiment with, you've got it here. And so just use this as
10:32 your tool to iterate on the LM very quickly and just not have to think about
10:34 Crafting Your System Prompt
10:36 it that much. And then for the system prompt component, I promised I would
10:39 dive a little bit more into the different categories that I have. So
10:41 that's what I want to talk about very quickly. It can be especially easy to
10:46 overthink the system prompt because it's just such a broad problem to solve of
10:50 like what should the top level instruction set be for my agent? And so
10:54 I like to keep things simple by working off of a template that I use for all of
10:58 my AI agents at least as a starting point. I always have persona and goals,
11:02 tool instructions and examples, output format, and miscellaneous instructions.
11:07 And what you shouldn't worry about at this point is setting up elaborate
11:12 prompt evaluations or split testing your system prompts. You can get into that
11:15 when you really want to refine your agent instructions. But right now, just
11:20 keep it simple and refine at a high level as you are manually testing your
11:24 agent. And if you want to see that system prompt template in action, I've
11:27 got you covered. I'll have a link to this in the description as well. It's a
11:32 real example of me filling out those different sections, creating a system
11:36 prompt for a task management agent. So, I have my persona defined here. I'm
11:40 defining the goals for the task management agent. The tool instructions
11:44 like how I can use different tools together to manage tasks in my platform.
11:49 The output format, just specifying ways that I want it to communicate back to me
11:53 or things to avoid. Some examples. Now, this applies more to more complex agents
11:57 and system prompts where you actually want to kind of give an example of a
12:01 workflow of chaining different tools together, so it doesn't really apply
12:03 here. And then the last thing is just miscellaneous instructions. This is also
12:08 the place to go to add in extra instructions to fix those little issues
12:12 you see with your agent that doesn't necessarily fit into all the others. So,
12:15 a catchall to make sure that there's a place to put anything as you're
12:19 experimenting with your agent and refining your system prompt. And then as
12:20 Creating Your Tools (Agent Capabilities)
12:23 far as tools go for your AI agents, there's just a few things I want to
12:26 cover quickly to help you keep things simple and focused. The first is that
12:31 you should keep your tools to under 10 for your AI agents, at least when
12:34 starting out. And you definitely want to make sure that each tool's purpose is
12:38 very distinct. Because if your tools have overlapping functionality or if you
12:42 have too many, then your large language model starts to get overwhelmed with all
12:46 the possibilities of its capabilities and it'll use the wrong tools. It will
12:51 forget to call tools. uh and it's just a mess. Like definitely keep it to under
12:56 10. And then also MCP servers are a great way to find preackaged sets of
13:00 tools you can bring into your an agent when you're, you know, creating
13:02 something initially and you just want to move very quickly. And so definitely
13:06 based on what you're building, you'll probably be able to find an MTP server
13:10 that gives you some functionality right out of the box for your agents. And then
13:14 the last thing I'll say is a lot of people ask me, "What capabilities should
13:18 I focus on learning first when I'm building agents?" and I want to give
13:24 them tools and rag is always the answer that I have for them. Giving your AI
13:28 agent tools that allows it to search your documents and knowledge base.
13:31 That's what retrieval augmented generation is. And so really, it's
13:35 giving your agents the ability to ground their responses in real data. And I
13:40 would say that probably over 80% of AI agents running out in the wild right
13:44 now, no matter the industry or niche, are using rag to some extent as part of
13:49 the capabilities for the agent. And then continuing with our theme here, what not
13:55 to focus on when building tools is don't worry about multi- aent systems or
13:59 complex tool orchestration through that yet. When you have a system that starts
14:03 to have more than 10 tools, that is generally when you start to split into
14:07 specialized sub aents and you have routing between them. Those kinds of
14:12 systems are powerful and necessary for a lot of applications, but definitely
14:15 overengineering when you're just getting started creating your agent or a system.
14:18 Also, if you want to learn more about rag and building that into your agents,
14:21 check out the video that I'll link to right here. I cover that all of the time
14:25 on my channel because it is so important. And so with that, moving on
14:26 AI Agent Security
14:29 to the next thing, we have our security essentials because it is important to
14:32 think about security when you're building any software upfront. But I
14:36 don't want you to over complicate it yet, right? Like don't become a security
14:40 expert overnight. There are existing tools out there to help us with
14:43 security. So we can still move quickly as we're building our agent initially.
14:47 We'll definitely want to pay more attention to security when we're going
14:50 into production. But at first there are a couple of tools that I want to call
14:54 out here. And then just some general principles to follow. Like for example,
14:58 don't hardcode your API keys, right? Like you don't want to have your OpenAI
15:02 or anthropic API key just sitting there right in your code or your end workflow
15:06 for example. You always want to store that in a secure way through things like
15:12 environment variables. And then also when we think about building AI agents
15:15 in particular, there's a lot of security that we want to implement through what
15:18 are called guard rails, right? So limiting what kind of information can
15:22 come into the large language model and then also limiting the kinds of
15:27 responses that the agent can give and having it like actually retry if it
15:31 produces any kind of response that isn't acceptable for us. And there's a super
15:32 Guardrails AI
15:36 popular open source repository that I lean on all the time to help with
15:39 guardrails and very creatively called guardrails AI. And so it's a Python
15:43 framework because I always love building my AI agents with Python that helps
15:48 build reliable AI applications by giving you both the input and output guard
15:51 rails that I'm talking about. So limiting what goes in and limiting what
15:56 the agent can produce. And they provide a lot of different options for
15:59 guardrails. Like for example, one thing that you want to avoid quite often is
16:04 inserting any kind of PII, personally identifiable information into a prompt
16:08 to an LLM, especially when it's going out to some model in the cloud like
16:12 anthropic or Gemini instead of a local LLM. So limiting that kind of thing,
16:16 maybe detecting any vulgar language that's outputed from an LLM because they
16:20 will do that sometimes. Like those are just some examples of input and output
16:24 guard rails. And it is very easy to install this as a Python package and
16:29 bring these guards right into your code as you are interacting with your agents
16:32 like we saw earlier when I had that, you know, simple command line tool to talk
16:35 to the agent. Like I could just add a guard before or after that call to the
16:39 agent. So yeah, guardrails don't have to be complicated. There are tools like
16:42 this, even completely open- source ones like guardrails AI that make it very
16:45 Snyk MCP Server
16:46 easy. Okay, so we've talked about guardrails and I gave you one example of
16:50 best practices for security in our codebase. But what about the other
16:54 million different vulnerabilities we have to account for in our codebase and
16:59 the dependencies we're bringing into our project? We can't expect ourselves to
17:03 become a security expert overnight. And so it's important to learn these things,
17:07 but also we can lean on existing tools to help us with this vulnerability
17:12 detection. There are a lot of options out there for this, but Sneak Studio is
17:16 one that I've been leaning on a lot recently. And they also have an MCP
17:21 server within the studio to help us handle vulnerability detection
17:25 automatically right within our coding process. So like always, I'm trying to
17:29 focus on open- source solutions for this video, but there's really no open-
17:33 source alternative to Sneak that I know about. This platform is incredible. So
17:38 in the Sneak Studio, we can set up these different projects and integrations. We
17:42 can have it analyze our codebase and dependencies for vulnerabilities in our
17:46 GitHub repositories. They have a CLI. We can do things locally. They have the MCP
17:49 server that I'm going to show you in a little bit. I'll link to all this in the
17:53 description. But yeah, the MCP server in particular is super cool to me because
17:57 we can have vulnerability detection built right into our AI coding
18:02 workflows. Now, so take a look at this. I have the Sneak MCP server connected
18:07 directly to my cloud code after I went through the Sneak authentication process
18:10 in the CLI. And you can connect this to literally any AI coding assistant or MCP
18:15 client. So now within cloud I could build this into a full AI coding
18:18 workflow which is very cool. I'm going to show you a simple demo right now.
18:23 I'll just say you know use the sneak MCP to analyze my code and dependencies
18:29 for vulnerabilities. And so it's able to leverage different tools within the MCP server to check for
18:37 both right like it's a very robust solution here. And so I'll let it go for
18:41 a little bit. I'll pause and come back once it has run the vulnerability
18:44 detection. Okay, this is so cool. Take a look at this. So, within my basic agent
18:49 repository, first it used the sneakc server to analyze for any
18:53 vulnerabilities with my dependencies, things like paidantic AI, for example.
18:59 And then it does a code scan. So, this would also detect things like if I had
19:03 my environment variables hardcoded like the example that I gave earlier. So, it
19:07 found three issues with my dependencies and nothing with my code, which I'm very
19:11 proud of. I got no issues with my code. And not only does it do the analysis,
19:15 but it gives me a summary and lists the actions I can take to remedy things.
19:18 Like here are the uh just medium severity vulnerabilities that I have
19:22 within a few of my dependencies. Nothing in my code. And then it gives me
19:26 recommendations to fix fix things. And so I can go and say yes action on this
19:30 now. And it's going to update my requirements.ext fix these things. And I
19:34 could even run the sneak MCP server again. And you can definitely see how
19:37 you'd build this kind of thing directly into the validation layer of your AI
19:42 coding workflow. Very, very neat for any AI agent or really any software you want
19:45 Managing Agent Context (Memory)
1:38 The 4 Core Components of AI Agents
1:40 thing I want to cover with you is the four core components of any AI agent,
1:47 which quick recap, an AI agent is any large language model that is given the
1:51 ability to interact with the outside world on your behalf through tools. And
1:54 so, it can do something like book a meeting on your calendar, search the
1:58 internet for you. That's the first part of agents is these tools. It's the
2:01 functions that we give it that it can call upon to perform actions. And then
2:07 the brain for our AI agent is the large language model. It processes our
2:11 requests and it decides based on the instructions we give it which tools to
2:16 use. And speaking of those core instructions, that is our agent program,
2:21 aka the system prompt. It's the highest level set of instructions we give to any
2:25 AI agent at the start of any conversation that instructs it on its
2:30 persona, goals, how to use tools. We'll cover the different core components of
2:34 system prompts in a little bit. And then last but not least, we have memory
2:38 systems. That's the context we have from our conversations, both the short-term
2:42 and long-term memory. We'll talk about this a bit more when we get into context
2:47 as well. And so, as we go through each of these core components, I'm going to
2:50 move pretty quickly because I just want to cover the basics with you, but I'll
2:54 also link to different videos on my channel throughout this video if you
2:57 want to dive deeper into anything. And when building an AI agent, it is really
3:02 simple to get started. And I'll show you an example in code in just a bit here so
3:06 you can really see what I'm talking about. So when you're building the very
3:10 core of your AI agent, it's really just three steps. You need to pick a large
3:14 language model, write a basic system prompt as the agent instructions, and
3:18 then add your first tool because you need a tool otherwise it's really just a
3:22 regular large language model, not an agent. And so for picking a large
3:26 language model, I would highly recommend using a platform called Open Router
3:29 because it gives you access to pretty much any large language model you could
3:35 possibly want. And so Claude Haiku 4.5 is the general one that I use just as
3:38 I'm prototyping my AI agents, but you could use GPT 5 Mini. You could use an
3:43 open source model like DeepSeek, for example. Like all of them are available
3:47 on this platform. And then when creating your system prompt, you just want to
3:50 define your agents role and behavior. And you can refine this over time as
3:53 well. just starting really simple and then adding your first tool. Like you
3:57 can give it access to search the web. You can give it the ability to perform
4:01 mathematical computations with a calculator tool. Like literally whatever
4:04 it is, just start simple and then once you have this foundation, that's when
4:08 you can build on more capabilities and integrations. And I want to show you
4:11 more than just theory as well. Like let's actually go and build an AI agent
4:15 right now so you can see practically how dead simple it really is. And I'll have
4:19 a link to this repo in the description as well if you want to dive into this
4:23 extremely basic agent that's covering all of the components in this video,
4:27 even some things we'll talk about in a bit like observability. So you can get
4:30 this up and running yourself, even use this as a template for your first agent
4:35 if you want. And so I'm going to build it from scratch with you right now, like
4:38 show you line by line how simple this really is. It's going to be less than 50
4:42 lines in the end, just like I promised in the slide. And so first I'm going to
4:46 import all of the Python dependencies. I'm using Pantic AI since it's my
4:51 favorite AI agent framework, but it really doesn't matter the one that you
4:54 use. The principles that I'm covering in this video applies no matter how you're
4:57 building your agents, even if it's with a tool like N8N because what I'm
5:01 focusing on here is just defining our four core components. LLM, tools,
5:07 memory, and a system prompt. And so the first thing I'm going to do is define
5:11 the large language model that I want to leverage. And just like I talked about a
5:15 little bit ago, I'm using open router. So right now I'm going to use cloud
5:19 haiku 4.5 as my model. But literally just changing this line or just changing
5:24 my environment variable here. A single line change. I can swap to any model I
5:29 want like Gemini or DeepSeek or OpenAI. It's that easy. After I have my LLM
5:34 defined, now I define the agent itself including the system prompt, the
5:38 highlevel instructions. And so I'm importing this from a separate file.
5:42 I'll just show you a very very basic example of a system prompt here and then
5:45 more on this in a little bit. The core components that I generally include
5:50 including the persona goal tool instructions, the output format like how
5:54 it communicates back to us and then also any other miscellaneous instructions I
5:57 want to include. So I have this saved here. Now this is a part of my agent
6:01 that I've defined. And so the next thing that we need to add is a tool to really
6:06 turn it from an LLM or a chatbot into a full-fledged agent. And the way that you
6:11 do that with most AI Asian frameworks is you define a Python function like very
6:16 simply and then you add what is called a decorator. This signals to paid AI that
6:21 this function right here I want to take and attach to my agent as a capability
6:26 that it can now invoke. And so the agent defines these parameters when it calls
6:30 the tool. So like in this case this is a very basic tool to add two numbers
6:34 together because large language models as token prediction machines actually
6:38 suck at math. interesting fact. And so it defines these parameters and it
6:42 leverages this dock string as it's called like this comment is included as
6:47 a part of the prompt to the LLM because it defines when and how to leverage this
6:51 tool which in this case the functionality is very basic just adding
6:54 two numbers together. But this could be a tool to search the web based on a
6:59 query it defines create an event in our calendar based on a time range and title
7:02 that it defines right like all those things are parameters and then we
7:05 perform the functionality for the agent based on that. That is the tool that we
7:09 got for the agent. And that is really good. We've created our agent and added
7:13 the tools. The only thing we have to do now is set up a way to interact with it.
7:17 So I'm going to create a very basic command line interface here. We start
7:21 with empty conversation. This is where we'll add memory, which is the fourth
7:25 component of agents. And so in an infinite loop here, we're getting the
7:28 input from the user. Uh and we're exiting the program if they say exit.
7:32 Otherwise, we are going to call the agent. So it's very simply agent.run run
7:37 with the user's latest message and passing in for short-term memory the
7:41 conversation history so it knows what we said to each other up until this point
7:46 and then I'm going to add on to the conversation history everything that we
7:50 just said and then print out the agents latest response. Take a look at that.
7:53 And then even after we call our main function here, we are still below 50
7:59 lines of code. It is that easy to define our agents. And obviously there's so
8:03 many more things that we have to do to really get our agent to the point where
8:06 it's production ready. But again, I just want to focus on making it dead simple
8:10 for you right now. And I know that a lot of this might be review for you if you
8:14 built agents in the past. But especially if you have built a lot of AI agents
8:18 already, you're probably like me where a lot of times you just overcomplicate
8:22 things cuz you know how much can go into building agents. That's what I'm trying
8:25 to do is just draw you back to the fundamentals because you need to keep
8:29 things simple when you're first creating any agent really any software at all.
8:33 And so yeah, we can go into the terminal now and interact with our agent. So I'm
8:37 going to run agent.py here. Everything that we just built, I can say hello to
8:41 get a super simple response back here. And then I can say for example, what is
8:45 and I'll just do a couple of bigger numbers that I want to add together. And
8:49 so here it knows thanks to the tool description that it should use the add
8:54 numbers tool that we gave it to produce this sum. There we go. Take a look at
8:58 that. And I can even say did you use the tool, right? And it should say yes. Like
9:01 it actually recognizes based on the conversation history that it used the ad
9:05 numbers tool. Okay, perfect. So we got this agent with conversation history. It
9:09 knows when to use this tool. And now at this point we can start to expand the
9:13 tools that we give it. We can refine our system prompt, play around with
9:16 different LLMs. and I want to talk about that as well. Now, starting with large
9:20 language models, choosing your LLM, like I was saying when I was building the
9:24 agent, Claude Haiku 4.5 is the one that I recommend just a cheap and fast option
9:28 that's really good for building proof of concepts when I don't want to spend a
9:31 lot of money on tokens as I'm iterating on my agent initially. And then Claude
9:36 Sonnet 4.5 is generally the best all-around right now. This might change
9:40 in literally a week and people have different opinions. The main thing that
9:44 I want to communicate here is don't actually worry about picking the perfect
9:47 LLM up front, especially when you're using a platform like Open Router where
9:52 it makes it so easy to swap between LLMs. Even if you're not using Open
9:56 Router, it still is really easy. And then if you want a local model for
10:00 privacy reasons or you want to be 100% free running on your hardware, then
10:04 Mistl 3.1 Smaller Quen 3 are the ones that I recommend right now. And if you
10:08 haven't ever tried Open Router or a tool like it that really just routes you
10:12 between the different LLM providers, I would highly recommend trying one
10:15 because it makes it so easy to iterate on the LLM for your agent, giving you
10:19 instant access to take a look at this. We got Grock, Anthropic, Gemini, we've
10:25 got the GPT models, we've got uh Quen 3, all the open- source ones. No matter
10:28 what you want to experiment with, you've got it here. And so just use this as
10:32 your tool to iterate on the LM very quickly and just not have to think about
10:36 it that much. And then for the system prompt component, I promised I would
10:39 dive a little bit more into the different categories that I have. So
10:41 that's what I want to talk about very quickly. It can be especially easy to
10:46 overthink the system prompt because it's just such a broad problem to solve of
10:50 like what should the top level instruction set be for my agent? And so
10:54 I like to keep things simple by working off of a template that I use for all of
10:58 my AI agents at least as a starting point. I always have persona and goals,
11:02 tool instructions and examples, output format, and miscellaneous instructions.
11:07 And what you shouldn't worry about at this point is setting up elaborate
11:12 prompt evaluations or split testing your system prompts. You can get into that
11:15 when you really want to refine your agent instructions. But right now, just
11:20 keep it simple and refine at a high level as you are manually testing your
11:24 agent. And if you want to see that system prompt template in action, I've
11:27 got you covered. I'll have a link to this in the description as well. It's a
11:32 real example of me filling out those different sections, creating a system
11:36 prompt for a task management agent. So, I have my persona defined here. I'm
11:40 defining the goals for the task management agent. The tool instructions
11:44 like how I can use different tools together to manage tasks in my platform.
11:49 The output format, just specifying ways that I want it to communicate back to me
11:53 or things to avoid. Some examples. Now, this applies more to more complex agents
11:57 and system prompts where you actually want to kind of give an example of a
12:01 workflow of chaining different tools together, so it doesn't really apply
12:03 here. And then the last thing is just miscellaneous instructions. This is also
12:08 the place to go to add in extra instructions to fix those little issues
12:12 you see with your agent that doesn't necessarily fit into all the others. So,
12:15 a catchall to make sure that there's a place to put anything as you're
12:19 experimenting with your agent and refining your system prompt. And then as
12:23 far as tools go for your AI agents, there's just a few things I want to
12:26 cover quickly to help you keep things simple and focused. The first is that
12:31 you should keep your tools to under 10 for your AI agents, at least when
12:34 starting out. And you definitely want to make sure that each tool's purpose is
12:38 very distinct. Because if your tools have overlapping functionality or if you
12:42 have too many, then your large language model starts to get overwhelmed with all
12:46 the possibilities of its capabilities and it'll use the wrong tools. It will
12:51 forget to call tools. uh and it's just a mess. Like definitely keep it to under
12:56 10. And then also MCP servers are a great way to find preackaged sets of
13:00 tools you can bring into your an agent when you're, you know, creating
13:02 something initially and you just want to move very quickly. And so definitely
13:06 based on what you're building, you'll probably be able to find an MTP server
13:10 that gives you some functionality right out of the box for your agents. And then
13:14 the last thing I'll say is a lot of people ask me, "What capabilities should
13:18 I focus on learning first when I'm building agents?" and I want to give
13:24 them tools and rag is always the answer that I have for them. Giving your AI
13:28 agent tools that allows it to search your documents and knowledge base.
13:31 That's what retrieval augmented generation is. And so really, it's
13:35 giving your agents the ability to ground their responses in real data. And I
13:40 would say that probably over 80% of AI agents running out in the wild right
13:44 now, no matter the industry or niche, are using rag to some extent as part of
13:49 the capabilities for the agent. And then continuing with our theme here, what not
13:55 to focus on when building tools is don't worry about multi- aent systems or
13:59 complex tool orchestration through that yet. When you have a system that starts
14:03 to have more than 10 tools, that is generally when you start to split into
14:07 specialized sub aents and you have routing between them. Those kinds of
14:12 systems are powerful and necessary for a lot of applications, but definitely
14:15 overengineering when you're just getting started creating your agent or a system.
14:18 Also, if you want to learn more about rag and building that into your agents,
14:21 check out the video that I'll link to right here. I cover that all of the time
14:25 on my channel because it is so important. And so with that, moving on
14:29 to the next thing, we have our security essentials because it is important to
14:32 think about security when you're building any software upfront. But I
14:36 don't want you to over complicate it yet, right? Like don't become a security
14:40 expert overnight. There are existing tools out there to help us with
14:43 security. So we can still move quickly as we're building our agent initially.
14:47 We'll definitely want to pay more attention to security when we're going
14:50 into production. But at first there are a couple of tools that I want to call
14:54 out here. And then just some general principles to follow. Like for example,
14:58 don't hardcode your API keys, right? Like you don't want to have your OpenAI
15:02 or anthropic API key just sitting there right in your code or your end workflow
15:06 for example. You always want to store that in a secure way through things like
15:12 environment variables. And then also when we think about building AI agents
15:15 in particular, there's a lot of security that we want to implement through what
15:18 are called guard rails, right? So limiting what kind of information can
15:22 come into the large language model and then also limiting the kinds of
15:27 responses that the agent can give and having it like actually retry if it
15:31 produces any kind of response that isn't acceptable for us. And there's a super
15:36 popular open source repository that I lean on all the time to help with
15:39 guardrails and very creatively called guardrails AI. And so it's a Python
15:43 framework because I always love building my AI agents with Python that helps
15:48 build reliable AI applications by giving you both the input and output guard
15:51 rails that I'm talking about. So limiting what goes in and limiting what
15:56 the agent can produce. And they provide a lot of different options for
15:59 guardrails. Like for example, one thing that you want to avoid quite often is
16:04 inserting any kind of PII, personally identifiable information into a prompt
16:08 to an LLM, especially when it's going out to some model in the cloud like
16:12 anthropic or Gemini instead of a local LLM. So limiting that kind of thing,
16:16 maybe detecting any vulgar language that's outputed from an LLM because they
16:20 will do that sometimes. Like those are just some examples of input and output
16:24 guard rails. And it is very easy to install this as a Python package and
16:29 bring these guards right into your code as you are interacting with your agents
16:32 like we saw earlier when I had that, you know, simple command line tool to talk
16:35 to the agent. Like I could just add a guard before or after that call to the
16:39 agent. So yeah, guardrails don't have to be complicated. There are tools like
16:42 this, even completely open- source ones like guardrails AI that make it very
16:46 easy. Okay, so we've talked about guardrails and I gave you one example of
16:50 best practices for security in our codebase. But what about the other
16:54 million different vulnerabilities we have to account for in our codebase and
16:59 the dependencies we're bringing into our project? We can't expect ourselves to
17:03 become a security expert overnight. And so it's important to learn these things,
17:07 but also we can lean on existing tools to help us with this vulnerability
17:12 detection. There are a lot of options out there for this, but Sneak Studio is
17:16 one that I've been leaning on a lot recently. And they also have an MCP
17:21 server within the studio to help us handle vulnerability detection
17:25 automatically right within our coding process. So like always, I'm trying to
17:29 focus on open- source solutions for this video, but there's really no open-
17:33 source alternative to Sneak that I know about. This platform is incredible. So
17:38 in the Sneak Studio, we can set up these different projects and integrations. We
17:42 can have it analyze our codebase and dependencies for vulnerabilities in our
17:46 GitHub repositories. They have a CLI. We can do things locally. They have the MCP
17:49 server that I'm going to show you in a little bit. I'll link to all this in the
17:53 description. But yeah, the MCP server in particular is super cool to me because
17:57 we can have vulnerability detection built right into our AI coding
18:02 workflows. Now, so take a look at this. I have the Sneak MCP server connected
18:07 directly to my cloud code after I went through the Sneak authentication process
18:10 in the CLI. And you can connect this to literally any AI coding assistant or MCP
18:15 client. So now within cloud I could build this into a full AI coding
18:18 workflow which is very cool. I'm going to show you a simple demo right now.
18:23 I'll just say you know use the sneak MCP to analyze my code and dependencies
18:29 for vulnerabilities. And so it's able to leverage different tools within the MCP server to check for
18:37 both right like it's a very robust solution here. And so I'll let it go for
18:41 a little bit. I'll pause and come back once it has run the vulnerability
18:44 detection. Okay, this is so cool. Take a look at this. So, within my basic agent
18:49 repository, first it used the sneakc server to analyze for any
18:53 vulnerabilities with my dependencies, things like paidantic AI, for example.
18:59 And then it does a code scan. So, this would also detect things like if I had
19:03 my environment variables hardcoded like the example that I gave earlier. So, it
19:07 found three issues with my dependencies and nothing with my code, which I'm very
19:11 proud of. I got no issues with my code. And not only does it do the analysis,
19:15 but it gives me a summary and lists the actions I can take to remedy things.
19:18 Like here are the uh just medium severity vulnerabilities that I have
19:22 within a few of my dependencies. Nothing in my code. And then it gives me
19:26 recommendations to fix fix things. And so I can go and say yes action on this
19:30 now. And it's going to update my requirements.ext fix these things. And I
19:34 could even run the sneak MCP server again. And you can definitely see how
19:37 you'd build this kind of thing directly into the validation layer of your AI
19:42 coding workflow. Very, very neat for any AI agent or really any software you want
19:46 to build at all. Moving on, I want to talk about memory. Now, managing the
19:50 tokens that we're passing into the LLM calls for our agents. And this really is
19:54 a hot topic right now, especially with all the rate limiting that people are
19:58 getting with AI coding assistants like Claude Code. It really is important to
20:03 manage our context efficiently, only giving to our agents the information it
20:07 actually needs and not completely bloating our system prompts with
20:10 thousands of lines of instruction and tools that it doesn't actually need.
20:14 That's what you want to avoid. And so, just a couple of simple tips here going
20:18 along with our theme. The first one is to keep your prompts very concise. both
20:23 your system prompts and then also the tool descriptions that describe to your
20:27 agent when and how to use tools like I showed in the code earlier. You don't
20:31 need to over complicate it. That's why I have these templates for you like the
20:34 one for the system prompt, right? Like you have your goal just a couple of
20:38 sentences, your persona just a couple of sentences. Keep it very organized and
20:42 keeping it organized also helps you keep it quite concise. You don't need to
20:47 overthink it. And so keeping your system prompts to just a couple of hundred
20:51 lines at most is generally what I recommend. Some solutions might need
20:54 more, but that's when I'd start to question like could you really make that
20:58 more concise or split it into different specialized agents so each agent still
21:03 has a simple system prompt. Another thing you can do for agents that have
21:07 longer conversations is you can limit kind of in a sliding window to the 10 or
21:12 20 most recent messages, for example, that you actually include in the
21:15 context. And going back to the code, I'll even show you what that looks like
21:19 here. Like right now when we call our agent, we run it, we're passing in the
21:22 entire conversation history. But in Python, if I wanted to include just the
21:26 last 10 messages, I could do something like this. And so now maybe like, you
21:30 know, all previous messages aren't really as relevant anymore. We just want
21:34 to include the most recent 10. That's how we can do that. So that's another
21:37 really popular strategy. Also, tools like N8N have that as an option baked
21:41 directly into their short-term memory nodes. So very useful to know. And then
21:46 also when you start to have so much information about a single user that you
21:51 don't want to include it in the short-term memory, that's when you can
21:55 look at long-term memory. But also, don't build it from scratch. Again,
21:59 don't over complicate it. There are tools that you can use just like with
22:03 security to help us with long-term memory, and mem is one of those. Mem is
22:05 Mem0 for Long Term Agent Memory
22:09 a completely open-source long-term memory agentic framework. And so I'll
22:12 show the GitHub in a second here, but yeah, when you have so much information
22:16 about a user that you can't just include it all in context, you need some way to
22:21 search through a longer term set of memories and bring only the ones in that
22:24 are relevant to the current conversation, which actually does use
22:27 rag under the hood, by the way. So again, another example why it's such an
22:32 important capability. Um, but yeah, basically you're able to pull core
22:36 memories from conversations and store it to be searched later. That's what
22:40 Memzero offers us. And it's so easy to include in our Python code to just like
22:44 guardrails AI. I'll show you an example really quickly in their quick start. You
22:48 install it as a Python package and then you basically have a function to search
22:53 for memories like performing rag to find memories related to the latest message
22:57 and then you have a function to add memories. And so it'll use a large
23:01 language model to extract the key information to store to be retrieved
23:05 later. And so this definitely solves the context problem because now you're able
23:09 to basically have infinite memory for an agent, but you don't have to give it all
23:13 to the LLM at once. It just retrieves things as needed. And of course, the
23:16 last thing I want to hit on for context is what not to focus on when you're
23:21 first building your agent. Do not worry about advanced memory compression
23:24 techniques. There's a lot of cool things that Enthropic especially has been doing
23:27 research on, but like don't worry about that. Don't worry about specialized sub
23:31 agents. These are both solutions to handle the memory problem when it starts
23:36 to get really really technical. But right now, just start simple and you can
23:40 always optimize things as you're starting to expand your agent and go to
23:44 production and you hit some limits. But right now, focusing on these things up
23:49 front is all you need to go the first 90% probably even beyond depending on
23:53 Agent Observability (with Langfuse)
23:54 how simple your agents are. And context was the last of the four core components
23:57 of agents. So, we've covered the core four and security. Now, I want to talk a
24:02 bit about observability and deployment. Getting our agent ready for production.
24:06 And I will say that security, observability, and deployment definitely
24:10 go a lot more into the last 10% of building an agent. But I want to touch
24:13 on them here because there are some ways to design stuff up front very simply,
24:18 especially with observability. I want to introduce you to Langfuse right now. And
24:22 I covered this on my channel already. Link to a video right here on Langfuse
24:25 if you want to dive more into observability. But we can set up the
24:29 ability to watch for the actions that our agent is taking, view them in a
24:34 dashboard. We can do things like testing different prompts for our agents. It is
24:38 a beautiful platform and it's actually super easy to incorporate into our code.
24:43 And so I did this very sneakily already when I built the agent with you, but I
24:47 have this function here called setup observability. And all it does is it
24:53 initializes langfuse based on some environment variables that I have set
24:56 here. And I cover all that in my YouTube video on Langfuse if you're curious. But
24:59 you basically just connect to your Langfuse instance. And then after you
25:05 set up the connection and instrument your agent, your Pantic AI agent for
25:09 observability, that is all you have to do. Literally no more code in here for
25:13 Langfuse. And it's going to watch for all of our agent executions, even
25:17 getting a sense of the tool calls that it's making under the hood. So take a
25:20 look at this. So I'm in the Langfuse dashboard now where I can view that
25:25 execution that we had from our test earlier where it used the add numbers
25:29 function and we have all of this very rich data around the number of tokens
25:33 that it used the latency. We can view the tools and also look at the different
25:37 parameters that we have like the tool arguments like for the numbers to add.
25:41 We can view the system prompt that was leveraged here based on that template we
25:45 have defined. We have all this observability that also really helps for
25:49 monitoring our agents in production when other users are leveraging the agent. So
25:53 we can't just like look at our chat and see how the agent is performing. And
25:57 there's so many other things within langu as well that I don't want to get
26:01 into right now like eval for your agent. It is a totally open- source platform
26:06 just like me zero and guardrails AI. So again focusing on open source a lot in
26:10 this video. There are other solutions for this kind of observability like
26:14 Heliconee and Langmith for example, but Langfuse is the one that I love using.
26:18 And I know I didn't cover it too much in the code, but it really is as simple as
26:21 what I showed you. And so you can use the repository that I have linked below
26:25 as your template to like start an agent with observability baked right in if
26:28 Agent Deployment (with Docker)
26:29 you're interested. And then the very last component that I want to at least
26:33 touch on right now is how you can configure your agent upfront to work
26:37 well for deployment when you're ready to take your agent into production. Now,
26:41 obviously that's going to be part of the last 10%. Not something I'm going to
26:44 talk about a lot in this video, but the one big golden nugget that I want to
26:49 give you here is you should always think about how you can build your AI agent to
26:55 run as a Docker container. Docker is my method for packaging up any application
27:00 especially AI agents that I want to deploy to the cloud and also I will say
27:06 that AI coding assistants are very good at setting up docker configuration like
27:10 your docker files and docker compose u files. Yeah. So leverage those and then
27:14 you can add you know like a simple streamllet application with Python or
27:18 build a react front end to create a chat interface for your agent if it is a
27:22 conversationally driven agent or otherwise what I like to do for more you
27:26 know like background agents that run on a data set periodically I'll run it just
27:29 as a serverless function so it's kind of like background agent run it as
27:33 serverless in a docker container conversational agent you run it in a
27:37 docker container also with a front-end application that's pretty much like the
27:41 two tracks I have for any agent that I want to deploy. So yeah, just think like
27:45 Docker native. Have that in your mind from the get-go when you're building
27:48 your agent. What you don't want to focus on for observability and deployment and
27:52 everything production ready is Kubernetes orchestration, extensive LM
27:57 evals or prompt AB testing. Like some of the things we have in Langfuse that are
28:00 very powerful when you want to super refine your agent tools and system
28:03 prompt and everything like don't even worry about that yet. You can definitely
28:07 get there and like I said core part of the last 10%. But right now also don't
28:12 even think about like the infrastructure that much because unless you're running
28:15 local large language models, you don't really need heavy infrastructure for
28:19 your agents at all. Like obviously it depends on the amount of usage of your
28:23 agent. But for most use cases, just like a couple of vCPUs and a few gigabytes of
28:28 RAM is all you need to run an AI agent even if you have a front-end application
28:33 as well. very very lightweight as long as you are calling a third party for the
28:38 large language model like open router or you know anthropic or gemini whatever
3:06 The First 3 Steps of Building an Agent
3:06 you can really see what I'm talking about. So when you're building the very
3:10 core of your AI agent, it's really just three steps. You need to pick a large
3:14 language model, write a basic system prompt as the agent instructions, and
3:18 then add your first tool because you need a tool otherwise it's really just a
3:22 regular large language model, not an agent. And so for picking a large
3:26 language model, I would highly recommend using a platform called Open Router
3:29 because it gives you access to pretty much any large language model you could
3:35 possibly want. And so Claude Haiku 4.5 is the general one that I use just as
3:38 I'm prototyping my AI agents, but you could use GPT 5 Mini. You could use an
3:43 open source model like DeepSeek, for example. Like all of them are available
3:47 on this platform. And then when creating your system prompt, you just want to
3:50 define your agents role and behavior. And you can refine this over time as
3:53 well. just starting really simple and then adding your first tool. Like you
3:57 can give it access to search the web. You can give it the ability to perform
4:01 mathematical computations with a calculator tool. Like literally whatever
4:04 it is, just start simple and then once you have this foundation, that's when
4:08 you can build on more capabilities and integrations. And I want to show you
4:09 Building a Basic AI Agent Together Live
4:11 more than just theory as well. Like let's actually go and build an AI agent
4:15 right now so you can see practically how dead simple it really is. And I'll have
4:19 a link to this repo in the description as well if you want to dive into this
4:23 extremely basic agent that's covering all of the components in this video,
4:27 even some things we'll talk about in a bit like observability. So you can get
4:30 this up and running yourself, even use this as a template for your first agent
4:35 if you want. And so I'm going to build it from scratch with you right now, like
4:38 show you line by line how simple this really is. It's going to be less than 50
4:42 lines in the end, just like I promised in the slide. And so first I'm going to
4:46 import all of the Python dependencies. I'm using Pantic AI since it's my
4:51 favorite AI agent framework, but it really doesn't matter the one that you
4:54 use. The principles that I'm covering in this video applies no matter how you're
4:57 building your agents, even if it's with a tool like N8N because what I'm
5:01 focusing on here is just defining our four core components. LLM, tools,
5:07 memory, and a system prompt. And so the first thing I'm going to do is define
5:11 the large language model that I want to leverage. And just like I talked about a
5:15 little bit ago, I'm using open router. So right now I'm going to use cloud
5:19 haiku 4.5 as my model. But literally just changing this line or just changing
5:24 my environment variable here. A single line change. I can swap to any model I
5:29 want like Gemini or DeepSeek or OpenAI. It's that easy. After I have my LLM
5:34 defined, now I define the agent itself including the system prompt, the
5:38 highlevel instructions. And so I'm importing this from a separate file.
5:42 I'll just show you a very very basic example of a system prompt here and then
5:45 more on this in a little bit. The core components that I generally include
5:50 including the persona goal tool instructions, the output format like how
5:54 it communicates back to us and then also any other miscellaneous instructions I
5:57 want to include. So I have this saved here. Now this is a part of my agent
6:01 that I've defined. And so the next thing that we need to add is a tool to really
6:06 turn it from an LLM or a chatbot into a full-fledged agent. And the way that you
6:11 do that with most AI Asian frameworks is you define a Python function like very
6:16 simply and then you add what is called a decorator. This signals to paid AI that
6:21 this function right here I want to take and attach to my agent as a capability
6:26 that it can now invoke. And so the agent defines these parameters when it calls
6:30 the tool. So like in this case this is a very basic tool to add two numbers
6:34 together because large language models as token prediction machines actually
6:38 suck at math. interesting fact. And so it defines these parameters and it
6:42 leverages this dock string as it's called like this comment is included as
6:47 a part of the prompt to the LLM because it defines when and how to leverage this
6:51 tool which in this case the functionality is very basic just adding
6:54 two numbers together. But this could be a tool to search the web based on a
6:59 query it defines create an event in our calendar based on a time range and title
7:02 that it defines right like all those things are parameters and then we
7:05 perform the functionality for the agent based on that. That is the tool that we
7:09 got for the agent. And that is really good. We've created our agent and added
7:13 the tools. The only thing we have to do now is set up a way to interact with it.
7:17 So I'm going to create a very basic command line interface here. We start
7:21 with empty conversation. This is where we'll add memory, which is the fourth
7:25 component of agents. And so in an infinite loop here, we're getting the
7:28 input from the user. Uh and we're exiting the program if they say exit.
7:32 Otherwise, we are going to call the agent. So it's very simply agent.run run
7:37 with the user's latest message and passing in for short-term memory the
7:41 conversation history so it knows what we said to each other up until this point
7:46 and then I'm going to add on to the conversation history everything that we
7:50 just said and then print out the agents latest response. Take a look at that.
7:53 And then even after we call our main function here, we are still below 50
7:59 lines of code. It is that easy to define our agents. And obviously there's so
8:03 many more things that we have to do to really get our agent to the point where
8:06 it's production ready. But again, I just want to focus on making it dead simple
8:10 for you right now. And I know that a lot of this might be review for you if you
8:14 built agents in the past. But especially if you have built a lot of AI agents
8:18 already, you're probably like me where a lot of times you just overcomplicate
8:22 things cuz you know how much can go into building agents. That's what I'm trying
8:25 to do is just draw you back to the fundamentals because you need to keep
8:29 things simple when you're first creating any agent really any software at all.
8:33 And so yeah, we can go into the terminal now and interact with our agent. So I'm
8:37 going to run agent.py here. Everything that we just built, I can say hello to
8:41 get a super simple response back here. And then I can say for example, what is
8:45 and I'll just do a couple of bigger numbers that I want to add together. And
8:49 so here it knows thanks to the tool description that it should use the add
8:54 numbers tool that we gave it to produce this sum. There we go. Take a look at
8:58 that. And I can even say did you use the tool, right? And it should say yes. Like
9:01 it actually recognizes based on the conversation history that it used the ad
9:05 numbers tool. Okay, perfect. So we got this agent with conversation history. It
9:09 knows when to use this tool. And now at this point we can start to expand the
9:13 tools that we give it. We can refine our system prompt, play around with
9:16 different LLMs. and I want to talk about that as well. Now, starting with large
9:17 Choosing Your LLM
9:20 language models, choosing your LLM, like I was saying when I was building the
9:24 agent, Claude Haiku 4.5 is the one that I recommend just a cheap and fast option
9:28 that's really good for building proof of concepts when I don't want to spend a
9:31 lot of money on tokens as I'm iterating on my agent initially. And then Claude
9:36 Sonnet 4.5 is generally the best all-around right now. This might change
9:40 in literally a week and people have different opinions. The main thing that
9:44 I want to communicate here is don't actually worry about picking the perfect
9:47 LLM up front, especially when you're using a platform like Open Router where
9:52 it makes it so easy to swap between LLMs. Even if you're not using Open
9:56 Router, it still is really easy. And then if you want a local model for
10:00 privacy reasons or you want to be 100% free running on your hardware, then
10:04 Mistl 3.1 Smaller Quen 3 are the ones that I recommend right now. And if you
10:08 haven't ever tried Open Router or a tool like it that really just routes you
10:12 between the different LLM providers, I would highly recommend trying one
10:15 because it makes it so easy to iterate on the LLM for your agent, giving you
10:19 instant access to take a look at this. We got Grock, Anthropic, Gemini, we've
10:25 got the GPT models, we've got uh Quen 3, all the open- source ones. No matter
10:28 what you want to experiment with, you've got it here. And so just use this as
10:32 your tool to iterate on the LM very quickly and just not have to think about
10:36 it that much. And then for the system prompt component, I promised I would
10:39 dive a little bit more into the different categories that I have. So
10:41 that's what I want to talk about very quickly. It can be especially easy to
10:46 overthink the system prompt because it's just such a broad problem to solve of
10:50 like what should the top level instruction set be for my agent? And so
10:54 I like to keep things simple by working off of a template that I use for all of
10:58 my AI agents at least as a starting point. I always have persona and goals,
11:02 tool instructions and examples, output format, and miscellaneous instructions.
11:07 And what you shouldn't worry about at this point is setting up elaborate
11:12 prompt evaluations or split testing your system prompts. You can get into that
11:15 when you really want to refine your agent instructions. But right now, just
11:20 keep it simple and refine at a high level as you are manually testing your
11:24 agent. And if you want to see that system prompt template in action, I've
11:27 got you covered. I'll have a link to this in the description as well. It's a
11:32 real example of me filling out those different sections, creating a system
11:36 prompt for a task management agent. So, I have my persona defined here. I'm
11:40 defining the goals for the task management agent. The tool instructions
11:44 like how I can use different tools together to manage tasks in my platform.
11:49 The output format, just specifying ways that I want it to communicate back to me
11:53 or things to avoid. Some examples. Now, this applies more to more complex agents
11:57 and system prompts where you actually want to kind of give an example of a
12:01 workflow of chaining different tools together, so it doesn't really apply
12:03 here. And then the last thing is just miscellaneous instructions. This is also
12:08 the place to go to add in extra instructions to fix those little issues
12:12 you see with your agent that doesn't necessarily fit into all the others. So,
12:15 a catchall to make sure that there's a place to put anything as you're
12:19 experimenting with your agent and refining your system prompt. And then as
12:23 far as tools go for your AI agents, there's just a few things I want to
12:26 cover quickly to help you keep things simple and focused. The first is that
12:31 you should keep your tools to under 10 for your AI agents, at least when
12:34 starting out. And you definitely want to make sure that each tool's purpose is
12:38 very distinct. Because if your tools have overlapping functionality or if you
12:42 have too many, then your large language model starts to get overwhelmed with all
12:46 the possibilities of its capabilities and it'll use the wrong tools. It will
12:51 forget to call tools. uh and it's just a mess. Like definitely keep it to under
12:56 10. And then also MCP servers are a great way to find preackaged sets of
13:00 tools you can bring into your an agent when you're, you know, creating
13:02 something initially and you just want to move very quickly. And so definitely
13:06 based on what you're building, you'll probably be able to find an MTP server
13:10 that gives you some functionality right out of the box for your agents. And then
13:14 the last thing I'll say is a lot of people ask me, "What capabilities should
13:18 I focus on learning first when I'm building agents?" and I want to give
13:24 them tools and rag is always the answer that I have for them. Giving your AI
13:28 agent tools that allows it to search your documents and knowledge base.
13:31 That's what retrieval augmented generation is. And so really, it's
13:35 giving your agents the ability to ground their responses in real data. And I
13:40 would say that probably over 80% of AI agents running out in the wild right
13:44 now, no matter the industry or niche, are using rag to some extent as part of
13:49 the capabilities for the agent. And then continuing with our theme here, what not
13:55 to focus on when building tools is don't worry about multi- aent systems or
13:59 complex tool orchestration through that yet. When you have a system that starts
14:03 to have more than 10 tools, that is generally when you start to split into
14:07 specialized sub aents and you have routing between them. Those kinds of
14:12 systems are powerful and necessary for a lot of applications, but definitely
14:15 overengineering when you're just getting started creating your agent or a system.
14:18 Also, if you want to learn more about rag and building that into your agents,
14:21 check out the video that I'll link to right here. I cover that all of the time
14:25 on my channel because it is so important. And so with that, moving on
14:29 to the next thing, we have our security essentials because it is important to
14:32 think about security when you're building any software upfront. But I
14:36 don't want you to over complicate it yet, right? Like don't become a security
14:40 expert overnight. There are existing tools out there to help us with
14:43 security. So we can still move quickly as we're building our agent initially.
14:47 We'll definitely want to pay more attention to security when we're going
14:50 into production. But at first there are a couple of tools that I want to call
14:54 out here. And then just some general principles to follow. Like for example,
14:58 don't hardcode your API keys, right? Like you don't want to have your OpenAI
15:02 or anthropic API key just sitting there right in your code or your end workflow
15:06 for example. You always want to store that in a secure way through things like
15:12 environment variables. And then also when we think about building AI agents
15:15 in particular, there's a lot of security that we want to implement through what
15:18 are called guard rails, right? So limiting what kind of information can
15:22 come into the large language model and then also limiting the kinds of
15:27 responses that the agent can give and having it like actually retry if it
15:31 produces any kind of response that isn't acceptable for us. And there's a super
15:36 popular open source repository that I lean on all the time to help with
15:39 guardrails and very creatively called guardrails AI. And so it's a Python
15:43 framework because I always love building my AI agents with Python that helps
15:48 build reliable AI applications by giving you both the input and output guard
15:51 rails that I'm talking about. So limiting what goes in and limiting what
15:56 the agent can produce. And they provide a lot of different options for
15:59 guardrails. Like for example, one thing that you want to avoid quite often is
16:04 inserting any kind of PII, personally identifiable information into a prompt
16:08 to an LLM, especially when it's going out to some model in the cloud like
16:12 anthropic or Gemini instead of a local LLM. So limiting that kind of thing,
16:16 maybe detecting any vulgar language that's outputed from an LLM because they
16:20 will do that sometimes. Like those are just some examples of input and output
16:24 guard rails. And it is very easy to install this as a Python package and
16:29 bring these guards right into your code as you are interacting with your agents
16:32 like we saw earlier when I had that, you know, simple command line tool to talk
16:35 to the agent. Like I could just add a guard before or after that call to the
16:39 agent. So yeah, guardrails don't have to be complicated. There are tools like
16:42 this, even completely open- source ones like guardrails AI that make it very
16:46 easy. Okay, so we've talked about guardrails and I gave you one example of
16:50 best practices for security in our codebase. But what about the other
16:54 million different vulnerabilities we have to account for in our codebase and
16:59 the dependencies we're bringing into our project? We can't expect ourselves to
17:03 become a security expert overnight. And so it's important to learn these things,
17:07 but also we can lean on existing tools to help us with this vulnerability
17:12 detection. There are a lot of options out there for this, but Sneak Studio is
17:16 one that I've been leaning on a lot recently. And they also have an MCP
17:21 server within the studio to help us handle vulnerability detection
17:25 automatically right within our coding process. So like always, I'm trying to
17:29 focus on open- source solutions for this video, but there's really no open-
17:33 source alternative to Sneak that I know about. This platform is incredible. So
17:38 in the Sneak Studio, we can set up these different projects and integrations. We
17:42 can have it analyze our codebase and dependencies for vulnerabilities in our
17:46 GitHub repositories. They have a CLI. We can do things locally. They have the MCP
17:49 server that I'm going to show you in a little bit. I'll link to all this in the
17:53 description. But yeah, the MCP server in particular is super cool to me because
17:57 we can have vulnerability detection built right into our AI coding
18:02 workflows. Now, so take a look at this. I have the Sneak MCP server connected
18:07 directly to my cloud code after I went through the Sneak authentication process
18:10 in the CLI. And you can connect this to literally any AI coding assistant or MCP
18:15 client. So now within cloud I could build this into a full AI coding
18:18 workflow which is very cool. I'm going to show you a simple demo right now.
18:23 I'll just say you know use the sneak MCP to analyze my code and dependencies
18:29 for vulnerabilities. And so it's able to leverage different tools within the MCP server to check for
18:37 both right like it's a very robust solution here. And so I'll let it go for
18:41 a little bit. I'll pause and come back once it has run the vulnerability
18:44 detection. Okay, this is so cool. Take a look at this. So, within my basic agent
18:49 repository, first it used the sneakc server to analyze for any
18:53 vulnerabilities with my dependencies, things like paidantic AI, for example.
18:59 And then it does a code scan. So, this would also detect things like if I had
19:03 my environment variables hardcoded like the example that I gave earlier. So, it
19:07 found three issues with my dependencies and nothing with my code, which I'm very
19:11 proud of. I got no issues with my code. And not only does it do the analysis,
19:15 but it gives me a summary and lists the actions I can take to remedy things.
19:18 Like here are the uh just medium severity vulnerabilities that I have
19:22 within a few of my dependencies. Nothing in my code. And then it gives me
19:26 recommendations to fix fix things. And so I can go and say yes action on this
19:30 now. And it's going to update my requirements.ext fix these things. And I
19:34 could even run the sneak MCP server again. And you can definitely see how
19:37 you'd build this kind of thing directly into the validation layer of your AI
19:42 coding workflow. Very, very neat for any AI agent or really any software you want
19:46 to build at all. Moving on, I want to talk about memory. Now, managing the
19:50 tokens that we're passing into the LLM calls for our agents. And this really is
19:54 a hot topic right now, especially with all the rate limiting that people are
19:58 getting with AI coding assistants like Claude Code. It really is important to
20:03 manage our context efficiently, only giving to our agents the information it
20:07 actually needs and not completely bloating our system prompts with
20:10 thousands of lines of instruction and tools that it doesn't actually need.
20:14 That's what you want to avoid. And so, just a couple of simple tips here going
20:18 along with our theme. The first one is to keep your prompts very concise. both
20:23 your system prompts and then also the tool descriptions that describe to your
20:27 agent when and how to use tools like I showed in the code earlier. You don't
20:31 need to over complicate it. That's why I have these templates for you like the
20:34 one for the system prompt, right? Like you have your goal just a couple of
20:38 sentences, your persona just a couple of sentences. Keep it very organized and
20:42 keeping it organized also helps you keep it quite concise. You don't need to
20:47 overthink it. And so keeping your system prompts to just a couple of hundred
20:51 lines at most is generally what I recommend. Some solutions might need
20:54 more, but that's when I'd start to question like could you really make that
20:58 more concise or split it into different specialized agents so each agent still
21:03 has a simple system prompt. Another thing you can do for agents that have
21:07 longer conversations is you can limit kind of in a sliding window to the 10 or
21:12 20 most recent messages, for example, that you actually include in the
21:15 context. And going back to the code, I'll even show you what that looks like
21:19 here. Like right now when we call our agent, we run it, we're passing in the
21:22 entire conversation history. But in Python, if I wanted to include just the
21:26 last 10 messages, I could do something like this. And so now maybe like, you
21:30 know, all previous messages aren't really as relevant anymore. We just want
21:34 to include the most recent 10. That's how we can do that. So that's another
21:37 really popular strategy. Also, tools like N8N have that as an option baked
21:41 directly into their short-term memory nodes. So very useful to know. And then
21:46 also when you start to have so much information about a single user that you
21:51 don't want to include it in the short-term memory, that's when you can
21:55 look at long-term memory. But also, don't build it from scratch. Again,
21:59 don't over complicate it. There are tools that you can use just like with
22:03 security to help us with long-term memory, and mem is one of those. Mem is
22:09 a completely open-source long-term memory agentic framework. And so I'll
22:12 show the GitHub in a second here, but yeah, when you have so much information
22:16 about a user that you can't just include it all in context, you need some way to
22:21 search through a longer term set of memories and bring only the ones in that
22:24 are relevant to the current conversation, which actually does use
22:27 rag under the hood, by the way. So again, another example why it's such an
22:32 important capability. Um, but yeah, basically you're able to pull core
22:36 memories from conversations and store it to be searched later. That's what
22:40 Memzero offers us. And it's so easy to include in our Python code to just like
22:44 guardrails AI. I'll show you an example really quickly in their quick start. You
22:48 install it as a Python package and then you basically have a function to search
22:53 for memories like performing rag to find memories related to the latest message
22:57 and then you have a function to add memories. And so it'll use a large
23:01 language model to extract the key information to store to be retrieved
23:05 later. And so this definitely solves the context problem because now you're able
23:09 to basically have infinite memory for an agent, but you don't have to give it all
23:13 to the LLM at once. It just retrieves things as needed. And of course, the
23:16 last thing I want to hit on for context is what not to focus on when you're
23:21 first building your agent. Do not worry about advanced memory compression
23:24 techniques. There's a lot of cool things that Enthropic especially has been doing
23:27 research on, but like don't worry about that. Don't worry about specialized sub
23:31 agents. These are both solutions to handle the memory problem when it starts
23:36 to get really really technical. But right now, just start simple and you can
23:40 always optimize things as you're starting to expand your agent and go to
23:44 production and you hit some limits. But right now, focusing on these things up
23:49 front is all you need to go the first 90% probably even beyond depending on
23:54 how simple your agents are. And context was the last of the four core components
23:57 of agents. So, we've covered the core four and security. Now, I want to talk a
24:02 bit about observability and deployment. Getting our agent ready for production.
24:06 And I will say that security, observability, and deployment definitely
24:10 go a lot more into the last 10% of building an agent. But I want to touch
24:13 on them here because there are some ways to design stuff up front very simply,
24:18 especially with observability. I want to introduce you to Langfuse right now. And
24:22 I covered this on my channel already. Link to a video right here on Langfuse
24:25 if you want to dive more into observability. But we can set up the
24:29 ability to watch for the actions that our agent is taking, view them in a
24:34 dashboard. We can do things like testing different prompts for our agents. It is
24:38 a beautiful platform and it's actually super easy to incorporate into our code.
24:43 And so I did this very sneakily already when I built the agent with you, but I
24:47 have this function here called setup observability. And all it does is it
24:53 initializes langfuse based on some environment variables that I have set
24:56 here. And I cover all that in my YouTube video on Langfuse if you're curious. But
24:59 you basically just connect to your Langfuse instance. And then after you
25:05 set up the connection and instrument your agent, your Pantic AI agent for
25:09 observability, that is all you have to do. Literally no more code in here for
25:13 Langfuse. And it's going to watch for all of our agent executions, even
25:17 getting a sense of the tool calls that it's making under the hood. So take a
25:20 look at this. So I'm in the Langfuse dashboard now where I can view that
25:25 execution that we had from our test earlier where it used the add numbers
25:29 function and we have all of this very rich data around the number of tokens
25:33 that it used the latency. We can view the tools and also look at the different
25:37 parameters that we have like the tool arguments like for the numbers to add.
25:41 We can view the system prompt that was leveraged here based on that template we
25:45 have defined. We have all this observability that also really helps for
25:49 monitoring our agents in production when other users are leveraging the agent. So
25:53 we can't just like look at our chat and see how the agent is performing. And
25:57 there's so many other things within langu as well that I don't want to get
26:01 into right now like eval for your agent. It is a totally open- source platform
26:06 just like me zero and guardrails AI. So again focusing on open source a lot in
26:10 this video. There are other solutions for this kind of observability like
26:14 Heliconee and Langmith for example, but Langfuse is the one that I love using.
26:18 And I know I didn't cover it too much in the code, but it really is as simple as
26:21 what I showed you. And so you can use the repository that I have linked below
26:25 as your template to like start an agent with observability baked right in if
26:29 you're interested. And then the very last component that I want to at least
26:33 touch on right now is how you can configure your agent upfront to work
26:37 well for deployment when you're ready to take your agent into production. Now,
26:41 obviously that's going to be part of the last 10%. Not something I'm going to
26:44 talk about a lot in this video, but the one big golden nugget that I want to
26:49 give you here is you should always think about how you can build your AI agent to
26:55 run as a Docker container. Docker is my method for packaging up any application
27:00 especially AI agents that I want to deploy to the cloud and also I will say
27:06 that AI coding assistants are very good at setting up docker configuration like
27:10 your docker files and docker compose u files. Yeah. So leverage those and then
27:14 you can add you know like a simple streamllet application with Python or
27:18 build a react front end to create a chat interface for your agent if it is a
27:22 conversationally driven agent or otherwise what I like to do for more you
27:26 know like background agents that run on a data set periodically I'll run it just
27:29 as a serverless function so it's kind of like background agent run it as
27:33 serverless in a docker container conversational agent you run it in a
27:37 docker container also with a front-end application that's pretty much like the
27:41 two tracks I have for any agent that I want to deploy. So yeah, just think like
27:45 Docker native. Have that in your mind from the get-go when you're building
27:48 your agent. What you don't want to focus on for observability and deployment and
27:52 everything production ready is Kubernetes orchestration, extensive LM
27:57 evals or prompt AB testing. Like some of the things we have in Langfuse that are
28:00 very powerful when you want to super refine your agent tools and system
28:03 prompt and everything like don't even worry about that yet. You can definitely
28:07 get there and like I said core part of the last 10%. But right now also don't
28:12 even think about like the infrastructure that much because unless you're running
28:15 local large language models, you don't really need heavy infrastructure for
28:19 your agents at all. Like obviously it depends on the amount of usage of your
28:23 agent. But for most use cases, just like a couple of vCPUs and a few gigabytes of
28:28 RAM is all you need to run an AI agent even if you have a front-end application
28:33 as well. very very lightweight as long as you are calling a third party for the
28:38 large language model like open router or you know anthropic or gemini whatever
28:42 that might be. So there you go that's everything that I have for you today
28:46 helping you just keep it simple which will not just help you build better
28:49 agents even when you have to scale complexity but it'll also just help you
28:53 get over that hurdle of motivation because I'm giving you permission to not
28:57 be perfect at first. you just start with the foundations like I showed you and
29:01 then build on top and iterate as you need. And so I hope that inspires you to
29:05 just go and build your next AI agent right now because it can be super simple
29:10 to start. And so with that, if you appreciate this video and you're looking
29:13 forward to more things on building AI agents and using AI coding assistants,
29:17 I'd really appreciate a like and a subscribe. And with that, I will see you