// transcript — 656 segments
0:00 The Power and Simplicity of Skills
0:02 It hasn't been that long since Anthropic released skills to the world and it is
0:07 one of the most important advancements in AI recently. And honestly, one of the
0:12 biggest reasons it is so important is because of how beautifully simple it is
0:17 and that is the motto of anthropic. The simpler the better. We see this looking
0:21 at Claude code as well. I mean, when you get into how skills work and the idea of
0:25 progressive disclosure that we'll get into, you can't help but thinking to
0:29 yourself, why in the world was skills not commonplace ever since generative AI
0:34 was a thing? And people have been building their own version of skills
0:38 before anthropic popularized it. It's super easy to incorporate the idea of
0:43 skills and progressive disclosure into any AI agent or tool that you want. And
0:47 that, my friend, is what I'm going to show you how to do today. Because here's
0:50 the thing, and a lot of people don't realize this. We are not limited to the
0:55 clawed ecosystem to take advantage of skills. Anthropic does get a lot of
0:59 credit for popularizing the idea, and we'll get into some of their best
1:02 practices for creating skills as well. But this really is a universal concept.
1:07 It's all about strategizing how we can allow the agent to discover context and
1:12 capabilities as it needs it to be more flexible and context efficient as
1:16 opposed to something like an MCP server or super long global rules where you're
1:19 just dumping a bunch of context into the LLM right away and completely
1:24 overwhelming it. So, as much as I really appreciate the anthropic ecosystem like
1:29 Claude Desktop and Claude Code, we don't always want to be limited to these
1:33 platforms because a lot of the time you want to build skills into your own
1:36 workflows or AI agents. You want to use different large language models, maybe
1:40 even local AI. There's so many reasons to incorporate skills into our own
1:44 systems and really build it out for oursel. And that's what I'm going to
1:47 show you how to do. We're going to take all of the concepts from Enthropics
1:51 version of skills and we're going to map it into our own AI agent with the system
1:55 prompts and the tools we give it. And it's beautifully simple, right? Simple
2:00 but powerful. And so this is only going to take like 10 to 15 minutes. And then
2:04 you'll know after that exactly how to build this kind of thing into your own
2:07 systems. And I've got a template for you of course as well. All right. So there's
2:10 three things I want to cover with you in the next 15 minutes. It's going to be
2:14 super valuepacked. So, first we need to get into at a high level how skills work
2:18 and why they're so powerful, even if you've used them before. Going over this
2:22 is going to be really valuable. Then we'll get into the template that I have
2:26 for you. This is a GitHub repo that of course I will have linked in the
2:29 description. And so, this is a demonstration using Pyantic AI as my
2:33 agent framework, how we can build our own idea of skills into any framework
2:37 that we want. And so I'll go over how I'm building this with Pantic AI, but
2:41 the concept here is going to work no matter what tool you end up using like
2:45 Langchain, Crew AI, Eggno, no framework at all, literally anything that you
2:50 want. And then just as an opportunity here to show you how far we can take our
2:54 custom agents, I also want to get into evals and observability. So how can we
2:59 make sure our agent is really following all the instructions and capabilities
3:02 that we give it? And so in our case right here, that it's truly leveraging
3:06 the skills that we give. And so I'll get into that at the end just as a bonus on
3:10 top of everything showing you how to build skills for yourself. All right, so
3:14 now let's go over a really really quick master class on skills, what they are,
3:18 why they are so important. So Anthropic has this article that I'll link to in
3:22 the description, a really good guide, and they cover best practices for
3:26 building skills that we'll talk about in a little bit. And so the problem skills
3:32 are solving. We want to give our agent a lot of different capabilities to
3:35 supercharge them, but we don't want to overwhelm their context window. Agents
3:41 are very prone to being overwhelmed when we give them a lot of information
3:45 through our tools, conversation history, the system prompt, everything goes in
3:49 the window. And so other methods like MCP servers, the problem there is we're
3:53 giving a ton of tools up front to the agent even if it never needs to use them
3:58 in a specific conversation. That is bad. And so with skills, the best way to
4:02 explain it is to go to a diagram here in the article. And I also of course have
4:07 it blown up in another tab right here. So the beautifully simple power of
4:11 skills is the idea of progressive disclosure. Instead of giving all the
4:15 tools up front to our agent like MCP servers, we are allowing our agent to
4:20 discover the capabilities over time as it actually needs them. And so the only
4:24 thing we're giving to the agent right away in the system prompt or you can
4:28 think of it like the global rules is the description of the capability or the
4:32 skill. And so in this case as an example we have a PDF processing skill. So we're
4:37 just telling the agent, hey you have this capability if you need it. If the
4:40 user actually asks you to do something with PDFs and then if the agent receives
4:45 that kind of request and it wants to leverage the capability then it'll read
4:51 the skill.md. So the skill.md this is the main file that drives any skill that
4:55 you'll see from anthropic and all the ones that we'll go over with our custom
5:00 implementation here. And so this has the full instructions for the capability.
5:04 Now it's starting to load in the context. It's the second layer of
5:08 progressive disclosure, right? Like the description is layer 1. This is layer
5:12 two. And then there are some other documents oftentimes that the skill.md
5:16 will reference. This is the third layer of progressive disclosure because we can
5:20 load in even more context. If you want to get even more specific about
5:24 something with PDFs, for example, like we have this extra set of instructions
5:29 for if we need to fill out some form, some PDF form, right? Like not all the
5:33 time we're working with PDFs do we care about this, but sometimes we do. So,
5:37 we're just discovering more and more context over time as we actually need it
5:42 for the task at hand. And that saves the LLM from being overwhelmed because if it
5:46 had all these documents and a dozen other skills loaded all at once, it
5:49 would be tens of thousands of tokens just to tell it right away all the
5:52 different capabilities that it has and it probably only has to use one or two
5:56 of them in a single conversation. All right, so with that explanation, I want
6:00 to now get into the template and exactly how we are translating the ideas from
6:04 anthropic into our own agent implementation. And again, I'm using
6:08 paidantic AI because it is my favorite AI or agent framework. It has been for
6:13 over a year now. And so starting off, we have the YAML front matter description,
6:17 which by the way, as far as best practice goes, it's good to have this be
6:23 somewhere between 50 and 100 words. You don't want to load in too much right
6:26 away, right? That defeats the purpose of skill. So you want to be pretty short,
6:29 but at least descriptive enough so the agent knows when it should leverage a
6:33 capability. So usually the skill is, you know, something around 5% of the total
6:37 context of the skill. That's all you're loading up front. Obviously, this is a
6:42 very very rough estimate. And so when we think about our description here,
6:47 essentially every single skill that we want our agent to have access to, we
6:51 need it to have the description and the path to the skill in the system prompt.
6:56 And so we're going to be taking advantage of what is called a dynamic
6:59 system prompt. We have our static content, the main instructions for our
7:03 agent that doesn't really change. But then also, we're going to collect the
7:08 descriptions from all YAML front matters, all of our skill.mds, and we're
7:13 going to put that in our system prompt. And I'll even get into the code just a
7:16 little bit with you and show you how that works after we go through this
7:20 diagram here. And then we have our skill.md, the main instructions for the
7:25 capability. And as far as best practices go, usually you want this between 300
7:29 and 500 lines long. Now, obviously that varies a ton depending on the
7:33 complexity. It could be a lot shorter as well, but usually around 30% of the
7:37 total context for the skill if you have a lot of reference files. Sometimes you
7:41 don't even need this third layer, by the way, if the capability is simple enough.
7:45 But anyway, as far as how this translates to our own system, our own
7:50 agent, we just need a simple tool. Typically, this load skill tool is going
7:55 to take a path to the skill.md. And so, the agent can invoke that. It can give
7:58 the path which should be in the system prompt. And then we're just going to
8:03 take the skill.md and return that as the tool response. And so now that is
8:07 included in the context for the agent because every single time we call a
8:11 tool, whatever it returns is now in the short-term memory for the agent. It is
8:15 that simple. So remember, system prompt has the description and the path to the
8:20 skill. So it has all the context it needs to know when it should leverage
8:23 the skill and then what parameter to pass in as far as the path so it reads
8:28 that file. And then in the skill.md we might also have references to our
8:32 reference files the third layer progressive disclosure the scripts and
8:36 markdowns to read and leverage to take the capabilities even further. And so I
8:41 have a second tool in my system for this to read a reference document. And so you
8:45 just give as the parameters here the skill and then the path to that
8:50 secondary file that you want to leverage. And so this is where you have
8:54 unlimited depth. I mean you could have a fourth layer progressive disclosure if
8:58 you want but that probably gets way too complicated. So the rest of your skill
9:03 will just live in these files. And I could combine these tools together. They
9:06 work very similarly because it's mainly just to read a certain file. But in my
9:10 experience from the testing I've done, it's helpful to the agent to know this
9:14 distinguishment. Like this is the main instruction set for your capability. And
9:18 that's it. I told you it would be simple. And I'll also show you the code
9:22 for the Pantic AI agent in a little bit after a demo. All these different skills
9:27 that I have incorporated into the template that I have for you. There's
9:31 quite a bit. Like you could extend this to dozens and dozens and dozens and the
9:35 agent would still work well because there's just that little bit of context
9:39 taken up front for each of the capabilities. All right, so I have my
9:43 custom Pantic AI skills agent loaded up here in the terminal. This is my
9:47 playground to test out all the different skills that I have incorporated. And I
9:52 can just drop any skills that I want into a skills folder just like you can
9:57 with Claude Code. And so this is using the template that I'll link to in the
10:01 description. I've got instructions here explaining how the skills work and a
10:05 quick start. Very, very easy to get this up and running. Feel free to use this as
10:09 a resource. Give it to your AI coding assistant if you want to incorporate
10:13 this all for yourself. I also use the idea of a tool set in Pinantic AI. So,
10:17 you should be able to pull that out and add skills into your own agent very
10:21 easily. Yeah, go ahead and take a look at this. So, when I start the terminal,
10:26 it is reading all of the skill.md files in my skill folder. It is loading all
10:31 these skills and so I can use any one of them now. So for example, I can say help
10:36 me find a good dinner dish with chicken. So it's going to use the recipe finder
10:40 and I have all the logs here on purpose just so we have visibility into what's
10:43 going on. So you can see it really is using the tool because we're loading the
10:49 skill recipe finder. And then from that we have instructions to understand how
10:54 to leverage some kind of API that this capability gives us. And so now it's
10:59 making that request to help get me get some recipes that have chicken in it.
11:03 And take a look at that. I found lots of delicious chicken options for you. And
11:07 wow. Okay, now I'm hungry. All right. But anyway, now I can say like what is
11:12 the weather in uh Tokyo, Japan right now. And so we'll see here that it's
11:16 going to leverage another skill. So there we go. It's loading the weather
11:21 tool. And then it's going to make an API request. And the only reason it knows
11:24 how to do this is because of the instructions that we have here. And I
11:28 could even tell it to like uh load the ref document for the weather skill
11:34 because there's the third layer progressive disclosure if it needs more
11:38 information. I have this API reference. So it can make more complicated
11:41 requests. And so that's a little bit forced there, but I'm just trying to
11:44 show you an example of the third layer of progressive disclosure as well. So
11:48 really, really cool. And the important thing here is that based on our
11:52 conversation with the agent, we probably only need to leverage one or two of
11:56 these at a time. I chose different skills that are very different from each
12:01 other on purpose here to drive home the point that like most of the time we
12:04 don't need all of this. So if we had an MCP server for every single one of
12:07 these, we would just be overwhelming our LLM for no reason. Okay. So now I want
12:10 High Level Code Dive - How it Works
12:11 to get into the code a little bit with you to show how things are working under
12:16 the hood. And even if you're not super technical, I'm going to keep this high
12:19 level. I'm just going to show you how we're able to leverage this agent, how
12:23 you can use this for yourself. So, first of all, in the readme here, I've got
12:27 instructions to set everything up. And you can change this in your environment
12:30 variables. One of the things you can specify is the directory where it
12:34 searches to load all the skills dynamically into your agent. And so just
12:38 to keep things as similar to the anthropic implementation as possible, I
12:43 have the skills directory just called skills just like you have in cloud code
12:47 for example. And so there's a folder in here for every single one of the skills
12:51 that we have. This looks just like the anthropic skills where we have a
12:56 skill.md. This is our YAML front matter. This is the description that's loaded
12:59 into the agent up front. I'll show you how that works with the dynamic system
13:02 prompt. So the agent knows like, okay, if I want to use a weather API, let me
13:07 read this entire skill. MD file so I can use the API and I know how to do so. And
13:11 then the third layer of progressive disclosure is optional, but for a lot of
13:15 these, we have the reference folder for code review. We even have some Python
13:18 scripts that we can use. And so all of these reference documents are called out
13:22 in the skill.md so the agent knows it can go deeper to get those. And then for
13:26 some like the world clock, we literally just have a skill.md because this is
13:30 just really like for converting time zones. Obviously, we don't need that
13:34 much context for the agent to know how to do that. And so it's just a pretty
13:38 simple skill.md only a couple of hundred lines long. And so the way that this
13:43 works, I have my pideantic AI agent. I have a lot of other content on my
13:47 channel covering Pyant AI. So I won't go too much into the weeds here, but we
13:51 have our agent definition, which by the way, this agent you can use Open Router,
13:55 O Lama, or OpenAI, easy to extend it for others as well. So again, not stuck to
14:00 the clawed ecosystem. And what we're doing here is we are creating a dynamic
14:04 system prompt. And so when we first define the agent, we're not setting the
14:08 system prompt at all because we're going to do it right here. And so in pineanti
14:13 the way the way that you do this is you reference your agent dotsystem prompt.
14:18 This is our Python decorator. Now the function below this is where we get to
14:22 define the system prompt for our agent. So we can inject things at runtime
14:26 because what we're doing with this line of code right here is we're calling a
14:30 function that I'm not going to get into the weeds of that is going to search
14:34 this skills directory. It's going to find every single skill.md take the YAML
14:39 front matter. It's going to extract that from the skill.m MD and then put it into
14:44 the system prompt. And so we have all of the descriptions of the skills and their
14:49 paths as well as our primary system prompt. So we still have our base
14:53 instructions that don't change. In fact, a lot of my instructions here is just
14:56 telling the agent how to use skills. So it's very important. Large language
15:00 models by themselves do not understand how to leverage these capabilities. What
15:04 Claude did and what we have to do oursel is be very descriptive here. Like here's
15:08 what skills are. Here's the metadata so you know the ones that are available to
15:12 you. And then here is step by step how you leverage a skill when the
15:16 description is screaming out to you that you want to use that capability. So all
15:20 the prompting that I went over here is the first layer of progressive
15:24 disclosure. The dynamic system prompt is how we are letting the agent know about
15:28 everything upfront. And so then we get into our tool set. And so I'm giving a
15:34 single tool set to my Pantic AI agent that has everything it needs to work
15:38 with the skills to basically read everything we have in the skills
15:41 directory. And so going to that definition here, there are three tools
15:46 that we are giving. So we have the load skill and read reference. And then one
15:51 other tool that I have here just to make it easier for the agent is to list the
15:54 reference documents just in case the skill.md doesn't reference it directly
16:00 at that at least then our agent is still able to find it. So we're just trying to
16:05 make it as easy as possible to discover everything. That's the whole point of
16:08 skills is discovering the capabilities that it has access to. And so for
16:13 example, to load a skill, we just have to give the name of the skill. That's
16:17 one of the things that's included in the system prompt. And so what we do here is
16:22 we have our skill path that's set in our environment variable and then we have
16:25 the name and then we're looking for the skill.md there. And so we're loading all
16:29 of that and then we're returning the content of the skill.md. That's how
16:33 we're including it now in the context window for the agent going forward. And
16:36 it's very similar for when we're reading one of our reference documents as well.
16:41 The code is actually almost identical, but there are some differences there to
16:44 make sure that we're reading a reference document specific to the skill. some
16:48 protections that I have in place for the agent. And so, yeah, that's pretty much
16:52 the agent as a whole. Like, that's that's how it works. It's super simple
16:56 in the end. And also, because I have this as a Pantic AI tool set, you can
17:01 take this tool set, like you could copy this file and then just a couple of
17:05 these other ones here, and you could bring this into your own Pantic AI agent
17:08 in just a couple of minutes. And your AI coding assistant could help you do this
17:13 so incredibly fast. So, please use this as a resource for yourself. just give it
17:17 this repository and say like, "Hey, I have all these skills." You can put like
17:21 literally any skill that you want in this folder right here and then the next
17:23 time you interact with the agent, it'll have those capabilities automatically.
17:27 So, very dynamic system that I built for you. Really good starting point for any
17:31 kind of system you want to create. Now, one other really important thing to
17:33 Build Your Own Custom Skills
17:35 cover here is how you can build your own skills. And so, this guide that I have
17:39 linked in the description is a really good starting point. Another quick tip
17:44 that I have for you, super useful. If you go in Claw Desktop, you can use it
17:48 to help you build your skills that you can then bring into the skill directory
17:52 for your custom agent. So, you just go to file settings and then capabilities.
17:57 You scroll all the way down to skills, go to example skills, and you can toggle
18:02 on the skill creator. So, this is really meta, but this is a skill to help you
18:07 build more skills. So when Claude uses this, it's going to pull in all the
18:10 instructions and best practices for creating skills and guide you through
18:14 that process. You can go here and say like, "Hey, help me build a skill for
18:17 LinkedIn posting or help me build a skill to generate powerpoints or to
18:21 create standard operating procedures, like whatever you need." And it'll walk
18:24 you through creating that. And then it'll create a skill.md and then
18:28 potentially some of those reference documents and you can take that and just
18:31 put it in a new folder here in the skills directory. It is that easy to
18:35 build your own skills and the sky's is the limit for the capabilities that you
18:39 can create. All right, so at this point we now know the importance of skills and
18:41 Testing Agent Reliability with Evals
18:44 how to build them into any agent that we want. But the big question we have here
18:49 is reliability. When we take these capabilities and you could have dozens
18:53 of skills, you give them to your agent. How do you know that the agent is always
18:57 going to leverage them when you want it to? Like you might have a skill for
19:01 creating X posts, but you ask it to help you with content creation and it doesn't
19:04 pull that skill because it doesn't know that content creation should mean that
19:08 it should pull the X skill, right? Like you want to test for those things, but
19:11 when you have dozens of different skills, it's really annoying to interact
19:15 with the agent and send in a question and make sure it's using each one of
19:19 them properly every single time you make changes to your agent. So that is where
19:23 eval comes in. we can create an automated way to define questions and
19:27 the expected tool calls or in this case the expected skills that it leverages.
19:32 And so I'll cover what that looks like with you and then also get into
19:35 observability. So when our agent is running in production using these
19:39 different skills, we can look at how real users are interacting with our
19:42 agent and making sure the agent is responding appropriately. And I'm pretty
19:46 excited to get into this for the last part of the video here. I'm going to be
19:49 pretty brief, but this is something I don't get to cover enough on my channel.
19:53 Evals and observability are things that are super important, but I don't get to
19:58 make content on it on my channel very much. So, luckily for us, Pideantic AI
20:04 has a very robust evaluation framework built right in. And so, essentially what
20:10 we can do is create these YAML files where we define all of our test cases.
20:15 And so, for example, I'm going to send in the question, what's [snorts] the
20:18 weather in New York right now? And then I have an evaluator to make sure that
20:22 the weather skill was loaded. So we can also create our custom evaluators. I
20:27 don't want to get into the code too much right now. You can read up on this in
20:31 the documentation. Use this as an example for your AI coding assistant.
20:34 But I have this custom evaluator to make sure that the right skills are loaded
20:38 based on the questions that I send in. So now instead of me having to go into
20:41 the agent and ask it each of these things to make sure it's using the
20:45 different skills like the code review skill and the research assistant skill,
20:49 now I can just run this single script. So I can call this Python script right
20:54 here to run my evaluators. It loads in my golden data set as I call it. Goes
20:58 through the questions one at a time. So, I am paying for the LLM credits, but
21:03 these are really cheap, really fast, just a smoke test to make sure that all
21:08 the different skills that I've added to my folder are actually being used
21:12 properly by my agent. If it's not, then it means that maybe there's an issue in
21:15 my loading capability or just my system prompt needs to be better or the skill
21:19 descriptions need to be better. Like, there's definitely different things that
21:22 need to be adjusted if the agent isn't working as you expect it to. And so eval
21:27 are very important to run every single time you change the system prompt for
21:31 your agent or even just the skills that you're giving it access to. So I also
21:35 have instructions in the readme for how to run the evals. And you can feel free
21:39 to poke around in the code for this as well if you want to see how to set this
21:42 up for your own Pantic AI agents. But you want to do this for pretty much any
21:46 agent you're deploying to production. Evals are so important and skills is
21:49 just a good example because there's just so many different capabilities that we
21:53 want to test for here. So logs are pretty verbose. Uh but I'm just using
21:57 Haiku for all the tests here. So it's really nice and fast. But at the bottom
22:02 here, 25 out of 25 cases have passed. And so I've sent in a lot of different
22:06 requests to make sure the agent properly understands all of the skills that I've
22:10 given it. So it's good to do this instead of having to do a bunch of
22:14 manual testing myself after every single time I adjust my agent. So, I'll also
22:19 link to this page for padantic evals in the description if you want to dive into
22:22 this and really take your agent seriously before deploying them to
22:25 Production Observability with Logfire
22:26 production. And the other thing I want to talk about is observability with
22:30 logfire because eval are great when you want to test your agent locally, but how
22:33 about when users are actually using your agent in production, but you want to be
22:37 able to peer into the traces as they're called to see the decisions your agent
22:41 is making when people are using it out in the wild. And so, that's why we need
22:45 a tool like Logfire. So it's created by the Pantic team. They also made Pyantic
22:49 AI. So it's just a fantastic integration to have here. And it's really easy to
22:53 set it up. And so there's just a minimal amount of code that I have to have in my
22:59 agent definition file. So the logfire token is one of the environment
23:01 variables. I explained that in the readme. And then we can configure
23:05 logfire. So it's going to instrument all the podantic agents as in every time we
23:09 invoke a tool interact with the LLM. It's going to send all of that as
23:14 telemetry data. So we can track this running it locally like you're seeing
23:18 right here, but then also in production. And so yeah, I'm also going to have a
23:22 link to Logfire. I just wanted to mention this quick like this is so
23:25 important being able to see our usage like token usage and cost in production.
23:29 Looking into the different traces, like if a user reports a problem, we can go
23:32 in here and see like, okay, where did the agent mess up? Like is it something
23:36 wrong with their system? Did it just use a tool incorrectly like a bad parameter?
23:39 We can look at all the parameters, all the tool calls that it made. So we can
23:44 see the decisions even when we're not running the agent locally. It's very
23:47 very important to have evals and observability when you want to take your
23:52 agent seriously and paid with logfire and pantic AI just makes it so easy. So
3:13 Masterclass on Skills (+ Claude Guide)
3:14 now let's go over a really really quick master class on skills, what they are,
3:18 why they are so important. So Anthropic has this article that I'll link to in
3:22 the description, a really good guide, and they cover best practices for
3:26 building skills that we'll talk about in a little bit. And so the problem skills
3:32 are solving. We want to give our agent a lot of different capabilities to
3:35 supercharge them, but we don't want to overwhelm their context window. Agents
3:41 are very prone to being overwhelmed when we give them a lot of information
3:45 through our tools, conversation history, the system prompt, everything goes in
3:49 the window. And so other methods like MCP servers, the problem there is we're
3:53 giving a ton of tools up front to the agent even if it never needs to use them
3:58 in a specific conversation. That is bad. And so with skills, the best way to
4:02 explain it is to go to a diagram here in the article. And I also of course have
4:07 it blown up in another tab right here. So the beautifully simple power of
4:11 skills is the idea of progressive disclosure. Instead of giving all the
4:15 tools up front to our agent like MCP servers, we are allowing our agent to
4:20 discover the capabilities over time as it actually needs them. And so the only
4:24 thing we're giving to the agent right away in the system prompt or you can
4:28 think of it like the global rules is the description of the capability or the
4:32 skill. And so in this case as an example we have a PDF processing skill. So we're
4:37 just telling the agent, hey you have this capability if you need it. If the
4:40 user actually asks you to do something with PDFs and then if the agent receives
4:45 that kind of request and it wants to leverage the capability then it'll read
4:51 the skill.md. So the skill.md this is the main file that drives any skill that
4:55 you'll see from anthropic and all the ones that we'll go over with our custom
5:00 implementation here. And so this has the full instructions for the capability.
5:04 Now it's starting to load in the context. It's the second layer of
5:08 progressive disclosure, right? Like the description is layer 1. This is layer
5:12 two. And then there are some other documents oftentimes that the skill.md
5:16 will reference. This is the third layer of progressive disclosure because we can
5:20 load in even more context. If you want to get even more specific about
5:24 something with PDFs, for example, like we have this extra set of instructions
5:29 for if we need to fill out some form, some PDF form, right? Like not all the
5:33 time we're working with PDFs do we care about this, but sometimes we do. So,
5:37 we're just discovering more and more context over time as we actually need it
5:42 for the task at hand. And that saves the LLM from being overwhelmed because if it
5:46 had all these documents and a dozen other skills loaded all at once, it
5:49 would be tens of thousands of tokens just to tell it right away all the
5:52 different capabilities that it has and it probably only has to use one or two
5:54 The Strategy - Build Skills into ANY AI Agent
5:56 of them in a single conversation. All right, so with that explanation, I want
6:00 to now get into the template and exactly how we are translating the ideas from
6:04 anthropic into our own agent implementation. And again, I'm using
6:08 paidantic AI because it is my favorite AI or agent framework. It has been for
6:13 over a year now. And so starting off, we have the YAML front matter description,
6:17 which by the way, as far as best practice goes, it's good to have this be
6:23 somewhere between 50 and 100 words. You don't want to load in too much right
6:26 away, right? That defeats the purpose of skill. So you want to be pretty short,
6:29 but at least descriptive enough so the agent knows when it should leverage a
6:33 capability. So usually the skill is, you know, something around 5% of the total
6:37 context of the skill. That's all you're loading up front. Obviously, this is a
6:42 very very rough estimate. And so when we think about our description here,
6:47 essentially every single skill that we want our agent to have access to, we
6:51 need it to have the description and the path to the skill in the system prompt.
6:56 And so we're going to be taking advantage of what is called a dynamic
6:59 system prompt. We have our static content, the main instructions for our
7:03 agent that doesn't really change. But then also, we're going to collect the
7:08 descriptions from all YAML front matters, all of our skill.mds, and we're
7:13 going to put that in our system prompt. And I'll even get into the code just a
7:16 little bit with you and show you how that works after we go through this
7:20 diagram here. And then we have our skill.md, the main instructions for the
7:25 capability. And as far as best practices go, usually you want this between 300
7:29 and 500 lines long. Now, obviously that varies a ton depending on the
7:33 complexity. It could be a lot shorter as well, but usually around 30% of the
7:37 total context for the skill if you have a lot of reference files. Sometimes you
7:41 don't even need this third layer, by the way, if the capability is simple enough.
7:45 But anyway, as far as how this translates to our own system, our own
7:50 agent, we just need a simple tool. Typically, this load skill tool is going
7:55 to take a path to the skill.md. And so, the agent can invoke that. It can give
7:58 the path which should be in the system prompt. And then we're just going to
8:03 take the skill.md and return that as the tool response. And so now that is
8:07 included in the context for the agent because every single time we call a
8:11 tool, whatever it returns is now in the short-term memory for the agent. It is
8:15 that simple. So remember, system prompt has the description and the path to the
8:20 skill. So it has all the context it needs to know when it should leverage
8:23 the skill and then what parameter to pass in as far as the path so it reads
8:28 that file. And then in the skill.md we might also have references to our
8:32 reference files the third layer progressive disclosure the scripts and
8:36 markdowns to read and leverage to take the capabilities even further. And so I
8:41 have a second tool in my system for this to read a reference document. And so you
8:45 just give as the parameters here the skill and then the path to that
8:50 secondary file that you want to leverage. And so this is where you have
8:54 unlimited depth. I mean you could have a fourth layer progressive disclosure if
8:58 you want but that probably gets way too complicated. So the rest of your skill
9:03 will just live in these files. And I could combine these tools together. They
9:06 work very similarly because it's mainly just to read a certain file. But in my
9:10 experience from the testing I've done, it's helpful to the agent to know this
9:14 distinguishment. Like this is the main instruction set for your capability. And
9:18 that's it. I told you it would be simple. And I'll also show you the code
9:22 for the Pantic AI agent in a little bit after a demo. All these different skills
9:27 that I have incorporated into the template that I have for you. There's
9:31 quite a bit. Like you could extend this to dozens and dozens and dozens and the
9:35 agent would still work well because there's just that little bit of context
9:38 Live Demo of my Pydantic AI Skills Agent
9:39 taken up front for each of the capabilities. All right, so I have my
9:43 custom Pantic AI skills agent loaded up here in the terminal. This is my
9:47 playground to test out all the different skills that I have incorporated. And I
9:52 can just drop any skills that I want into a skills folder just like you can
9:57 with Claude Code. And so this is using the template that I'll link to in the
10:01 description. I've got instructions here explaining how the skills work and a
10:05 quick start. Very, very easy to get this up and running. Feel free to use this as
10:09 a resource. Give it to your AI coding assistant if you want to incorporate
10:13 this all for yourself. I also use the idea of a tool set in Pinantic AI. So,
10:17 you should be able to pull that out and add skills into your own agent very
10:21 easily. Yeah, go ahead and take a look at this. So, when I start the terminal,
10:26 it is reading all of the skill.md files in my skill folder. It is loading all
10:31 these skills and so I can use any one of them now. So for example, I can say help
10:36 me find a good dinner dish with chicken. So it's going to use the recipe finder
10:40 and I have all the logs here on purpose just so we have visibility into what's
10:43 going on. So you can see it really is using the tool because we're loading the
10:49 skill recipe finder. And then from that we have instructions to understand how
10:54 to leverage some kind of API that this capability gives us. And so now it's
10:59 making that request to help get me get some recipes that have chicken in it.
11:03 And take a look at that. I found lots of delicious chicken options for you. And
11:07 wow. Okay, now I'm hungry. All right. But anyway, now I can say like what is
11:12 the weather in uh Tokyo, Japan right now. And so we'll see here that it's
11:16 going to leverage another skill. So there we go. It's loading the weather
11:21 tool. And then it's going to make an API request. And the only reason it knows
11:24 how to do this is because of the instructions that we have here. And I
11:28 could even tell it to like uh load the ref document for the weather skill
11:34 because there's the third layer progressive disclosure if it needs more
11:38 information. I have this API reference. So it can make more complicated
11:41 requests. And so that's a little bit forced there, but I'm just trying to
11:44 show you an example of the third layer of progressive disclosure as well. So
11:48 really, really cool. And the important thing here is that based on our
11:52 conversation with the agent, we probably only need to leverage one or two of
11:56 these at a time. I chose different skills that are very different from each
12:01 other on purpose here to drive home the point that like most of the time we
12:04 don't need all of this. So if we had an MCP server for every single one of
12:07 these, we would just be overwhelming our LLM for no reason. Okay. So now I want
12:11 to get into the code a little bit with you to show how things are working under
12:16 the hood. And even if you're not super technical, I'm going to keep this high
12:19 level. I'm just going to show you how we're able to leverage this agent, how
12:23 you can use this for yourself. So, first of all, in the readme here, I've got
12:27 instructions to set everything up. And you can change this in your environment
12:30 variables. One of the things you can specify is the directory where it
12:34 searches to load all the skills dynamically into your agent. And so just
12:38 to keep things as similar to the anthropic implementation as possible, I
12:43 have the skills directory just called skills just like you have in cloud code
12:47 for example. And so there's a folder in here for every single one of the skills
12:51 that we have. This looks just like the anthropic skills where we have a
12:56 skill.md. This is our YAML front matter. This is the description that's loaded
12:59 into the agent up front. I'll show you how that works with the dynamic system
13:02 prompt. So the agent knows like, okay, if I want to use a weather API, let me
13:07 read this entire skill. MD file so I can use the API and I know how to do so. And
13:11 then the third layer of progressive disclosure is optional, but for a lot of
13:15 these, we have the reference folder for code review. We even have some Python
13:18 scripts that we can use. And so all of these reference documents are called out
13:22 in the skill.md so the agent knows it can go deeper to get those. And then for
13:26 some like the world clock, we literally just have a skill.md because this is
13:30 just really like for converting time zones. Obviously, we don't need that
13:34 much context for the agent to know how to do that. And so it's just a pretty
13:38 simple skill.md only a couple of hundred lines long. And so the way that this
13:43 works, I have my pideantic AI agent. I have a lot of other content on my
13:47 channel covering Pyant AI. So I won't go too much into the weeds here, but we
13:51 have our agent definition, which by the way, this agent you can use Open Router,
13:55 O Lama, or OpenAI, easy to extend it for others as well. So again, not stuck to
14:00 the clawed ecosystem. And what we're doing here is we are creating a dynamic
14:04 system prompt. And so when we first define the agent, we're not setting the
14:08 system prompt at all because we're going to do it right here. And so in pineanti
14:13 the way the way that you do this is you reference your agent dotsystem prompt.
14:18 This is our Python decorator. Now the function below this is where we get to
14:22 define the system prompt for our agent. So we can inject things at runtime
14:26 because what we're doing with this line of code right here is we're calling a
14:30 function that I'm not going to get into the weeds of that is going to search
14:34 this skills directory. It's going to find every single skill.md take the YAML
14:39 front matter. It's going to extract that from the skill.m MD and then put it into
14:44 the system prompt. And so we have all of the descriptions of the skills and their
14:49 paths as well as our primary system prompt. So we still have our base
14:53 instructions that don't change. In fact, a lot of my instructions here is just
14:56 telling the agent how to use skills. So it's very important. Large language
15:00 models by themselves do not understand how to leverage these capabilities. What
15:04 Claude did and what we have to do oursel is be very descriptive here. Like here's
15:08 what skills are. Here's the metadata so you know the ones that are available to
15:12 you. And then here is step by step how you leverage a skill when the
15:16 description is screaming out to you that you want to use that capability. So all
15:20 the prompting that I went over here is the first layer of progressive
15:24 disclosure. The dynamic system prompt is how we are letting the agent know about
15:28 everything upfront. And so then we get into our tool set. And so I'm giving a
15:34 single tool set to my Pantic AI agent that has everything it needs to work
15:38 with the skills to basically read everything we have in the skills
15:41 directory. And so going to that definition here, there are three tools
15:46 that we are giving. So we have the load skill and read reference. And then one
15:51 other tool that I have here just to make it easier for the agent is to list the
15:54 reference documents just in case the skill.md doesn't reference it directly
16:00 at that at least then our agent is still able to find it. So we're just trying to
16:05 make it as easy as possible to discover everything. That's the whole point of
16:08 skills is discovering the capabilities that it has access to. And so for
16:13 example, to load a skill, we just have to give the name of the skill. That's
16:17 one of the things that's included in the system prompt. And so what we do here is
16:22 we have our skill path that's set in our environment variable and then we have
16:25 the name and then we're looking for the skill.md there. And so we're loading all
16:29 of that and then we're returning the content of the skill.md. That's how
16:33 we're including it now in the context window for the agent going forward. And
16:36 it's very similar for when we're reading one of our reference documents as well.
16:41 The code is actually almost identical, but there are some differences there to
16:44 make sure that we're reading a reference document specific to the skill. some
16:48 protections that I have in place for the agent. And so, yeah, that's pretty much
16:52 the agent as a whole. Like, that's that's how it works. It's super simple
16:56 in the end. And also, because I have this as a Pantic AI tool set, you can
17:01 take this tool set, like you could copy this file and then just a couple of
17:05 these other ones here, and you could bring this into your own Pantic AI agent
17:08 in just a couple of minutes. And your AI coding assistant could help you do this
17:13 so incredibly fast. So, please use this as a resource for yourself. just give it
17:17 this repository and say like, "Hey, I have all these skills." You can put like
17:21 literally any skill that you want in this folder right here and then the next
17:23 time you interact with the agent, it'll have those capabilities automatically.
17:27 So, very dynamic system that I built for you. Really good starting point for any
17:31 kind of system you want to create. Now, one other really important thing to
17:35 cover here is how you can build your own skills. And so, this guide that I have
17:39 linked in the description is a really good starting point. Another quick tip
17:44 that I have for you, super useful. If you go in Claw Desktop, you can use it
17:48 to help you build your skills that you can then bring into the skill directory
17:52 for your custom agent. So, you just go to file settings and then capabilities.
17:57 You scroll all the way down to skills, go to example skills, and you can toggle
18:02 on the skill creator. So, this is really meta, but this is a skill to help you
18:07 build more skills. So when Claude uses this, it's going to pull in all the
18:10 instructions and best practices for creating skills and guide you through
18:14 that process. You can go here and say like, "Hey, help me build a skill for
18:17 LinkedIn posting or help me build a skill to generate powerpoints or to
18:21 create standard operating procedures, like whatever you need." And it'll walk
18:24 you through creating that. And then it'll create a skill.md and then
18:28 potentially some of those reference documents and you can take that and just
18:31 put it in a new folder here in the skills directory. It is that easy to
18:35 build your own skills and the sky's is the limit for the capabilities that you
18:39 can create. All right, so at this point we now know the importance of skills and
18:44 how to build them into any agent that we want. But the big question we have here
18:49 is reliability. When we take these capabilities and you could have dozens
18:53 of skills, you give them to your agent. How do you know that the agent is always
18:57 going to leverage them when you want it to? Like you might have a skill for
19:01 creating X posts, but you ask it to help you with content creation and it doesn't
19:04 pull that skill because it doesn't know that content creation should mean that
19:08 it should pull the X skill, right? Like you want to test for those things, but
19:11 when you have dozens of different skills, it's really annoying to interact
19:15 with the agent and send in a question and make sure it's using each one of
19:19 them properly every single time you make changes to your agent. So that is where
19:23 eval comes in. we can create an automated way to define questions and
19:27 the expected tool calls or in this case the expected skills that it leverages.
19:32 And so I'll cover what that looks like with you and then also get into
19:35 observability. So when our agent is running in production using these
19:39 different skills, we can look at how real users are interacting with our
19:42 agent and making sure the agent is responding appropriately. And I'm pretty
19:46 excited to get into this for the last part of the video here. I'm going to be
19:49 pretty brief, but this is something I don't get to cover enough on my channel.
19:53 Evals and observability are things that are super important, but I don't get to
19:58 make content on it on my channel very much. So, luckily for us, Pideantic AI
20:04 has a very robust evaluation framework built right in. And so, essentially what
20:10 we can do is create these YAML files where we define all of our test cases.
20:15 And so, for example, I'm going to send in the question, what's [snorts] the
20:18 weather in New York right now? And then I have an evaluator to make sure that
20:22 the weather skill was loaded. So we can also create our custom evaluators. I
20:27 don't want to get into the code too much right now. You can read up on this in
20:31 the documentation. Use this as an example for your AI coding assistant.
20:34 But I have this custom evaluator to make sure that the right skills are loaded
20:38 based on the questions that I send in. So now instead of me having to go into
20:41 the agent and ask it each of these things to make sure it's using the
20:45 different skills like the code review skill and the research assistant skill,
20:49 now I can just run this single script. So I can call this Python script right
20:54 here to run my evaluators. It loads in my golden data set as I call it. Goes
20:58 through the questions one at a time. So, I am paying for the LLM credits, but
21:03 these are really cheap, really fast, just a smoke test to make sure that all
21:08 the different skills that I've added to my folder are actually being used
21:12 properly by my agent. If it's not, then it means that maybe there's an issue in
21:15 my loading capability or just my system prompt needs to be better or the skill
21:19 descriptions need to be better. Like, there's definitely different things that
21:22 need to be adjusted if the agent isn't working as you expect it to. And so eval
21:27 are very important to run every single time you change the system prompt for
21:31 your agent or even just the skills that you're giving it access to. So I also
21:35 have instructions in the readme for how to run the evals. And you can feel free
21:39 to poke around in the code for this as well if you want to see how to set this
21:42 up for your own Pantic AI agents. But you want to do this for pretty much any
21:46 agent you're deploying to production. Evals are so important and skills is
21:49 just a good example because there's just so many different capabilities that we
21:53 want to test for here. So logs are pretty verbose. Uh but I'm just using
21:57 Haiku for all the tests here. So it's really nice and fast. But at the bottom
22:02 here, 25 out of 25 cases have passed. And so I've sent in a lot of different
22:06 requests to make sure the agent properly understands all of the skills that I've
22:10 given it. So it's good to do this instead of having to do a bunch of
22:14 manual testing myself after every single time I adjust my agent. So, I'll also
22:19 link to this page for padantic evals in the description if you want to dive into
22:22 this and really take your agent seriously before deploying them to
22:26 production. And the other thing I want to talk about is observability with
22:30 logfire because eval are great when you want to test your agent locally, but how
22:33 about when users are actually using your agent in production, but you want to be
22:37 able to peer into the traces as they're called to see the decisions your agent
22:41 is making when people are using it out in the wild. And so, that's why we need
22:45 a tool like Logfire. So it's created by the Pantic team. They also made Pyantic
22:49 AI. So it's just a fantastic integration to have here. And it's really easy to
22:53 set it up. And so there's just a minimal amount of code that I have to have in my
22:59 agent definition file. So the logfire token is one of the environment
23:01 variables. I explained that in the readme. And then we can configure
23:05 logfire. So it's going to instrument all the podantic agents as in every time we
23:09 invoke a tool interact with the LLM. It's going to send all of that as
23:14 telemetry data. So we can track this running it locally like you're seeing
23:18 right here, but then also in production. And so yeah, I'm also going to have a
23:22 link to Logfire. I just wanted to mention this quick like this is so
23:25 important being able to see our usage like token usage and cost in production.
23:29 Looking into the different traces, like if a user reports a problem, we can go
23:32 in here and see like, okay, where did the agent mess up? Like is it something
23:36 wrong with their system? Did it just use a tool incorrectly like a bad parameter?
23:39 We can look at all the parameters, all the tool calls that it made. So we can
23:44 see the decisions even when we're not running the agent locally. It's very
23:47 very important to have evals and observability when you want to take your
23:52 agent seriously and paid with logfire and pantic AI just makes it so easy. So
23:57 there you go. That is your guide for building skills into any AI agent that
24:02 you want. And you even got a bit of a bonus with evals and observability
24:05 because it is important to make sure we're constantly checking our agent
24:09 making sure that it's leveraging our tools properly. Because with skills, we
24:14 can give our agents dozens and dozens of capabilities like I've shown you here.
24:18 So, if you appreciate this video and you're looking forward to more things on
24:22 building AI agents, I would really appreciate a like and a subscribe. And