you2idea@video:~$ watch -iTNOaCmLcw [24:27]
// transcript — 656 segments
0:02 It hasn't been that long since Anthropic released skills to the world and it is
0:07 one of the most important advancements in AI recently. And honestly, one of the
0:12 biggest reasons it is so important is because of how beautifully simple it is
0:17 and that is the motto of anthropic. The simpler the better. We see this looking
0:21 at Claude code as well. I mean, when you get into how skills work and the idea of
0:25 progressive disclosure that we'll get into, you can't help but thinking to
0:29 yourself, why in the world was skills not commonplace ever since generative AI
0:34 was a thing? And people have been building their own version of skills
0:38 before anthropic popularized it. It's super easy to incorporate the idea of
0:43 skills and progressive disclosure into any AI agent or tool that you want. And
0:47 that, my friend, is what I'm going to show you how to do today. Because here's
0:50 the thing, and a lot of people don't realize this. We are not limited to the
0:55 clawed ecosystem to take advantage of skills. Anthropic does get a lot of
0:59 credit for popularizing the idea, and we'll get into some of their best
1:02 practices for creating skills as well. But this really is a universal concept.
1:07 It's all about strategizing how we can allow the agent to discover context and
1:12 capabilities as it needs it to be more flexible and context efficient as
1:16 opposed to something like an MCP server or super long global rules where you're
1:19 just dumping a bunch of context into the LLM right away and completely
1:24 overwhelming it. So, as much as I really appreciate the anthropic ecosystem like
1:29 Claude Desktop and Claude Code, we don't always want to be limited to these
1:33 platforms because a lot of the time you want to build skills into your own
1:36 workflows or AI agents. You want to use different large language models, maybe
1:40 even local AI. There's so many reasons to incorporate skills into our own
1:44 systems and really build it out for oursel. And that's what I'm going to
1:47 show you how to do. We're going to take all of the concepts from Enthropics
1:51 version of skills and we're going to map it into our own AI agent with the system
1:55 prompts and the tools we give it. And it's beautifully simple, right? Simple
2:00 but powerful. And so this is only going to take like 10 to 15 minutes. And then
2:04 you'll know after that exactly how to build this kind of thing into your own
2:07 systems. And I've got a template for you of course as well. All right. So there's
2:10 three things I want to cover with you in the next 15 minutes. It's going to be
2:14 super valuepacked. So, first we need to get into at a high level how skills work
2:18 and why they're so powerful, even if you've used them before. Going over this
2:22 is going to be really valuable. Then we'll get into the template that I have
2:26 for you. This is a GitHub repo that of course I will have linked in the
2:29 description. And so, this is a demonstration using Pyantic AI as my
2:33 agent framework, how we can build our own idea of skills into any framework
2:37 that we want. And so I'll go over how I'm building this with Pantic AI, but
2:41 the concept here is going to work no matter what tool you end up using like
2:45 Langchain, Crew AI, Eggno, no framework at all, literally anything that you
2:50 want. And then just as an opportunity here to show you how far we can take our
2:54 custom agents, I also want to get into evals and observability. So how can we
2:59 make sure our agent is really following all the instructions and capabilities
3:02 that we give it? And so in our case right here, that it's truly leveraging
3:06 the skills that we give. And so I'll get into that at the end just as a bonus on
3:10 top of everything showing you how to build skills for yourself. All right, so
3:14 now let's go over a really really quick master class on skills, what they are,
3:18 why they are so important. So Anthropic has this article that I'll link to in
3:22 the description, a really good guide, and they cover best practices for
3:26 building skills that we'll talk about in a little bit. And so the problem skills
3:32 are solving. We want to give our agent a lot of different capabilities to
3:35 supercharge them, but we don't want to overwhelm their context window. Agents
3:41 are very prone to being overwhelmed when we give them a lot of information
3:45 through our tools, conversation history, the system prompt, everything goes in
3:49 the window. And so other methods like MCP servers, the problem there is we're
3:53 giving a ton of tools up front to the agent even if it never needs to use them
3:58 in a specific conversation. That is bad. And so with skills, the best way to
4:02 explain it is to go to a diagram here in the article. And I also of course have
4:07 it blown up in another tab right here. So the beautifully simple power of
4:11 skills is the idea of progressive disclosure. Instead of giving all the
4:15 tools up front to our agent like MCP servers, we are allowing our agent to
4:20 discover the capabilities over time as it actually needs them. And so the only
4:24 thing we're giving to the agent right away in the system prompt or you can
4:28 think of it like the global rules is the description of the capability or the
4:32 skill. And so in this case as an example we have a PDF processing skill. So we're
4:37 just telling the agent, hey you have this capability if you need it. If the
4:40 user actually asks you to do something with PDFs and then if the agent receives
4:45 that kind of request and it wants to leverage the capability then it'll read
4:51 the skill.md. So the skill.md this is the main file that drives any skill that
4:55 you'll see from anthropic and all the ones that we'll go over with our custom
5:00 implementation here. And so this has the full instructions for the capability.
5:04 Now it's starting to load in the context. It's the second layer of
5:08 progressive disclosure, right? Like the description is layer 1. This is layer
5:12 two. And then there are some other documents oftentimes that the skill.md
5:16 will reference. This is the third layer of progressive disclosure because we can
5:20 load in even more context. If you want to get even more specific about
5:24 something with PDFs, for example, like we have this extra set of instructions
5:29 for if we need to fill out some form, some PDF form, right? Like not all the
5:33 time we're working with PDFs do we care about this, but sometimes we do. So,
5:37 we're just discovering more and more context over time as we actually need it
5:42 for the task at hand. And that saves the LLM from being overwhelmed because if it
5:46 had all these documents and a dozen other skills loaded all at once, it
5:49 would be tens of thousands of tokens just to tell it right away all the
5:52 different capabilities that it has and it probably only has to use one or two
5:56 of them in a single conversation. All right, so with that explanation, I want
6:00 to now get into the template and exactly how we are translating the ideas from
6:04 anthropic into our own agent implementation. And again, I'm using
6:08 paidantic AI because it is my favorite AI or agent framework. It has been for
6:13 over a year now. And so starting off, we have the YAML front matter description,
6:17 which by the way, as far as best practice goes, it's good to have this be
6:23 somewhere between 50 and 100 words. You don't want to load in too much right
6:26 away, right? That defeats the purpose of skill. So you want to be pretty short,
6:29 but at least descriptive enough so the agent knows when it should leverage a
6:33 capability. So usually the skill is, you know, something around 5% of the total
6:37 context of the skill. That's all you're loading up front. Obviously, this is a
6:42 very very rough estimate. And so when we think about our description here,
6:47 essentially every single skill that we want our agent to have access to, we
6:51 need it to have the description and the path to the skill in the system prompt.
6:56 And so we're going to be taking advantage of what is called a dynamic
6:59 system prompt. We have our static content, the main instructions for our
7:03 agent that doesn't really change. But then also, we're going to collect the
7:08 descriptions from all YAML front matters, all of our skill.mds, and we're
7:13 going to put that in our system prompt. And I'll even get into the code just a
7:16 little bit with you and show you how that works after we go through this
7:20 diagram here. And then we have our skill.md, the main instructions for the
7:25 capability. And as far as best practices go, usually you want this between 300
7:29 and 500 lines long. Now, obviously that varies a ton depending on the
7:33 complexity. It could be a lot shorter as well, but usually around 30% of the
7:37 total context for the skill if you have a lot of reference files. Sometimes you
7:41 don't even need this third layer, by the way, if the capability is simple enough.
7:45 But anyway, as far as how this translates to our own system, our own
7:50 agent, we just need a simple tool. Typically, this load skill tool is going
7:55 to take a path to the skill.md. And so, the agent can invoke that. It can give
7:58 the path which should be in the system prompt. And then we're just going to
8:03 take the skill.md and return that as the tool response. And so now that is
8:07 included in the context for the agent because every single time we call a
8:11 tool, whatever it returns is now in the short-term memory for the agent. It is
8:15 that simple. So remember, system prompt has the description and the path to the
8:20 skill. So it has all the context it needs to know when it should leverage
8:23 the skill and then what parameter to pass in as far as the path so it reads
8:28 that file. And then in the skill.md we might also have references to our
8:32 reference files the third layer progressive disclosure the scripts and
8:36 markdowns to read and leverage to take the capabilities even further. And so I
8:41 have a second tool in my system for this to read a reference document. And so you
8:45 just give as the parameters here the skill and then the path to that
8:50 secondary file that you want to leverage. And so this is where you have
8:54 unlimited depth. I mean you could have a fourth layer progressive disclosure if
8:58 you want but that probably gets way too complicated. So the rest of your skill
9:03 will just live in these files. And I could combine these tools together. They
9:06 work very similarly because it's mainly just to read a certain file. But in my
9:10 experience from the testing I've done, it's helpful to the agent to know this
9:14 distinguishment. Like this is the main instruction set for your capability. And
9:18 that's it. I told you it would be simple. And I'll also show you the code
9:22 for the Pantic AI agent in a little bit after a demo. All these different skills
9:27 that I have incorporated into the template that I have for you. There's
9:31 quite a bit. Like you could extend this to dozens and dozens and dozens and the
9:35 agent would still work well because there's just that little bit of context
9:39 taken up front for each of the capabilities. All right, so I have my
9:43 custom Pantic AI skills agent loaded up here in the terminal. This is my
9:47 playground to test out all the different skills that I have incorporated. And I
9:52 can just drop any skills that I want into a skills folder just like you can
9:57 with Claude Code. And so this is using the template that I'll link to in the
10:01 description. I've got instructions here explaining how the skills work and a
10:05 quick start. Very, very easy to get this up and running. Feel free to use this as
10:09 a resource. Give it to your AI coding assistant if you want to incorporate
10:13 this all for yourself. I also use the idea of a tool set in Pinantic AI. So,
10:17 you should be able to pull that out and add skills into your own agent very
10:21 easily. Yeah, go ahead and take a look at this. So, when I start the terminal,
10:26 it is reading all of the skill.md files in my skill folder. It is loading all
10:31 these skills and so I can use any one of them now. So for example, I can say help
10:36 me find a good dinner dish with chicken. So it's going to use the recipe finder
10:40 and I have all the logs here on purpose just so we have visibility into what's
10:43 going on. So you can see it really is using the tool because we're loading the
10:49 skill recipe finder. And then from that we have instructions to understand how
10:54 to leverage some kind of API that this capability gives us. And so now it's
10:59 making that request to help get me get some recipes that have chicken in it.
11:03 And take a look at that. I found lots of delicious chicken options for you. And
11:07 wow. Okay, now I'm hungry. All right. But anyway, now I can say like what is
11:12 the weather in uh Tokyo, Japan right now. And so we'll see here that it's
11:16 going to leverage another skill. So there we go. It's loading the weather
11:21 tool. And then it's going to make an API request. And the only reason it knows
11:24 how to do this is because of the instructions that we have here. And I
11:28 could even tell it to like uh load the ref document for the weather skill
11:34 because there's the third layer progressive disclosure if it needs more
11:38 information. I have this API reference. So it can make more complicated
11:41 requests. And so that's a little bit forced there, but I'm just trying to
11:44 show you an example of the third layer of progressive disclosure as well. So
11:48 really, really cool. And the important thing here is that based on our
11:52 conversation with the agent, we probably only need to leverage one or two of
11:56 these at a time. I chose different skills that are very different from each
12:01 other on purpose here to drive home the point that like most of the time we
12:04 don't need all of this. So if we had an MCP server for every single one of
12:07 these, we would just be overwhelming our LLM for no reason. Okay. So now I want
12:11 to get into the code a little bit with you to show how things are working under
12:16 the hood. And even if you're not super technical, I'm going to keep this high
12:19 level. I'm just going to show you how we're able to leverage this agent, how
12:23 you can use this for yourself. So, first of all, in the readme here, I've got
12:27 instructions to set everything up. And you can change this in your environment
12:30 variables. One of the things you can specify is the directory where it
12:34 searches to load all the skills dynamically into your agent. And so just
12:38 to keep things as similar to the anthropic implementation as possible, I
12:43 have the skills directory just called skills just like you have in cloud code
12:47 for example. And so there's a folder in here for every single one of the skills
12:51 that we have. This looks just like the anthropic skills where we have a
12:56 skill.md. This is our YAML front matter. This is the description that's loaded
12:59 into the agent up front. I'll show you how that works with the dynamic system
13:02 prompt. So the agent knows like, okay, if I want to use a weather API, let me
13:07 read this entire skill. MD file so I can use the API and I know how to do so. And
13:11 then the third layer of progressive disclosure is optional, but for a lot of
13:15 these, we have the reference folder for code review. We even have some Python
13:18 scripts that we can use. And so all of these reference documents are called out
13:22 in the skill.md so the agent knows it can go deeper to get those. And then for
13:26 some like the world clock, we literally just have a skill.md because this is
13:30 just really like for converting time zones. Obviously, we don't need that
13:34 much context for the agent to know how to do that. And so it's just a pretty
13:38 simple skill.md only a couple of hundred lines long. And so the way that this
13:43 works, I have my pideantic AI agent. I have a lot of other content on my
13:47 channel covering Pyant AI. So I won't go too much into the weeds here, but we
13:51 have our agent definition, which by the way, this agent you can use Open Router,
13:55 O Lama, or OpenAI, easy to extend it for others as well. So again, not stuck to
14:00 the clawed ecosystem. And what we're doing here is we are creating a dynamic
14:04 system prompt. And so when we first define the agent, we're not setting the
14:08 system prompt at all because we're going to do it right here. And so in pineanti
14:13 the way the way that you do this is you reference your agent dotsystem prompt.
14:18 This is our Python decorator. Now the function below this is where we get to
14:22 define the system prompt for our agent. So we can inject things at runtime
14:26 because what we're doing with this line of code right here is we're calling a
14:30 function that I'm not going to get into the weeds of that is going to search
14:34 this skills directory. It's going to find every single skill.md take the YAML
14:39 front matter. It's going to extract that from the skill.m MD and then put it into
14:44 the system prompt. And so we have all of the descriptions of the skills and their
14:49 paths as well as our primary system prompt. So we still have our base
14:53 instructions that don't change. In fact, a lot of my instructions here is just
14:56 telling the agent how to use skills. So it's very important. Large language
15:00 models by themselves do not understand how to leverage these capabilities. What
15:04 Claude did and what we have to do oursel is be very descriptive here. Like here's
15:08 what skills are. Here's the metadata so you know the ones that are available to
15:12 you. And then here is step by step how you leverage a skill when the
15:16 description is screaming out to you that you want to use that capability. So all
15:20 the prompting that I went over here is the first layer of progressive
15:24 disclosure. The dynamic system prompt is how we are letting the agent know about
15:28 everything upfront. And so then we get into our tool set. And so I'm giving a
15:34 single tool set to my Pantic AI agent that has everything it needs to work
15:38 with the skills to basically read everything we have in the skills
15:41 directory. And so going to that definition here, there are three tools
15:46 that we are giving. So we have the load skill and read reference. And then one
15:51 other tool that I have here just to make it easier for the agent is to list the
15:54 reference documents just in case the skill.md doesn't reference it directly
16:00 at that at least then our agent is still able to find it. So we're just trying to
16:05 make it as easy as possible to discover everything. That's the whole point of
16:08 skills is discovering the capabilities that it has access to. And so for
16:13 example, to load a skill, we just have to give the name of the skill. That's
16:17 one of the things that's included in the system prompt. And so what we do here is
16:22 we have our skill path that's set in our environment variable and then we have
16:25 the name and then we're looking for the skill.md there. And so we're loading all
16:29 of that and then we're returning the content of the skill.md. That's how
16:33 we're including it now in the context window for the agent going forward. And
16:36 it's very similar for when we're reading one of our reference documents as well.
16:41 The code is actually almost identical, but there are some differences there to
16:44 make sure that we're reading a reference document specific to the skill. some
16:48 protections that I have in place for the agent. And so, yeah, that's pretty much
16:52 the agent as a whole. Like, that's that's how it works. It's super simple
16:56 in the end. And also, because I have this as a Pantic AI tool set, you can
17:01 take this tool set, like you could copy this file and then just a couple of
17:05 these other ones here, and you could bring this into your own Pantic AI agent
17:08 in just a couple of minutes. And your AI coding assistant could help you do this
17:13 so incredibly fast. So, please use this as a resource for yourself. just give it
17:17 this repository and say like, "Hey, I have all these skills." You can put like
17:21 literally any skill that you want in this folder right here and then the next
17:23 time you interact with the agent, it'll have those capabilities automatically.
17:27 So, very dynamic system that I built for you. Really good starting point for any
17:31 kind of system you want to create. Now, one other really important thing to
17:35 cover here is how you can build your own skills. And so, this guide that I have
17:39 linked in the description is a really good starting point. Another quick tip
17:44 that I have for you, super useful. If you go in Claw Desktop, you can use it
17:48 to help you build your skills that you can then bring into the skill directory
17:52 for your custom agent. So, you just go to file settings and then capabilities.
17:57 You scroll all the way down to skills, go to example skills, and you can toggle
18:02 on the skill creator. So, this is really meta, but this is a skill to help you
18:07 build more skills. So when Claude uses this, it's going to pull in all the
18:10 instructions and best practices for creating skills and guide you through
18:14 that process. You can go here and say like, "Hey, help me build a skill for
18:17 LinkedIn posting or help me build a skill to generate powerpoints or to
18:21 create standard operating procedures, like whatever you need." And it'll walk
18:24 you through creating that. And then it'll create a skill.md and then
18:28 potentially some of those reference documents and you can take that and just
18:31 put it in a new folder here in the skills directory. It is that easy to
18:35 build your own skills and the sky's is the limit for the capabilities that you
18:39 can create. All right, so at this point we now know the importance of skills and
18:44 how to build them into any agent that we want. But the big question we have here
18:49 is reliability. When we take these capabilities and you could have dozens
18:53 of skills, you give them to your agent. How do you know that the agent is always
18:57 going to leverage them when you want it to? Like you might have a skill for
19:01 creating X posts, but you ask it to help you with content creation and it doesn't
19:04 pull that skill because it doesn't know that content creation should mean that
19:08 it should pull the X skill, right? Like you want to test for those things, but
19:11 when you have dozens of different skills, it's really annoying to interact
19:15 with the agent and send in a question and make sure it's using each one of
19:19 them properly every single time you make changes to your agent. So that is where
19:23 eval comes in. we can create an automated way to define questions and
19:27 the expected tool calls or in this case the expected skills that it leverages.
19:32 And so I'll cover what that looks like with you and then also get into
19:35 observability. So when our agent is running in production using these
19:39 different skills, we can look at how real users are interacting with our
19:42 agent and making sure the agent is responding appropriately. And I'm pretty
19:46 excited to get into this for the last part of the video here. I'm going to be
19:49 pretty brief, but this is something I don't get to cover enough on my channel.
19:53 Evals and observability are things that are super important, but I don't get to
19:58 make content on it on my channel very much. So, luckily for us, Pideantic AI
20:04 has a very robust evaluation framework built right in. And so, essentially what
20:10 we can do is create these YAML files where we define all of our test cases.
20:15 And so, for example, I'm going to send in the question, what's [snorts] the
20:18 weather in New York right now? And then I have an evaluator to make sure that
20:22 the weather skill was loaded. So we can also create our custom evaluators. I
20:27 don't want to get into the code too much right now. You can read up on this in
20:31 the documentation. Use this as an example for your AI coding assistant.
20:34 But I have this custom evaluator to make sure that the right skills are loaded
20:38 based on the questions that I send in. So now instead of me having to go into
20:41 the agent and ask it each of these things to make sure it's using the
20:45 different skills like the code review skill and the research assistant skill,
20:49 now I can just run this single script. So I can call this Python script right
20:54 here to run my evaluators. It loads in my golden data set as I call it. Goes
20:58 through the questions one at a time. So, I am paying for the LLM credits, but
21:03 these are really cheap, really fast, just a smoke test to make sure that all
21:08 the different skills that I've added to my folder are actually being used
21:12 properly by my agent. If it's not, then it means that maybe there's an issue in
21:15 my loading capability or just my system prompt needs to be better or the skill
21:19 descriptions need to be better. Like, there's definitely different things that
21:22 need to be adjusted if the agent isn't working as you expect it to. And so eval
21:27 are very important to run every single time you change the system prompt for
21:31 your agent or even just the skills that you're giving it access to. So I also
21:35 have instructions in the readme for how to run the evals. And you can feel free
21:39 to poke around in the code for this as well if you want to see how to set this
21:42 up for your own Pantic AI agents. But you want to do this for pretty much any
21:46 agent you're deploying to production. Evals are so important and skills is
21:49 just a good example because there's just so many different capabilities that we
21:53 want to test for here. So logs are pretty verbose. Uh but I'm just using
21:57 Haiku for all the tests here. So it's really nice and fast. But at the bottom
22:02 here, 25 out of 25 cases have passed. And so I've sent in a lot of different
22:06 requests to make sure the agent properly understands all of the skills that I've
22:10 given it. So it's good to do this instead of having to do a bunch of
22:14 manual testing myself after every single time I adjust my agent. So, I'll also
22:19 link to this page for padantic evals in the description if you want to dive into
22:22 this and really take your agent seriously before deploying them to
22:26 production. And the other thing I want to talk about is observability with
22:30 logfire because eval are great when you want to test your agent locally, but how
22:33 about when users are actually using your agent in production, but you want to be
22:37 able to peer into the traces as they're called to see the decisions your agent
22:41 is making when people are using it out in the wild. And so, that's why we need
22:45 a tool like Logfire. So it's created by the Pantic team. They also made Pyantic
22:49 AI. So it's just a fantastic integration to have here. And it's really easy to
22:53 set it up. And so there's just a minimal amount of code that I have to have in my
22:59 agent definition file. So the logfire token is one of the environment
23:01 variables. I explained that in the readme. And then we can configure
23:05 logfire. So it's going to instrument all the podantic agents as in every time we
23:09 invoke a tool interact with the LLM. It's going to send all of that as
23:14 telemetry data. So we can track this running it locally like you're seeing
23:18 right here, but then also in production. And so yeah, I'm also going to have a
23:22 link to Logfire. I just wanted to mention this quick like this is so
23:25 important being able to see our usage like token usage and cost in production.
23:29 Looking into the different traces, like if a user reports a problem, we can go
23:32 in here and see like, okay, where did the agent mess up? Like is it something
23:36 wrong with their system? Did it just use a tool incorrectly like a bad parameter?
23:39 We can look at all the parameters, all the tool calls that it made. So we can
23:44 see the decisions even when we're not running the agent locally. It's very
23:47 very important to have evals and observability when you want to take your
23:52 agent seriously and paid with logfire and pantic AI just makes it so easy. So
3:14 now let's go over a really really quick master class on skills, what they are,
3:18 why they are so important. So Anthropic has this article that I'll link to in
3:22 the description, a really good guide, and they cover best practices for
3:26 building skills that we'll talk about in a little bit. And so the problem skills
3:32 are solving. We want to give our agent a lot of different capabilities to
3:35 supercharge them, but we don't want to overwhelm their context window. Agents
3:41 are very prone to being overwhelmed when we give them a lot of information
3:45 through our tools, conversation history, the system prompt, everything goes in
3:49 the window. And so other methods like MCP servers, the problem there is we're
3:53 giving a ton of tools up front to the agent even if it never needs to use them
3:58 in a specific conversation. That is bad. And so with skills, the best way to
4:02 explain it is to go to a diagram here in the article. And I also of course have
4:07 it blown up in another tab right here. So the beautifully simple power of
4:11 skills is the idea of progressive disclosure. Instead of giving all the
4:15 tools up front to our agent like MCP servers, we are allowing our agent to
4:20 discover the capabilities over time as it actually needs them. And so the only
4:24 thing we're giving to the agent right away in the system prompt or you can
4:28 think of it like the global rules is the description of the capability or the
4:32 skill. And so in this case as an example we have a PDF processing skill. So we're
4:37 just telling the agent, hey you have this capability if you need it. If the
4:40 user actually asks you to do something with PDFs and then if the agent receives
4:45 that kind of request and it wants to leverage the capability then it'll read
4:51 the skill.md. So the skill.md this is the main file that drives any skill that
4:55 you'll see from anthropic and all the ones that we'll go over with our custom
5:00 implementation here. And so this has the full instructions for the capability.
5:04 Now it's starting to load in the context. It's the second layer of
5:08 progressive disclosure, right? Like the description is layer 1. This is layer
5:12 two. And then there are some other documents oftentimes that the skill.md
5:16 will reference. This is the third layer of progressive disclosure because we can
5:20 load in even more context. If you want to get even more specific about
5:24 something with PDFs, for example, like we have this extra set of instructions
5:29 for if we need to fill out some form, some PDF form, right? Like not all the
5:33 time we're working with PDFs do we care about this, but sometimes we do. So,
5:37 we're just discovering more and more context over time as we actually need it
5:42 for the task at hand. And that saves the LLM from being overwhelmed because if it
5:46 had all these documents and a dozen other skills loaded all at once, it
5:49 would be tens of thousands of tokens just to tell it right away all the
5:52 different capabilities that it has and it probably only has to use one or two
5:56 of them in a single conversation. All right, so with that explanation, I want
6:00 to now get into the template and exactly how we are translating the ideas from
6:04 anthropic into our own agent implementation. And again, I'm using
6:08 paidantic AI because it is my favorite AI or agent framework. It has been for
6:13 over a year now. And so starting off, we have the YAML front matter description,
6:17 which by the way, as far as best practice goes, it's good to have this be
6:23 somewhere between 50 and 100 words. You don't want to load in too much right
6:26 away, right? That defeats the purpose of skill. So you want to be pretty short,
6:29 but at least descriptive enough so the agent knows when it should leverage a
6:33 capability. So usually the skill is, you know, something around 5% of the total
6:37 context of the skill. That's all you're loading up front. Obviously, this is a
6:42 very very rough estimate. And so when we think about our description here,
6:47 essentially every single skill that we want our agent to have access to, we
6:51 need it to have the description and the path to the skill in the system prompt.
6:56 And so we're going to be taking advantage of what is called a dynamic
6:59 system prompt. We have our static content, the main instructions for our
7:03 agent that doesn't really change. But then also, we're going to collect the
7:08 descriptions from all YAML front matters, all of our skill.mds, and we're
7:13 going to put that in our system prompt. And I'll even get into the code just a
7:16 little bit with you and show you how that works after we go through this
7:20 diagram here. And then we have our skill.md, the main instructions for the
7:25 capability. And as far as best practices go, usually you want this between 300
7:29 and 500 lines long. Now, obviously that varies a ton depending on the
7:33 complexity. It could be a lot shorter as well, but usually around 30% of the
7:37 total context for the skill if you have a lot of reference files. Sometimes you
7:41 don't even need this third layer, by the way, if the capability is simple enough.
7:45 But anyway, as far as how this translates to our own system, our own
7:50 agent, we just need a simple tool. Typically, this load skill tool is going
7:55 to take a path to the skill.md. And so, the agent can invoke that. It can give
7:58 the path which should be in the system prompt. And then we're just going to
8:03 take the skill.md and return that as the tool response. And so now that is
8:07 included in the context for the agent because every single time we call a
8:11 tool, whatever it returns is now in the short-term memory for the agent. It is
8:15 that simple. So remember, system prompt has the description and the path to the
8:20 skill. So it has all the context it needs to know when it should leverage
8:23 the skill and then what parameter to pass in as far as the path so it reads
8:28 that file. And then in the skill.md we might also have references to our
8:32 reference files the third layer progressive disclosure the scripts and
8:36 markdowns to read and leverage to take the capabilities even further. And so I
8:41 have a second tool in my system for this to read a reference document. And so you
8:45 just give as the parameters here the skill and then the path to that
8:50 secondary file that you want to leverage. And so this is where you have
8:54 unlimited depth. I mean you could have a fourth layer progressive disclosure if
8:58 you want but that probably gets way too complicated. So the rest of your skill
9:03 will just live in these files. And I could combine these tools together. They
9:06 work very similarly because it's mainly just to read a certain file. But in my
9:10 experience from the testing I've done, it's helpful to the agent to know this
9:14 distinguishment. Like this is the main instruction set for your capability. And
9:18 that's it. I told you it would be simple. And I'll also show you the code
9:22 for the Pantic AI agent in a little bit after a demo. All these different skills
9:27 that I have incorporated into the template that I have for you. There's
9:31 quite a bit. Like you could extend this to dozens and dozens and dozens and the
9:35 agent would still work well because there's just that little bit of context
9:39 taken up front for each of the capabilities. All right, so I have my
9:43 custom Pantic AI skills agent loaded up here in the terminal. This is my
9:47 playground to test out all the different skills that I have incorporated. And I
9:52 can just drop any skills that I want into a skills folder just like you can
9:57 with Claude Code. And so this is using the template that I'll link to in the
10:01 description. I've got instructions here explaining how the skills work and a
10:05 quick start. Very, very easy to get this up and running. Feel free to use this as
10:09 a resource. Give it to your AI coding assistant if you want to incorporate
10:13 this all for yourself. I also use the idea of a tool set in Pinantic AI. So,
10:17 you should be able to pull that out and add skills into your own agent very
10:21 easily. Yeah, go ahead and take a look at this. So, when I start the terminal,
10:26 it is reading all of the skill.md files in my skill folder. It is loading all
10:31 these skills and so I can use any one of them now. So for example, I can say help
10:36 me find a good dinner dish with chicken. So it's going to use the recipe finder
10:40 and I have all the logs here on purpose just so we have visibility into what's
10:43 going on. So you can see it really is using the tool because we're loading the
10:49 skill recipe finder. And then from that we have instructions to understand how
10:54 to leverage some kind of API that this capability gives us. And so now it's
10:59 making that request to help get me get some recipes that have chicken in it.
11:03 And take a look at that. I found lots of delicious chicken options for you. And
11:07 wow. Okay, now I'm hungry. All right. But anyway, now I can say like what is
11:12 the weather in uh Tokyo, Japan right now. And so we'll see here that it's
11:16 going to leverage another skill. So there we go. It's loading the weather
11:21 tool. And then it's going to make an API request. And the only reason it knows
11:24 how to do this is because of the instructions that we have here. And I
11:28 could even tell it to like uh load the ref document for the weather skill
11:34 because there's the third layer progressive disclosure if it needs more
11:38 information. I have this API reference. So it can make more complicated
11:41 requests. And so that's a little bit forced there, but I'm just trying to
11:44 show you an example of the third layer of progressive disclosure as well. So
11:48 really, really cool. And the important thing here is that based on our
11:52 conversation with the agent, we probably only need to leverage one or two of
11:56 these at a time. I chose different skills that are very different from each
12:01 other on purpose here to drive home the point that like most of the time we
12:04 don't need all of this. So if we had an MCP server for every single one of
12:07 these, we would just be overwhelming our LLM for no reason. Okay. So now I want
12:11 to get into the code a little bit with you to show how things are working under
12:16 the hood. And even if you're not super technical, I'm going to keep this high
12:19 level. I'm just going to show you how we're able to leverage this agent, how
12:23 you can use this for yourself. So, first of all, in the readme here, I've got
12:27 instructions to set everything up. And you can change this in your environment
12:30 variables. One of the things you can specify is the directory where it
12:34 searches to load all the skills dynamically into your agent. And so just
12:38 to keep things as similar to the anthropic implementation as possible, I
12:43 have the skills directory just called skills just like you have in cloud code
12:47 for example. And so there's a folder in here for every single one of the skills
12:51 that we have. This looks just like the anthropic skills where we have a
12:56 skill.md. This is our YAML front matter. This is the description that's loaded
12:59 into the agent up front. I'll show you how that works with the dynamic system
13:02 prompt. So the agent knows like, okay, if I want to use a weather API, let me
13:07 read this entire skill. MD file so I can use the API and I know how to do so. And
13:11 then the third layer of progressive disclosure is optional, but for a lot of
13:15 these, we have the reference folder for code review. We even have some Python
13:18 scripts that we can use. And so all of these reference documents are called out
13:22 in the skill.md so the agent knows it can go deeper to get those. And then for
13:26 some like the world clock, we literally just have a skill.md because this is
13:30 just really like for converting time zones. Obviously, we don't need that
13:34 much context for the agent to know how to do that. And so it's just a pretty
13:38 simple skill.md only a couple of hundred lines long. And so the way that this
13:43 works, I have my pideantic AI agent. I have a lot of other content on my
13:47 channel covering Pyant AI. So I won't go too much into the weeds here, but we
13:51 have our agent definition, which by the way, this agent you can use Open Router,
13:55 O Lama, or OpenAI, easy to extend it for others as well. So again, not stuck to
14:00 the clawed ecosystem. And what we're doing here is we are creating a dynamic
14:04 system prompt. And so when we first define the agent, we're not setting the
14:08 system prompt at all because we're going to do it right here. And so in pineanti
14:13 the way the way that you do this is you reference your agent dotsystem prompt.
14:18 This is our Python decorator. Now the function below this is where we get to
14:22 define the system prompt for our agent. So we can inject things at runtime
14:26 because what we're doing with this line of code right here is we're calling a
14:30 function that I'm not going to get into the weeds of that is going to search
14:34 this skills directory. It's going to find every single skill.md take the YAML
14:39 front matter. It's going to extract that from the skill.m MD and then put it into
14:44 the system prompt. And so we have all of the descriptions of the skills and their
14:49 paths as well as our primary system prompt. So we still have our base
14:53 instructions that don't change. In fact, a lot of my instructions here is just
14:56 telling the agent how to use skills. So it's very important. Large language
15:00 models by themselves do not understand how to leverage these capabilities. What
15:04 Claude did and what we have to do oursel is be very descriptive here. Like here's
15:08 what skills are. Here's the metadata so you know the ones that are available to
15:12 you. And then here is step by step how you leverage a skill when the
15:16 description is screaming out to you that you want to use that capability. So all
15:20 the prompting that I went over here is the first layer of progressive
15:24 disclosure. The dynamic system prompt is how we are letting the agent know about
15:28 everything upfront. And so then we get into our tool set. And so I'm giving a
15:34 single tool set to my Pantic AI agent that has everything it needs to work
15:38 with the skills to basically read everything we have in the skills
15:41 directory. And so going to that definition here, there are three tools
15:46 that we are giving. So we have the load skill and read reference. And then one
15:51 other tool that I have here just to make it easier for the agent is to list the
15:54 reference documents just in case the skill.md doesn't reference it directly
16:00 at that at least then our agent is still able to find it. So we're just trying to
16:05 make it as easy as possible to discover everything. That's the whole point of
16:08 skills is discovering the capabilities that it has access to. And so for
16:13 example, to load a skill, we just have to give the name of the skill. That's
16:17 one of the things that's included in the system prompt. And so what we do here is
16:22 we have our skill path that's set in our environment variable and then we have
16:25 the name and then we're looking for the skill.md there. And so we're loading all
16:29 of that and then we're returning the content of the skill.md. That's how
16:33 we're including it now in the context window for the agent going forward. And
16:36 it's very similar for when we're reading one of our reference documents as well.
16:41 The code is actually almost identical, but there are some differences there to
16:44 make sure that we're reading a reference document specific to the skill. some
16:48 protections that I have in place for the agent. And so, yeah, that's pretty much
16:52 the agent as a whole. Like, that's that's how it works. It's super simple
16:56 in the end. And also, because I have this as a Pantic AI tool set, you can
17:01 take this tool set, like you could copy this file and then just a couple of
17:05 these other ones here, and you could bring this into your own Pantic AI agent
17:08 in just a couple of minutes. And your AI coding assistant could help you do this
17:13 so incredibly fast. So, please use this as a resource for yourself. just give it
17:17 this repository and say like, "Hey, I have all these skills." You can put like
17:21 literally any skill that you want in this folder right here and then the next
17:23 time you interact with the agent, it'll have those capabilities automatically.
17:27 So, very dynamic system that I built for you. Really good starting point for any
17:31 kind of system you want to create. Now, one other really important thing to
17:35 cover here is how you can build your own skills. And so, this guide that I have
17:39 linked in the description is a really good starting point. Another quick tip
17:44 that I have for you, super useful. If you go in Claw Desktop, you can use it
17:48 to help you build your skills that you can then bring into the skill directory
17:52 for your custom agent. So, you just go to file settings and then capabilities.
17:57 You scroll all the way down to skills, go to example skills, and you can toggle
18:02 on the skill creator. So, this is really meta, but this is a skill to help you
18:07 build more skills. So when Claude uses this, it's going to pull in all the
18:10 instructions and best practices for creating skills and guide you through
18:14 that process. You can go here and say like, "Hey, help me build a skill for
18:17 LinkedIn posting or help me build a skill to generate powerpoints or to
18:21 create standard operating procedures, like whatever you need." And it'll walk
18:24 you through creating that. And then it'll create a skill.md and then
18:28 potentially some of those reference documents and you can take that and just
18:31 put it in a new folder here in the skills directory. It is that easy to
18:35 build your own skills and the sky's is the limit for the capabilities that you
18:39 can create. All right, so at this point we now know the importance of skills and
18:44 how to build them into any agent that we want. But the big question we have here
18:49 is reliability. When we take these capabilities and you could have dozens
18:53 of skills, you give them to your agent. How do you know that the agent is always
18:57 going to leverage them when you want it to? Like you might have a skill for
19:01 creating X posts, but you ask it to help you with content creation and it doesn't
19:04 pull that skill because it doesn't know that content creation should mean that
19:08 it should pull the X skill, right? Like you want to test for those things, but
19:11 when you have dozens of different skills, it's really annoying to interact
19:15 with the agent and send in a question and make sure it's using each one of
19:19 them properly every single time you make changes to your agent. So that is where
19:23 eval comes in. we can create an automated way to define questions and
19:27 the expected tool calls or in this case the expected skills that it leverages.
19:32 And so I'll cover what that looks like with you and then also get into
19:35 observability. So when our agent is running in production using these
19:39 different skills, we can look at how real users are interacting with our
19:42 agent and making sure the agent is responding appropriately. And I'm pretty
19:46 excited to get into this for the last part of the video here. I'm going to be
19:49 pretty brief, but this is something I don't get to cover enough on my channel.
19:53 Evals and observability are things that are super important, but I don't get to
19:58 make content on it on my channel very much. So, luckily for us, Pideantic AI
20:04 has a very robust evaluation framework built right in. And so, essentially what
20:10 we can do is create these YAML files where we define all of our test cases.
20:15 And so, for example, I'm going to send in the question, what's [snorts] the
20:18 weather in New York right now? And then I have an evaluator to make sure that
20:22 the weather skill was loaded. So we can also create our custom evaluators. I
20:27 don't want to get into the code too much right now. You can read up on this in
20:31 the documentation. Use this as an example for your AI coding assistant.
20:34 But I have this custom evaluator to make sure that the right skills are loaded
20:38 based on the questions that I send in. So now instead of me having to go into
20:41 the agent and ask it each of these things to make sure it's using the
20:45 different skills like the code review skill and the research assistant skill,
20:49 now I can just run this single script. So I can call this Python script right
20:54 here to run my evaluators. It loads in my golden data set as I call it. Goes
20:58 through the questions one at a time. So, I am paying for the LLM credits, but
21:03 these are really cheap, really fast, just a smoke test to make sure that all
21:08 the different skills that I've added to my folder are actually being used
21:12 properly by my agent. If it's not, then it means that maybe there's an issue in
21:15 my loading capability or just my system prompt needs to be better or the skill
21:19 descriptions need to be better. Like, there's definitely different things that
21:22 need to be adjusted if the agent isn't working as you expect it to. And so eval
21:27 are very important to run every single time you change the system prompt for
21:31 your agent or even just the skills that you're giving it access to. So I also
21:35 have instructions in the readme for how to run the evals. And you can feel free
21:39 to poke around in the code for this as well if you want to see how to set this
21:42 up for your own Pantic AI agents. But you want to do this for pretty much any
21:46 agent you're deploying to production. Evals are so important and skills is
21:49 just a good example because there's just so many different capabilities that we
21:53 want to test for here. So logs are pretty verbose. Uh but I'm just using
21:57 Haiku for all the tests here. So it's really nice and fast. But at the bottom
22:02 here, 25 out of 25 cases have passed. And so I've sent in a lot of different
22:06 requests to make sure the agent properly understands all of the skills that I've
22:10 given it. So it's good to do this instead of having to do a bunch of
22:14 manual testing myself after every single time I adjust my agent. So, I'll also
22:19 link to this page for padantic evals in the description if you want to dive into
22:22 this and really take your agent seriously before deploying them to
22:26 production. And the other thing I want to talk about is observability with
22:30 logfire because eval are great when you want to test your agent locally, but how
22:33 about when users are actually using your agent in production, but you want to be
22:37 able to peer into the traces as they're called to see the decisions your agent
22:41 is making when people are using it out in the wild. And so, that's why we need
22:45 a tool like Logfire. So it's created by the Pantic team. They also made Pyantic
22:49 AI. So it's just a fantastic integration to have here. And it's really easy to
22:53 set it up. And so there's just a minimal amount of code that I have to have in my
22:59 agent definition file. So the logfire token is one of the environment
23:01 variables. I explained that in the readme. And then we can configure
23:05 logfire. So it's going to instrument all the podantic agents as in every time we
23:09 invoke a tool interact with the LLM. It's going to send all of that as
23:14 telemetry data. So we can track this running it locally like you're seeing
23:18 right here, but then also in production. And so yeah, I'm also going to have a
23:22 link to Logfire. I just wanted to mention this quick like this is so
23:25 important being able to see our usage like token usage and cost in production.
23:29 Looking into the different traces, like if a user reports a problem, we can go
23:32 in here and see like, okay, where did the agent mess up? Like is it something
23:36 wrong with their system? Did it just use a tool incorrectly like a bad parameter?
23:39 We can look at all the parameters, all the tool calls that it made. So we can
23:44 see the decisions even when we're not running the agent locally. It's very
23:47 very important to have evals and observability when you want to take your
23:52 agent seriously and paid with logfire and pantic AI just makes it so easy. So
23:57 there you go. That is your guide for building skills into any AI agent that
24:02 you want. And you even got a bit of a bonus with evals and observability
24:05 because it is important to make sure we're constantly checking our agent
24:09 making sure that it's leveraging our tools properly. Because with skills, we
24:14 can give our agents dozens and dozens of capabilities like I've shown you here.
24:18 So, if you appreciate this video and you're looking forward to more things on
24:22 building AI agents, I would really appreciate a like and a subscribe. And
$

Claude Skills Aren't Just for Claude - Here's How to Build Them for ANY Agent

@ColeMedin 24:27 9 chapters
[AI agents and automation][developer tools and coding][productivity and workflows][hardware setup and infrastructure][content creation and YouTube]
// chapters
// description

Claude Skills (aka Agent Skills) is one of the most powerful advancements in AI recently. It's beautifully simple, yet it gives agents so much power, flexibility, and context efficiency. But what if you don't want to be limited to the Anthropic ecosystem? What if you want to use different models, run things locally, build your own agents, or create custom workflows without relying on Claude Code or Claude Desktop? That's exactly what I'll show you in this video. I'll walk you through building

now: 0:00
// tags
[AI agents and automation][developer tools and coding][productivity and workflows][hardware setup and infrastructure][content creation and YouTube]