Why agent skills are the next productivity unlock

Agent skills are scalable units of institutional knowledge that compartmentalize smaller portions of one’s workflows. They aren’t as flashy as autonomous agents, but they offer something more valuable for senior engineers: reproducibility and leverage. The key is managing prompts and inputs deliberately.

Skills TL;DR

Simply put, skills are a way to reuse prompts in a more structured and meaningful way, by thinking carefully about the problem being solved, how to solve it, what inputs it requires, and finally, what it should produce.

You can get started with skills by using sites like skills.sh, which allow you to browse and install skills created by others, or you can simply create a folder with SKILL.md in it, including additional context such as scripts and reference materials.

Once you have the SKILL.md in place, all you need is a general purpose coding agent. You can just invoke the skill ($my-skill for Codex, /my-skill for Claude) and its prompt will be loaded into your context along with any additional instructions you provide at invocation time.

Funnily enough, there’s a skill to find other skills better suited for a task at hand, which might be a good place to start.

Skills are playbooks for repeatable work

Skills function as agents’ playbooks. They contain procedural guidance that specifies how to perform a task. Once agents understand their action space, they can reliably transform inputs into well-defined inputs, like a proper function.

These tasks shouldn’t be one-offs, like “write tests for function X in repo Y”, but more like a task classes, such as “write unit tests for function X with these conventions”. The latter is reusable across various contexts that share the same conventions and rules.

For example, I’ve recently been playing around with the RPI framework (Research → Plan → Implement), and I wanted to make skills around each of these phases.

Research: understands the context surrounding the problem, finds all the relevant files, doesn’t try to solve anything. Research minimizes future exploration, by outputting an artifact that later phases can simply reference instead of trying to find everything on their own.
Plan: outlines exact implementation steps to achieve the desired outcome, including filenames, lines, and code snippets, explicitly defining what “done” means. Plan minimizes future thinking, by outputting an artifact that the next phase can simply execute.
Implement: writes the code. If properly planned, the implementation is easy and expected.

Each phase gets its own context window, to not pollute context from any of the other phases. Output artifacts chain them together, and the results are fascinating.

If skills are just prompts, how much do they help?

While writing this article, SkillsBench, the first skills benchmarking framework, was published. They measured agent performance across 86 real-world tasks, with and without skills, and the results were striking.

SkillsBench graph showing how agents perform with and without skills

Their experiment shows that, on average, the smallest model with a good skill (Claude’s Haiku 4.5) can outperform the strongest model without any skills (Claude’s Opus 4.5)!

Of course, the improvement rate varies significantly between different task classes. Some tasks are so specialized that generalized models cannot solve them, so guidance is paramount, and sometimes, tasks are so simple (e.g. engineering chores), that skills help minimally.

Skills can sometimes hurt the outcome of a task, like when a skill is vibed/created with the Skill Creator skill (which is tempting, but often too generic to help), so it isn’t tailored enough for your workflows.

But overall, performance across tasks vastly improves with skills, just because you’re thinking deeply about the problem space and guiding agents through it.

How do you adopt skills today

Start small.

Pick one small part of a workflow you repeat often and turn it into a skill. Don’t aim for perfection. Just define the inputs, the rough sequence of steps, the rules, and what a “done” artifact should look like.

Run it a few times. Tweak the prompt. Tighten the instructions. Notice how it feels when the agent follows your playbook instead of improvising.

As things start to click, you’ll naturally split bigger workflows into a few more skills. The system becomes clearer. Reuse gets easier. And that’s when the productivity starts to compound.