By Josh Proto

Feb 13, 2026

By Josh Proto

Feb 13, 2026

Claude Code Skills and the Problem of Context in Modern Software Development

As AI-assisted development tools mature, the core challenge facing engineering teams is no longer whether large language models can write code. The more pressing question is how to structure repositories, workflows, and information so that these tools behave predictably, remain accurate, and support real production work instead of generating noise. A recent episode of the Code & Cognition podcast explores this question through hands-on experience with Claude Code, Anthropic's CLI-based coding assistant. The conversation focuses less on surface-level features and more on how Claude Code's emerging mechanisms like skills, slash commands, subagents, and context isolation change the way developers think about the "work" of software engineering. What follows is a synthesis of what we see happens when these tools are used seriously inside live projects.

From "Can It Code?" to "What Does It See?"

Early experiments with AI coding assistants often stall for the same reason: "God Prompts". Large markdown files describing architecture, style guides, deployment steps, and coding conventions are loaded into every prompt. The assumption is that more information would lead to better answers. In practice, this approach produces the opposite result. Using LLMs becomes slower and less precise compared to traditional development and the LLMs themselves are still prone to hallucination. However, Claude Code's evolution reflects a recognition that context management is a first-class engineering problem. Rather than treating the model as a passive recipient of everything the repository contains, Claude Code introduces mechanisms that allow teams to decide when information should be visible and why.

The Early Baseline: CLAUDE.md

The original Claude Code workflow centered on a single file: CLAUDE.md. This file functioned as a persistent memory layer, containing everything the assistant should know about the repository: coding standards, architectural patterns, commands, and conventions. This approach works well for small projects, but it does not scale. Every interaction loads the entire file, regardless of relevance. For instance, linting rules influenced design discussions, deployment notes appeared during test writing, and over time, the model's output became less aligned with developer intent. The model is unable to determine when to prioritize its breadth of instruction contained in the CLAUDE.md file. Static global context creates unintended coupling between unrelated tasks.

Slash Commands as Explicit Context Gates

Slash commands were the first meaningful step toward solving this problem. Instead of loading everything up front, teams can store prompts as discrete files and invoke them intentionally. A /commit command can load commit message formatting rules. A /pr command can load pull request templates and review heuristics. A /test command can focus the model on affected files only. This makes Claude Code more predictable and more discrete in its use. Developers know exactly what information is being introduced and when. Slash commands prove to be highly composable, can be called from other commands, subagents, or even skills. In practice, slash commands are the most reliable primitive in Claude Code workflows. They are explicit, inspectable, and deterministic. Qualities that help operationalize and standardize AI output in a production process.

A summary of slash commands used in our recent project

Subagents and Context Isolation

Subagents extend the same principle further. Rather than sharing a single context window, subagents operate in isolated environments where they can read files, analyze code, or generate plans, then return a summary before their internal state is discarded. This matters because many AI-assisted tasks are exploratory by nature and searching for patterns, scanning directories, or reasoning through alternatives can generate large amounts of intermediate output that is irrelevant once a decision is made. By isolating that work, subagents prevent exploratory reasoning from contaminating the main thread of development.

What 'Claude Code Skills' Were Intended to Solve

Claude Code Skills sit somewhere between slash commands and subagents. They are markdown files containing structured instructions, templates, and references that Claude can load automatically when it believes they are relevant to the current task. So instead of explicitly invoking a command, the model recognizes that a certain type of work is being requested through context and loads the appropriate procedural knowledge on its own. In theory, this enables more natural interactions. However in practice we found it introduces a new ambiguous variable that Claude Code can misinterpret.

When Skills Fall Short

Our initial attempts to use skills often focused on static rules around linting conventions, naming standards, or formatting guidelines, which seemed like obvious candidates for conditional loading. However, real-world use quickly reveals a limitation. Claude does not always load a skill simply because it exists and even when we think it should based on the given context. If the model believes it can complete a task without additional context, it may bypass the skill entirely. This inconsistency makes skills unreliable when used as passive rule containers. The problem is not that skills fail, it is that rules alone do not strongly signal intent for skill use.

Reframing Skills as Procedures

The turning point came when skills were treated not as rulebooks, but as procedural guides. Instead of encoding "how code should look," successful skills encode "how work should be done." One example discussed in our conversation was a skill created to manage Shortcut tickets. Rather than listing labels or naming conventions in isolation, the skill described:

How tickets are classified
What level of detail is required
How stories relate to epics and sprints
How templates should be applied
How metadata is inferred from context

When Claude was asked to create or update tickets, it consistently loaded the skill because the task aligned with the described procedure. When Claude was merely retrieving ticket data, the skill was sometimes skipped. This distinction matters, as skills work when they describe processes.

Reliability Through Redundancy

Because skill invocation is not guaranteed to happen automatically, teams quickly discovered that relying on Claude to "do the right thing" on its own was insufficient for production work. Instead, reliability emerged from layering explicit mechanisms on top of the more implicit ones. In practice, this means using slash commands to deliberately load a skill before performing a task, adding lightweight hooks that prompt Claude to reconsider available skills when a request is ambiguous, and designing skills that reference other documents or commands rather than duplicating logic internally. None of these solutions are particularly elegant as they introduce a degree of redundancy that feels unnecessary in theory. But in practice, that redundancy is what makes the use of Claude Code Skills usable. Sometimes when AI is embedded in a production workflow, conceptual purity matters far less than predictable behavior.

Measuring the Impact of Better Context Design

One of the most concrete outcomes discussed in the episode came from an informal comparison between two Claude Code setups:

A legacy configuration with large, monolithic documentation
A modular configuration using slash commands and procedural skills

The modular setup produced dramatic improvements:

Token usage dropped substantially
File reads decreased
Search time fell
Output became more focused and accurate

These gains were not merely about cost although cost reduction did follow. Smaller, more relevant context windows reduced hallucination and improved reasoning quality and Claude spent less time reconciling irrelevant information and more time executing the task at hand.

Performance improvements from better context design

Performance improvements from modular context design vs. monolithic documentation

High-Context Refactors in Practice

Claude Code's strengths become especially clear during large refactors. In one case, a personal project originally built on MongoDB was migrated to PostgreSQL using Prisma. The application contained dozens of database-dependent endpoints. Claude Code was asked to:

Understand the existing schema
Replace database interactions
Introduce the new ORM
Update queries and models accordingly

Within minutes, the application was largely functional. Errors remained, but the bulk of the mechanical work was complete. Besides speed, Claude Code was shown to successfully operate across files, layers, and abstractions while remaining grounded in the actual codebase.

Creativity and Structural Changes

Claude Code was also used for less mechanical tasks, such as redesigning a dashboard in a personal betting application. Given minimal guidance, the assistant:

Introduced data visualizations
Improved layout and information hierarchy
Added interactive elements like drag-and-drop
Generated UI code that aligned with the existing project structure

The developer did not consult documentation for the underlying libraries. Instead, they focused on describing outcomes and constraints and Claude handled the translation into implementation after receiving thorough designs.

Dashboard redesign with data visualizations and improved layout using high context prompt.

Development Without an Editor

One of the more striking anecdotes from the episode involved a hackathon project built using Claude Code almost exclusively through the command line. A small team built a playable top-down game in a matter of hours without opening a code editor. They relied on natural language interaction, iterative feedback, and Claude's ability to manage unfamiliar libraries. Although not blanket permission to start vibe coding all projects, this points to what's possible when AI Code Editors like Claude Code are managed by experienced software engineers. When context is managed well, AI tools can support workflows that would previously have been impractical.

The Open Question of Review and Judgment

Despite these successes, the episode surfaced unresolved tensions, particularly around code review. Claude can review pull requests. It can identify inconsistencies, suggest improvements, and flag potential issues. However, it will almost always find something to critique. This raises a practical problem. If an assistant always has feedback, how do teams decide when that feedback matters? How do they prevent endless refinement loops? And where does human judgment remain indispensable? The emerging consensus is that AI-assisted review works best as augmented judgment, teams will always prefer having a human in the loop. Developers remain responsible for deciding which suggestions align with project goals and which can be safely ignored.

Toward Spec-Driven and Spectrum Development

The conversation concluded by gesturing toward what comes next. As tools like Claude Code mature, the boundary between design, implementation, review, and documentation continues to blur. Specifications generate code. Code updates documentation. Reviews feed back into standards. So a well planned application is better suited to be built faster and with less error in this new age of AI Code Assistants. Moreover, this "spectrum development" model does not eliminate engineering discipline and expertise. In fact, it requires more of it. How systems are structured, how context is constrained, and how responsibility is assigned are more important than ever as they will dictate the speed and quality of software developed with AI Code Assistants.

Josh Proto

Cloud Strategist

Josh is a Cloud Strategist passionate about helping engineers and business leaders navigate how emerging technologies like AI can be skillfully used in their organizations. In his free time you'll find him rescuing pigeons with his non-profit or singing Hindustani & Nepali Classical Music.

Share This Post