feat: Deterministic LLM agents through composable skills with static type checking #13

edu-ap · 2025-12-18T23:05:35Z

Executive Summary

This PR introduces an optional composability extension to the Agent Skills format, enabling atomic skills to be combined into higher-order workflows with static type checking. All new fields are optional; existing skills work unchanged.

The Problem

LLM agents with flat tool definitions suffer from non-deterministic behaviour that makes them unsuitable for production:

Tool selection lottery: Same prompt yields different tools and inconsistent results
Context window bloat: Accumulated history degrades reasoning quality
Scope creep: Models expand tasks beyond requirements
Hallucination: Outputs lack verifiable provenance

The Solution

This PR introduces composable skills with static type checking, applying principles from functional programming-based hardware synthesis (FCCM 2014) to LLM agent orchestration:

Hierarchical composition (L1 Atomic → L2 Composite → L3 Workflow)
Typed inputs/outputs with static compatibility checking
Epistemic requirements to prevent hallucination
MECE skill definitions eliminating tool selection ambiguity

Key Benefits

Deterministic execution: Composition field declares dependencies; no runtime choice
Focused context: Sub-agents receive only typed inputs
Build-time error detection: Type-checking catches composition errors before execution
Verifiable outputs: Epistemic constraints enforce cited, reasoned responses

Core Features Implemented

New Frontmatter Fields:

`level` (1-3): Composition hierarchy tier
`operation` (READ/WRITE/TRANSFORM): Safety classification
`composes` (string[]): Skill dependencies
`inputs` / `outputs`: Typed parameters with schemas

Field Schema Properties:

`requires_source`: Mandate citations
`requires_rationale`: Mandate explanations
`range`, `min_length`, `min_items`: Output constraints

CLI Commands:
```bash
skills-ref validate ./skills/research
skills-ref graph --format=mermaid ./skills
skills-ref typecheck ./skills
```

Architecture Overview

Three-level composition hierarchy:

Level 3: Workflows (complex orchestration, loops, recursion)
Level 2: Composites (combined operations for common patterns)
Level 1: Atomics (single-purpose operations)

Showcase: Trip Optimizer

Complete example with 12 skills across 3 levels, demonstrating:

Self-recursion
L3 → L3 composition
Typed inputs/outputs with constraints
Microeconomic optimization patterns

Industry Validation

The problems addressed align with documented challenges:

Challenge	Source
"MCP lacks built-in workflow concept"	a16z Deep Dive
"Anthropic outlines proposal but provides no code"	Simon Willison
"Reduce unnecessary non-determinism... using more code components"	Barry Zhang (Anthropic)
"By writing explicit orchestration logic, Claude makes fewer errors"	Anthropic Engineering

Gateway MCP Pattern

Typed composition applies across the stack:

Client-side: User's agent selecting skills
Gateway layer: Routing and aggregating across MCP servers (Microsoft MCP Gateway)
Server-side: Internal LLM reasoning within MCP servers

The FP-to-Hardware Parallel

This architecture draws from research on compiling functional programs to parallel hardware (FCCM 2014):

FP → Hardware	Composable Skills
Higher-order functions as parameters	L3 workflows compose L2/L1 skills
Latency-insensitive interfaces	Typed inputs/outputs
Robust type-checking	Static composition validation
Composable RTL modules	Skills with typed schemas
Isolated state	Sub-agents with isolated context

Files Changed

Core Implementation:

`skills-ref/src/skills_ref/models.py` - Type definitions
`skills-ref/src/skills_ref/parser.py` - YAML parsing
`skills-ref/src/skills_ref/validator.py` - Type checking
`skills-ref/src/skills_ref/cli.py` - CLI commands

Documentation & Examples:

`docs/architecture.mdx` - Full architecture documentation
`examples/_showcase/trip-optimizer/` - 12-skill showcase
`examples/_workflows/deep-research/` - Recursive research example

Tests:

`skills-ref/tests/test_validator.py` - 17 new type-checking tests

Test Results

All 103 tests pass successfully.

Backwards Compatibility

All new fields are optional. Existing skills work unchanged. This is an opt-in enhancement for teams requiring deterministic composition; adoption can be incremental.

Reference library for Agent Skills with CLI and Python API: - validate: Check skill directories for valid SKILL.md with proper frontmatter - read-properties: Parse and output skill properties as JSON - to-prompt: Generate suggested <available_skills> XML for agent prompts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* Document reference SDK * Update docs/integrate-skills.mdx Co-authored-by: Keith Lazuka <[email protected]> --------- Co-authored-by: Keith Lazuka <[email protected]>

Introduces the ability to build complex behaviours from simpler skills, like higher-order functions in functional programming. New frontmatter fields: - level: Composition tier (1=Atomic, 2=Composite, 3=Workflow) - operation: Safety classification (READ/WRITE/TRANSFORM) - composes: List of skill dependencies Benefits: - Reusability: Write atomic skills once, compose everywhere - Testability: Each level tested independently - Safety: READ/WRITE separation propagates upward - Transparency: Explicit dependency graph via composes field Changes: - docs/architecture.mdx: Full design rationale and patterns - docs/specification.mdx: New field documentation - skills-ref/models.py: SkillLevel, SkillOperation enums - skills-ref/parser.py: Parse new fields - skills-ref/validator.py: Validate composability fields - examples/: Working examples at all three levels Backwards compatible - skills without new fields work unchanged.

This commit enhances the composability extension with powerful graph analysis tools: - New graph module (skills_ref.graph): - CompositionGraph: Build and analyze skill dependency graphs - Circular dependency detection using DFS - Missing dependency validation - Level hierarchy violation warnings - ASCII and Mermaid diagram generation - JSON export with statistics - New CLI command (skills-ref graph): - Visualize skill composition in ASCII, Mermaid, or JSON - Built-in validation for cycles and missing dependencies - Supports single skills or entire skill directories - Enhanced exports in __init__.py: - CompositionGraph, GraphAnalysis, SkillNode - validate_composition convenience function - Bug fixes: - Parser now correctly coerces level strings to integers - Validator handles string-to-int coercion for strictyaml compatibility - Comprehensive test coverage (84 tests total): - test_graph.py: 24 tests for graph analysis - test_validator.py: Extended with composability field tests Visual diagram added to architecture.mdx showing composition hierarchy.

Recursion is a fundamental pattern in functional programming that should be supported in composable skills. This commit clarifies the distinction: ALLOWED - Self-recursion (a skill composing itself): - Enables divide-and-conquer algorithms with minimal code - Supports dynamic parallelisation of sub-agents - Reduces context consumption through concise recursive definitions - Follows established functional programming principles PROHIBITED - Circular dependencies (A → B → A): - Create ambiguous execution order - Prevent static analysis of composition graph - Indicate design flaws (skills should compose downward) Changes: - Updated detect_cycles() to skip self-references - Added comprehensive tests for self-recursion scenarios - Documented the distinction in architecture.mdx with examples

- Add deep-research example demonstrating self-recursion pattern - Documents divide-and-conquer, parallelisation, and context efficiency benefits - Fix level hierarchy check to not warn about self-recursion - Update examples README to highlight recursion capability The deep-research skill shows how a single recursive definition can replace multiple hardcoded depth levels (research-depth-1, research-depth-2, etc.) while enabling natural parallelisation of sub-agents.

edu-ap · 2025-12-18T23:21:55Z

cc @dsp @simonw @jspahrsummers

This PR addresses the non-deterministic tool selection problem that becomes critical as MCP tool registries scale beyond 50-100 definitions. The core insight: when tools have semantic overlap, LLMs choose inconsistently across invocations, not due to temperature, but attention mechanics over large, similar definitions.

The solution is a hierarchical composition system that reduces decision complexity at each level:

Level 1 (Atomic): Single operations like web-search
Level 2 (Composite): Combined operations like research = web-search + pdf-save
Level 3 (Workflow): Orchestration with decision logic

This keeps tool sets small at each decision point, enables self-recursion for divide-and-conquer patterns, and propagates safety classifications (READ/WRITE/TRANSFORM) through the hierarchy.

Would welcome feedback on the architecture, particularly whether this aligns with where you see MCP tooling evolving.

- Move from _composite to _workflows directory - Change level from 2 to 3 - Compose 'research' instead of atomics directly (web-search, pdf-save) - This maintains proper hierarchy: L3 → L2 → L1 - Eliminates semantic overlap at Level 2 Recursion is a form of orchestration (deciding when to recurse, when to stop), which correctly places it at Level 3 (workflows).

- Add decision flowchart for determining skill level - Add Level Criteria Summary table - Document L3 patterns: recursion, loops, dynamic dispatch, fan-out, state - Add L3 → L3 composition section with quarterly-review example - Fix deep-research example to show Level 3 (recursion = workflow) - Clarify that deep-research composes research (L2), not atomics This makes the level classification testable and unambiguous.

…itecture Complete example showcasing: - Fan-out parallelization (evaluate 12 destinations simultaneously) - Expected value optimization (prioritize high-potential options) - Gradient descent refinement (local search around best option) - Self-recursion (trip-optimize and option-explore recurse) - L3 → L3 composition (trip-optimize calls option-explore) - Early termination (stop when marginal return < marginal cost) - Binary constraint filtering (eliminate infeasible options first) - MECE compliance (clear level separation) Microeconomic concepts applied: - Expected value, marginal cost/return, opportunity cost - Pareto frontier for final recommendations - Game-theoretic compute efficiency Skills included: - Level 3: trip-optimize, option-explore - Level 2: destination-evaluate, route-price, feasibility-check - Level 1: flight-search, hotel-search, weather-fetch, visa-check, activity-search, calendar-read

Features: - FieldSchema dataclass for typed inputs/outputs with epistemic requirements - TypeDefinition for custom types (future expansion) - Parse inputs/outputs from YAML frontmatter - Validate field schemas (type, range, requires_source, requires_rationale) - typecheck_composition() validates type compatibility between composed skills - New CLI command: skills-ref typecheck Type checking catches: - Input type mismatches between parent and child skills - Output type mismatches in composition chains - Missing composed skill dependencies - Invalid ranges (min > max) Supports type widening (integer → number, datetime → date) and the 'any' escape hatch for flexibility. Documentation added to architecture.mdx explaining the type system, primitive types, epistemic requirements, and CLI usage. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

- Add FP-to-hardware parallel table explaining architectural principles - Reference FCCM 2014 paper on compiling higher-order functional programs - Add PR_BODY.md with comprehensive PR description The architecture applies the same principles that made FP-based hardware synthesis tractable to LLM agent orchestration: type-checked composition, latency-insensitive interfaces, and isolated execution contexts. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

The comprehensive PR description is now the actual body of PR agentskills#13, not a separate file in the repository. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

- README.md: Add Static Type System section with epistemic requirements - README.md: Add Trip Optimizer Showcase to Getting Started - architecture.mdx: Add Acknowledgements for FCCM 2014 co-authors - Create CHANGELOG.md to track composable skills release Ensures valuable context from PR agentskills#13 description persists in repo files after merge. Type system, theoretical foundation, and acknowledgements now live in their canonical locations. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

edu-ap · 2025-12-19T02:51:18Z

cc @fabiopelosin - This PR directly addresses the concerns you raised in #11 ("Skill composition without context bloat").

How this PR maps to your proposal:

Your Requirement	Implementation in This PR
"Skills need to declare what they accept and return"	`inputs`/`outputs` fields with typed schemas
"Outputs from one skill can safely become inputs to another"	Static type checking via `skills-ref typecheck`
"Validate input/output contracts"	Type compatibility rules with widening (integer→number)
"Separate reasoning plane from data/execution plane"	L3 workflows orchestrate L1/L2 atomics deterministically
"Thousands of rows flow between operations but never enter model context"	Sub-agents receive typed inputs, not accumulated history

Additional features addressing your concerns:

Epistemic requirements - requires_source, requires_rationale prevent models from inventing data
Range constraints - range: [0, 1] for numeric outputs
Build-time validation - Catch composition errors before runtime

Showcase example: The trip-optimizer demonstrates all 3 levels with typed contracts.

Would welcome your feedback on whether this addresses your design questions, particularly around whether composition should be spec-level (as implemented here) vs runtime-only.

edu-ap · 2025-12-19T02:52:33Z

Community Validation: Related Issues in anthropics/skills

Researching the broader ecosystem, I found two open issues in anthropics/skills that validate the need for this PR:

anthropics/skills#150: Allow "dependencies" in skill metadata

"There's a discrepancy between documentation and implementation regarding skill dependencies... the validation script defines ALLOWED_PROPERTIES = {'name', 'description', 'license', 'allowed-tools', 'metadata'} - the 'dependencies' field is not included."

How this PR addresses it: Our composes field provides explicit dependency declarations with the additional benefit of static type checking across the composition chain.

anthropics/skills#132: Make skills spec future proof

"SKILLS should be defined as a contract, not an implementation... support query-based skill discovery for systems managing thousands of skills."

How this PR addresses it:

Typed contracts: inputs/outputs with schemas define the skill's interface
Hierarchical discovery: L1→L2→L3 levels enable filtering by composition tier
Static validation: skills-ref typecheck validates contracts before runtime

These upstream issues demonstrate that the community is independently arriving at the same conclusions: skills need dependency management and typed contracts. This PR provides a concrete, backwards-compatible implementation.

edu-ap · 2025-12-19T03:11:46Z

cc @gattimassimo @rahimnathwani @omnisci3nce - Given your engagement with Issue #11, you may find this PR relevant.

TL;DR: This PR implements the composable skills architecture that @fabiopelosin described, with:

Explicit composition: composes field declares skill dependencies
Typed contracts: inputs/outputs with schemas (string, number, date, custom types)
Static validation: skills-ref typecheck catches errors before runtime
Epistemic requirements: requires_source, requires_rationale prevent hallucination

The trip-optimizer showcase demonstrates all 3 composition levels with 12 skills.

Would welcome your thoughts on whether this addresses the composition challenges you've encountered.

edu-ap · 2025-12-19T03:11:50Z

cc @elmariachi111 @remygendron - Your issues in anthropics/skills are directly relevant here:

@elmariachi111 (#150): You requested a dependencies field in skill metadata. This PR implements composes which provides exactly that, plus static type checking to validate the dependency chain:

composes:
  - web-search
  - pdf-save

@remygendron (#132): You proposed "SKILLS should be defined as a contract, not an implementation." This PR adds typed inputs/outputs that define the skill's contract:

inputs:
  - name: query
    type: string
    required: true
outputs:
  - name: answer
    type: string
    requires_source: true

The skills-ref typecheck command validates these contracts at build time, enabling the "query-based skill discovery" and "versioning for approval workflows" you described.

Would value your input on whether this approach addresses your use cases.

edu-ap · 2025-12-19T03:13:28Z

cc @Christian-Blank - Your work on task orchestration methodology articulates the same problem this PR addresses. Specifically, your principle:

"We automate what we can deterministically, use AI for bounded determinism, and keep humans for high-value judgment."

This PR implements that layered approach for skills:

Your Principle	This PR's Implementation
"Automate deterministically"	L1 Atomics: single-purpose, typed operations
"AI for bounded determinism"	L2/L3 composition with static type checking
"Agents propose events rather than directly applying state changes"	`composes` declares dependencies; runtime executes deterministically
"Idempotency through separation of concerns"	Typed contracts prevent cascade failures

Your event-sourcing pattern (.ledger.jsonl) and our typed inputs/outputs serve the same goal: making agent coordination auditable and reproducible.

Related industry validation: Salesforce's Agent Graph work calls this "guided determinism" - separating LLM reasoning from explicit choreography. Our composes field + type system achieves this at the skill level.

Would welcome your perspective on whether this aligns with the SyntropicSystems vision.

edu-ap · 2025-12-19T10:07:39Z

Alignment with @maheshmurag's Vision for Agent Skills

In his AI Engineer Summit talk and Anthropic engineering blog, Mahesh articulated core challenges this PR addresses:

Problems Identified by Mahesh

Challenge	Mahesh's Statement	This PR's Solution
Tool selection at scale	"We can give the same agent a library of hundreds or thousands of skills"	`level` field enables hierarchical filtering (L1/L2/L3)
Context window efficiency	"Agents don't need to read the entirety of a skill into their context window"	`composes` declares dependencies without loading full content
Determinism over token generation	"Applications require the deterministic reliability that only code can provide"	Static type checking via `skills-ref typecheck`
Progressive disclosure	"Load information only as needed"	Typed `inputs`/`outputs` define contracts without full skill loading

Key Quote

"Sorting a list via token generation is far more expensive than simply running a sorting algorithm."

This insight applies directly to skill composition: declaring dependencies statically (composes field) is more reliable than letting the LLM discover and select tools at runtime.

The "One Agent, Many Skills" Architecture

Mahesh advocates for "one universal agent powered by domain-specific skills" rather than multi-agent orchestration. Our hierarchical composition (L1 Atomic → L2 Composite → L3 Workflow) implements exactly this: a single agent selects high-level skills, and composition handles the rest deterministically.

numman-ali · 2025-12-19T19:51:28Z

What you're actually describing is MCPs executed in code

Skills inherently are of non-deterministic nature and through means of the sandbox environment they run in and the apis they expose via scripts, is what tightens that non-determinism

Hate to break it to you but this is too dense a PR where there already is a widely regarded solution to solve this problem:

https://www.anthropic.com/engineering/code-execution-with-mcp

That is where you should consider your efforts IMO

rahimnathwani · 2025-12-19T21:32:55Z

@numman-ali Is there an implementation of this pattern that you'd recommend?

Maybe one of these two?

numman-ali · 2025-12-19T21:49:34Z

@rahimnathwani I am still spending time deciding on an approach but at a minimum, any MCP I want to use, I simply extra the API commands I care about into a skill using skill-creator:
https://github.com/anthropics/skills/tree/main/skills/skill-creator

here are some additional links to explore:

https://github.com/steipete/mcporter

https://github.com/universal-tool-calling-protocol/code-mode

https://github.com/jx-codes/lootbox

If you tell me your use case, I can likely give better advice

edu-ap · 2025-12-19T22:37:17Z

Thanks for engaging, @numman-ali. Let me make sure I understand your position before responding:

Your argument (as I understand it):

Skills are "inherently non-deterministic" by nature
MCP code execution (sandbox + APIs) is the established solution
This PR is "too dense" given that solution exists

I appreciate the directness. Here's where I'd respectfully push back:

1. "Skills inherently are of non-deterministic nature": is this physics, or an arbitrary philosophical choice?

Barry Zhang (Anthropic, co-creator of Agent Skills) wrote in Making Peace with LLM Non-determinism:

"Reduce unnecessary non-determinism... Some examples are seeding, greedy sampling, and using more code components."

The skill-creator skill itself says:

"Low freedom (specific scripts, few parameters): Use when operations are fragile and error-prone, consistency is critical."

The question isn't whether skills CAN be non-deterministic; it's whether they SHOULD be for production workflows. This PR provides the "code components" Barry recommends, but at the composition layer.

2. MCP Code Execution has documented limitations

From Anthropic's engineering post and the a16z deep dive:

Limitation	Source
"MCP lacks a built-in workflow concept" for multi-step tasks	a16z
Tool selection is unsolved: "Does everyone need to implement their own RAG for tools?"	a16z
No reference implementation: as Simon Willison observed, "Anthropic outline the proposal in some detail but provide no code to execute on it"	Simon Willison
Debugging: "Each MCP client has its own quirks, traces missing or hard to find"	a16z

Static type checking at composition time catches errors BEFORE code generation, reducing the debugging burden.

3. This is fully optional and backwards-compatible

To be clear: this PR doesn't change how existing skills work. All new fields (level, operation, composes, inputs, outputs) are optional. Skills without these fields continue to function exactly as before.

This is an opt-in enhancement for teams who need deterministic composition. If your use case doesn't require it, you can ignore these features entirely. The spec explicitly states: "Teams can adopt composability incrementally."

4. This isn't just about client-side tool calling

You may be viewing this from the client perspective. But consider the gateway MCP pattern, which is increasingly common in production:

┌─────────────────────────────────────────────────────────────────┐
│                         HOST / CLIENT                           │
│  ┌─────────┐                                                    │
│  │   LLM   │  ◄── Needs typed skill contracts for reasoning     │
│  └────┬────┘                                                    │
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────┐                                            │
│  │  Gateway MCP    │  ◄── Aggregates tools, routes requests     │
│  │  (Proxy/Router) │      Needs composition graph for routing   │
│  └────┬───────────┬┘                                            │
│       │           │                                             │
│       ▼           ▼                                             │
│  ┌─────────┐ ┌─────────┐                                        │
│  │ MCP     │ │ MCP     │  ◄── May contain LLMs processing       │
│  │ Server A│ │ Server B│      requests internally               │
│  └─────────┘ └─────────┘                                        │
└─────────────────────────────────────────────────────────────────┘

Why gateways need typed composition:

Gateway Challenge	How Typed Skills Help	Source
Request routing by tool namespace	`composes` field declares explicit dependencies	Solo.io
Aggregating tool lists from downstream servers	`level` field enables hierarchical filtering	Gravitee
Policy enforcement per tool category	`operation` (READ/WRITE) enables safety policies	Traefik
Deterministic routing under load	Composition graph = explicit execution path	Microsoft MCP Gateway

Skill composability applies at every layer:

Client-side: User's agent selecting skills
Gateway: Routing and aggregating across MCP servers (mcp-proxy, pluggedin-mcp-proxy)
Server-side: Internal LLM reasoning within each MCP server

A typed composition graph is a "mental model" LLMs can use at ALL these layers, not just a runtime constraint. Is it possible you're considering this only from the user/client side rather than more holistically?

5. Genuine question: What's the most complex production system you've built with MCP code execution?

I ask sincerely because I'd love to understand how you've achieved mission-critical reliability. Specifically:

How many chained tool calls across multiple MCP servers?
What's your error rate in production?
How do you handle the debugging challenges a16z describes?

@rahimnathwani asked a similar question and you mentioned you're "still spending time deciding on an approach." That's fair; this is new territory. But that's precisely why exploring multiple approaches, including typed composition, benefits the community.

The core disagreement

If we accept that skills must be non-deterministic, we're making an arbitrary philosophical choice that limits what agents can reliably accomplish. The SyntropicSystems methodology captures this:

"We automate what we can deterministically, use AI for bounded determinism, and keep humans for high-value judgment."

This PR implements that layered approach as an optional extension. My mission is to solve for LLM reliability and efficiency by providing "mental models" that help LLMs reason better than their present "best efforts."

- Add Industry Recognition section to README with a16z, Simon Willison, and Barry Zhang citations - Add References section to README with FCCM 2014 paper and key sources - Add Industry Validation section to architecture.mdx with detailed table - Add Gateway MCP Pattern section with ASCII diagram and source table - Add Barry Zhang quote on reducing non-determinism with code components - All sources properly cited with links These additions strengthen the case for composable skills by showing: 1. Industry recognition of the problems being solved 2. Gateway MCP pattern applicability beyond client-side 3. Theoretical foundation in FP-to-hardware synthesis

numman-ali · 2025-12-19T23:02:00Z

Dude @edu-ap sorry, but if you're going to reply with a wall of text it's obvious you're using a tonne of AI

You're not being thoughtful in your response - this is an open source protocol, people give their spare time to contribute

I read your first point and saw thats a blog post from April 2024, things change

Please come with a hand written response, based on your personal experience, and of real world user experience not copy paste of random facts

edu-ap · 2025-12-19T23:10:29Z

Thanks for the pushback, @numman-ali. Point taken on the HOW; but I'd argue it shouldn't distract from the WHAT.

I took a look at openskills; nice work on the universal CLI loader!

I see these as complementary: openskills solves distribution (getting skills to agents); this PR addresses composition (what skills contain and how they fit together). A typed composition graph could actually help loaders like openskills optimise what to load and when.

Would you consider opening PRs to agentskills for:

Directory structure conventions (references/, scripts/, assets/)
Progressive disclosure guidance in the spec

The format benefits from diverse perspectives. Your loader experience is valuable here.

- Add to Industry Recognition section in README.md - Add to References section in README.md - Add to Industry Validation table in architecture.mdx Key quote: "By writing explicit orchestration logic, Claude makes fewer errors" This directly validates the composition layer approach.

maheshmurag · 2025-12-20T02:44:40Z

@edu-ap so sorry, had to force push and this PR got auto-closed. You may need to rebase and re-push it, and link back to this for discussion. Apologies for the hassle!

edu-ap · 2025-12-20T13:16:38Z

No worries, it happens. I'll do that tomorrow 🙏🏼

The comprehensive PR description is now the actual body of PR agentskills#13, not a separate file in the repository. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

- README.md: Add Static Type System section with epistemic requirements - README.md: Add Trip Optimizer Showcase to Getting Started - architecture.mdx: Add Acknowledgements for FCCM 2014 co-authors - Create CHANGELOG.md to track composable skills release Ensures valuable context from PR agentskills#13 description persists in repo files after merge. Type system, theoretical foundation, and acknowledgements now live in their canonical locations. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

edu-ap · 2025-12-21T01:52:15Z

@maheshmurag Following your note about the force push, I've rebased and opened a new PR:

→ PR #29: Quasi-deterministic LLM agents through composable skills with static type checking

All discussion from this PR is preserved here for reference. The new PR includes updated acknowledgements and links to community validation from related issues.

Thanks for the heads up!

klazuka and others added 11 commits December 16, 2025 10:48

init

656839f

Add documentation

a24a2b6

Document reference SDK (agentskills#3)

df6282f

* Document reference SDK * Update docs/integrate-skills.mdx Co-authored-by: Keith Lazuka <[email protected]> --------- Co-authored-by: Keith Lazuka <[email protected]>

Wording updates (agentskills#4)

6981ef7

Replace Block logo with Goose logo in docs (agentskills#5)

dcf52bc

Make docs mobile-responsive (agentskills#6)

eda5d47

edu-ap and others added 6 commits December 18, 2025 23:29

Add .venv to gitignore

dcd815d

Update trip-optimizer: Add ski option and driving distance preference

6c2fc3c

edu-ap force-pushed the feature/composable-skills-architecture branch from a910d53 to d1d32d1 Compare December 19, 2025 01:24

claude and others added 2 commits December 19, 2025 01:43

Remove PR_BODY.md - content now in PR description

f860f3d

The comprehensive PR description is now the actual body of PR agentskills#13, not a separate file in the repository. Co-authored-by: Eduardo Aguilar Pelaez <[email protected]> Co-authored-by: Claude <[email protected]>

edu-ap changed the title ~~feat: Add composable skills architecture with hierarchical composition and recursion support~~ feat: Composable skills with static type checking — applying FP-to-hardware principles to LLM agents Dec 19, 2025

edu-ap changed the title ~~feat: Composable skills with static type checking — applying FP-to-hardware principles to LLM agents~~ feat: Deterministic LLM agents through composable skills with static type checking Dec 19, 2025

edu-ap mentioned this pull request Dec 19, 2025

Discussion: Skill composition without context bloat #11

Closed

edu-ap force-pushed the feature/composable-skills-architecture branch from e3f77ed to 1197719 Compare December 20, 2025 01:12

maheshmurag closed this Dec 20, 2025

maheshmurag force-pushed the main branch from 8baf9c5 to 75287b2 Compare December 20, 2025 02:41

edu-ap mentioned this pull request Dec 20, 2025

feat: Type-Safe Skill Composition - Compile-Time Validation #29

Closed

feat: Deterministic LLM agents through composable skills with static type checking #13

feat: Deterministic LLM agents through composable skills with static type checking #13

Uh oh!

Conversation

edu-ap commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Executive Summary

The Problem

The Solution

Key Benefits

Core Features Implemented

Architecture Overview

Showcase: Trip Optimizer

Industry Validation

Gateway MCP Pattern

The FP-to-Hardware Parallel

Files Changed

Test Results

Backwards Compatibility

Uh oh!

edu-ap commented Dec 18, 2025

Uh oh!

edu-ap commented Dec 19, 2025

Uh oh!

edu-ap commented Dec 19, 2025

Community Validation: Related Issues in anthropics/skills

anthropics/skills#150: Allow "dependencies" in skill metadata

anthropics/skills#132: Make skills spec future proof

Uh oh!

edu-ap commented Dec 19, 2025

Uh oh!

edu-ap commented Dec 19, 2025

Uh oh!

edu-ap commented Dec 19, 2025

Uh oh!

edu-ap commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alignment with @maheshmurag's Vision for Agent Skills

Problems Identified by Mahesh

Key Quote

The "One Agent, Many Skills" Architecture

Uh oh!

numman-ali commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rahimnathwani commented Dec 19, 2025

Uh oh!

numman-ali commented Dec 19, 2025

Uh oh!

edu-ap commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. "Skills inherently are of non-deterministic nature": is this physics, or an arbitrary philosophical choice?

2. MCP Code Execution has documented limitations

3. This is fully optional and backwards-compatible

4. This isn't just about client-side tool calling

5. Genuine question: What's the most complex production system you've built with MCP code execution?

The core disagreement

Uh oh!

numman-ali commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edu-ap commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maheshmurag commented Dec 20, 2025

Uh oh!

edu-ap commented Dec 20, 2025

Uh oh!

edu-ap commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

edu-ap commented Dec 18, 2025 •

edited

Loading

edu-ap commented Dec 19, 2025 •

edited

Loading

numman-ali commented Dec 19, 2025 •

edited

Loading

edu-ap commented Dec 19, 2025 •

edited

Loading

numman-ali commented Dec 19, 2025 •

edited

Loading

edu-ap commented Dec 19, 2025 •

edited

Loading