← Browse all 10 skills-check commands
Analysis & Verification

test

Execute eval test suites that verify skills actually work when loaded by an agent. Define test cases in cases.yaml with prompts, expected outcomes, and graders. Track baselines to catch regressions after refresh.

Why it matters

A skill can have perfect metadata and pass every lint check, but still produce wrong code when an agent uses it. Test closes this gap by actually running prompts through an agent harness and grading the output — like integration tests for your skill files.

What it does

  • Discovers tests/ directories inside skill directories containing cases.yaml
  • Parses declarative test suites with trigger, outcome, style, and regression test types
  • Executes prompts through configurable agent harnesses (Claude Code CLI, generic shell)
  • Grades results with 7 built-in graders: file-exists, command, contains, not-contains, json-match, package-has, llm-rubric
  • Supports custom graders via dynamic module import
  • Runs multiple trials per test case with configurable pass thresholds and flaky test detection
  • Stores baselines for regression tracking across skill updates

Usage

npx skills-check test [dir] [options]

Options

FlagDescription
-s, --skill <name>Test a specific skill
-t, --type <type>Filter: trigger, outcome, style, or regression
--agent <name>Agent harness: claude-code or generic
--trials <n>Number of runs per test case
--dryPreview test plan without executing
--update-baselineSave results as new baseline
--ciCI mode with strict exit codes
-f, --format <type>Output: terminal or json
--isolation <provider>Run in an isolated container (auto, docker, podman, apple-container, vercel-sandbox, etc.)
--no-isolationDisable isolation and accept the risk of running directly on the host

Examples

Run all tests

npx skills-check test

Test one skill

npx skills-check test -s ai-sdk-core

Outcome tests only

npx skills-check test --type outcome

Preview plan

npx skills-check test --dry

Update baseline

npx skills-check test --update-baseline

Test in isolation

npx skills-check test --isolation auto

When to use this

  • After refreshing a stale skill
  • To detect regressions in agent behavior
  • Before major version releases

Related commands

  • budgetMeasure cost, then verify behavior
  • refreshRefresh a skill, then verify behavior

CI tip

Run test --ci after refresh to catch regressions. Use --update-baseline on main after verified changes so future PRs compare against the latest known-good results.