Augment tested dozens of AGENTS.md files against their AuggieBench and published hard numbers. The sweet spot is 100β150 lines in a hub file that references deeper docs via relative paths. Referenced files are discovered 90% of the time; orphan docs sitting in the repo? Under 10%.
The most counter-intuitive finding: the same instruction block can boost one task by +25% while tanking another by β30%. Lists of "don'ts" without matching "do" alternatives actively hurt. The #1 failure mode is overexploration β too much architecture context causes the agent to read 12 files and burn 80K tokens before writing a single line.
Microsoft shipped an Agent Package Manager (2,145β) with a five-layer architecture:
manifest β resolve β security gate β compile β install.
The killer insight is the compilation step: the same skill primitives are transformed
into per-client output β AGENTS.md, CLAUDE.md, Gemini format β at install time.
Security is baked in: a Unicode injection scanner blocks tag characters, bidi overrides, and
variation selectors before anything enters the agent's context window. Enterprise governance
via apm-policy.yml with tighten-only inheritance.
skill-ecosystem wiki card.
codejunkie99/brain treats git as an event log: each memory is a JSON blob + commit.
SQLite FTS5 is a rebuilt cache, never the source of truth. It introduces a
bitemporal model (time_observed vs time_recorded) β the first agent memory system
to make this distinction β and 10 typed events across 6 cognitive layers
(Working β Episodic β Semantic β Personal β Skill β Protocol).
Authority scoring (source kind + score 0β100) means not all memories are equal trust. A secret prefilter runs RegexSet scans before git commit. Prevention > detection. Deliberately no LLM consolidation β raw events + search, let the agent synthesize.
source: human|self|study|review|env
to beliefs-candidates with differentiated graduation thresholds (human corrections at 2Γ, others at 3Γ),
(2) installed pre-commit secret scanning hooks on workspace + wiki repos (12 regex patterns).
Hermes (113kβ123kβ, +10k in 5 days) made two architectural moves worth studying.
First, BOOT.md β hooks migration: startup behavior moved from hardcoded
AIAgent() calls to user-configurable hooks. The old pattern caused 401 errors on
every gateway start because built-in behaviors ran unconditionally.
Second, tool definition memoization using a composite cache key:
(frozenset(enabled), frozenset(disabled), registry._generation, config.mtime+size)
with a 30s TTL for external state probes. Result: 7.5ms β 0.01ms per turn β a 750Γ improvement.
The generation counter bumps on registry mutation, making invalidation precise.
nexu-io/open-design (1,902β in 1 day!) extends SKILL.md with od: frontmatter β
typed fields for mode, inputs, parameters, and design system sections. It's the 5th independent project
to adopt the SKILL.md format, joining Claude Code skills, thClaws, venice/skills, and APM.
Two novel patterns emerged: (1) token-efficient section pruning via
od.design_system.sections β only inject relevant design system fragments into context,
and (2) the question-form pattern where the LLM emits structured <question-form>
XML that the app renders as interactive UI. This inverts the usual context-file pattern:
the agent generates UI schema, not just consumes it.
thin-harness-fat-skills and agent-context-files wiki cards.