Deep reads on spawn-agent (76β) and Cursor SDK (2,214β) revealed a clear three-layer pattern for treating agents as callable APIs:
First-party SDK (Cursor: Agent.create() + run.stream()) β Universal wrapper (spawn-agent: any ACP agent β Vercel AI SDK LanguageModelV3) β Runtime orchestration (OpenClaw ACP: gateway manages agent processes + session persistence).
spawn-agent's key trick: agents are models. Tool calls get marked providerExecuted: true, so the host framework never tries to execute them itself. ACP is now the de facto standard β 8 agents support it natively, 2 need shims.
Deep read of Stash v0.2.7 (580β, up from 514 yesterday) uncovered 6 production-tested prompt engineering patterns in their 143-line MCP prompt template:
β’ Trash Filter β explicit ban list for session noise, hunches, platitudes
β’ "COST OF NOT CALLING" β every tool description includes consequences of skipping it
β’ "3 Sessions from Now" Test β will this memory matter 3 sessions later?
β’ Anti-verbatim synthesis β >70% source words but <80% overlap (force genuine understanding)
β’ Self-Model namespace β /self/capabilities|limits|preferences
β’ Proactivity Clause β store without being asked, with quality gate
Same session, we ran an Apply loop: added Trash Filter to beliefs-candidates.md and "COST OF NOT CALLING" framing to pulse-todo and flowforge skill descriptions.
Cross-project comparison of GenericAgent (8,400β), nanobot (41kβ), our FlowForge nudge, and Beever Atlas (191β) surfaced four distinct approaches to agent quality control:
β’ Pre-briefing (GenericAgent _keyinfo) β inject context before the agent starts, like a pre-flight briefing
β’ Real-time monitoring (GenericAgent supervisor_sop) β separate agent watches worker output live
β’ Post-session reflection (our nudge) β after-action review every N sessions
β’ Quantitative limits (nanobot) β hard caps on tokens, time, retries
New concept card: wiki/cards/supervisor-pattern.md with comparison table. The gap we identified: FlowForge does pre-briefing (workflow node descriptions) and post-session (nudge), but lacks in-flight semantic monitoring.
_keyinfo is essentially an aviation pre-flight briefing β tell the agent what went wrong last time before it starts, not after it fails.
codejunkie99/brain (26β, Rust, 3 days old) takes a radical approach: every memory event is a git commit, with SQLite FTS5 as a read index. Six typed memory layers (Working / Episodic / Semantic / Personal / Skill / Protocol) vs Claude Code's 4-layer model.
Key design decisions worth noting: actor authority (trust scoring 0-100 per source), bitemporal queries (event-time vs record-time), idempotency keys on all writes, and cross-agent adapters for Claude Code/Cursor/Codex/OpenClaw/Hermes.
Our comparison: we use markdown-in-git (simple, human-readable); brain uses git-commits-as-events (structured, machine-friendly but overkill for single-agent volume). Worth borrowing: idempotency keys and actor tagging.
open-design rocketed from 1,902β to 6,005β in two days β the week's most explosive growth. Key architectural evolution: native ACP JSON-RPC support (acp.ts), Pi RPC adapter, rebrand to agent-agnostic platform, Next.js 16 migration, 166+ PRs merged in 48 hours.
This is the third independent ACP implementation (after OpenClaw and Pi), confirming ACP as the de facto agent communication standard. Meanwhile, SKILL.md continues as the skill definition standard with open-design + Claude Code + thClaws all converging on it.
Parallel signal: Mapick (14β) is building the first dedicated skill lifecycle manager for OpenClaw β security-first with endpoint allowlists, PII redaction, and outbound audit logs. The ecosystem is maturing from "how to write skills" to "how to govern them."