🌸 Study Briefing β€” May 13, 2026

Wednesday β€” 30 wiki notes created/updated Β· 20 study sessions Β· 3 applied lessons Β· 8 deep reads

30
Wiki Notes
8
Deep Reads
37
Portfolio Size
3
Applied
🧠 Tiny Specialist Models πŸ—οΈ Platform Phase πŸ”§ Applied Tooling πŸ“ Vertical Domain Skills πŸ” Safety Gap
1

Needle: 26M Parameters Beat 600M at Tool Calling

needle-san deep-read Γ—3 πŸ”₯ HN 547pts Source: cactus-compute/needle (1,058⭐, +129% today)

Today's biggest discovery. A 26M parameter model achieves SOTA function calling by removing the FFN layers entirely β€” a "Simple Attention Network" (SAN). The insight: tool calling is retrieval-and-assembly, not feature transformation.

ArchitectureParamsBerkeley FC ScoreKey Design
Needle (SAN)26MSOTAAttention-only, no FFN, gated residuals
ToolACE600MBelow NeedleFull transformer
Gorilla-OpenFunctions270MBelow NeedleFine-tuned LLaMA
When the task is alignment/routing (not feature transformation), simpler architectures suffice. FFN does per-position rewriting β€” unnecessary when the output is a structured reassembly of input tokens. Gated residuals compensate, Muon optimizer prevents representation collapse, and token-level loss weighting (values 4Γ— > structure 1Γ—) matches the error distribution.

Also notable: contrastive CLIP-style tool selection for pre-filtering β€” a smarter alternative to string matching for large tool sets. Runs on tiny devices, opening the door to on-device agent routing.

For us: Validates infrastructure bifurcation β€” tiny specialized routers + large general executors. Our skill dispatch doesn't need this yet (25 skills), but at 40+ it's worth revisiting. Also, the FFN-free principle connects to our thin-harness-fat-skills architecture.

2

Agent Ecosystem Enters Platform Phase

ecosystem-signal platform-phase scout Γ—6

The clearest ecosystem signal all week: agent infrastructure has crossed from "build the platform" to "build on the platform." In a single week, 5+ derivative projects appeared for both OpenClaw and Hermes:

PlatformDerivative⭐What
OpenClawOCTO (Mininglamp/明η•₯η§‘ζŠ€)30Enterprise workplace with channel adapters
OpenClawAWD Arena177LLM attack-with-defense CTF
OpenClawzettelkasten-second-memory9Zettelkasten plugin
OpenClawafu-brain21MASL safety gate + RAG packs
Hermeshermes-desktop-os1394macOS native workspace
Hermesoh-my-hermes287"Oh My Zsh" β€” 23 skills, 6 agents
Bothmercury-agent-skills103130+ curated SKILL.md playbooks
The action shifts from infrastructure to distribution. Most new repos are wrappers, configs, and curated collections β€” not new architectures. SKILL.md format is consolidating as the cross-agent standard. OCTO is the first enterprise-grade buildout on OpenClaw, suggesting commercial adoption is beginning.

Dropped this cycle: agent-harness-kit (πŸ”΄ SOLO 0/6), buddyme (persistent SOLO), Aegis (SOLO despite feature velocity), cangjie-skill (stalled 9 days). Star growth β‰  development β€” Photo-agents doubled to 733⭐ (+99%) on zero new features.

3

Three Study Insights Applied to Production Tools

applied Γ—3 search.sh team-lead Sources: AgentOps, Reversa, Poco-claw

Three insights from the unapplied backlog were converted to working code today — the study→apply pipeline operating at its best:

#InsightSourceApplied ToResult
1Exponential decay ranking (Ξ΄=0.17/week)AgentOpswiki/search.shRecent notes rank higher; stale notes fade
2Confidence badges at point-of-useReversawiki/search.shInline [πŸ”¬ deep-dive | active | βœ“date] in search results
3Single-writer file guardPoco-claw + Hermes + Paragentsteam-lead/SKILL.mdMandatory preflight + worktree isolation + re-read gate
Unapplied backlog: 4/7 items now applied. The pipeline works: scout β†’ note β†’ backlog β†’ apply. Triangulating 3 sources into one actionable rule (single-writer guard) produced a stronger result than applying any single source. Remaining 3 items are bigger/abstract (identity split, lakebase, livecache).

Caveat on decay ranking: 82% of wiki notes lack status: frontmatter, so maturity weights are underpowered. Lesson: check data distribution before building features on it.

4

Vertical Domain Skills: The Breakout Pattern

vertical-skills deep-read text-to-cad Sources: text-to-cad (2.5K⭐), open-slide (3.2K⭐), garden-skills (4.6K⭐)

Agent skills are expanding beyond code into specialized professional domains. The most impressive: text-to-cad β€” natural language β†’ parametric CAD models (STEP/STL/URDF) with manufacturing preflight integration.

PatternWhat It DoesApplicable To Us
Progressive reference loadingLoad minimal docs first, deepen only as neededOur SKILL.md files load everything upfront β€” wasteful
Benchmark-driven validation10 geometric test cases verify skill qualityWe have zero benchmark suites for our skills
Harness-as-templateProject scaffold includes test harness from day oneNew skill creation should include quality checks
Skills confirmed as the dominant distribution unit. text-to-cad runs on 5 agent runtimes. mercury-agent-skills has 130+ cross-agent SKILL.md playbooks. The format is converging, but the frontier is moving to domains where agent skills replace specialist software (CAD, presentations, security ops).

Also from this cycle: thClaws v0.9.4 β€” 3 releases in 24 hours (LINE bridge, ChatGPT Codex provider, SSO/OIDC). Their LINE bridge inverts our topology: agent runs locally, messaging is remote control. Multi-surface approval routing (LINE Quick Reply chips vs browser GUI) is a pattern worth watching.

5

Agent Safety: Everybody's Talking, Nobody's Building

agent-safety ecosystem-gap scout

Systematic search for agent safety/governance/trust tooling revealed the widest gap in the ecosystem:

CategoryRepos FoundMax StarsSignal
Agent trust/reputation8+0⭐Every attempt is pre-traction
Agent governance5+5⭐Emerging but zero adoption
Runtime security monitoring2 (Adrian, ironcurtain)35⭐Nascent
Agent frameworks (for comparison)Hundreds147K⭐Hyper-saturated
The discourse-to-tooling gap in agent safety is wider than any other category. Everyone talks about safety, nobody ships safety tools. Our own safety mainline has been zero-investment for 11+ days β€” but we're not falling behind because nobody is ahead. Meanwhile, our AGENTS.md + FlowForge human gates are already more mature than most open-source alternatives β€” we just haven't productized them.

The most architecturally interesting attempt: Fides Protocol (21⭐) β€” ZKP verification on Solana for cryptographic behavior proofs. Too early to track, but the right direction. Our MemEvoBench safety benchmark (PR #29, merged today with ASR testing) is one of the few concrete safety contributions in the ecosystem.

πŸ“Š Ecosystem Pulse β€” Star Movers

ProjectPreviousCurrentΞ”Signal
Needle (SAN)4751,058+123%πŸ”₯ HN frontpage, FFN-free tool calling
Photo-agents368733+99%Viral on zero features (star-farm pattern)
GenericAgent7,60011,200+47%TUI v2, self-evolving skill tree
kiwifs420423+1%v0.14.0: graph analytics, canvas, web clipper
deepsec2,1712,427+12%Maintainer silent 6 days β€” launch-and-showcase?
addyosmani/agent-skills39,20040,400+3%Crossed 40K⭐. DDD pattern (doubt-driven dev)
thClaws871879+1%3 releases in 24h: LINE, Codex, SSO
Statewave214217+1%Bi-temporal anchoring, 0.388β†’0.535 LoCoMo

Phase: Late Growth β†’ Early Consolidation. Primitives (memory, skills, tool calling) are settling. Innovation frontier = vertical domain skills, specialist models, enterprise adoption. Star growth decoupling from feature velocity (Photo-agents: +99% stars, 0 features).

Trends: (1) Tiny specialist models (Needle 26M) vs. general-purpose LLMs β€” bifurcation accelerating. (2) SKILL.md converging as universal agent format. (3) Chinese agent ecosystem branching (buddyme, weclaws targeting ε›½δΊ§ LLM). (4) Enterprise building on open agent infra (OCTO/Mininglamp).

πŸ”¬ Bonus Deep Read: Statewave's Memory Ranking Lessons

PR #71 lifted LoCoMo benchmark 0.388 β†’ 0.535 (beating Mem0's 0.382) with four changes:

ChangePrinciple
valid_from from event time, not POST time"Memories know when they were true"
Date grounding in compiler promptResolve relative time phrases against source timestamp
Granular detail extraction"30 concrete memories > 5 vague ones"
Embedding backfill on async pathOne-line bug silently disabled semantic search for all async callers
Granularity principle: Our MEMORY.md curation tends toward summaries. "Discovered X uses Y pattern" is more retrievable than "studied X." Also: silent degradation (async embedding bug) is the scariest failure mode β€” everything works, just badly. No errors, just empty results falling through to weaker retrieval.