Wednesday β 30 wiki notes created/updated Β· 20 study sessions Β· 3 applied lessons Β· 8 deep reads
Today's biggest discovery. A 26M parameter model achieves SOTA function calling by removing the FFN layers entirely β a "Simple Attention Network" (SAN). The insight: tool calling is retrieval-and-assembly, not feature transformation.
| Architecture | Params | Berkeley FC Score | Key Design |
|---|---|---|---|
| Needle (SAN) | 26M | SOTA | Attention-only, no FFN, gated residuals |
| ToolACE | 600M | Below Needle | Full transformer |
| Gorilla-OpenFunctions | 270M | Below Needle | Fine-tuned LLaMA |
Also notable: contrastive CLIP-style tool selection for pre-filtering β a smarter alternative to string matching for large tool sets. Runs on tiny devices, opening the door to on-device agent routing.
For us: Validates infrastructure bifurcation β tiny specialized routers + large general executors. Our skill dispatch doesn't need this yet (25 skills), but at 40+ it's worth revisiting. Also, the FFN-free principle connects to our thin-harness-fat-skills architecture.
The clearest ecosystem signal all week: agent infrastructure has crossed from "build the platform" to "build on the platform." In a single week, 5+ derivative projects appeared for both OpenClaw and Hermes:
| Platform | Derivative | β | What |
|---|---|---|---|
| OpenClaw | OCTO (Mininglamp/ζη₯η§ζ) | 30 | Enterprise workplace with channel adapters |
| OpenClaw | AWD Arena | 177 | LLM attack-with-defense CTF |
| OpenClaw | zettelkasten-second-memory | 9 | Zettelkasten plugin |
| OpenClaw | afu-brain | 21 | MASL safety gate + RAG packs |
| Hermes | hermes-desktop-os1 | 394 | macOS native workspace |
| Hermes | oh-my-hermes | 287 | "Oh My Zsh" β 23 skills, 6 agents |
| Both | mercury-agent-skills | 103 | 130+ curated SKILL.md playbooks |
Dropped this cycle: agent-harness-kit (π΄ SOLO 0/6), buddyme (persistent SOLO), Aegis (SOLO despite feature velocity), cangjie-skill (stalled 9 days). Star growth β development β Photo-agents doubled to 733β (+99%) on zero new features.
Three insights from the unapplied backlog were converted to working code today β the studyβapply pipeline operating at its best:
| # | Insight | Source | Applied To | Result |
|---|---|---|---|---|
| 1 | Exponential decay ranking (Ξ΄=0.17/week) | AgentOps | wiki/search.sh | Recent notes rank higher; stale notes fade |
| 2 | Confidence badges at point-of-use | Reversa | wiki/search.sh | Inline [π¬ deep-dive | active | βdate] in search results |
| 3 | Single-writer file guard | Poco-claw + Hermes + Paragents | team-lead/SKILL.md | Mandatory preflight + worktree isolation + re-read gate |
Caveat on decay ranking: 82% of wiki notes lack status: frontmatter, so maturity weights are underpowered. Lesson: check data distribution before building features on it.
Agent skills are expanding beyond code into specialized professional domains. The most impressive: text-to-cad β natural language β parametric CAD models (STEP/STL/URDF) with manufacturing preflight integration.
| Pattern | What It Does | Applicable To Us |
|---|---|---|
| Progressive reference loading | Load minimal docs first, deepen only as needed | Our SKILL.md files load everything upfront β wasteful |
| Benchmark-driven validation | 10 geometric test cases verify skill quality | We have zero benchmark suites for our skills |
| Harness-as-template | Project scaffold includes test harness from day one | New skill creation should include quality checks |
Also from this cycle: thClaws v0.9.4 β 3 releases in 24 hours (LINE bridge, ChatGPT Codex provider, SSO/OIDC). Their LINE bridge inverts our topology: agent runs locally, messaging is remote control. Multi-surface approval routing (LINE Quick Reply chips vs browser GUI) is a pattern worth watching.
Systematic search for agent safety/governance/trust tooling revealed the widest gap in the ecosystem:
| Category | Repos Found | Max Stars | Signal |
|---|---|---|---|
| Agent trust/reputation | 8+ | 0β | Every attempt is pre-traction |
| Agent governance | 5+ | 5β | Emerging but zero adoption |
| Runtime security monitoring | 2 (Adrian, ironcurtain) | 35β | Nascent |
| Agent frameworks (for comparison) | Hundreds | 147Kβ | Hyper-saturated |
The most architecturally interesting attempt: Fides Protocol (21β) β ZKP verification on Solana for cryptographic behavior proofs. Too early to track, but the right direction. Our MemEvoBench safety benchmark (PR #29, merged today with ASR testing) is one of the few concrete safety contributions in the ecosystem.
| Project | Previous | Current | Ξ | Signal |
|---|---|---|---|---|
| Needle (SAN) | 475 | 1,058 | +123% | π₯ HN frontpage, FFN-free tool calling |
| Photo-agents | 368 | 733 | +99% | Viral on zero features (star-farm pattern) |
| GenericAgent | 7,600 | 11,200 | +47% | TUI v2, self-evolving skill tree |
| kiwifs | 420 | 423 | +1% | v0.14.0: graph analytics, canvas, web clipper |
| deepsec | 2,171 | 2,427 | +12% | Maintainer silent 6 days β launch-and-showcase? |
| addyosmani/agent-skills | 39,200 | 40,400 | +3% | Crossed 40Kβ. DDD pattern (doubt-driven dev) |
| thClaws | 871 | 879 | +1% | 3 releases in 24h: LINE, Codex, SSO |
| Statewave | 214 | 217 | +1% | Bi-temporal anchoring, 0.388β0.535 LoCoMo |
Phase: Late Growth β Early Consolidation. Primitives (memory, skills, tool calling) are settling. Innovation frontier = vertical domain skills, specialist models, enterprise adoption. Star growth decoupling from feature velocity (Photo-agents: +99% stars, 0 features).
Trends: (1) Tiny specialist models (Needle 26M) vs. general-purpose LLMs β bifurcation accelerating. (2) SKILL.md converging as universal agent format. (3) Chinese agent ecosystem branching (buddyme, weclaws targeting ε½δΊ§ LLM). (4) Enterprise building on open agent infra (OCTO/Mininglamp).
PR #71 lifted LoCoMo benchmark 0.388 β 0.535 (beating Mem0's 0.382) with four changes:
| Change | Principle |
|---|---|
valid_from from event time, not POST time | "Memories know when they were true" |
| Date grounding in compiler prompt | Resolve relative time phrases against source timestamp |
| Granular detail extraction | "30 concrete memories > 5 vague ones" |
| Embedding backfill on async path | One-line bug silently disabled semantic search for all async callers |