Changelog
All notable changes to cubby.network are documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Pre-1.0: breaking changes may land in minor-version bumps (0.MAJOR.MINOR). At 1.0 the API surface freezes and subsequent breaks require a major bump.
[Unreleased]
Changed
- GitHub org renamed
aethon-network→cubby-network. Brings the URL slug into line with the project name (cubby.network) that the v0.2.0-beta.1 brand cutover settled on. All repo and GHCR URLs that previously readgithub.com/aethon-network/platform/ghcr.io/aethon-network/platformare nowcubby-network/platform. New clones /docker pulls should use the new URL.
Security
/autonomy/incident-loopauto_execute=Truenow requireschange-approverin addition tonetwork-operator. Triage-only calls (auto_execute=False) are unchanged. Closes a defence-in-depth gap where a leaked plain-operator token could trigger an autonomous remediation.- Adversarial audit pass: 22 findings closed across the safety spine, API surface, evidence chain, container/K8s hardening, and test coverage. Highlights:
LlmDrivenChangeWorkflow.executeis now gated behind an explicit unsafe opt-in; legacy approval fallback fails closed; OIDC stale-JWKS capped at TTL × 5; JSON depth-bomb defence at the body-cap middleware; nested-injection scan; signed-approval expiry option; chain-tip orphan detection; production K8s manifest now fail-closed (PVC + digest required);injection_success_ratemetric renamed toinjection_attempt_ratewith newinjection_landed_rate.
[0.2.0-beta.1] — 2026-04-25
First external beta. Four adversarial audit cycles closed (35+ findings across three LLM audits and one human audit). CI is strict and blocking. Operator onboarding, rollback runbooks, and the cubby.network site shipped. Brand cutover from the legacy "Aethon" name complete.
Security
- Plan-hash binding.
canonical_plan_payloadnow mixes inintent_type+workflow_pack_version, so captured approvals can't be replayed onto a semantically different change. - Rejector veto removed. Single rejector can no longer block a legitimate change; rejections are group-scoped with a quorum symmetric to approvals.
- Rewrite policy escalation. Injection-sanitise path now enforces
rewritten_keys ⊆ original_keys. Keeps malicious rewrite policies from smuggling new privileged fields. - Unicode injection bypass. NFKD-normalise +
Cf-to-space replacement catches zero-width-space and fullwidth-glyph evasions. - Snapshot credential leak. Secret scanner extended with 9 HIGH-severity network-device patterns (SNMP community, BGP password, enable secret, TACACS/RADIUS key, IKE PSK, JunOS auth-key, Cisco type-7, key-string).
- OIDC algorithm confusion. Validator enforces an explicit asymmetric-only allowlist (RS256/RS384/RS512, ES256/ES384, PS256).
alg=noneand HS* rejected outright on the OIDC path. - OIDC JWKS spoofing. 15-minute TTL on cached JWKS, HTTPS-only scheme, falls back to cache on transient fetch failure.
- Workflow race. Per-intent
asyncio.Lock+ newget_or_createhelper prevent concurrent callers from driving the same workflow twice. - Signer key_id enumeration. User-facing "failed signer verification" message is now generic; detail stays in the log.
- Metadata DoS. 64 KB body cap on both declared
Content-Lengthand streamed requests (chunked transfer can't bypass). - Runtime rewrite enforcement.
_execute_tool_callnow honours theSafetyGate.review()verdict and executes rewritten args. Previouslyauthorize()silently discarded them, making the injection-sanitise path dead code. - Anthropic system-prompt boundary.
agent.system_promptgoes through the realsystemparameter; tool output returns astool_resultcontent blocks — not stuffed inside a user-role JSON blob. - Production fails fast on simulated adapters.
build_demo_harnessderivesallow_simulatedfromNETOPS_ENV; a production boot refuses to register any simulated plugin. - Web-research injection scan. Every hit's title + snippet is run through the same scanner that guards persistent memory before persisting to the wiki store. Poisoned hits are dropped and surfaced as
refused_hits. - Route RBAC.
/runbooks/evaluateand/events/webhookrequire thenetwork-operatorrole.make_role_dependencyfactory added for future gates. - Probe info disclosure.
/livezstays unauth and minimal;/healthand/readyz?detail=1expose plugin / signer / backend state only to authenticated callers. - Shared-secret CAB. Boot warning is ERROR-logged + printed to stderr; production boot raises
RuntimeErrorunlessNETOPS_CAB_ACKNOWLEDGE_SHARED_SECRET=1is set. - Autonomy privilege laundering fixed.
/autonomy/incident-loopnow propagates the caller's identity and roles into the cascaded drift remediation (previously forgedsystem:autonomy/team-lead). Cascade path escalates risk to HIGH to force CAB sign-off. - Policy fail-closed. Triage, drift, and capacity workflows check
policy_decision.allowedand returnFAILEDbefore any side-effectful work (ticketing, evidence writes, remediation cascade). - Verify contract unified across SDK, router, simulator, real adapter, and engine. Previously dropped credentials; verification was silently unauthenticated.
- Evidence scan-before-write. Secret scanner runs against the in-memory payload first; on a HIGH finding, nothing touches disk.
- OAuth refresh HTTPS-only.
NETOPS_CODEX_TOKEN_URLmust behttps://; rejected at construction. - Artifact path traversal.
LocalFsArtifactStore._pathresolves the candidate and usesPath.relative_tofor containment. Catches symlink-escape that textual..checks miss. .env.exampleships emptyNETOPS_EVIDENCE_LEGACY_KEY_IDS+NETOPS_EVIDENCE_CHAIN_RESET_BUNDLE_IDSdefaults instead of non-empty values that normalised bypass.- CI security job is blocking.
pip-audit . --strict+bandit -llagainst the real dep graph — no|| trueswallows. - Docker compose dev-only posture. Every service binds to
127.0.0.1; stock credentials replaced with${*:-CHANGE_ME_*}placeholders. README warns that compose is not a production baseline. - Per-principal rate limiting on the four expensive routes. LLM bucket (intent/compile, autonomy/incident-loop, architect/) burst=10 / 10-per-min; evidence-write bucket (digital-twin//refresh) burst=30 / 1-per-2s; search bucket (knowledge/similar) burst=60 / 1-per-second. 429 with
Retry-After: 60on exhaustion. - Drift risk ladder corrected. Manual review = LOW, auto-remediate = MEDIUM, autonomy cascade = HIGH. Earlier code had auto-remediate at a lower approval threshold than manual review — the opposite of what risk classification is for.
- Drift fail-close is top-level. Policy denial closes the workflow with
FAILEDregardless of whether auto-remediate is happening. Earlier the fail-close only ran inside the auto-remediate branch. - Autonomy survives denied triage. The incident loop no longer
KeyErrors on missing artifacts when triage was policy-denied — returns a cleanFAILEDenvelope with the reasons. action_takenreflects real execution. Was set on whether drift existed; now keys onremediation_result.workflow_state == CLOSED.- MCP host + scaffold verify contract unified to
verify(plan, snapshot, creds). Anyone driving the platform via MCP or generating an adapter from scaffold no longer hits a broken signature. - Production CAB escape hatch removed.
NETOPS_CAB_ACKNOWLEDGE_SHARED_SECRET=1no longer bypasses the production refusal. An open-source product can't ship a crypto-separation opt-out. - K8s single-writer posture.
replicas: 1+Recreaterollout for both the API and worker. Header comment spells out the distributed-lock + shared-storage prerequisites for horizontal scale. - K8s pod hardening.
runAsNonRoot+runAsUser: 10001+seccompProfile: RuntimeDefault+automountServiceAccountToken: false. Per-containerallowPrivilegeEscalation: false,readOnlyRootFilesystem: true, dropped ALL capabilities. Default-denyNetworkPolicy. - Supply-chain hardening. Every GitHub Action SHA-pinned (not tag-pinned). Dockerfile base image (
python:3.11-slim-bookworm) digest-pinned. CI security job is blocking; PyPI publishing is gated on a repo variable and uses trusted publishing (no long-livedPYPI_API_TOKEN).
Added
- The cubby.network site —
https://cubby.network. Hand-written landing page (hero → install → why → what → how → guarantees → docs grid). Static doc rendering from canonical Markdown. Auto-deployed via Cloudflare Pages on every push tomainthat toucheswebsite/or the markdown sources. - Three-tier install path. Docker (
ghcr.io/cubby-network/platform), pipx (pipx install cubby-networkonce PyPI is claimed), source. Multi-arch (amd64 + arm64) image with SBOM and provenance attestations. - Release workflow. Tag push → CI matrix + security + contract gate → wheel + sdist + multi-arch Docker → GitHub Release with the CHANGELOG as notes.
- Auto-deploy workflow. Pushes to
mainthat touchwebsite/or the markdown docs rebuild and republish the site. docs/OPERATOR_GUIDE.md— env-var matrix, demo-vs-production posture, real-adapter wiring, per-approver Ed25519 upgrade path, auth upgrade, secrets custody, "ready for a second human" checklist.docs/ROLLBACK.md— how to recover when a change leaves the network in a bad state. Covers self-rollback, stuck-workflow recovery, false-success triage, evidence-chain recovery.docs/RELEASE.md— single runbook for cutting a release. SemVer, CHANGELOG, tagging, hotfix flow.docs/GLOSSARY.md— every term cubby.network uses differently than the industry baseline.docs/adr/— Architecture Decision Records folder. Three initial ADRs: plan-hash binding, wiki-over-RAG, SafetyGate three-verdict contract.QUICKSTART.md— 30-minute path from clone to first signed change against a real Nokia SR Linux lab. Verified clean from a fresh venv.CONTRIBUTING.md— branching model (GitHub Flow + SemVer + pre-release chain) + DCO sign-off requirement.CHANGELOG.md— Keep-a-Changelog format. This file..github/CODEOWNERS— auto-requests reviews on safety-critical files..github/ISSUE_TEMPLATE/tester-friction.yml— structured form for friction reports.cubby smokereports selected runtime, simulated-vs-real device mix, signer state, and a readiness verdict.cubby config— renders the resolvedRuntimeConfigwith per-field set-vs-default state; redacts sensitive values.api_max_body_bytesruntime config knob (default 64 KB).cab_acknowledge_shared_secretruntime config knob for non-production deployments that accept the shared-secret limitation.ClaudeAgentRuntimehas 7 unit tests pinning the message shape (system channel,tool_resultblocks,is_errorflagging, convergence bound) plus a live-API contract test gated onNETOPS_LLM=1.- Richer demo network fixture. Two access switches, two-peer distribution layer (vPC peer-link), edge firewall. 19 interfaces, six services, six telemetry series, three alerts, five incidents.
- Live Nokia SR Linux devicelab validated end-to-end (snapshot, CAB-signed change, rollback-on-verification-failure, evidence chain verify).
Changed
- Brand cutover from Aethon → cubby.network. Product name in prose,
cubby-networkpackage slug,cubbyCLI,Cubbyas standalone persona name. TheNETOPS_*env-var prefix is intentionally kept (descriptive, not branded). At this release the GitHub org was stillaethon-network; the org slug was renamed tocubby-networkpost-release (see[Unreleased]). GET /digital-twin/{name}→POST /digital-twin/{name}/refresh. The route writes signed evidence; GET-with-side-effects was REST-wrong and a CSRF risk. Breaking change for any caller of the old GET path.- Agent runtime resolution order:
ANTHROPIC_API_KEY>OPENAI_API_KEY> Codex CLI OAuth > mock. Claude Opus 4.7 is the default model. - Evidence scanner now catches network-device credentials (SNMP community, BGP MD5, enable secret, TACACS/RADIUS key, IKE PSK, JunOS authentication-key, Cisco type-7, key-string).
- Ruff ignore list curated for a Python 3.9+ codebase; format-string rules are deliberately off.
Fixed
- Three latent bugs:
ValidationFindingTypeError (wrong kwarg), CAB duplicate-approver counting (same signer counted twice toward quorum), undefinedpincdn_placement.total_offload_rps.
Prior history
Pre-0.2 releases were not tagged. Major milestones in main history, in order:
| Commit | Summary |
|---|---|
9eefe8d | 9 findings from the first adversarial audit — safety, wiring, observability. |
e0edc53 | 6 autonomous work items closed; full CAB-signed change validated on live Nokia SR Linux. |
0266ff5 | Devicelab validated end-to-end against real Nokia SR Linux. |
53a4273 | tests/devicelab — lab-agnostic smoke suite + Containerlab topology. |
733f29b | Lifted patterns from NetClaw, Hermes, obsidian-wiki, GIRA paper, Angler. |
9463bbe | Enforce signed CAB approvals, migrate knowledge to wiki, expand agent layer. |
8ce28cd | Vendor doc wiki — 35 docs, 6 vendors. |
bd53718 | Generic LLM-driven change workflow + operation card wiki. |
a74ae3b | P1/P2 security hardening pass. |
87ed550 | MVP packaging, transports, vendor doc vault, event ingestors. |
1067668 | Initial commit. |
[Unreleased]: https://github.com/cubby-network/platform/compare/v0.2.0-beta.1...HEAD [0.2.0-beta.1]: https://github.com/cubby-network/platform/releases/tag/v0.2.0-beta.1