SmarterAgents: Autonomous Multi-Agent Orchestration Framework

An elite, production-grade local AI agent framework designed for unified memory architectures, specifically optimized for the AMD Framework 16 Laptop (Ryzen 9 7940HS / Radeon 780M iGPU / GFX1103) running accelerated inference via the Mesa Vulkan RADV driver on Fedora Linux.

This framework abandons monolithic agent loops in favor of a Decoupled Model Context Protocol (MCP) architecture. It utilizes a single, resident 14B-class LLM, dynamically toggling its cognitive reasoning state to orchestrate a multi-persona pipeline (Planner \rightarrow Builder \rightarrow Reviewer). All local state, execution failures, and design variables are actively vectorized into a localized CPU-bound RAG pipeline for low-latency, self-healing execution.

1. System Architecture & Directory Layout

The environment strictly separates the transport broker layer from tool execution and active agent sandboxes. No automated agents write to the core/ or dev/ directories.

.
├── agents                       # Immutable agent configuration profiles, tools, and dynamic spaces
│   ├── default_agent.md
│   ├── modules
│   │   └── geoscaper            # Low-level workspace tool logic invoked by MCP server
│   ├── tools
│   │   └── tool_rules.gbnf      # Strict structural logit constraints for Builder turns
│   └── workspaces/              # Isolated Agent Sandbox Environment
│       └── MMDDprojectname_workspace/  # Dynamic task-specific runtime directory
│           ├── 00cache.json     # Global design vectors, tokens, and active styles
│           ├── 00memory_vault.db# In-process sqlite-vec engine (768-dim float array store)
│           ├── 00tasks.json     # Project feature checklist and sequence goals [Fully Embedded]
│           ├── 00lessons_learned.md # Global closed-fault and successful repair ledger
│           ├── 01plan_handoff.md# Planner execution payload (Thinking: Enabled)
│           ├── 02build_handoff.md# Builder compilation summary payload
│           └── 03review_audit.md# Accumulative Audit Ledger [Append-Only Block Records]
├── core                         # Source logic and foundational runtime engines ONLY
│   ├── smarterframework.py      # Transport broker (Discover, Constrain, Execute)
│   ├── llama.cpp                # Core inference execution backend
│   ├── cache/                   # System-level internal runtime engine caches
│   └── logs/                    # Standard input/output framework system logs
├── dev                          # Strict User Space: Backups, manual synching, manual logs
│   ├── 00plan_v1.md
│   ├── 00scratch.txt
│   ├── agent_stream.log
│   ├── llama-server.log
│   └── staging
│       ├── 01CACHE
│       ├── 02OLD
│       └── 03PROTO
├── models                       # Binary Storage Tier
│   ├── Huihui-Qwen3-8B-abliterated-v2.i1-Q5_K_M.gguf
│   ├── Qwen3.5-4B-Q4_K_M.gguf
│   └── Qwen3-14B-Instruct-Abliterated-Q4_K_M.gguf
├── README.md
├── stage.sh
└── start.sh

2. Core Operational Pipelines

The Decoupled Transport Broker (`smarterframework.py`)

The Python orchestrator is entirely tool-agnostic. It does not parse code, resolve paths, or execute system commands directly. It strictly manages the MCP network cycle:

Discover: Queries the MCP server for capabilities (tools/list).
Constrain: Maps schema parameters to local GBNF JSON grammars.
Execute: Forwards rigid argument values to the target tool (tools/call).

The Multi-Persona State Machine

To prevent instruction leakage and context dilution, the orchestrator executes a hard wipe of the context window between every phase transition, toggling the model's cognitive mode natively:

Phase 1: The Planner. (enable_thinking=True). Outputs strategic architecture to 01plan_handoff.md.
Phase 2: The Builder. (enable_thinking=False + GBNF Constraint). Emits zero-latency JSON tool payloads to execute components. Outputs to 02build_handoff.md.
Phase 3: The Reviewer. (enable_thinking=True). Audits compiled outputs against 00tasks.json constraints. Appends results to 03review_audit.md.

Active Memory Pipeline (`MemoryVault`)

All workspace iterations are systematically vectorized in-process. To preserve Vulkan execution lanes for the primary 14B model, the embedding pipeline runs entirely on the host CPU.

Engine: sqlite-vec + nomic-embed-text-v1.5.
Self-Healing Recovery: If the Builder receives a non-zero exit code or the Reviewer logs a rejection block, the framework automatically extracts the error trace, executes a k-NN distance search against the 00memory_vault.db, and spins up a fresh Builder instance pre-loaded with the top 3 historic fixes.

3. Target Hardware Optimization

This framework operates safely within a 32GB Shared Memory Ceiling operating at 89.6 GB/s bandwidth.

VRAM Allocation: Up to ~9.5 GB allocated to the 14B Q4_K_M model matrix via Vulkan.
Context Cap Limit: Context space is artificially restricted to 16384 (--ctx-size 16384) to prevent the KV Cache from exceeding ~4.4GB and causing out-of-memory kernel panics.
Unified Memory Safety: Pre-allocates unified memory safely and prevents page table collisions by bypassing OS memory mapping (--no-mmap).
Physical Core Alignment: Restricts processing to exactly 8 physical threads (--threads 8), eliminating SMT resource contention.
GPU Queue Splitting: Batch processing is throttled (--batch-size 128 and --ubatch-size 64) to prevent amdgpu driver timeouts (TDR).

4. Setup & Compilation

Step 1: Install Python Dependencies

The orchestrator requires specific libraries for runtime type guarding and the local vector engine:

pip install pydantic sentence-transformers sqlite-vec numpy

Step 2: Compile the Vulkan Inference Engine

Compile llama-server locally using the Mesa Vulkan driver stack, utilizing Link-Time Optimization (LTO) and jemalloc:

cd core/llama.cpp
cmake -B build -DGGML_VULKAN=1 -DLLAMA_LTO=ON -DLLAMA_JEMALLOC=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS="-O3 -march=native -mtune=native" -DCMAKE_CXX_FLAGS="-O3 -march=native -mtune=native"
cmake --build build --config Release -j$(nproc)

Step 3: ELF Dynamic Linking Fix (One-Time Execution)

To execute llama-server seamlessly from the root folder without dynamic linker errors, patch the binary to search its own directory for compiled shared objects:

sudo dnf install patchelf
patchelf --set-rpath '$ORIGIN' ./core/llama.cpp/build/bin/llama-server

Step 4: Execute the Pipeline

Run the unified start script from the root directory. This script initializes the Vulkan backend, performs memory health checks, boots the CPU-based MemoryVault, and triggers the asynchronous smarterframework.py orchestrator loop.

chmod +x start.sh
./start.sh

README.md