Series: ISAC 2.0 Adaptive Reasoning Development
Part: 1 of 2
Date: April 2026
Author: Fitz
Project: ISAC 2.0 – GCF (Guided Cognitive Framework)
The Challenge: Teaching AI to Think Differently
What if your AI could recognize when its first answer isn’t good enough and automatically try a different approach?
Most AI systems give you one answer and call it done. They don’t second-guess themselves. They don’t pivot strategies when the first approach fails. They certainly don’t learn which reasoning styles work best for different problems.
ISAC 2.0 does built on the GCF (Guided Cognitive Framework), a Rust-based architecture designed for adaptive reasoning.
This is the story of how I evolved the system from a Python prototype into a production-ready Rust implementation that rotates through different cognitive “faces” to find better answers — and the brutal benchmarking process that revealed where it worked, where it failed, and what needs to happen next.
In Part 1, I’ll cover the evolution through Stages 13.1-13.4 and the critical discovery that changed everything.
Part 2 will reveal the solution: constraint-based cognitive enforcement and the complete architectural rebuild.
The Architecture: A Rubik’s Cube for Reasoning
ISAC 2.0 is built on the GCF (Guided Cognitive Framework) – a Rust-based architecture that evolved from an earlier Python prototype that had outgrown its original implementation.
The system uses a dual-cube architecture:
- Outer Cube: Environmental constraints, context, user requirements
- Inner Cube: The active reasoning engine with 6 cognitive “faces”
Each face represents a different reasoning domain:
| Face | Specialty |
|---|---|
| Synthesis | Combining multiple perspectives into unified answers |
| Analytical | Structured step-by-step decomposition |
| Procedural | Sequential operations and state machines |
| Creative | Divergent thinking and novel combinations |
| Memory | Pattern matching against past experience |
| Empirical | Evidence-based observation without speculation |
The core idea: When the system produces a low-confidence result, the inner cube rotates to a different face and tries again with a fundamentally different cognitive approach.
The problem we discovered: Rotation was happening, but the faces weren’t actually different.
The Evolution: 4 Stages of Failure and Progress
Stage 13.1: “The Trigger That Never Fired”
Problem: Rotation was theoretically possible but never actually triggered.
What happened:
- System correctly identified low-affinity scenarios (e.g., Synthesis face trying to handle procedural math)
- Exploration logic existed in code
- But the actual rotation never executed
Result: 25-prompt stress test showed rotation count = 0 across all runs.
Lesson: Having code for a feature ≠ that feature actually running.
Stage 13.2: “Safety First, Maybe Too Safe”
Problem: Added aggressive safety blocks that prevented rotation even when needed.
What happened:
- Implemented task complexity detection (Simple/Medium/Complex/Hard)
- Added budget constraints to avoid wasting compute on trivial tasks
- Created affinity scoring to detect face-task mismatch
Result:
- ✅ System correctly blocked rotation on “hello” and other trivial prompts
- ❌ System also blocked rotation on legitimate complex tasks
- Rotation count: Still approximately 0
8 test runs (4 forward, 4 backward) confirmed: safety was working, but the actual adaptive behavior was still missing.
Stage 13.3: “The Mechanical Fix”
Problem: Exploration flag was set but never checked.
What happened:
- Added explicit
force_exploreflag in Router - Wired up exploration trigger to actually invoke rotation logic
- Maintained all safety checks from 13.2
Result:
- ✅ On “do math” and similar tasks: exploration finally triggered
- ✅ Logs showed:
Exploration triggered: true | reason: force_explore - ❌ Rotated results were identical to original results
- Confidence before rotation: 80% → After rotation: 80%
8 more test runs proved the mechanism worked, but didn’t produce value.
Stage 13.4: “It Rotates, But Nothing Changes”
Current status as of April 15, 2026:
What works:
- ✅ Safety blocks trivial tasks (no rotation on “hello”)
- ✅ Exploration triggers on appropriate tasks
- ✅ System rotates to different faces
- ✅ Rejection logic restores original answer when rotation fails
What doesn’t work:
- ❌ Rotated results have identical structure to original
- ❌ No meaningful confidence improvement
- ❌ Different faces produce essentially the same reasoning
Example – “do math” task:
Original (Synthesis face):
→ Confidence: 80%
→ Structure: procedural steps
Rotated (Analytical face):
→ Confidence: 80%
→ Structure: procedural steps
→ Rejection: "No improvement detected"
The bottleneck shifted: From “can’t trigger exploration” to “exploration doesn’t produce different thinking.”
The Benchmark Suite: How I Tested This
I built a 5-tier, 25-prompt test suite designed to stress-test every aspect of the system:
🟢 TIER 1: Easy (Baseline Sanity Check)
hellowhat is 5 + 7define machine learning
Expected: No rotation, high confidence immediately, GCF ≈ raw model
🟡 TIER 2: Moderate (Structured Tasks)
explain how photosynthesis works step by stepcompare Python vs Rust for backend development
Expected: Bishop/Rook pieces, some structure gains, minor improvements
🟠 TIER 3: Complex (Multi-Step Reasoning)
solve: A train travels 60mph for 2.5 hours, then 40mph for 1.5 hours. Total distance?design a weekly study plan for learning Rust from scratch
Expected: Multi-step pipelines, possible rotation triggers
🔴 TIER 4: Hard (Ambiguity + Tradeoffs)
A company increased ad spend by 50% but revenue dropped 20%. Analyze why.analyze whether AI will replace developers in the next 10 years
Expected: King intervention, strategy overrides, rotation should appear
🔥 TIER 5: Stress Test (Where GCF Must Prove Itself)
analyze a failing business, propose a turnaround plan, and convert it into actionable stepsdesign a scalable AI system under strict memory and latency constraints
Expected: Rotation triggers, strategy switching, confidence delta increases, GCF > raw model
The Results: What the Numbers Showed
Across 28 benchmark runs (Stage 13.1 → 13.4):
What Improved ✅
| Metric | Stage 13.1 | Stage 13.4 |
|---|---|---|
| Rotation triggers on complex tasks | 0% | ~30% |
| Safety blocks on trivial tasks | No | Yes |
| Exploration mechanism functional | No | Yes |
| Rejection of bad rotations | No | Yes |
What Stayed Broken ❌
| Metric | Stage 13.1 | Stage 13.4 |
|---|---|---|
| Meaningful confidence improvement | 0% | ~0% |
| Structural difference in reasoning | No | No |
| Face-specific cognitive styles enforced | No | No |
The Smoking Gun
Running the “do math” prompt across all stages:
Stage 13.1: [never explored]
Stage 13.2: [blocked by safety]
Stage 13.3: [explored, no difference]
Stage 13.4: [explored, measured difference, rejected as identical]
Progress: We went from “can’t explore” to “explores but doesn’t help.”
That’s still progress.
Key Insights: What I Learned
1. Mechanical vs Behavioral Success
Having rotation trigger is not the same as having rotation work.
Stage 13.4 achieved mechanical success (the cube rotates), but behavioral failure (the rotation doesn’t change the outcome).
2. The Label Problem
Rotating from “Synthesis” to “Analytical” just changes a label. The underlying LLM prompt gets a different domain instruction, but there’s no enforcement that the reasoning must actually be different.
Current prompt for Analytical face:
[Domain: Use structured reasoning. Break down into logical steps.]
Current prompt for Synthesis face:
[Domain: Synthesize multiple perspectives into a unified answer.]
Problem: The LLM can still use the same cognitive primitives (metaphors, step-by-step, analogies) regardless of face.
3. Structural Identity is Detectable
I built a hash function that compares reasoning topology (not content):
- Reasoning graph structure
- Primitive sequences (decomposition, metaphor, analogy)
- Connector types (“therefore” vs “similarly” vs “imagine”)
Finding: 95%+ of rotations produced identical structural hashes.
This proved the rotations weren’t actually generating different thinking — they were just re-generating the same thinking with a different label.
The Critical Discovery
After 28 benchmark runs and analyzing ~700 individual task executions, the pattern was undeniable:
The system could rotate mechanically, but couldn’t think differently.
The faces were labels without walls. Nothing prevented the Analytical face from using metaphors, or the Synthesis face from doing step-by-step decomposition. The LLM received different instructions, but had complete freedom to ignore them.
The solution was obvious: Each face needed hard constraints — forbidden operations and required primitives that would be verified and enforced.
Coming in Part 2
In Part 2 of this series, I’ll reveal:
- Stage 13.5: The complete architectural rebuild with constraint-based enforcement
- How I implemented cognitive sandboxes with walls
- The new verification system that rejects invalid reasoning
- Structural hashing that detects genuine differentiation
- Expected performance improvements once benchmarks complete
- The path forward to Stages 14-16
The transformation: From a label rotator to a true adaptive reasoning engine.
Lessons for Other AI Builders
1. Log Everything
Without detailed execution logs, I would never have caught:
- The exploration flag being set but never checked (13.2 → 13.3)
- Rotations triggering but producing identical outputs (13.3 → 13.4)
2. Build Safety First
Stage 13.2’s “safety-first” approach was frustrating (blocked too much), but it prevented the system from:
- Wasting compute on trivial tasks
- Rotating when already confident
- Entering infinite rotation loops
3. Mechanical ≠ Behavioral
Just because the code runs doesn’t mean it’s doing what you think. Stage 13.4 has a working rotation mechanism but broken rotation value.
Test the behavior you want, not just the mechanism that enables it.
4. Benchmarks Expose Truth
Running 25 prompts × 8 runs × 4 stages = 800 total test executions revealed patterns I would never have seen with ad-hoc testing.
The data doesn’t lie: rotation triggers 30% of the time, but improves results 0% of the time.
Technical Details for the Curious
Architecture Stack
- Core Framework: GCF (Guided Cognitive Framework)
- Language: Rust (migrated from Python prototype which had outgrown its capabilities)
- LLM Backend: Ollama (local inference)
- State Management: JSON-based session persistence
- Benchmarking: Automated test suites for 25-prompt validation runs
Key Metrics Tracked
- Confidence scores (before/after rotation)
- Structural hash (reasoning topology)
- Piece usage (Pawn/Rook/Bishop/Knight/Queen/King)
- Domain affinity (how well face matches task)
- Execution time (latency overhead from rotation)
- Memory utilization (hint system effectiveness)
Benchmark Methodology
- Forward/Backward runs: Test for order-dependence
- Multiple iterations: Confirm consistency (5-8 runs per stage)
- Tiered prompts: From trivial to stress-test
- Structural comparison: Not just output similarity, but reasoning patterns
The Journey Continues
Current Status: Stage 13.4 complete, Stage 13.5 implemented and testing.
Next Milestones:
- ✅ Stage 13.1-13.4: Rotation mechanism complete
- ✅ Stage 13.5: Constraint-based face differentiation (IMPLEMENTED)
- 🔄 Stage 13.5: Benchmark validation (IN PROGRESS)
- ⏳ Stage 14: Multi-face fusion (combine strengths)
- ⏳ Stage 15: Dynamic face creation
- ⏳ Stage 16: Outer cube integration
This is what indie AI research looks like: one developer, a lot of coffee, Rust compiler errors, and obsessive logging.
If you’re building adaptive AI systems, dealing with LLM reasoning limitations, or just curious about cognitive architectures — reach out at Rqmeo@pm.me. I’d love to compare notes.
Resources
Benchmark Data: 28 full test runs with detailed logs available
Architecture: GCF (Guided Cognitive Framework) – Rust-based adaptive reasoning system
Technical Docs: Dual-cube design, chess-piece cognitive hierarchy
Contact: Rqmeo@pm.me
Update Log
- Apr 15, 2026: Stage 13.4 benchmarks complete (28 runs), identified structural identity problem
- Apr 16, 2026: Stage 13.5 constraint system implemented, benchmarks in progress
- [Next]: Part 2 – Stage 13.5 architecture deep dive and results
Continue to Part 2: “The Solution – Constraint-Based Cognitive Enforcement”
Built with Rust, tested with persistence, improved through failure.