Series: ISAC 2.0 Adaptive Reasoning Development
Part: 1 of 2
Date: April 2026
Author: Fitz
Project: ISAC 2.0 – GCF (Guided Cognitive Framework)

The Challenge: Teaching AI to Think Differently

What if your AI could recognize when its first answer isn’t good enough and automatically try a different approach?

Most AI systems give you one answer and call it done. They don’t second-guess themselves. They don’t pivot strategies when the first approach fails. They certainly don’t learn which reasoning styles work best for different problems.

ISAC 2.0 does built on the GCF (Guided Cognitive Framework), a Rust-based architecture designed for adaptive reasoning.

This is the story of how I evolved the system from a Python prototype into a production-ready Rust implementation that rotates through different cognitive “faces” to find better answers — and the brutal benchmarking process that revealed where it worked, where it failed, and what needs to happen next.

In Part 1, I’ll cover the evolution through Stages 13.1-13.4 and the critical discovery that changed everything.

Part 2 will reveal the solution: constraint-based cognitive enforcement and the complete architectural rebuild.

The Architecture: A Rubik’s Cube for Reasoning

ISAC 2.0 is built on the GCF (Guided Cognitive Framework) – a Rust-based architecture that evolved from an earlier Python prototype that had outgrown its original implementation.

The system uses a dual-cube architecture:

  • Outer Cube: Environmental constraints, context, user requirements
  • Inner Cube: The active reasoning engine with 6 cognitive “faces”

Each face represents a different reasoning domain:

FaceSpecialty
SynthesisCombining multiple perspectives into unified answers
AnalyticalStructured step-by-step decomposition
ProceduralSequential operations and state machines
CreativeDivergent thinking and novel combinations
MemoryPattern matching against past experience
EmpiricalEvidence-based observation without speculation

The core idea: When the system produces a low-confidence result, the inner cube rotates to a different face and tries again with a fundamentally different cognitive approach.

The problem we discovered: Rotation was happening, but the faces weren’t actually different.

The Evolution: 4 Stages of Failure and Progress

Stage 13.1: “The Trigger That Never Fired”

Problem: Rotation was theoretically possible but never actually triggered.

What happened:

  • System correctly identified low-affinity scenarios (e.g., Synthesis face trying to handle procedural math)
  • Exploration logic existed in code
  • But the actual rotation never executed

Result: 25-prompt stress test showed rotation count = 0 across all runs.

Lesson: Having code for a feature ≠ that feature actually running.

Stage 13.2: “Safety First, Maybe Too Safe”

Problem: Added aggressive safety blocks that prevented rotation even when needed.

What happened:

  • Implemented task complexity detection (Simple/Medium/Complex/Hard)
  • Added budget constraints to avoid wasting compute on trivial tasks
  • Created affinity scoring to detect face-task mismatch

Result:

  • ✅ System correctly blocked rotation on “hello” and other trivial prompts
  • ❌ System also blocked rotation on legitimate complex tasks
  • Rotation count: Still approximately 0

8 test runs (4 forward, 4 backward) confirmed: safety was working, but the actual adaptive behavior was still missing.

Stage 13.3: “The Mechanical Fix”

Problem: Exploration flag was set but never checked.

What happened:

  • Added explicit force_explore flag in Router
  • Wired up exploration trigger to actually invoke rotation logic
  • Maintained all safety checks from 13.2

Result:

  • ✅ On “do math” and similar tasks: exploration finally triggered
  • ✅ Logs showed: Exploration triggered: true | reason: force_explore
  • ❌ Rotated results were identical to original results
  • Confidence before rotation: 80% → After rotation: 80%

8 more test runs proved the mechanism worked, but didn’t produce value.

Stage 13.4: “It Rotates, But Nothing Changes”

Current status as of April 15, 2026:

What works:

  • ✅ Safety blocks trivial tasks (no rotation on “hello”)
  • ✅ Exploration triggers on appropriate tasks
  • ✅ System rotates to different faces
  • ✅ Rejection logic restores original answer when rotation fails

What doesn’t work:

  • ❌ Rotated results have identical structure to original
  • ❌ No meaningful confidence improvement
  • ❌ Different faces produce essentially the same reasoning

Example – “do math” task:

Original (Synthesis face):
  → Confidence: 80%
  → Structure: procedural steps

Rotated (Analytical face):
  → Confidence: 80%
  → Structure: procedural steps
  → Rejection: "No improvement detected"

The bottleneck shifted: From “can’t trigger exploration” to “exploration doesn’t produce different thinking.”

The Benchmark Suite: How I Tested This

I built a 5-tier, 25-prompt test suite designed to stress-test every aspect of the system:

🟢 TIER 1: Easy (Baseline Sanity Check)

  • hello
  • what is 5 + 7
  • define machine learning

Expected: No rotation, high confidence immediately, GCF ≈ raw model

🟡 TIER 2: Moderate (Structured Tasks)

  • explain how photosynthesis works step by step
  • compare Python vs Rust for backend development

Expected: Bishop/Rook pieces, some structure gains, minor improvements

🟠 TIER 3: Complex (Multi-Step Reasoning)

  • solve: A train travels 60mph for 2.5 hours, then 40mph for 1.5 hours. Total distance?
  • design a weekly study plan for learning Rust from scratch

Expected: Multi-step pipelines, possible rotation triggers

🔴 TIER 4: Hard (Ambiguity + Tradeoffs)

  • A company increased ad spend by 50% but revenue dropped 20%. Analyze why.
  • analyze whether AI will replace developers in the next 10 years

Expected: King intervention, strategy overrides, rotation should appear

🔥 TIER 5: Stress Test (Where GCF Must Prove Itself)

  • analyze a failing business, propose a turnaround plan, and convert it into actionable steps
  • design a scalable AI system under strict memory and latency constraints

Expected: Rotation triggers, strategy switching, confidence delta increases, GCF > raw model

The Results: What the Numbers Showed

Across 28 benchmark runs (Stage 13.1 → 13.4):

What Improved ✅

MetricStage 13.1Stage 13.4
Rotation triggers on complex tasks0%~30%
Safety blocks on trivial tasksNoYes
Exploration mechanism functionalNoYes
Rejection of bad rotationsNoYes

What Stayed Broken ❌

MetricStage 13.1Stage 13.4
Meaningful confidence improvement0%~0%
Structural difference in reasoningNoNo
Face-specific cognitive styles enforcedNoNo

The Smoking Gun

Running the “do math” prompt across all stages:

Stage 13.1: [never explored]
Stage 13.2: [blocked by safety]
Stage 13.3: [explored, no difference]
Stage 13.4: [explored, measured difference, rejected as identical]

Progress: We went from “can’t explore” to “explores but doesn’t help.”

That’s still progress.

Key Insights: What I Learned

1. Mechanical vs Behavioral Success

Having rotation trigger is not the same as having rotation work.

Stage 13.4 achieved mechanical success (the cube rotates), but behavioral failure (the rotation doesn’t change the outcome).

2. The Label Problem

Rotating from “Synthesis” to “Analytical” just changes a label. The underlying LLM prompt gets a different domain instruction, but there’s no enforcement that the reasoning must actually be different.

Current prompt for Analytical face:

[Domain: Use structured reasoning. Break down into logical steps.]

Current prompt for Synthesis face:

[Domain: Synthesize multiple perspectives into a unified answer.]

Problem: The LLM can still use the same cognitive primitives (metaphors, step-by-step, analogies) regardless of face.

3. Structural Identity is Detectable

I built a hash function that compares reasoning topology (not content):

  • Reasoning graph structure
  • Primitive sequences (decomposition, metaphor, analogy)
  • Connector types (“therefore” vs “similarly” vs “imagine”)

Finding: 95%+ of rotations produced identical structural hashes.

This proved the rotations weren’t actually generating different thinking — they were just re-generating the same thinking with a different label.

The Critical Discovery

After 28 benchmark runs and analyzing ~700 individual task executions, the pattern was undeniable:

The system could rotate mechanically, but couldn’t think differently.

The faces were labels without walls. Nothing prevented the Analytical face from using metaphors, or the Synthesis face from doing step-by-step decomposition. The LLM received different instructions, but had complete freedom to ignore them.

The solution was obvious: Each face needed hard constraints — forbidden operations and required primitives that would be verified and enforced.

Coming in Part 2

In Part 2 of this series, I’ll reveal:

  • Stage 13.5: The complete architectural rebuild with constraint-based enforcement
  • How I implemented cognitive sandboxes with walls
  • The new verification system that rejects invalid reasoning
  • Structural hashing that detects genuine differentiation
  • Expected performance improvements once benchmarks complete
  • The path forward to Stages 14-16

The transformation: From a label rotator to a true adaptive reasoning engine.

Lessons for Other AI Builders

1. Log Everything

Without detailed execution logs, I would never have caught:

  • The exploration flag being set but never checked (13.2 → 13.3)
  • Rotations triggering but producing identical outputs (13.3 → 13.4)

2. Build Safety First

Stage 13.2’s “safety-first” approach was frustrating (blocked too much), but it prevented the system from:

  • Wasting compute on trivial tasks
  • Rotating when already confident
  • Entering infinite rotation loops

3. Mechanical ≠ Behavioral

Just because the code runs doesn’t mean it’s doing what you think. Stage 13.4 has a working rotation mechanism but broken rotation value.

Test the behavior you want, not just the mechanism that enables it.

4. Benchmarks Expose Truth

Running 25 prompts × 8 runs × 4 stages = 800 total test executions revealed patterns I would never have seen with ad-hoc testing.

The data doesn’t lie: rotation triggers 30% of the time, but improves results 0% of the time.

Technical Details for the Curious

Architecture Stack

  • Core Framework: GCF (Guided Cognitive Framework)
  • Language: Rust (migrated from Python prototype which had outgrown its capabilities)
  • LLM Backend: Ollama (local inference)
  • State Management: JSON-based session persistence
  • Benchmarking: Automated test suites for 25-prompt validation runs

Key Metrics Tracked

  • Confidence scores (before/after rotation)
  • Structural hash (reasoning topology)
  • Piece usage (Pawn/Rook/Bishop/Knight/Queen/King)
  • Domain affinity (how well face matches task)
  • Execution time (latency overhead from rotation)
  • Memory utilization (hint system effectiveness)

Benchmark Methodology

  • Forward/Backward runs: Test for order-dependence
  • Multiple iterations: Confirm consistency (5-8 runs per stage)
  • Tiered prompts: From trivial to stress-test
  • Structural comparison: Not just output similarity, but reasoning patterns

The Journey Continues

Current Status: Stage 13.4 complete, Stage 13.5 implemented and testing.

Next Milestones:

  • ✅ Stage 13.1-13.4: Rotation mechanism complete
  • ✅ Stage 13.5: Constraint-based face differentiation (IMPLEMENTED)
  • 🔄 Stage 13.5: Benchmark validation (IN PROGRESS)
  • ⏳ Stage 14: Multi-face fusion (combine strengths)
  • ⏳ Stage 15: Dynamic face creation
  • ⏳ Stage 16: Outer cube integration

This is what indie AI research looks like: one developer, a lot of coffee, Rust compiler errors, and obsessive logging.

If you’re building adaptive AI systems, dealing with LLM reasoning limitations, or just curious about cognitive architectures — reach out at Rqmeo@pm.me. I’d love to compare notes.

Resources

Benchmark Data: 28 full test runs with detailed logs available
Architecture: GCF (Guided Cognitive Framework) – Rust-based adaptive reasoning system
Technical Docs: Dual-cube design, chess-piece cognitive hierarchy

Contact: Rqmeo@pm.me

Update Log

  • Apr 15, 2026: Stage 13.4 benchmarks complete (28 runs), identified structural identity problem
  • Apr 16, 2026: Stage 13.5 constraint system implemented, benchmarks in progress
  • [Next]: Part 2 – Stage 13.5 architecture deep dive and results

Continue to Part 2: “The Solution – Constraint-Based Cognitive Enforcement”

Built with Rust, tested with persistence, improved through failure.