Meet ISAC: The AI That Thinks Before It Speaks

By Billy P. | April 2026

What Is ISAC?

ISAC stands for Intelligent Strategic Awareness Companion. He’s a desktop AI assistant that I’ve been building from scratch for the past 18 months – and when I say from scratch, I mean everything. The brain, the voice, the interface, the security, the memory. All of it.

You’ve probably used things like ChatGPT or Siri. They’re cloud-based – your words go to a server somewhere, get processed, and come back. ISAC is different. He runs entirely on your own computer. No internet required. No data leaving your machine. No subscription. He’s yours.

But what makes ISAC actually interesting isn’t that he’s local. It’s how he thinks.

The Problem with Normal AI

When you ask ChatGPT a question, it treats every single query the same way. Whether you say “hi” or ask it to plan a business strategy, the same amount of computing power fires up and the same approach is used. There’s no concept of “this is a simple question, I’ll keep it quick” or “this is complex, I need to really think about this.”

That’s like using a sledgehammer to hang a picture frame and also to demolish a wall. Same tool, wildly different jobs.

I wanted to fix that.

How ISAC Thinks: The Cognitive Cube

The brain behind ISAC is something I designed called the Cognitive Cube Framework – or GCF for short. It’s inspired by chess.

Imagine a cube with six faces, and each face is a chess board with 64 squares. That gives ISAC 384 cells to think with. Different chess pieces represent different ways of thinking:

The Pawn handles the simple stuff. You say “hey, what’s the weather?” – that’s a Pawn job. Quick, cheap, done.

The Knight is the creative thinker. Need a brainstorm? Want a fresh angle on something? The Knight jumps sideways, just like in chess.

The Bishop is the analyst. Give it a comparison or a step-by-step problem and it breaks things down methodically.

The Rook is the doer. It executes tools – runs code, searches the web, manages files. Straight lines, gets the job done.

The Queen is the heavy hitter. Complex research, multi-part answers, pulling together information from everywhere. Expensive to run, but worth it when you need depth.

The King is the boss. He doesn’t do the thinking himself – he validates everyone else’s work. Is this answer safe? Is this tool call appropriate? Should I approve this action? The King has the final say on everything.

When you ask ISAC something, the Cube looks at your question, figures out how complex it is, picks the right piece (or combination of pieces), allocates a budget of cells on the board, and only then starts thinking. Simple question? One Pawn, one cell, instant response. Complex question? Maybe a Rook gathers data, a Bishop analyses it, and a Queen synthesises the final answer – across multiple faces of the Cube.

It’s not just a metaphor. It’s an actual resource management system that decides how much computing power to spend on each question.

How a 3GB Model on the Cube Can Compete with GPT-5.2 and Claude Opus 4.6

This is the part that surprises people. ChatGPT 5.2 runs on models with hundreds of billions of parameters, spread across data centres filled with thousands of GPUs drawing megawatts of power. Claude Opus 4.6 is the same story. These are some of the most powerful AI systems ever built. How could a 3GB model running on a laptop possibly compete?

The answer is: it doesn’t have to be smarter. It just has to waste less.

The Dirty Secret of Big Models

When you ask GPT-5.2 a question, the model has to do everything internally. Before it even starts generating your answer, it has to:

Figure out what kind of question you’re asking
Decide how to approach it
Determine whether tools are needed
Plan the structure of its response
Evaluate whether its answer is safe
Maintain context from your conversation history
Format the output appropriately

And then – after all of that – it generates the actual answer.

All of that happens inside one enormous model. The routing, the safety checks, the memory, the planning – it’s all baked into those hundreds of billions of parameters. A huge portion of that model’s capacity isn’t generating your answer at all. It’s doing the cognitive overhead around your answer. Think about that. You’re paying for a model the size of a building, and a significant chunk of it is just figuring out what you asked and how to structure the reply.

This isn’t a criticism of those models – they’re remarkable engineering. But it’s an architectural inefficiency. Every query, no matter how simple, activates the full weight of the system.

What the Cube Does Instead

The GCF strips all of that overhead out of the model’s job and handles it externally, before the model is ever called.

By the time the language model receives a prompt from the Cube, the hard decisions have already been made. The Cube has already classified the query. Already allocated a budget. Already selected the right reasoning strategy. Already determined which tools are needed. Already set the safety boundaries. Already pulled relevant memories. The model’s only remaining job is to generate high-quality text within a narrow, well-defined scope.

The model doesn’t think about thinking. The Cube thinks about thinking. The model just thinks.

Example 1: Research Task

You ask: “Research the pros and cons of electric vehicles and give me a structured comparison.”

What GPT-5.2 does: The entire model activates. Hundreds of billions of parameters fire to classify this as a research task, decide on a comparison format, recall information about EVs, structure the output, check safety, and generate – all in one monolithic pass through one model. It works. But it’s like driving a lorry to the corner shop.

What ISAC does with a 3GB model:

The Cube’s router classifies this as a complex analytical query – no model needed yet, this is pure logic running on the CPU.
The budget system allocates cells across two faces – still no model call.
The strategy phase selects a three-piece pipeline: Rook (gather data) → Bishop (analyse) → Queen (synthesise) – still no model.
The Rook executes with a focused prompt: “List the key advantages and disadvantages of electric vehicles with supporting data.” The 3GB model generates a clean factual list. That’s all it has to do. Not classify. Not strategise. Not format. Just generate facts within a tight scope.
The Bishop executes with a different focused prompt: “Given these pros and cons, provide a structured analytical comparison with clear categories.” It receives the Rook’s output as context. Narrow scope again.
The Queen synthesises: “Combine this analysis into a comprehensive recommendation with a clear conclusion.” Final pass, clear objective.
The King validates the complete output for quality and safety.

Three model calls, each one simple and focused. The 3GB model never had to figure out what kind of question it was answering. Never had to decide on a strategy. Never had to evaluate its own safety. The Cube did all of that for free – no GPU needed, no tokens burned.

The end result? A structured, multi-perspective analysis that reads like it came from a much larger model, because the structure came from the Cube and the content came from a model that was only asked to do what it’s good at.

Example 2: Quick Factual Question

You ask: “What’s the capital of Japan?”

What GPT-5.2 does: The full model activates. Hundreds of billions of parameters engage to process a question that needs one word. It’s correct, obviously. But it’s absurd overkill.

What ISAC does: The router identifies this as a trivial factual query in microseconds. One Pawn. One cell. The 3GB model gets a razor-sharp prompt and returns “Tokyo” almost instantly. Total cost: negligible. The Cube spent less energy on this than GPT-5.2 spent deciding how to format the response.

For simple queries, the Cube doesn’t just match the big models – it’s faster and dramatically more efficient. No wasted compute. No overhead.

Example 3: Tool Use with Safety

You ask: “Run a system scan and delete any temporary files over 30 days old.”

What GPT-5.2 does: The model decides whether to use tools, generates a tool call, and relies on internal guardrails to decide whether this is safe. Those guardrails are baked into the model’s weights – you can’t inspect them, you can’t audit them, and you can’t customise them. If the model hallucinates a wrong file path, there’s no second check.

What ISAC does: The Cube routes this to the Rook (tool execution piece). Before the Rook can touch anything, three separate safety layers fire in sequence:

The role gate checks whether the current user has permission to execute system tools. If you’re in Normal mode, this is denied before anything else happens.
The tool governance layer classifies file deletion as a Dangerous operation.
The King piece evaluates the specific request: “Is deleting files older than 30 days from the temp directory reasonable and safe?” The King uses the LLM to reason about this, but only about the safety question – not about how to do it.

Only after all three layers approve does the Rook execute. Every step is logged to an audit trail. If something goes wrong, you can see exactly what was approved, by which layer, and why.

This isn’t something a big model can replicate internally. The governance is structural – it’s built into the architecture, not trained into weights that might drift or be jailbroken.

Example 4: Multi-Turn Conversation with Memory

You’ve been discussing a project for twenty minutes. You say: “Based on everything we’ve talked about, what are the three biggest risks?”

What GPT-5.2 does: The model processes your entire conversation history in its context window. For a long conversation, this can be thousands of tokens – all fed through hundreds of billions of parameters just to maintain context. The model has to re-read and re-process everything to figure out what “everything we’ve talked about” means.

What ISAC does: The Cube’s memory system has been tagging and scoring every exchange throughout the conversation. Important points got high importance scores. Trivial small talk got low scores and has already faded. The Cube assembles a focused context package – just the high-importance memories relevant to the current query – and hands it to a Bishop piece with a clear instruction: “Given this project context, identify the three most significant risks.”

The 3GB model receives a clean, pre-filtered, pre-organised context. Not the entire conversation. Just the bits that matter. It produces a focused, relevant answer because it was given focused, relevant input. The memory system did the hard work of deciding what matters – something the big models have to do internally, burning tokens and compute.

Example 5: Self-Correction

You ask something ambiguous and ISAC’s first answer isn’t great.

What GPT-5.2 does: You tell it the answer was wrong. The model tries again with the same approach, hoping for different results. There’s no mechanism for it to choose a fundamentally different reasoning strategy.

What ISAC does: The Cube tracks confidence scores. If the King’s validation flags low confidence, or if you push back, the Cube can re-route the query to a completely different piece. First attempt was a Pawn (too shallow)? The Cube escalates to a Bishop for analytical depth. Bishop’s analysis was too rigid? Re-route to a Knight for creative lateral thinking. Each retry is a genuinely different reasoning strategy, not just the same model trying harder.

In cross-check mode, the Cube goes even further: it sends the query to two different models simultaneously and has the King resolve any disagreements. The big cloud models can’t do this because they’re a single model. ISAC can because the Cube is the decision-maker, not the model.

The Numbers

Here’s how the resource comparison actually looks:

	GPT-5.2 / Claude Opus 4.6	ISAC + 3GB Model on GCF
Model size	Hundreds of billions of parameters	1.5–3 billion parameters
Hardware required	Data centre GPU clusters	A laptop
Internet required	Yes, always	No, never
Monthly cost	£20/month subscription or API fees	Free forever
Privacy	Your data goes to external servers	Everything stays on your machine, encrypted
Routing intelligence	Internal – baked into model weights, invisible	External – Cube framework, zero model cost, fully visible
Safety governance	Internal – model-level guardrails, not auditable	External – three independent layers, fully auditable
Strategy selection	Implicit – model decides internally, you can’t see how	Explicit – visible on the board, you can watch it think
Tool execution	Model requests tools, single safety check	Rook piece with role gate + governance tier + King approval
Self-correction	Same model tries again the same way	Re-routes to a different reasoning strategy entirely
Multi-model verification	Not possible – single model	Built-in cross-check with King resolution
Quality on focused tasks	Excellent	Comparable or better – model only handles what it’s good at, framework handles everything else
Quality on open-ended creative tasks	Excellent	Lower – honest about this, raw model size still wins here
Quality on multi-step research	Excellent	Strong – piece pipeline breaks complex tasks into focused steps
Transparency	Black box – you see the answer, nothing else	Full visibility: which piece, which face, what budget, what confidence, full audit trail

What You Actually Need to Run ISAC

This is where it gets really interesting for normal people. Let’s talk about what these systems actually need to run.

What GPT-5.2 and Claude Opus 4.6 need:

These models run on data centres. We’re talking thousands of NVIDIA A100 or H100 GPUs, each one costing £25,000+. Racks and racks of servers. Industrial cooling systems. Megawatts of electricity. Hundreds of gigabytes of VRAM just to load the model into memory. The infrastructure behind a single ChatGPT conversation costs millions of pounds to build and thousands per day to operate. You never see this because you access it through a website – but it’s there, and someone’s paying for it. That someone is you, through your subscription.

What ISAC needs:

RAM: 8GB minimum. That’s it. A basic laptop from 2018 has 8GB. The 3GB model fits in memory alongside Windows and the Electron frontend with room to spare. With 16GB you’re comfortable. With 32GB you could run multiple models simultaneously.
CPU: Any modern processor. The Cube’s routing, governance, and board management run on CPU and they’re lightweight – classification and strategy selection happen in microseconds. Even a budget laptop CPU handles this without breaking a sweat.
GPU: Optional. Genuinely optional. A GPU accelerates model inference (faster responses), but ISAC runs perfectly well on CPU only. A laptop with integrated graphics works. If you have a dedicated GPU – even an older one with 4GB VRAM – inference speeds up dramatically. You do not need a £1,500 graphics card.
Storage: The 3GB model plus the entire framework fits in under 5GB of disk space. That’s smaller than most games.
Internet: None. Zero. Not for setup, not for operation, not ever (unless you choose to use web search tools).

Let me put this in perspective:

Resource	GPT-5.2 Infrastructure	ISAC on a Laptop
RAM	Hundreds of GB of VRAM across GPU clusters	8GB system RAM (minimum)
GPUs	Thousands of A100/H100 cards	None required (optional, any will do)
CPUs	Server-grade multi-socket systems	Any modern laptop processor
Storage	Petabytes across distributed systems	Under 5GB
Power consumption	Megawatts	Your laptop charger
Cost to build	Millions of pounds	You already own the hardware
Cost to run	Thousands per day in electricity and cooling	Pennies in electricity
Internet	Required for every single query	Never needed

The Cube’s architecture is what makes this possible. Because the framework handles routing, governance, memory, and strategy on the CPU – which costs almost nothing computationally – the model only activates for actual text generation. And because each model call is narrow and focused (thanks to the piece system), the model runs fast even on modest hardware. A Pawn query on a laptop CPU returns in under a second. A full Queen pipeline might take a few seconds. That’s it.

You could run ISAC on the laptop you’re reading this on right now.

Where ISAC Wins

Focused tasks. When the question has a clear shape – research, analysis, comparison, tool execution, factual retrieval, structured output – the Cube’s ability to decompose the task, select the right strategy, and give the model a narrow focused scope produces results that match or exceed what the big models produce. The model is smaller, but it’s being used better.

Efficiency. A Pawn answering “what’s the capital of Japan?” uses a fraction of the compute that GPT-5.2 burns on the same question. Over hundreds of queries a day, this adds up to a massive difference in energy, cost, and hardware requirements.

Safety and governance. Three independent, auditable safety layers that you control, running locally, with no possibility of cloud outage or policy change breaking your system. You can inspect every decision. You can customise every threshold. The big models give you a black box and ask you to trust them.

Privacy. Nothing ever leaves your machine. Full stop. No terms of service that might change. No data being used to train the next version. No API logs on someone else’s server. For sensitive work, this isn’t a nice-to-have – it’s a requirement.

Transparency. You can literally watch ISAC think. The 3D cube lights up, the board shows which cells are active, you can see which piece was chosen, what confidence it had, and what the King thought of the result. Try asking GPT-5.2 which internal layer decided how to approach your question. You can’t. It doesn’t know.

Where the Big Models Still Win

I want to be completely honest about this, because overclaiming would undermine the real advantages.

Pure creative generation. If you want a 5,000-word short story with rich character development and surprising plot twists, a 3GB model is not going to match Claude Opus 4.6. Raw creative language generation benefits from raw model size in ways that the framework can’t fully compensate for. The Cube can help with structure and planning, but the prose itself comes from the model, and bigger models write better prose.

Extremely broad knowledge. Larger models have been trained on more data and can recall more obscure facts. A 3GB model has a narrower knowledge base. The Cube can partially compensate by routing to tools (web search, databases), but for pure recall without tools, the bigger model knows more.

Nuanced multi-language work. Large models handle translation, code-switching, and multilingual tasks better because they’ve seen more multilingual data. A compact model fine-tuned for the GCF would likely be English-focused initially.

Long-form generation in a single pass. The Cube excels at breaking tasks into pieces, but sometimes you want one long, flowing, coherent piece of text generated in a single pass. Larger models maintain coherence over longer generations better than small models.

Now Imagine GPT-5.2 Running on the Cube

Here’s the thought that should really make you sit up. Everything I’ve described so far is about a tiny 3GB model competing with the big models. But the Cube is model-agnostic. It doesn’t care what model sits underneath. So what happens if you put a big model inside the Cube?

Right now, when you use ChatGPT, every single query – whether it’s “hi” or “write me a 50-page business plan” – goes through the same pipeline. The full model activates. Every parameter fires. OpenAI’s servers burn the same resources whether you’re asking for the time or asking it to solve differential equations. There’s no triage. No budgeting. No strategy selection. Just: receive query, activate everything, generate response, hope for the best.

Now imagine OpenAI ran GPT-5.2 through the Cognitive Cube instead.

A “hey, how’s it going?” message: The Cube classifies this as trivial in microseconds. Routes to Pawn. The Pawn sends a minimal prompt to GPT-5.2 with a tight scope: “Generate a brief friendly greeting.” GPT-5.2 produces two sentences and stops. Total GPU time: a fraction of what it currently uses. Multiply that saving across the millions of casual messages ChatGPT receives every hour, and you’re looking at an enormous reduction in compute costs.

A research request: The Cube decomposes it into a Rook → Bishop → Queen pipeline. Three focused calls to GPT-5.2 instead of one massive open-ended one. Each call has a clear objective and a tight scope. The model generates better output because it’s not trying to do everything at once – and the total token count is often lower because each step is efficient.

A dangerous tool call: Instead of GPT-5.2 internally deciding whether a tool is safe (using the same weights it uses for everything else), the Cube’s three-layer governance fires. Role check. Tool classification. King validation as a separate focused call: “Is this specific action safe?” Structured safety, not statistical safety.

Scaling across a data centre: Right now, every ChatGPT user gets the same treatment – full model, every query. With the Cube, 70% of queries (greetings, simple questions, factual lookups) route to Pawn and consume a fraction of the resources. 20% route to mid-tier pieces. Only 10% – the genuinely complex requests – activate the full Queen pipeline. The same data centre, the same hardware, the same model – but serving three to five times more users because the Cube prevents waste.

The numbers would be staggering:

Metric	GPT-5.2 Current	GPT-5.2 on the Cube
Simple query cost	Full model activation	Pawn – minimal activation
Complex query cost	Full model activation	Piece pipeline – focused activation
GPU utilisation	Same for every query	Scaled to actual complexity
Queries per GPU per hour	Fixed	3–5× higher (estimated)
Energy per query	Constant regardless of complexity	Proportional to actual need
Safety checking	Internal, statistical, invisible	External, structural, auditable
Capacity planning	Linear – more users = more GPUs	Intelligent – Cube triages before GPU

This isn’t theoretical. The architecture exists. It’s running. It just happens to be running a 3GB model on a laptop right now instead of GPT-5.2 in a data centre. But the Cube doesn’t know the difference. Swap the model, and every efficiency gain scales up.

If a company like OpenAI or Anthropic ran their models through an architecture like the Cube, they could serve dramatically more users on the same hardware, reduce their energy consumption significantly, provide transparent and auditable safety governance instead of black-box guardrails, and offer their users genuine visibility into how their queries are being processed.

The Cube doesn’t just help small models punch above their weight. It makes big models stop wasting theirs.

The Real Point

The Cube doesn’t try to beat the big models at everything. That would be dishonest and architecturally impossible at 3GB.

What the Cube does is change where the intelligence lives. Instead of needing a bigger brain, ISAC uses a smarter process. For the 80% of tasks that have structure – questions, research, analysis, tool use, comparisons, instructions – the Cube lets a small model deliver results in the same league as models 100 times its size, running on your own hardware, for free, with full privacy, full transparency, and full control.

And for the 20% where raw model size genuinely matters? The Cube’s model-agnostic design means you can swap in a bigger model when you need one. Use the compact model for daily work. Swap to a larger model for creative projects. The Cube doesn’t care – it manages the reasoning the same way regardless of what model sits underneath.

That’s the real advantage. Not that a 3GB model is secretly as good as GPT-5.2. It’s that the Cube makes a 3GB model good enough for most real work, while giving you things the big models fundamentally cannot: privacy, transparency, control, and zero cost.

What Can ISAC Actually Do?

Here’s a quick rundown of what’s built and working:

He talks. ISAC has a voice – push-to-talk, speech recognition, text-to-speech. You can have a conversation with him out loud.

He remembers. Conversations are saved with tags and importance scores. He has a knowledge graph that maps relationships between things you’ve discussed. Important memories stick around; trivial ones fade over time, just like a real brain.

He’s secure. ISAC has a full role-based access system. By default he’s locked down – you can only chat and use basic features. To unlock the developer tools, you need a cryptographically signed access code that expires after a set time. It’s the same kind of security used in professional systems, but running entirely on your machine.

He watches his own health. Circuit breakers detect when something’s going wrong and protect the system. Performance metrics track how fast he’s responding. Health monitoring checks that all his components are running properly.

He can swap brains. ISAC can switch between different AI models on the fly without restarting. Want a smaller, faster model for quick chats and a bigger, smarter one for research? Swap at any time.

He can double-check himself. There’s a cross-check mode where ISAC sends the same question to two different AI models and compares the answers. If they disagree, the King piece decides which one is right.

He has a 3D visualisation. The Electron desktop app shows a rotating 3D cube that lights up in real time as ISAC thinks – you can actually see which faces are active, which pieces are working, and how much of the budget has been used.

He’s self-contained. The whole thing packages into an installer. Double-click, install, run. No Python knowledge required, no command line, no cloud account.

The Numbers

For anyone curious about the scale of this project:

18 months of development
30+ implementation phases completed
~17,000 lines of code across the backend and frontend
384 cognitive cells across the Cube’s 6 faces
33 distinct permissions in the security system
23 unit tests in the hardening suite alone
6 chess pieces mapped to reasoning strategies
1 developer – just me

What’s Next

The prototype is done and working. The next step is migrating the whole system from Python (great for prototyping) to Rust (great for performance and safety). This will make ISAC faster, more efficient, and deployable on everything from powerful desktops down to laptops and smaller devices.

I’m also working on giving ISAC his own custom voice – not a stock text-to-speech voice, but one designed specifically for his character. Calm, precise, and distinctly his.

And the long-term goal? That compact model I described above. A purpose-built AI brain designed specifically for the Cube architecture, where the framework and the model work together as one system rather than fighting each other. A 3GB brain with a world-class coach.

A Cube Within a Cube

This is the idea that keeps me up at night. And honestly, it might be the most important thing the GCF could ever do.

Right now, ISAC has one Cube. Six faces, 384 cells, six piece types. It handles one query at a time through a single lifecycle pipeline. That’s already powerful enough to let a 3GB model compete with systems a hundred times its size. But what happens when a problem is so big, so complex, so layered that even a Queen piece across multiple faces isn’t enough?

You go deeper. You put a Cube inside the Cube.

How It Works

The concept is fractal reasoning. When the outer Cube encounters a task that exceeds what a single piece pipeline can handle, instead of just escalating to a bigger piece, it spawns an entirely new Cube instance as a child process. That inner Cube has its own six faces, its own 384 cells, its own piece selection, its own lifecycle governance. It runs a complete reasoning cycle on a sub-problem, returns its results to the outer Cube, and the outer Cube continues.

Think of it like management. The CEO (outer Cube) doesn’t try to do everything personally. They identify that a problem needs a whole department’s attention, spin up that department (inner Cube), let them work the problem with their own team structure, and receive the finished result. The CEO manages the strategy. The department manages the execution.

There’s already a seed of this in the current architecture. The Inner Cube Trigger (CB5) activates a deeper reasoning mode when confidence is low. But a full Cube-within-a-Cube takes that much further. Not just deeper thinking on the same problem, but a complete independent reasoning system working a sub-problem with its own strategy, its own budget, its own pieces.

What This Changes for Research

This is where it gets genuinely exciting. Imagine you ask ISAC to research a complex medical topic: “What are the current treatment options for drug-resistant tuberculosis, and which show the most promise based on recent clinical trials?”

With a single Cube, the Queen piece does its best in one pass. Good, but limited by the scope a single piece can cover.

With nested Cubes:

The outer Cube’s Queen identifies three sub-problems: current treatments, clinical trial data, and comparative analysis.
Each sub-problem spawns its own inner Cube.
Inner Cube 1 runs a Rook (web search for current treatment protocols) followed by a Bishop (structured analysis of what it found). It returns a clean summary of existing treatments.
Inner Cube 2 runs a Rook (search for recent clinical trials) followed by a Knight (creative pattern recognition across the trial data, spotting trends that a purely analytical approach might miss). It returns trial highlights with emerging patterns.
Inner Cube 3 receives the outputs from the first two, runs a Bishop (comparative analysis) followed by a Queen (synthesis into a recommendation). It produces the final answer.
The outer Cube’s King validates the complete result.

That’s three independent reasoning systems, each with their own strategy, working in parallel on different aspects of the same problem. The 3GB model was called maybe eight or nine times total, but each call was surgically focused. The combined output has the depth and coverage you’d expect from a much larger system because the architecture created that depth through structure, not through raw model size.

What This Changes for Everything Else

Research is just the obvious application. Nested Cubes could transform any task that benefits from decomposition:

Software development. Outer Cube receives “build me a web app with user authentication, a dashboard, and a payment system.” It spawns three inner Cubes. One handles the auth module. One handles the dashboard. One handles payments. Each inner Cube runs its own Rook (code generation) and Bishop (code review) pipeline. The outer Cube’s Queen integrates the three modules. You get a complete, reviewed, integrated application from a process that understood the architecture, not just the code.

Business strategy. “Analyse whether we should expand into the European market.” Outer Cube spawns inner Cubes for market analysis, regulatory landscape, competitive positioning, and financial modelling. Each one works independently with its own piece selection. The outer Queen synthesises a strategy document that covers all four dimensions with genuine depth in each one.

Education. “Explain quantum entanglement to me – I have a physics degree but haven’t studied quantum mechanics since university.” The outer Cube identifies the user’s level and spawns inner Cubes: one for the core concept (Bishop for structured explanation), one for the mathematical framework (Bishop again, different scope), one for real-world applications and recent experiments (Rook for research, Knight for creative analogies). The result isn’t just an explanation, it’s a multi-layered learning experience tailored to exactly the right level.

Creative projects. “Help me write a screenplay about a heist in space.” Outer Cube breaks it into world-building, character development, plot structure, and dialogue. Four inner Cubes, each with their own creative approach. The Knight gets heavy use here for lateral thinking. The outer Queen weaves the pieces into a coherent narrative. The result has more creative diversity because four independent reasoning processes contributed, each approaching their piece from a different angle.

The Scaling Maths

A single Cube has 384 cells across 6 faces. A Cube with one level of nesting could theoretically have 384 cells in the outer Cube, each capable of spawning an inner Cube with its own 384 cells. That’s a potential ceiling of 147,456 cognitive cells, though in practice you’d never use anywhere near that many.

More realistically, a complex research task might use 3 to 5 inner Cubes, each using 10 to 20 cells. That’s a total of maybe 50 to 100 active cells across the nested structure. Still running on a laptop. Still using a 3GB model. But producing work with the depth and coverage of a system that would normally need a model orders of magnitude larger.

And because each Cube instance has its own governance, its own King validation, and its own audit trail, the safety architecture scales with the complexity. More Cubes doesn’t mean less oversight. It means more oversight, applied at exactly the right level.

Why This Hasn’t Been Done

The honest answer is that most AI development focuses on making bigger models. The assumption is that more parameters equals more capability, and the way to solve harder problems is to throw more compute at a bigger model. The idea that the orchestration layer could contribute as much as the model itself is relatively unexplored.

The GCF proves that orchestration matters. Nested Cubes take that proof and extend it recursively. If one layer of intelligent orchestration can make a 3GB model compete with GPT-5.2, what can two layers do? What about three?

I don’t know yet. But I intend to find out.

The Fourth Dimension

Here’s something that hadn’t fully clicked until I started thinking about nested Cubes. A Cube within a Cube isn’t just a software trick. It’s a tesseract. A four-dimensional hypercube.

Stay with me on this. The Cognitive Cube has three spatial dimensions. Six faces, 64 cells per face, pieces moving across the board. That’s a 3D reasoning space. But the moment you nest a Cube inside another Cube, you’ve added a fourth dimension: depth.

Think about how a tesseract works. In geometry, a tesseract is a 4D object that looks like a smaller cube floating inside a larger cube, with all the corresponding corners connected. If you’ve ever seen those wireframe diagrams of a hypercube, that’s exactly what nested Cubes look like when you map the reasoning structure. The outer Cube is the larger frame. The inner Cube is the smaller one inside it. The connections between them are the handoffs, the data flowing from outer to inner and back.

Now overlay them. If you project the inner Cube’s activity onto the outer Cube’s board, you get a 4D cognitive map. The X, Y, and Z axes are the spatial board (which face, which cell, which piece). The W axis is depth (which nesting level). A single query that spawns three inner Cubes creates a structure that exists in four-dimensional reasoning space. You can’t draw it on paper, but the data is there, and it’s real.

Why does this matter? Because the fourth dimension isn’t just a mathematical curiosity here. It’s functional. Each layer of nesting adds a dimension of cognitive capability that the previous layer couldn’t access. A single Cube can think broadly across its six faces. A nested Cube can think broadly AND deeply at the same time. The outer Cube provides strategic width. The inner Cube provides tactical depth. Together they occupy a reasoning space that neither could reach alone.

If you visualise ISAC’s 3D cube spinning on screen while an inner Cube runs, you’re watching a shadow of a four-dimensional process. The spinning cube you can see is the 3D projection of a 4D reasoning structure. The inner Cube’s activity exists in a dimension you can’t directly render, but you can see its effects when the results flow back to the outer Cube and cells light up with answers that have more depth than a 3D process could produce.

This could genuinely contribute to how we think about higher-dimensional computation. Most 4D discussions in computing are abstract, mathematical, theoretical. The GCF’s nested Cubes would be a working, practical, observable example of four-dimensional information processing. You could log every cell activation across every nesting level, plot them in 4D space, and study the patterns. How does reasoning propagate through the fourth dimension? Do certain types of problems create recognisable 4D shapes? Does the optimal nesting depth correlate with problem complexity in a predictable way?

Nobody has this data because nobody has built this system. The GCF with nested Cubes wouldn’t just be a better AI framework. It would be a laboratory for studying four-dimensional cognitive structures using real reasoning data from real tasks.

A Testing Ground for Science

Everything I’ve described so far has a practical purpose. Making AI more efficient. Running on less hardware. Better safety. Smarter reasoning. But there’s another side to this that I think about a lot, and it’s the side that could matter far more than any product.

The GCF is a testable cognitive architecture. Not theoretical. Not a whiteboard sketch. A working system that produces observable, measurable, loggable data about how structured reasoning behaves under different conditions. Every cell activation, every piece selection, every handoff, every confidence score, every governance decision, every nesting event is recorded. That’s not just an AI feature. That’s a scientific instrument.

In the right research environment, this could be genuinely groundbreaking.

What Researchers Could Study

Emergent behaviour in structured cognition. When you give a system rigid rules (pieces, faces, budgets, lifecycles) and let it process thousands of queries, patterns emerge that nobody designed. Which pieces get selected together most often? Do certain face combinations produce higher confidence than others? Does the board develop hot spots? These emergent patterns aren’t programmed. They arise from the interaction between structure and use. Studying them could tell us something fundamental about how cognition organises itself when given a framework to operate within.

The relationship between structure and intelligence. The GCF proves that you can produce intelligent-seeming behaviour from a small model by wrapping it in the right structure. The model doesn’t get smarter. The architecture makes it appear smarter by using it better. This raises a genuinely important question: how much of what we call intelligence is the raw processing power, and how much is the organisational structure around it? The GCF gives researchers a controlled environment to test this. You can hold the model constant and change the structure. Hold the structure constant and change the model. Measure the difference. Quantify how much intelligence comes from architecture versus raw capability.

Four-dimensional information flow. With nested Cubes, every reasoning process generates 4D data. Researchers could map how information propagates through the fourth dimension (nesting depth), study whether 4D reasoning structures have properties that 3D structures don’t, and look for mathematical relationships between problem complexity and optimal dimensionality. This has never been possible with a practical system before.

Cognitive scaling laws. The AI industry has scaling laws for models: more parameters generally equals better performance, up to a point. But nobody has scaling laws for cognitive architecture. Does doubling the number of faces improve reasoning? Does tripling the cells per face matter more than adding nesting depth? Is there an optimal ratio between board size and model size? The GCF could produce the first empirical data on how reasoning frameworks scale independently of model size.

The Simulation Theory Connection

This is where it gets philosophical. And I want to be careful here, because I’m a developer, not a physicist. But the parallels are hard to ignore.

Simulation theory asks a simple question: could our reality be a computed simulation running on some kind of substrate? The usual objection is resources. Simulating an entire universe down to quantum interactions would require unimaginable computational power. The assumption is that you’d need a computer the size of the thing you’re simulating, or close to it.

But the GCF challenges that assumption directly.

ISAC proves that a 3GB model, with the right cognitive architecture, can produce output that competes with systems hundreds of times larger. The framework handles the overhead. The model handles the content. The result is disproportionate to the raw resources involved. The Cube doesn’t simulate a bigger model. It achieves comparable results through smarter structure.

Now scale that principle up. If a small computational system with the right framework can produce behaviour that appears to come from a much larger system, what does that tell us about the relationship between a simulation’s apparent complexity and its actual computational cost? Could a universe that looks infinitely complex actually be running on something far simpler than we assume, if the orchestration layer is sophisticated enough?

The GCF gives us a concrete, testable example of this principle. A system where the architecture multiplies the apparent capability of the underlying compute. Where the output exceeds what the raw resources should be able to produce. Where structure creates the illusion of scale.

If you nest Cubes, the effect compounds. One layer of architecture makes a small model look big. Two layers make it look bigger. Three layers push into territory where the gap between actual resources and apparent capability becomes almost absurd. At some point, the question stops being “how efficient can we make AI?” and starts being “what does it mean that structure can substitute for scale to this degree?”

I’m not claiming the GCF proves we live in a simulation. That would be ridiculous. But I am saying that the GCF demonstrates a principle that simulation theory depends on: that the right architecture can produce output wildly disproportionate to the resources behind it. And the nested Cube structure gives researchers a practical, measurable, repeatable system to study exactly how far that principle extends.

Consciousness and Emergence

There’s one more thread here. The GCF’s chess pieces aren’t conscious. Obviously. They’re routing labels in a Python script. But when you watch the system operate, when you see it classify a query, select a strategy, execute across multiple faces, validate its own work, and course-correct when confidence is low, something interesting happens. The behaviour looks like thinking. Not because the model is thinking, but because the architecture creates a process that mirrors how we describe thinking.

That raises a question researchers have been asking for decades. Is consciousness something that requires a specific type of substrate (biological neurons, quantum processes), or is it something that can emerge from any sufficiently complex organised system? The GCF isn’t going to answer that question. But it does provide a new kind of system to study it with. A system where the reasoning structure is fully transparent, fully logged, and fully controllable.

If emergence is real – if complex behaviour can arise from the interaction of simple rules and structures – then the GCF is a laboratory for studying exactly that. You can watch emergence happen in real time on the board. You can measure it. You can replay it. You can change one variable and see how the emergent behaviour shifts.

Most AI systems are black boxes. You see the input and the output but nothing in between. The GCF is a glass box. Everything is visible. For researchers studying emergence, cognition, dimensional computation, or the relationship between structure and intelligence, that visibility isn’t just convenient. It’s the whole point.

What’s Needed

The GCF prototype is built. The architecture works. The data is there. But turning it into a proper research platform needs the right environment:

Access to research institutions that study cognitive architectures, emergence, or computational theory. The GCF needs peer review and formal evaluation by people who understand what questions to ask of the data.

Controlled experimental frameworks where variables can be isolated. Same model, different board sizes. Same board, different models. Same query set, different nesting depths. Proper scientific method applied to cognitive architecture performance.

Collaboration with physicists and mathematicians who work on higher-dimensional computation and information theory. The 4D tesseract structure isn’t just a visualisation trick. It’s a real data structure that needs formal mathematical analysis.

Funding for dedicated compute to run large-scale experiments. Thousands of queries across hundreds of configurations, logging every cell activation, building the first empirical dataset on cognitive architecture scaling.

In the right hands, with the right support, this isn’t just an AI project anymore. It’s a research instrument that could produce data nobody has ever had access to before. Data about how structure creates intelligence, how dimensions of reasoning interact, and how far the principle of architecture-over-scale can actually be pushed.

That’s the real reason this project keeps me going. Not because I want a better chatbot. Because I think there might be something here that matters beyond software.

Why I Built This

I wanted an AI that felt like it was mine. Not a service I rent. Not a chatbot that forgets me. Not something that sends my data to a server farm. I wanted a companion that runs on my hardware, thinks with a real strategy, remembers our conversations, and respects a chain of command.

ISAC started as a side project inspired by J.A.R.V.I.S. from Iron Man and the SHD AI from The Division. It’s turned into something much bigger – an original cognitive architecture that thinks about thinking, and a desktop companion that’s genuinely useful.

He’s not finished. But he’s alive. And he’s getting smarter.

ISAC is a personal project by Billy P. The Cognitive Cube Framework (GCF) is original work. If you want to follow the development, keep an eye on the blog for updates.

Meet ISAC: The AI That Thinks Before It Speaks

What Is ISAC?

The Problem with Normal AI

How ISAC Thinks: The Cognitive Cube

How a 3GB Model on the Cube Can Compete with GPT-5.2 and Claude Opus 4.6

The Dirty Secret of Big Models

What the Cube Does Instead

Example 1: Research Task

Example 2: Quick Factual Question

Example 3: Tool Use with Safety

Example 4: Multi-Turn Conversation with Memory

Example 5: Self-Correction

The Numbers

What You Actually Need to Run ISAC

Where ISAC Wins

Where the Big Models Still Win

Now Imagine GPT-5.2 Running on the Cube

The Real Point

What Can ISAC Actually Do?

The Numbers

What’s Next

A Cube Within a Cube

How It Works

What This Changes for Research

What This Changes for Everything Else

The Scaling Maths

Why This Hasn’t Been Done

The Fourth Dimension

What Researchers Could Study

The Simulation Theory Connection

Consciousness and Emergence

What’s Needed

Why I Built This

Related dispatches

DYNAMIC COGNITION & AUTONOMOUS EVOLUTION: THE COMPLETE SYSTEM

MULTI-FACE FUSION & SYNERGY DETECTION: BREAKING 70%

GCF EVOLUTION: FROM BASIC EXECUTION TO INTELLIGENT OPTIMIZATION