Lily Labs logoLily Labs

Lily Labs Ltd · United Kingdom

Lily Labs

Designing AI systems you can trust with consequential work.

We’re a research lab building technology that makes AI evidentially disciplined — not through prompting, but through architecture.

The problem

AI fabricates.

For most applications, you catch it and move on. For regulated finance, healthcare, defence, and scientific research, you can’t. These domains need AI that sources its claims, audits its reasoning, and retracts when wrong — as an architectural property, not a hope.

Our approach

Three layers, one system

Anti-fabrication isn’t a single fix. It’s a stack: a verified foundation underneath, persistent memory in the middle, audit discipline on top. Each layer catches failures the others can’t. We’re building all three.

Three architectural layers, one systemArchitectural cross-section of the Lily Labs anti-fabrication stack, drawn as a pyramid. The widest tier sits at the bottom; the narrowest at the top. Three stacked tiers, top to bottom. Layer 3 — Evidential Discipline, the governance layer (narrowest), asks “does it survive scrutiny?” and contains kill conditions, derivation chains, and retract operations. Layer 2 — Persistent Intelligence, the continuity layer, asks “what do we already know?” and contains memory, context, and continuity across sessions. Layer 1 — Concept-Native Intelligence, the reasoning foundation (widest, supporting the layers above), asks “what is true, and how do we know?” and contains concept space, typed primitives, and provenance. Two upward arrows show data flow: retrieval and grounding from Layer 1 to Layer 2; claims submitted for audit from Layer 2 to Layer 3. A side curve on the right closes the loop — retractions and corrections flow from Layer 3 back down to Layer 1, so the system corrects its own record.EVIDENTIAL DISCIPLINELAYER 3“Does it survive scrutiny?”kill conditions · derivation chains · retractclaims submitted for auditPERSISTENT INTELLIGENCELAYER 2“What do we already know?”memory · context · continuity across sessionsretrieval + groundingCONCEPT-NATIVE INTELLIGENCELAYER 1“What is true, and how do we know?”concept space · typed primitives · provenanceretractions + corrections

Layer 1

Concept-Native Intelligence — the reasoning foundation

Today’s AI is statistical autocomplete. It predicts the next word; it doesn’t represent meaning directly. Hallucination isn’t a bug in that architecture — it’s the architecture working as designed.

We’ve patented a different foundation. Meaning lives in a geometric concept space built from designed primitives — the irreducible units of meaning, positioned by explicit pairwise similarity rather than by statistical co-occurrence. Words become coordinates. Reasoning becomes geometric operations on those coordinates: distances, paths, midpoints, graph traversal.

Three properties matter for trust:

  • Auditable. Every dimension has a defined meaning. Every reasoning step is a traceable operation, not a forward pass through a black box.
  • Updatable without retraining. Adding a fact is adding an edge. No GPU, no million-pound retraining run.
  • Cannot fabricate beyond its graph. Asked about something not in the graph, the system returns “I don’t know” — because there is no path to traverse. Structural, not a tuning target.

Independently corroborated: when researchers used Sparse Autoencoders to extract what GPT-2 had learned from billions of words of text, eleven of our twelve designed primitive categories appeared in its features. Two completely independent approaches converged on the same underlying structure.

UK patent application filed (Stephen G M Brailsford, inventor). The architecture is generalised. The first productised application is Structured Concept Data (SCD) for industrial control systems. Theoretical physics runs in parallel as a research-grade application — a domain that detects fabrication instantly.

Layer 2

Persistent Intelligence — the continuity layer

An AI that never starts from zero. Continuous memory, intelligent context assembly, token-optimised retrieval across thousands of sessions.

We’ve been running this for six months across 34,000 sessions — it catches its own errors, builds on its own history, and maintains coherence over timescales that defeat session-bound systems. It’s our most mature technology and the backbone of everything else we do.

Layer 3

Evidential Discipline — the governance layer

Research pipelines where data in is approved, claims out are audited, and errors are retracted with a record. No claim without a source. No result without a condition that could kill it.

We stress-tested the methodology on theoretical physics — a domain where fabrication is instantly detectable, because numbers either match nature or they don’t. Applied to a peer-review-bound paper, it produced 88 audited results, six public retractions, and pre-committed falsification conditions. It caught overclaims within hours and stripped fabricated precision from headline numbers. Physics is our gauntlet, not our market — if the discipline holds there, it holds in finance, healthcare, and defence.

Read how

Status

What we’ve proven, what we’re building

Proven — concept-native architecture
The CNI engine runs end-to-end on consumer hardware. 2,054 designed primitives positioned in a 44-dimensional interpretable concept space, computed from 3,600+ pairwise similarity relationships. 5,387 word engrams provide the English-to-concept-space translation layer. No neural network, no GPU, no training — a ~6MB lookup table plus linear algebra. 255 of 255 engram regression tests pass; Wierzbicka NSM semantic-primes coverage 89.2%.
Proven — persistent intelligence
Six months of continuous operation across 34,000 sessions.
Proven — discipline methodology
Applied to a peer-review-bound physics paper: 88 audited results, six public retractions, pre-committed falsification conditions. Caught overclaims within hours.
Proven — Structured Concept Data (industrial systems)
Productised as Structured Concept Data (SCD) — the first vertical specialisation of the CNI graph — and applied to real ICS documentation across eleven major industrial vendors — Siemens, Allen-Bradley, ABB, Schneider, Honeywell, Beckhoff, Yokogawa, Omron, Phoenix, Wago, and Emerson. 757 entities extracted across 2,190 pages with zero validation errors. The graph reasons natively over IEC 60812 (FMEA), IEC 61025 (fault trees), IEC 61078 (reliability block diagrams), and IEC 61508 (SIL) — every claim sourced to document, page, and bounding box. Below a 0.5 confidence threshold, the system returns “I don’t know” with a list of the specific source data that would raise confidence.
Patented
UK patent application filed , priority date established. Implementation of the generalised concept space and the first domain instances is underway.
In active design
The integrated architecture — concept foundation, persistent intelligence, and discipline pipeline as one system — informed by six months of learning where AI fails and what catches the failures.

We’re clear about which is which.

Who this is for

Where fabrication isn’t an option

Regulated industries
Finance, healthcare, public sector. Where every AI output needs a trail.
Critical national infrastructure
Power, water, transport, manufacturing. Where industrial control systems generate the documentation and fabricated analysis carries operational risk.
Defence and security research
Where air-gapped operation, provenance per claim, and standards-aligned reasoning aren't optional features.
Scientific research
Where AI-assisted discovery needs the same rigour as the science itself.

Questions

Common questions

What does Lily Labs do?
We design AI systems that can be trusted with consequential work. Three architectural layers — concept-native reasoning, persistent intelligence, and an evidential discipline pipeline — combine into a single anti-fabrication stack for regulated finance, healthcare, defence, critical national infrastructure, and scientific research.
Are you a research lab or a consultancy?
Both. Roughly half our time is original research — some fundamental, some applied. The other half is consulting engagements that draw on that research. Each side feeds the other: research is pressure-tested against real client constraints, and client engagements surface research questions that matter.
Who do you typically work with?
Government and defence procurement, critical national infrastructure operators, enterprise AI teams in regulated sectors such as financial services and healthcare, IP lawyers and investors evaluating our portfolio, and academic research collaborators.
How do engagements begin?
Most start with a fixed-scope discovery — typically two to four weeks — after which the shape of any longer engagement is clear to both sides. Written enquiries only: no phone, no chat, no calendar booking. Initial enquiries are handled by Steve Brailsford directly.
Can you work in air-gapped or classified environments?
Yes. Our architectures are designed for consumer-grade compute, with no cloud dependency and no per-token billing. Air-gap-compatible by default. Where engagements require it, we work within existing client security perimeters.
Is this the same company as R U AI Ready?
Yes. R U AI Ready is our consumer-facing audit service — same company (Lily Labs Ltd), different audience. ruaiready.co.uk gives UK businesses an instant AI-search-readiness score; lily-labs.co.uk is for enterprise and research engagements.

Talk to us

If your work can’t tolerate fabrication, we’d like to hear about it.

curious@lily-labs.co.uk