What does Lily Labs do?

We design AI systems that can be trusted with consequential work. Three architectural layers — concept-native reasoning, persistent intelligence, and an evidential discipline pipeline — combine into a single anti-fabrication stack for regulated finance, healthcare, defence, critical national infrastructure, and scientific research.

Are you a research lab or a consultancy?

Both. Roughly half our time is original research — some fundamental, some applied. The other half is consulting engagements that draw on that research. Each side feeds the other: research is pressure-tested against real client constraints, and client engagements surface research questions that matter.

Who do you typically work with?

Government and defence procurement, critical national infrastructure operators, enterprise AI teams in regulated sectors such as financial services and healthcare, IP lawyers and investors evaluating our portfolio, and academic research collaborators.

How do engagements begin?

Most start with a fixed-scope discovery — typically two to four weeks — after which the shape of any longer engagement is clear to both sides. Written enquiries only: no phone, no chat, no calendar booking. Initial enquiries are handled by Steve Brailsford directly.

Can you work in air-gapped or classified environments?

Yes. Our architectures are designed for consumer-grade compute, with no cloud dependency and no per-token billing. Air-gap-compatible by default. Where engagements require it, we work within existing client security perimeters.

Is this the same company as R U AI Ready?

Yes. R U AI Ready is our consumer-facing audit service — same company (Lily Labs Ltd), different audience. ruaiready.co.uk gives UK businesses an instant AI-search-readiness score; lily-labs.co.uk is for enterprise and research engagements.

Lily Labs Ltd · United Kingdom

Lily Labs

Designing AI systems you can trust with consequential work.

We’re a research lab building technology that makes AI evidentially disciplined — not through prompting, but through architecture.

Explore the research Request a briefing

The problem

AI fabricates.

For most applications, you catch it and move on. For regulated finance, healthcare, defence, and scientific research, you can’t. These domains need AI that sources its claims, audits its reasoning, and retracts when wrong — as an architectural property, not a hope.

Our approach

Three layers, one system

Anti-fabrication isn’t a single fix. It’s a stack: a verified foundation underneath, persistent memory in the middle, audit discipline on top. Each layer catches failures the others can’t. We’re building all three.

Layer 1

Concept-Native Intelligence — the reasoning foundation

Today’s AI is statistical autocomplete. It predicts the next word; it doesn’t represent meaning directly. Hallucination isn’t a bug in that architecture — it’s the architecture working as designed.

We’ve patented a different foundation. Meaning lives in a geometric concept space built from designed primitives — the irreducible units of meaning, positioned by explicit pairwise similarity rather than by statistical co-occurrence. Words become coordinates. Reasoning becomes geometric operations on those coordinates: distances, paths, midpoints, graph traversal.

Three properties matter for trust:

Auditable. Every dimension has a defined meaning. Every reasoning step is a traceable operation, not a forward pass through a black box.
Updatable without retraining. Adding a fact is adding an edge. No GPU, no million-pound retraining run.
Cannot fabricate beyond its graph. Asked about something not in the graph, the system returns “I don’t know” — because there is no path to traverse. Structural, not a tuning target.

Independently corroborated: when researchers used Sparse Autoencoders to extract what GPT-2 had learned from billions of words of text, eleven of our twelve designed primitive categories appeared in its features. Two completely independent approaches converged on the same underlying structure.

UK patent application filed 31 March 2026 (Stephen G M Brailsford, inventor). The architecture is generalised. The first productised application is Structured Concept Data (SCD) for industrial control systems. Theoretical physics runs in parallel as a research-grade application — a domain that detects fabrication instantly.

Layer 2

Persistent Intelligence — the continuity layer

An AI that never starts from zero. Continuous memory, intelligent context assembly, token-optimised retrieval across thousands of sessions.

We’ve been running this for six months across 34,000 sessions — it catches its own errors, builds on its own history, and maintains coherence over timescales that defeat session-bound systems. It’s our most mature technology and the backbone of everything else we do.

Layer 3

Evidential Discipline — the governance layer

Research pipelines where data in is approved, claims out are audited, and errors are retracted with a record. No claim without a source. No result without a condition that could kill it.

We stress-tested the methodology on theoretical physics — a domain where fabrication is instantly detectable, because numbers either match nature or they don’t. Applied to a peer-review-bound paper, it produced 88 audited results, six public retractions, and pre-committed falsification conditions. It caught overclaims within hours and stripped fabricated precision from headline numbers. Physics is our gauntlet, not our market — if the discipline holds there, it holds in finance, healthcare, and defence.

Read how

Status

What we’ve proven, what we’re building

Proven — concept-native architecture: The CNI engine runs end-to-end on consumer hardware. 2,054 designed primitives positioned in a 44-dimensional interpretable concept space, computed from 3,600+ pairwise similarity relationships. 5,387 word engrams provide the English-to-concept-space translation layer. No neural network, no GPU, no training — a ~6MB lookup table plus linear algebra. 255 of 255 engram regression tests pass; Wierzbicka NSM semantic-primes coverage 89.2%.
Proven — persistent intelligence: Six months of continuous operation across 34,000 sessions.
Proven — discipline methodology: Applied to a peer-review-bound physics paper: 88 audited results, six public retractions, pre-committed falsification conditions. Caught overclaims within hours.
Proven — Structured Concept Data (industrial systems): Productised as Structured Concept Data (SCD) — the first vertical specialisation of the CNI graph — and applied to real ICS documentation across eleven major industrial vendors — Siemens, Allen-Bradley, ABB, Schneider, Honeywell, Beckhoff, Yokogawa, Omron, Phoenix, Wago, and Emerson. 757 entities extracted across 2,190 pages with zero validation errors. The graph reasons natively over IEC 60812 (FMEA), IEC 61025 (fault trees), IEC 61078 (reliability block diagrams), and IEC 61508 (SIL) — every claim sourced to document, page, and bounding box. Below a 0.5 confidence threshold, the system returns “I don’t know” with a list of the specific source data that would raise confidence.
Patented: UK patent application filed 31 March 2026, priority date established. Implementation of the generalised concept space and the first domain instances is underway.
In active design: The integrated architecture — concept foundation, persistent intelligence, and discipline pipeline as one system — informed by six months of learning where AI fails and what catches the failures.

We’re clear about which is which.

Who this is for

Where fabrication isn’t an option

Regulated industries: Finance, healthcare, public sector. Where every AI output needs a trail.
Critical national infrastructure: Power, water, transport, manufacturing. Where industrial control systems generate the documentation and fabricated analysis carries operational risk.
Defence and security research: Where air-gapped operation, provenance per claim, and standards-aligned reasoning aren't optional features.
Scientific research: Where AI-assisted discovery needs the same rigour as the science itself.

Questions

Common questions

What does Lily Labs do?: We design AI systems that can be trusted with consequential work. Three architectural layers — concept-native reasoning, persistent intelligence, and an evidential discipline pipeline — combine into a single anti-fabrication stack for regulated finance, healthcare, defence, critical national infrastructure, and scientific research.
Are you a research lab or a consultancy?: Both. Roughly half our time is original research — some fundamental, some applied. The other half is consulting engagements that draw on that research. Each side feeds the other: research is pressure-tested against real client constraints, and client engagements surface research questions that matter.
Who do you typically work with?: Government and defence procurement, critical national infrastructure operators, enterprise AI teams in regulated sectors such as financial services and healthcare, IP lawyers and investors evaluating our portfolio, and academic research collaborators.
How do engagements begin?: Most start with a fixed-scope discovery — typically two to four weeks — after which the shape of any longer engagement is clear to both sides. Written enquiries only: no phone, no chat, no calendar booking. Initial enquiries are handled by Steve Brailsford directly.
Can you work in air-gapped or classified environments?: Yes. Our architectures are designed for consumer-grade compute, with no cloud dependency and no per-token billing. Air-gap-compatible by default. Where engagements require it, we work within existing client security perimeters.
Is this the same company as R U AI Ready?: Yes. R U AI Ready is our consumer-facing audit service — same company (Lily Labs Ltd), different audience. ruaiready.co.uk gives UK businesses an instant AI-search-readiness score; lily-labs.co.uk is for enterprise and research engagements.

Talk to us

If your work can’t tolerate fabrication, we’d like to hear about it.

curious@lily-labs.co.uk

Request a briefing