Generative AI.
RAG, agents, copilots and content systems — engineered for production with evaluation, guardrails and observability from day one. No demo-ware.
Agents and RAG, done properly.
A GenAI prototype takes a weekend. A GenAI product takes retrieval quality, evaluation, safety, observability, cost control, integration and change management. We ship the product.
We've built agents for claims triage, tenancy queries, audit evidence, tax enquiries and procurement — each live in production, with human-in-the-loop where stakes require it.
Where we help.
LLM selection & evaluation
Bake-off across frontier and small open models on your ground truth. Commercials, latency, safety, quality — all on the same sheet.
RAG & knowledge-base design
Chunking strategy, hybrid retrieval, re-ranking, citations, freshness, access controls. Not a vector-DB demo.
Agents & workflows
Tool-using agents with typed schemas, deterministic guard-rails, retries, fallback paths, cost budgets per call.
Copilots & UX
In-product copilots with the right affordances for trust: citations, edits, undo, audit trail, feedback loop.
Safety, evals & observability
Offline eval harness, online quality monitoring, red-team suite, prompt versioning, cost/latency dashboards.
Fine-tuning & distillation
Where a small specialised model beats a frontier API on cost, latency and quality — we build it.
A production-first delivery path.
Scope & evals
Define tasks, collect ground truth, build offline eval harness before a single prompt is written.
Retrieval & baseline
Data ingestion, retrieval bake-off, prompt v1 against evals.
Agent build
Tools, guardrails, UX, integration. Fortnightly demos against evals.
Hardening & launch
Red-team, load test, observability, rollout plan, training.
- ✓Production agentIn your cloud, integrated with your systems, monitored.
- ✓Evaluation harnessOffline + online, gated to CI/CD.
- ✓Retrieval layerIngestion, chunking, hybrid search, access control.
- ✓Safety & red-team reportKnown failure modes, mitigations, residual risks.
- ✓Cost & quality dashboardsLangfuse / equivalent, per-session unit economics.
- ✓Operator runbook & trainingYour team ready to own and extend.
Tools & frameworks we use.
A real engagement.
City council — AI-assisted FOI response drafting.
A RAG agent drafts Freedom-of-Information responses grounded in council records, routes to the right reviewer, and tracks deadline risk. 58% of responses now drafted in < 2 minutes; backlog eliminated.
Read full case study