Case study · Accounting

AI triage on 42,000 client queries a year — freeing 9,000 partner-hours.

A UK Top-50 accountancy firm was losing partner time to recurring client queries. We built a RAG agent that auto-drafts responses with citations, routes exceptions to partners, and cleared the drag on advisory capacity.

73%

Auto-resolved

9,000h

Partner time freed/yr

£680k

Advisory uplift

11 wks

Kick-off to live

Overview

Advisory work was being crowded out by routine.

Partners were spending hours a week on queries the firm had answered hundreds of times. Client satisfaction was solid; partner capacity for higher-value advisory was not. The firm wanted to fix that without adding headcount.

We scoped the problem, built a production-grade RAG system integrated with their practice-management platform, and landed it inside 11 weeks.

Approach

Production, not a prototype.

Week 1: scoping + evals

Collected 800 ground-truth queries, built offline eval harness before prompt v1.

Weeks 2–3: retrieval

Ingested firm knowledge base (VAT, payroll, R&D, corporate tax), hybrid retrieval with re-ranking.

Weeks 4–8: agent & UI

Tool-using agent, structured outputs, human-review workflow, partner dashboard.

Weeks 9–10: red-team & rollout

Safety testing, partner training, phased rollout across three offices.

Ongoing: operate

Quality monitoring, retrieval refresh, cost optimisation.

73%

Queries auto-resolved

9,000h

Partner hours freed / year

£680k

Annual advisory uplift

−71%

Avg. response time

High-stakes errors in 6 months

4.6/5

Client satisfaction

Technology

Tools & frameworks we use.

Azure OpenAI

Claude

LangGraph

pgvector

Langfuse

Ragas

Microsoft Entra ID

Azure App Service

FAQ

Common questions.

What about hallucinations?+

Structured outputs with citation-required schema; answers without retrieval hits escalate to human. Evaluation harness gates every prompt change before deploy.

Did partners trust it?+

Only after the rollout showed parity with partner-drafted responses in blind review. Trust was earned, not assumed.

What's the cost model?+

Per-session unit economics tracked in Langfuse. Average cost per auto-resolved query is 1% of the partner-hour cost.

AI triage on 42,000 client queries a year — freeing 9,000 partner-hours.

Advisory work was being crowded out by routine.

Production, not a prototype.

Week 1: scoping + evals

Weeks 2–3: retrieval

Weeks 4–8: agent & UI

Weeks 9–10: red-team & rollout

Ongoing: operate

Tools & frameworks we use.

Common questions.

You might also need.

Put your data to work.

AI triage on 42,000 client queries a year — freeing 9,000 partner-hours.

Advisory work was being crowded out by routine.

Production, not a prototype.

Week 1: scoping + evals

Weeks 2–3: retrieval

Weeks 4–8: agent & UI

Weeks 9–10: red-team & rollout

Ongoing: operate

Tools & frameworks we use.

Common questions.

You might also need.

Accounting & Finance

Generative AI

AI Readiness · Accounting

Put your data to work.