Awaab Ishak's death and the subsequent legal and regulatory response changed what the sector expects from housing data. Every association we work with now has a "damp and mould" programme. The quality of those programmes varies. This is what, in our experience, a defensible one looks like — and what the honest limits are.
The signal sources
A damp-and-mould risk signal must triangulate. No single source is reliable alone.
- Repairs history. Mould-related tickets, ventilation complaints, window condensation, plaster repairs. A clean taxonomy of job reasons is essential; most estates have drift in how jobs are coded, and the first week of any engagement is unpicking it.
- Tenant contact logs. Not just formal complaints — every contact where the tenant mentioned a relevant symptom. These are usually in a CRM, often in free text. LLM-assisted classification is genuinely helpful here; we have seen it surface signals that were missed by repairs coding alone.
- Surveyor and officer visits. Scheduled and ad-hoc. Surveyor notes are rarely structured; capture the ones that are, enrich the ones that are not.
- Property characteristics. Construction type, age, orientation, thermal performance, ventilation type, last heating system upgrade. Predominantly useful for cohort-level reasoning.
- EPC and SAP data. Patchy in coverage; useful for thermal-bridging risk.
- Environmental sensors. Where deployed, the best signal by far — humidity, temperature, dew point. Most associations have sensors in 1–15% of stock; the right aspiration is 30–60%, starting with high-risk cohorts.
Model design — keep it interpretable
Every housing data leader asks whether to use a deep neural network or a gradient-boosted model. Our answer is the same every time: use the model you can explain to a housing officer and the Regulator. In practice that means a gradient-boosted model (XGBoost or LightGBM) with SHAP-style per-feature contribution, not a neural network.
Why it matters: when a property is flagged, the officer needs to know why. When the Regulator audits, the chain from input to output must be reconstructible. Opaque models fail this test.
The output
Our reference pattern: a per-property risk tier (critical / elevated / monitor / routine), a refreshed-monthly cadence, a per-property explanation ("this property is elevated because: three mould-related repairs in 18 months, tenant reported condensation in May, sensor humidity sustained above 70% for eleven consecutive days"), and a worklist for the relevant team.
Stewardship — where it usually breaks
The model is the easier half. The harder half is what happens after. Critical-tier properties need owners, SLAs, visit cadences, and closure discipline. The common failure mode we see: the model runs, the list is generated, and the list is too long to action.
The design choice that avoids this: calibrate the model against operational capacity. If the team can visit 80 critical properties per month, the model should promote no more than that — plus a defensible monitoring regime for the next tier. Over-flagging is not a cautious posture; it is a signal-obscuring posture.
The honest limits
- It will miss some. A per-property risk signal will miss properties with no contact history, no repairs, no sensors. The mitigation is a rotating sample visit programme, not a claim of complete coverage.
- It will false-positive some. Properties will be flagged that do not have an issue. Officer time is the scarce resource; accept a modest false-positive rate and use it as a route-in for conversation.
- It does not replace physical inspection. It routes it.
- It does not absolve. A working damp-and-mould model is a tool the association can defend in front of the Regulator. It is not a substitute for the operational, legal and cultural work around responsiveness.
Governance posture
Every damp-and-mould model we have shipped has a named model owner (typically the Director of Assets or equivalent), a quarterly review of accuracy and drift, a ticket path for tenants or officers to challenge a tier, and a documented evaluation regime. The Regulator of Social Housing will increasingly ask about this. Boards should expect it.
The reference stack
For mid-size associations (5k–40k properties) our stack is typically: the operational systems as source; a tenancy-aligned data platform (Snowflake, Databricks or Fabric depending on existing investment); a dbt project for the entity model; an XGBoost model with SHAP explanation; a Power BI or Quicksight surface for internal users; and integration back into the repairs and contact systems for operational routing. Pragmatic, familiar, easy to govern.