Outcomes claims are everywhere in recovery monitoring. Recovery rates, retention rates, recidivism reduction, cost savings per participant. Brochures cite them. Boards review them. Funders ask for them. Most do not survive close reading.
The reason is rarely that the program did not produce the outcome. It is that the records the program kept cannot support the claim it wants to make.
A claim is only as defensible as the discipline behind the data that generated it. Multi-year recovery rates require multi-year cohort tracking. Recidivism reduction requires participant identification consistent with court records over time. Cost savings require participant-level event data that tie to budget categories. None of these are produced by good intentions. They are produced by the structure of the records the program keeps.
This is why outcomes infrastructure matters. Not as a research deliverable. As an operating discipline that determines what a program can credibly say about its work, to whom, and over what horizon.
What “outcomes that hold up” actually means
Two studies anchor the recovery monitoring outcomes literature. They are worth understanding in some detail because their methodology is what makes their claims defensible, and that is the model.
McLellan, Skipper, Campbell, and DuPont, BMJ 2008. “Five year outcomes in a cohort study of physicians treated for substance use disorders in the United States.” A five-year follow-up of 904 physicians enrolled across 16 state Physician Health Programs between 1995 and 2001. The headline findings: 78% of physicians completed monitoring without substance-related relapse, and 75% retained their medical license over the five-year period. These are not rates pulled from program reports. They are rates derived from a structured cohort followed under a defined protocol over a defined window.
DuPont, McLellan, White, Merlo, and Gold, Journal of Substance Abuse Treatment 2009. “Setting the standard for recovery: Physicians’ Health Programs.” A synthesis of PHP outcomes data across multiple state programs that confirmed the 75 to 80 percent range of sustained abstinence and license retention over multi-year follow-up. The paper proposed PHPs as a “gold standard” model for substance use treatment.
The “75 to 80 percent” figure is widely cited in physician monitoring marketing. It is defensible when used precisely. It is indefensible when used loosely. The difference rests entirely on whether the records that produced it can be shown.
A program that wants to make a McLellan-style claim today has to be able to do four things. Identify a defined cohort over a defined enrollment window. Track each participant under a consistent monitoring protocol. Capture relapse events with consistent definitions and dates. Reconcile against a credentialing or licensing record over a multi-year horizon.
These are records discipline questions before they are research questions. The McLellan and DuPont studies were possible because the underlying programs kept the records. Most monitoring programs do not.
The drug court parallel
The drug court outcomes literature is the second well-documented body of evidence in recovery monitoring, and it follows the same pattern.
The Multi-Site Adult Drug Court Evaluation (MADCE), Rossman, Roman, Zweig, Rempel, and Lindquist, published by the Urban Institute under National Institute of Justice funding in 2011, studied 23 adult drug courts and 6 comparison jurisdictions. The headline findings: 8 to 26 percent recidivism reduction at 18-month follow-up across study sites, and cost savings of $4,400 to $13,000 per participant per year compared to traditional case processing.
The MADCE remains the largest and most-cited drug court outcomes study in the field. Subsequent NIJ-funded research has extended the findings to specialized populations, including juvenile drug courts, family drug courts, and veterans treatment courts. The conclusions have held up.
What made the MADCE possible was the records discipline at the participating courts. Every participant identified consistently across the criminal justice system. Every program contact logged. Every status hearing documented. Every graduation, termination, and re-arrest tied to the participant of record over the follow-up period.
A drug court that wants to claim recidivism reduction today has to be able to produce data of that quality. Most cannot. Their records are spread across spreadsheets, paper files, and the institutional memory of long-tenured staff. The numbers may exist; the structure to defend them does not.
Why most monitoring programs cannot make claims like these
The technology a program uses determines, in advance, what kinds of outcomes claims that program can make.
Programs operating on spreadsheets and email face four structural constraints that compound over time.
Cohort definition is unreliable. When a coordinator leaves and another arrives, the rules for who counts as a current participant tend to shift. Inactive cases linger on lists. Active cases get dropped. By the time anyone wants to look back at a five-year cohort, the cohort itself is contested.
Event capture is inconsistent. Relapse events, missed tests, re-arrests, license actions. Each is a critical input to an outcomes claim. Each is captured differently depending on which staff member documented it, what tool they used, and what they thought the documentation standard was. Events get captured twice. Events get missed. Events get logged in a way that cannot be reconciled later.
Stakeholder reconciliation is manual. A meaningful outcomes claim usually requires reconciling internal program records with external authoritative records: licensing board status, court docket disposition, employer file. When that reconciliation happens by phone calls and email exchanges, the cost of doing it across a five-year cohort is prohibitive. So it does not happen.
Disclosure history cannot be reconstructed. A program that wants to demonstrate, in retrospect, what it disclosed to whom and on what authority, often cannot. The disclosures happened. The records of them are scattered, incomplete, or absent.
These four constraints are not failures of effort. They are predictable consequences of running multi-year, multi-stakeholder workflows on tools that were not designed for them. The work gets done. The defensible record of the work does not.
What audit-trail rigor and structured records actually produce
A platform built for outcomes infrastructure produces five things that an outcomes claim depends on.
A definitive cohort. Every participant has a single, persistent identifier tied to a defined enrollment window. Status changes are explicit, timestamped, and auditable. The cohort is a queryable property of the data, not a property of who happens to be running the program this quarter.
Structured event records. Test results, contact events, missed appointments, status changes, relapse documentation, disclosures, graduations, terminations. Each captured with a consistent schema, attributable to the staff member who recorded it, with the time, the participant, the event type, and the relevant context. Events are records, not narratives.
Disclosure provenance. Every disclosure that leaves the platform is tied to the consent that authorized it, the data categories included, the recipient, and the purpose. This is the same architecture that consent rigor requires, and the audit benefit is the same. A program can show, in retrospect, exactly what it disclosed to whom, on what basis.
Reconciliation pathways. Where the program tracks data that needs to be reconciled against external authoritative records (licensing status, docket disposition), the platform records the external reference identifiers needed for that reconciliation in advance, not in retrospect. The matching becomes a query, not a project.
A tamper-evident audit log. The record of who did what, when, on which record, with what authority, is itself a record. It cannot be edited retroactively. It can be queried. It is the evidentiary backstop that makes everything else trustworthy.
These five capabilities are not the deliverables of an outcomes study. They are the conditions that make an outcomes study possible. The discipline of producing them is the daily operation of the program. The benefit shows up years later, when someone asks the program to defend a claim and the program can.
The gap the literature itself acknowledges
There is a candid note in the recovery monitoring outcomes literature that deserves attention.
The PHP studies are now seventeen years old. They predate widespread medication-assisted treatment. They predate modern case-management software. They studied programs whose records discipline came from the institutional habits of the participating PHPs and the academic teams that worked alongside them. The literature flags the absence of more recent research, and specifically flags one question that has not been studied at all.
No published research currently compares the outcomes of monitoring programs equipped with structured case-management platforms against the outcomes of programs running on spreadsheets and email.
The question is research-relevant. It is also strategically relevant. The hypothesis that platform-equipped programs produce more defensible outcomes than spreadsheet-based programs is reasonable on its face, supported by the records-discipline argument above, and consistent with what the McLellan and MADCE studies imply about the conditions for evidence-grade claims. It has not been tested.
Reweave Health is positioned to participate in answering it over time. Programs that adopt platforms with the records discipline described above produce, as a byproduct of their daily operation, the data that would let the question be studied properly. Outcomes research is not the platform’s primary purpose. The infrastructure that makes outcomes research possible is.
What this means for buyers in regulated programs
Programs evaluating monitoring platforms have a buying choice that compounds over the lifetime of the program.
A platform without records discipline runs the program day-to-day at acceptable cost. The cost shows up later, when the program is asked to justify its work to a board, a funder, a regulator, or a court, and the records cannot answer the question being asked. By then, the data the answer would have required has either not been captured or cannot be reconstructed at a manageable cost.
A platform with records discipline runs the program day-to-day at the same cost (often lower, given coordinator-time savings). The benefit shows up later, when the program has the records to back the claims it wants to make. Outcomes data, compliance evidence, disclosure history, audit trail. Each is a queryable property of how the work was done, not a project to be undertaken in retrospect.
Three diagnostic questions distinguish the two postures.
Show me the cohort. If the platform cannot return a clean list of every participant who was active during a defined window, attributed to a coordinator and a status, the cohort discipline is not there.
Show me the event log. If event capture is inconsistent across coordinators or across time, the events that an outcomes claim depends on are not records, they are guesses.
Show me the disclosures. If the platform cannot return every disclosure that left the program in the last twelve months, the consent and audit posture is not there, and the next claim about compliance will not be defensible either.
The vendors that can answer these questions cleanly are the ones built for the long run. The vendors that cannot are the ones whose customers will, eventually, find that the outcomes story they want to tell is not in the records they kept.
Where Reweave Health stands on this
Reweave Care, the first platform under the Reweave Health umbrella, is built around the proposition that records discipline is the work, not a deliverable from the work. Cohort definition, event structure, disclosure provenance, reconciliation pathways, and audit-trail integrity are the substrate the platform runs on.
The clinical organizations and program administrators we partner with do not have to choose between operating their program well today and being able to demonstrate the value of that work tomorrow. The infrastructure that supports the second is the same infrastructure that supports the first.
If the records your program is keeping today are the records you want to be defending in five years, we should talk.