
What the Numbers Mean: A Therapist's Guide to Reading Your Dashboard
You didn't go to grad school for statistics. Neither did most of us.
But if you're tracking outcomes — and you should be — you're going to see numbers on your dashboard. Severity bands. Trend arrows. Alerts. Percentages. Words like "reliable change" and "clinically significant."
None of it is as complicated as it sounds. This guide walks through every statistical concept Theracharts surfaces, in plain language, with concrete examples. No formulas. No Greek letters. Just what the numbers mean and how to use them.
Severity bands: where does this score land?
Every validated assessment has severity bands — score ranges that correspond to how bad things are. You already use these intuitively. A PHQ-9 of 18 feels different from a PHQ-9 of 6, and the bands formalize that intuition.
For the PHQ-9, the bands are:
- 0–4: Minimal. Not endorsing significant depression symptoms.
- 5–9: Mild. Some symptoms present, worth monitoring.
- 10–14: Moderate. Clinical threshold — treatment should be addressing depression at this level.
- 15–19: Moderately severe. Most symptoms endorsed at high frequency.
- 20–27: Severe. Active, intensive treatment is strongly indicated.
The GAD-7 follows a similar structure (0–4 minimal, 5–9 mild, 10–14 moderate, 15–21 severe). So do the PCL-5, DASS-21, ISI, and every other validated instrument in the library.
Theracharts color-codes these automatically. When you see a score, you also see where it falls on the severity spectrum. The value isn't in any one score — it's in watching which band your client occupies over time, and whether they're moving in the right direction.
When is a score change "real"? Understanding reliable change
This is the most important concept in outcome tracking, and the one therapists ask about most.
Your client's PHQ-9 drops from 15 to 10. That's five points. Is that real improvement, or is it just noise — a good week, a mood fluctuation, the client feeling more hopeful on a sunny Tuesday?
This is exactly what the Reliable Change Index answers. It's a statistical threshold, specific to each instrument, that tells you: this change is bigger than what you'd expect from measurement error alone.
Think of it like a thermometer. If a thermometer is accurate to plus or minus one degree, and you get a reading of 99.1 today versus 98.7 yesterday, that 0.4-degree difference is within the margin of error. You can't be confident the temperature actually changed. But if it reads 101.2 today versus 98.6 yesterday — that 2.6-degree difference exceeds the margin of error. The change is real.
The RCI works the same way for clinical assessments. Every instrument has a known amount of measurement error (called test-retest reliability). The RCI calculates how much a score needs to change before you can be confident the change is real and not just noise.
For the instruments you use most often:
- PHQ-9: A change of 5 or more points is clinically meaningful. So your client going from 15 to 10 — that's a real change. Going from 15 to 13? Probably noise.
- GAD-7: A 4-point change meets the threshold. From 14 to 10? Real. From 14 to 12? Inconclusive.
- PCL-5: A 10-point change is the response threshold. PTSD scores tend to be more variable, so the bar is higher.
- DASS-21: About 9 points per subscale (depression, anxiety, stress are scored separately).
- ISI (Insomnia Severity Index): An 8-point change indicates treatment response.
- CORE-10: A 6-point change at the 90% confidence level.
When Theracharts labels a change as "reliable improvement" or "reliable deterioration," it's applying these thresholds. A change that doesn't reach the threshold isn't labeled — not because it doesn't matter, but because you can't be statistically confident it's real yet.
What to do with this: When you see "reliable improvement," you can trust it. When you see a smaller change, keep watching. Two or three administrations showing a consistent pattern in the same direction is clinically meaningful even when individual changes are below the threshold.
Trend lines: reading the direction of treatment
A single score is a snapshot. A trend line is the story.
Theracharts tracks every administration and shows you the direction your client is heading. Here's how to read the trend labels:
Improving means scores are moving consistently toward healthier ranges. The app looks at the overall trajectory — if most data points are heading in the right direction, even with some session-to-session variability, the trend reads as improving.
Worsening means the opposite — scores are consistently moving toward more severe ranges.
Non-monotonic means the scores are bouncing around without a clear direction. Up one week, down the next. This is actually useful information — it might indicate that the client's symptoms are reactive to external stressors, that treatment hasn't found traction yet, or that the assessment is capturing real variability in their experience.
Flat means scores aren't changing meaningfully in either direction. This can mean treatment is maintaining gains (good, if they started in a healthy range) or that things are stuck (worth examining, if they're still in a clinical range).
The app also calculates a slope — essentially, how fast things are changing per week. A steeper slope means faster change. This is useful for comparing treatment phases: "During the first eight weeks we saw rapid improvement, and in the last four weeks the pace has slowed" is a conversation worth having with your client.
The 15% rule: Theracharts flags a change as a meaningful trend when it reaches 15% or more from the first-half average to the second-half average of the assessment period. A 30% or greater change gets flagged as a strong trend. These thresholds are deliberately conservative — when the app says "improving," it means it.
Clinical alerts: what needs your attention right now
Not everything on your dashboard requires the same level of urgency. The alert system flags four situations that warrant immediate clinical attention:
Severe score alerts fire when a client's score lands in the highest severity band for any instrument. A PHQ-9 above 20 or a GAD-7 above 15, for instance. This isn't about change — it's about current severity being high enough that you should be thinking about it before the next session.
Critical item alerts flag specific responses to safety-relevant questions. If a client endorses the suicidality item on the PHQ-9, that generates an alert regardless of the total score. A PHQ-9 total of 8 with item 9 endorsed at "more than half the days" is clinically different from a PHQ-9 of 8 without it.
Acted-on-target alerts (for DBT diary cards) fire when a client reports engaging in a target behavior — the behaviors you've identified as treatment priorities.
High urge alerts fire when a client reports urge intensity at 4 or 5 on a 5-point scale. A high urge that wasn't acted on is still clinically important — it tells you the skills are working for now, but the risk is elevated.
The MBC dashboard: your practice at a glance
If you're tracking outcomes across your caseload, the Measurement-Based Care dashboard gives you four key metrics:
Improving — the percentage of clients whose scores have improved by 10% or more. This is your signal that treatment is working across your caseload.
Declining — the percentage whose scores have worsened by 10% or more. Even one client here deserves attention. Research consistently shows that therapists are poor at detecting deterioration without data.
Average change — the mean percentage change across all tracked clients. A negative number (for instruments where lower is better) means your caseload is improving on average.
Response rate — what percentage of assigned assessments are actually getting completed. This is your engagement metric. A 70% or higher completion rate is strong. Between 40% and 70% is typical. Below 40% is a signal that clients may need reminders, shorter instruments, or a conversation about the value of tracking.
Engagement detection: is your client pulling away?
The dashboard also watches for disengagement patterns. If a client's assessment completion rate drops by more than 50% over a two-week window compared to the prior two weeks, the app flags them as "at risk" for disengagement. If they stop completing entirely, they're flagged as "disengaged."
This matters because disengagement from assessment completion often precedes disengagement from therapy itself. Catching the pattern early gives you a chance to address it in session — whether the issue is assessment fatigue, avoidance of difficult scores, or a broader disconnection from the therapeutic process.
Couples data: perception gaps and parallel trends
If you're doing couples work, the app tracks an additional layer: how the two partners' scores relate to each other.
Perception gaps flag when both partners complete the same assessment on the same day and their scores differ by a meaningful amount. A gap of 2 or more points is flagged as moderate; 5 or more as high. On a relationship satisfaction measure, a 6-point gap between partners tells you they're living in very different subjective realities — and that's a conversation starter.
Parallel trends track whether the partners' scores are moving in the same direction (converging), opposite directions (diverging), or staying roughly parallel. Converging scores often signal that the couple is aligning on their experience of the relationship. Diverging scores may mean one partner is improving while the other isn't — or that one partner is becoming more honest while the other was already forthcoming.
Cross-measure correlations: when patterns connect
When a client is completing multiple assessments, the app surfaces correlations — patterns where two measures tend to move together. If a client's PHQ-9 and GAD-7 are both improving at the same rate, that's a co-moving pattern. If their depression scores improve while anxiety stays flat, the app flags that divergence.
The app only surfaces these when the correlation is strong (a Pearson r of 0.7 or higher, for the statistically curious) and when there are enough data points to be meaningful. The point isn't to turn you into a data analyst — it's to surface patterns you might not notice when you're focused on one measure at a time.
Clinical Updates: the narrative version
All of the above — severity bands, reliable change, trends, alerts, engagement patterns — feeds into the Clinical Update feature. When you generate a Clinical Update for a client, the app compiles everything since your last export into a brief narrative summary, written in clinical language, ready to paste into your EHR.
The narrative doesn't make clinical judgments. It describes what the data shows: "PHQ-9 decreased from 18 to 11 over 6 administrations (reliable improvement, crossed from moderately severe to moderate range). Trend is improving with a slope of -1.2 points per week." You provide the clinical interpretation. The app provides the data summary.
The bottom line
None of this replaces clinical judgment. A client with a "flat" trend line might be doing exactly the maintenance work they need. A client with "reliable improvement" might still be struggling in ways the assessments don't capture. The data is one voice in the room — arguably the most objective one — but it's not the only one.
What outcome tracking gives you is a check against the things we're all vulnerable to as clinicians: confirmation bias, recency effects, the tendency to over-weight the last session, and — most importantly — the well-documented difficulty of detecting deterioration in our own clients without data.
The numbers aren't complicated. They're just a structured way of answering the question every good therapist is already asking: is what I'm doing actually helping?
Ready to see what the data says about your clients? Start tracking outcomes with Theracharts — free for up to 10 clients, with 100+ validated assessments auto-scored and tracked over time.