A new paper aims to trawl medical records to work out how well depressed patients responded to treatment. The authors used Natural Language Processing or NLP (not that NLP) to interpret electronic medical records from over 5,000 patients treated at hospitals in New England. Each record included notes taken at multiple visits.
A crack team of "three experienced board-certified clinical psychiatrists" reviewed the notes and provided a "Gold Standard" classification as to whether patients were Depressed, Recovered or Intermediate at each visit. The problem here is that they didn't actually see the patients, they just had the notes. If the notes were bad, the result will have been bad too. Garbage In, Garbage Out. Even if you then put a big gold medal on the garbage.
They then found that an NLP algorithm was able to learn how to duplicate the expert opinion, based on the words used in the notes. Using a machine learning approach they were able to teach the computer that if the text contained the word "depressed", it was a sign that the patient was depressed while "much better" was associated with being... guess.
In fairness, it's not a bad attempt to turn text into numbers, and in future it could allow you to do interesting things such as comparing two drugs in terms of which ones make people "much better".
I'm concerned about this though. The essence of the original, narrative notes, is that they contain individual information about that patient's story. You could go through them with a computer and calculate what happens to the average patient given a certain drug. That might be useful information. But if you did that as a replacement for reading about individual patients, you'd be missing the whole point of the narrative notes.
Worse, as this kind of thing becomes feaisble it will feed back on itself and encourage clinicians to write their notes -and therefore to think, inevitably - in machine-readable terms. The authors suggest as much:
As more health care systems move to electronic medical records, there is a unique opportunity to better quantify outcomes. For example, the 16-item patient-rated QIDS-SR [questionairre] has been shown to be highly correlated with clinician-rated measures and sensitive to treatment effects.... At minimum, EMR systems that utilize templates could require clinicians to record a clinical status for example, using the 7-point Clinical Global Impression scale...Indeed, many say that this already is happening. Now quantification is generally a good thing I think, but only so long as it's an aid to understanding, not a replacement for it.
Yet quantification often does become a replacement for understanding because there's a trap that we face when trying to deal with a complicated set of information. The temptation is to focus on the easiest bit to measure, because that's easy, and then assume that this represents the state of the whole thing. The reason that something's easy to measure is often because it doesn't capture the whole phenomena.
Perlis RH, Iosifescu DV, Castro VM, Murphy SN, Gainer VS, Minnier J, Cai T, Goryachev S, Zeng Q, Gallagher PJ, Fava M, Weilburg JB, Churchill SE, Kohane IS, & Smoller JW (2011). Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model. Psychological medicine, 1-10 PMID: 21682950
No comments:
Post a Comment