Tuesday, February 24, 2009

A Genomic Map of the Mouse Brain

Nature Neuroscience has a nice little report about a new resource that should prove useful for neuroscientists - an anatomic gene expression atlas of the adult mouse brain.

The atlas is freely available at http://mouse.brain-map.org/agea, courtesy of the Allen Foundation. It's a map of the entire adult mouse brain including data on the expression levels of 4,376 genes. You can click on a point in the brain and see which areas have a similar pattern of gene expression:
The hotter the colour, the more correlated is the gene expression profile in that point vs. your selected region. This allows one to see the different regions of the brain defined not just anatomically, but genomically - fancy. Here I've clicked on a point in the cortex and this shows that other points in the cortex tend to have the same pattern of gene expression. That's hardly surprising, of course.

This is the kind of thing that will be invaluable for some neuroscientists, and not much use to most others, but it's a source of pretty pictures for everyone - and it's an example of the power of this kind of database. The genomic atlas is derived from the Allen Brain Atlas which allows you to see where in the brain any given gene is expressed. See also BrainMAP.org for a modest attempt to do the same thing for functional neuroimaging.

ResearchBlogging.orgLydia Ng, Amy Bernard, Chris Lau, Caroline C Overly, Hong-Wei Dong, Chihchau Kuan, Sayan Pathak, Susan M Sunkin, Chinh Dang, Jason W Bohland, Hemant Bokil, Partha P Mitra, Luis Puelles, John Hohmann, David J Anderson, Ed S Lein, Allan R Jones, Michael Hawrylycz (2009). An anatomic gene expression atlas of the adult mouse brain Nature Neuroscience, 12 (3), 356-362 DOI: 10.1038/nn.2281

A Genomic Map of the Mouse Brain

Nature Neuroscience has a nice little report about a new resource that should prove useful for neuroscientists - an anatomic gene expression atlas of the adult mouse brain.

The atlas is freely available at http://mouse.brain-map.org/agea, courtesy of the Allen Foundation. It's a map of the entire adult mouse brain including data on the expression levels of 4,376 genes. You can click on a point in the brain and see which areas have a similar pattern of gene expression:
The hotter the colour, the more correlated is the gene expression profile in that point vs. your selected region. This allows one to see the different regions of the brain defined not just anatomically, but genomically - fancy. Here I've clicked on a point in the cortex and this shows that other points in the cortex tend to have the same pattern of gene expression. That's hardly surprising, of course.

This is the kind of thing that will be invaluable for some neuroscientists, and not much use to most others, but it's a source of pretty pictures for everyone - and it's an example of the power of this kind of database. The genomic atlas is derived from the Allen Brain Atlas which allows you to see where in the brain any given gene is expressed. See also BrainMAP.org for a modest attempt to do the same thing for functional neuroimaging.

ResearchBlogging.orgLydia Ng, Amy Bernard, Chris Lau, Caroline C Overly, Hong-Wei Dong, Chihchau Kuan, Sayan Pathak, Susan M Sunkin, Chinh Dang, Jason W Bohland, Hemant Bokil, Partha P Mitra, Luis Puelles, John Hohmann, David J Anderson, Ed S Lein, Allan R Jones, Michael Hawrylycz (2009). An anatomic gene expression atlas of the adult mouse brain Nature Neuroscience, 12 (3), 356-362 DOI: 10.1038/nn.2281

Sunday, February 22, 2009

How To Read Minds

(Update 27 4 2009: For a methodological problem which could cast doubt on some (but not all) of the kind of research that I discuss below, see this newer post.)

In the last couple of weeks we've seen not one but two reports about "reading minds" through brain imaging. First, two Canadian scientists claimed to be able to tell which flavor of drink you prefer (Decoding subjective preference from single-trial near-infrared spectroscopy signals). Then a pair of Nashville neuroimagers said that they could tell which of two pictures you were thinking about through fMRI (Decoding reveals the contents of visual working memory in early visual areas); you can read more about this one here. Can it be true? And if so, how does it work?

Although this kind of "mind reading" with brain scanners strikes us as exciting and mysterious, it would be much more surprising if it turned out to be impossible. That would mean that Descartes was right (probably). There's nothing surprising about the fact that mental states can be read using physical measurements, such as fMRI. If you prefer one thing to another, something must be going on in your brain to make that happen. Likewise if you're thinking about a certain picture, activity somewhere in your brain must be responsible.

But how do we find the activity that's associated with a certain mental state? It's actually pretty straightforward - in the sense that it relies upon brute computational force rather than sophisticated neurobiological theories. The trick is data-mining, which I've written about before. Essentially, you take a huge set of measurements of brain activity, and search through them in order to find those which are related to the mental state of interest.

The goal in other words is pattern classification: the search for some pattern of neural activity which is correlated with, say, enjoying a certain drink, or thinking about a bunch of horizontal lines. To find such a pattern, you measure activity over an area of the brain while people are in two different mental states: you then search for some set of variables which differ between these two states.

If this succeeds, you can end up with an algorithm - a "pattern classifier" - which can take a set of activity signals and tell you which mental state it is associated with. Or if you want to be a bit more sensationalist: it can read minds! But importantly, just because it works doesn't mean that anyone knows how it works.

Here's a pic from the first paper showing the neural activity associated with preferring two different drinks (actually pictures of drinks on a screen, not real drinks.) X's are the activity measured when the person preferred the first out of two drinks, and O's are when they preferred the second. The 2D "space" represents activity levels in two different measures of neural activity. A spot in the top left corner means that "Feature 2" activity was high while "Feature 1" activity was low.

You can see that the X's and the O's tend to be in different parts of the space - X's tend to be in the top left and O's in the bottom right. That's not a hard-and-fast rule but it's true most of the time. So if you drew an imaginary line down the middle you could do a pretty good job of distinguishing between the X's and the O's. This is what a pattern classifier does. It searches through a huge set of pictures like this and looks for the ones where you can draw such a line.

The second paper uses what's in essence a similar method to discriminate between the neural activity in the visual areas of the brain associated with remembering two different pictures. Indeed, the technique is fast becoming very popular with neuroimagers. (One attractive thing about it is that you can point a pattern classifier at some data that you collected for entirely seperate reasons - two publications for the price of one...) But this doesn't mean that we can read your mind. We just have computer programs that can do it for us - and only if they are are specially (and often time-consumingly) "trained" to discriminate between two very specific states of mind.

Being able to put someone in an MRI scanner and work out what they are thinking straight off the bat is a neuroimager's pipe dream and will remain so for a good while yet.

ResearchBlogging.orgSheena Luu, Tom Chau (2009). Decoding subjective preference from single-trial near-infrared spectroscopy signals Journal of Neural Engineering, 6 (1) DOI: 10.1088/1741-2560/6/1/016003

Stephanie Harrison, Frank Tong (2009). Decoding reveals the contents of visual working memory in early visual areas Nature

How To Read Minds

(Update 27 4 2009: For a methodological problem which could cast doubt on some (but not all) of the kind of research that I discuss below, see this newer post.)

In the last couple of weeks we've seen not one but two reports about "reading minds" through brain imaging. First, two Canadian scientists claimed to be able to tell which flavor of drink you prefer (Decoding subjective preference from single-trial near-infrared spectroscopy signals). Then a pair of Nashville neuroimagers said that they could tell which of two pictures you were thinking about through fMRI (Decoding reveals the contents of visual working memory in early visual areas); you can read more about this one here. Can it be true? And if so, how does it work?

Although this kind of "mind reading" with brain scanners strikes us as exciting and mysterious, it would be much more surprising if it turned out to be impossible. That would mean that Descartes was right (probably). There's nothing surprising about the fact that mental states can be read using physical measurements, such as fMRI. If you prefer one thing to another, something must be going on in your brain to make that happen. Likewise if you're thinking about a certain picture, activity somewhere in your brain must be responsible.

But how do we find the activity that's associated with a certain mental state? It's actually pretty straightforward - in the sense that it relies upon brute computational force rather than sophisticated neurobiological theories. The trick is data-mining, which I've written about before. Essentially, you take a huge set of measurements of brain activity, and search through them in order to find those which are related to the mental state of interest.

The goal in other words is pattern classification: the search for some pattern of neural activity which is correlated with, say, enjoying a certain drink, or thinking about a bunch of horizontal lines. To find such a pattern, you measure activity over an area of the brain while people are in two different mental states: you then search for some set of variables which differ between these two states.

If this succeeds, you can end up with an algorithm - a "pattern classifier" - which can take a set of activity signals and tell you which mental state it is associated with. Or if you want to be a bit more sensationalist: it can read minds! But importantly, just because it works doesn't mean that anyone knows how it works.

Here's a pic from the first paper showing the neural activity associated with preferring two different drinks (actually pictures of drinks on a screen, not real drinks.) X's are the activity measured when the person preferred the first out of two drinks, and O's are when they preferred the second. The 2D "space" represents activity levels in two different measures of neural activity. A spot in the top left corner means that "Feature 2" activity was high while "Feature 1" activity was low.

You can see that the X's and the O's tend to be in different parts of the space - X's tend to be in the top left and O's in the bottom right. That's not a hard-and-fast rule but it's true most of the time. So if you drew an imaginary line down the middle you could do a pretty good job of distinguishing between the X's and the O's. This is what a pattern classifier does. It searches through a huge set of pictures like this and looks for the ones where you can draw such a line.

The second paper uses what's in essence a similar method to discriminate between the neural activity in the visual areas of the brain associated with remembering two different pictures. Indeed, the technique is fast becoming very popular with neuroimagers. (One attractive thing about it is that you can point a pattern classifier at some data that you collected for entirely seperate reasons - two publications for the price of one...) But this doesn't mean that we can read your mind. We just have computer programs that can do it for us - and only if they are are specially (and often time-consumingly) "trained" to discriminate between two very specific states of mind.

Being able to put someone in an MRI scanner and work out what they are thinking straight off the bat is a neuroimager's pipe dream and will remain so for a good while yet.

ResearchBlogging.orgSheena Luu, Tom Chau (2009). Decoding subjective preference from single-trial near-infrared spectroscopy signals Journal of Neural Engineering, 6 (1) DOI: 10.1088/1741-2560/6/1/016003

Stephanie Harrison, Frank Tong (2009). Decoding reveals the contents of visual working memory in early visual areas Nature

Saturday, February 21, 2009

Redesigned

You might have noticed that this blog has a new design - love it? Hate it? Got any better ideas?

Redesigned

You might have noticed that this blog has a new design - love it? Hate it? Got any better ideas?

Sunday, February 15, 2009

Ecstasy vs. Horseriding

Which is more dangerous, taking ecstasy or riding a horse?

This is the question that got Professor David Nutt, a British psychiatrist, into a spot of political bother. Nutt is the Editor of the academic Journal of Psychopharmacology. He recently published a brief and provocative editorial called "Equasy".

Equasy is a fun read with a serious message. (It's open access so you can read the whole thing - I recommend it.) Nutt points out that the way in which we think about the harms of illegal drugs, such as ecstasy, is unlike the way in which we think about other dangerous things such as horseriding - or "equasy" as he dubs it:
The drug debate takes place without reference to other causes of harm in society, which tends to give drugs a different, more worrying, status. In this article, I share experience of another harmful addiction I have called equasy...
He goes on to describe some of the injuries, including brain damage, that you can get from falling off horses. After arguing that horseriding is in some ways comparable to ecstasy in terms of its dangerousness he concludes:
Perhaps this illustrates the need to offer a new approach to considering what underlies society’s tolerance of potentially harmful activities and how this evolves over time (e.g. fox hunting, cigarette smoking). A debate on the wider issues of how harms are tolerated by society and policy makers can only help to generate a broad based and therefore more relevant harm assessment process that could cut through the current ill-informed debate about the drug harms? The use of rational evidence for the assessment of the harms of drugs will be one step forward to the development of a credible drugs strategy.
Or, in other words, we need to ask why we are more concerned about the harms of illicit drugs than we are the harms of, say, sports. No-one ever suggests that the existence of sporting injuries means that we ought to ban sports. Ecstasy is certainly not completely safe. People do die from taking it and it may cause other more subtle harms. But people die and get hurt by falling off horses. Even if it turns out that on an hour-by-hour basis, you're more likely to die riding a horse than dancing on ecstasy (quite possible), no-one would think to ban riding and legalize E. But why not?
This attitude raises the critical question of why society tolerates –indeed encourages – certain forms of potentially harmful behaviour but not others, such as drug use.
Which is an extremely good question. It remains a good question even if it turns out that horse-riding is much safer than ecstasy. These are just the two examples that Nutt happened to pick, presumably because it allowed him to make that cheeky pun. Comparing the harms of such different activities is fraught with pitfalls anyway - are we talking about the harms of pure MDMA, or street ecstasy? Do we include people injured by horses indirectly (e.g. due to road accidents?)

Yet the whole point is that no-one even tries to do this. The dangerousness of drugs is treated as quite different to the dangerousness of sports and other such activies. The media indeed seem to have a particular interest in the harms of ecstasy - at least according to a paper cited by Nutt, Forsyth (2001), which claims that deaths from ecstasy in Scotland were much more likely to get newspaper coverage than deaths from paracetemol, Valium, and even other illegal drugs. It's not clear why this is. Indeed, when you make the point explicitly, as Nutt did, it looks rather silly. Why shouldn't we treat taking ecstasy as a recreational activity like horse-riding? That's something to think about.

Professor Nutt is well known in psychopharmacology circles both for his scientific contributions and for his outspoken views. These cover drug policy as well as other aspects of psychiatry - for one thing, he's strongly pro-antidepressants (see another provocative editorial of his here.)

As recently-appointed Chairman of the Advisory Council on the Misuse of Drugs - "an independent expert body that advises government on drug related issues in the UK" - Nutt might be thought to have some degree of influence. (He wrote the article before he became chairman). Sadly not, it appears, for as soon as the Government realized what he'd written he got a dressing down from British Home Secretary Jacqui Smith - Ooo-er:
For me that makes light of a serious problem, trivialises the dangers of drugs, shows insensitivity to the families of victims of ecstasy and sends the wrong message to young people about the dangers of drugs.
I'm not sure how many "young people" or parents of ecstasy victims read the Journal of Psychopharmacology, but I can't see how anyone could be offended by the Equasy article. Except perhaps people who enjoy hunting foxes while riding horses (Nutt compares this to drug-fuelled violence). Nutt's editorial was intended to point out that discussion over drugs is often irrational, and to call for a serious, evidence-based debate. It is not really about ecstasy, or horses, but about the way in which we conceptualize drugs and their harms. Clearly, that's just a step too far.

[BPSDB]

ResearchBlogging.orgD. Nutt (2008). Equasy -- An overlooked addiction with implications for the current debate on drug harms Journal of Psychopharmacology, 23 (1), 3-5 DOI: 10.1177/0269881108099672

Ecstasy vs. Horseriding

Which is more dangerous, taking ecstasy or riding a horse?

This is the question that got Professor David Nutt, a British psychiatrist, into a spot of political bother. Nutt is the Editor of the academic Journal of Psychopharmacology. He recently published a brief and provocative editorial called "Equasy".

Equasy is a fun read with a serious message. (It's open access so you can read the whole thing - I recommend it.) Nutt points out that the way in which we think about the harms of illegal drugs, such as ecstasy, is unlike the way in which we think about other dangerous things such as horseriding - or "equasy" as he dubs it:
The drug debate takes place without reference to other causes of harm in society, which tends to give drugs a different, more worrying, status. In this article, I share experience of another harmful addiction I have called equasy...
He goes on to describe some of the injuries, including brain damage, that you can get from falling off horses. After arguing that horseriding is in some ways comparable to ecstasy in terms of its dangerousness he concludes:
Perhaps this illustrates the need to offer a new approach to considering what underlies society’s tolerance of potentially harmful activities and how this evolves over time (e.g. fox hunting, cigarette smoking). A debate on the wider issues of how harms are tolerated by society and policy makers can only help to generate a broad based and therefore more relevant harm assessment process that could cut through the current ill-informed debate about the drug harms? The use of rational evidence for the assessment of the harms of drugs will be one step forward to the development of a credible drugs strategy.
Or, in other words, we need to ask why we are more concerned about the harms of illicit drugs than we are the harms of, say, sports. No-one ever suggests that the existence of sporting injuries means that we ought to ban sports. Ecstasy is certainly not completely safe. People do die from taking it and it may cause other more subtle harms. But people die and get hurt by falling off horses. Even if it turns out that on an hour-by-hour basis, you're more likely to die riding a horse than dancing on ecstasy (quite possible), no-one would think to ban riding and legalize E. But why not?
This attitude raises the critical question of why society tolerates –indeed encourages – certain forms of potentially harmful behaviour but not others, such as drug use.
Which is an extremely good question. It remains a good question even if it turns out that horse-riding is much safer than ecstasy. These are just the two examples that Nutt happened to pick, presumably because it allowed him to make that cheeky pun. Comparing the harms of such different activities is fraught with pitfalls anyway - are we talking about the harms of pure MDMA, or street ecstasy? Do we include people injured by horses indirectly (e.g. due to road accidents?)

Yet the whole point is that no-one even tries to do this. The dangerousness of drugs is treated as quite different to the dangerousness of sports and other such activies. The media indeed seem to have a particular interest in the harms of ecstasy - at least according to a paper cited by Nutt, Forsyth (2001), which claims that deaths from ecstasy in Scotland were much more likely to get newspaper coverage than deaths from paracetemol, Valium, and even other illegal drugs. It's not clear why this is. Indeed, when you make the point explicitly, as Nutt did, it looks rather silly. Why shouldn't we treat taking ecstasy as a recreational activity like horse-riding? That's something to think about.

Professor Nutt is well known in psychopharmacology circles both for his scientific contributions and for his outspoken views. These cover drug policy as well as other aspects of psychiatry - for one thing, he's strongly pro-antidepressants (see another provocative editorial of his here.)

As recently-appointed Chairman of the Advisory Council on the Misuse of Drugs - "an independent expert body that advises government on drug related issues in the UK" - Nutt might be thought to have some degree of influence. (He wrote the article before he became chairman). Sadly not, it appears, for as soon as the Government realized what he'd written he got a dressing down from British Home Secretary Jacqui Smith - Ooo-er:
For me that makes light of a serious problem, trivialises the dangers of drugs, shows insensitivity to the families of victims of ecstasy and sends the wrong message to young people about the dangers of drugs.
I'm not sure how many "young people" or parents of ecstasy victims read the Journal of Psychopharmacology, but I can't see how anyone could be offended by the Equasy article. Except perhaps people who enjoy hunting foxes while riding horses (Nutt compares this to drug-fuelled violence). Nutt's editorial was intended to point out that discussion over drugs is often irrational, and to call for a serious, evidence-based debate. It is not really about ecstasy, or horses, but about the way in which we conceptualize drugs and their harms. Clearly, that's just a step too far.

[BPSDB]

ResearchBlogging.orgD. Nutt (2008). Equasy -- An overlooked addiction with implications for the current debate on drug harms Journal of Psychopharmacology, 23 (1), 3-5 DOI: 10.1177/0269881108099672

Friday, February 13, 2009

Music Reviews?

Two sites I visit daily are Pitchfork and DrownedInSound. They're music review sites, which means they're full of stuff like this:
Major General moves a fair piece between the Hüsker Dü-like urgency of opener "Jeff Penalty" to the loungey, languid closer "I'm Done Singing", hitting mid-1990s alt-rock, tipsy Billy Joel balladry, Sunday afternoon swing, and Eastern European folk-tinged rave-ups along the way. I can't quite tell if "Jeff Penalty" is a highlight or the highlight, but it's certainly a winner, spinning a note-perfect yarn of seeing the Jello Biafra-less Dead Kennedys revue. In Nicolay's tale, the crowd reluctantly accepts "Jeff Whatisname" and refuses to stop believin'. It's as good a song about navigating aging in the scene-- never selling out, after all, just turns you into the old guy in the room-- as any on the Hold Steady's Stay Positive, and if they were to slip it into a setlist sometime soon, they wouldn't miss a step.
Now, I don't know if I'm alone in this, but I don't find that helpful. It would be interesting if you were really into the band in question & cared about every detail of what they do, but when you're looking for recommendations as to what to listen to, is that what you look for? And can a review really tell you why something is any good or not?

What I look for in these reviews is the bit where they tell you what other music it sounds like - because if it's similar to something I already like then I'll probably like it, if not I probably won't. Everyone has things they like, and there's little rhyme or reason to that - at the moment I'm listening to a lot of
Darker My Love, The Aliens, and 1990s Greenday, but I couldn't tell you why. I just like them. And I'll probably like stuff which sounds like them. That's what having a certain taste is, surely.

That's why Pandora, last.fm and other automatic music-recommendation engines are rapidly becoming more useful to me than reviewers. You can type a band or a song into Pandora and it'll recommend other music that sounds like it. Perfect. Except that at the moment Pandora only works if you live in the US, for copyright reasons...

Music Reviews?

Two sites I visit daily are Pitchfork and DrownedInSound. They're music review sites, which means they're full of stuff like this:
Major General moves a fair piece between the Hüsker Dü-like urgency of opener "Jeff Penalty" to the loungey, languid closer "I'm Done Singing", hitting mid-1990s alt-rock, tipsy Billy Joel balladry, Sunday afternoon swing, and Eastern European folk-tinged rave-ups along the way. I can't quite tell if "Jeff Penalty" is a highlight or the highlight, but it's certainly a winner, spinning a note-perfect yarn of seeing the Jello Biafra-less Dead Kennedys revue. In Nicolay's tale, the crowd reluctantly accepts "Jeff Whatisname" and refuses to stop believin'. It's as good a song about navigating aging in the scene-- never selling out, after all, just turns you into the old guy in the room-- as any on the Hold Steady's Stay Positive, and if they were to slip it into a setlist sometime soon, they wouldn't miss a step.
Now, I don't know if I'm alone in this, but I don't find that helpful. It would be interesting if you were really into the band in question & cared about every detail of what they do, but when you're looking for recommendations as to what to listen to, is that what you look for? And can a review really tell you why something is any good or not?

What I look for in these reviews is the bit where they tell you what other music it sounds like - because if it's similar to something I already like then I'll probably like it, if not I probably won't. Everyone has things they like, and there's little rhyme or reason to that - at the moment I'm listening to a lot of
Darker My Love, The Aliens, and 1990s Greenday, but I couldn't tell you why. I just like them. And I'll probably like stuff which sounds like them. That's what having a certain taste is, surely.

That's why Pandora, last.fm and other automatic music-recommendation engines are rapidly becoming more useful to me than reviewers. You can type a band or a song into Pandora and it'll recommend other music that sounds like it. Perfect. Except that at the moment Pandora only works if you live in the US, for copyright reasons...

Wednesday, February 11, 2009

What's the Best Antidepressant?

Edit: For more discussion of this paper, see here. (29.10.09)

It's escitalopram (Lexapro aka Cipralex) - hurrah! That is if you believe a meta-analysis just published in The Lancet. Should you believe it? The Lancet's a highly-regarded journal. However, this paper certainly bears a close reading.

The question of whether any antidepressant works "better" than any other is an old one. There are many who hold that all antidepressants are pretty much equal. Then again, there are people who deny that they really work at all. If you think about it, it would be pretty odd if tianeptine, a drug which enhances the reuptake of serotonin, was exactly as good as tranylcypromine, which blocks the breakdown of serotonin, noradrenaline and dopamine. They work in completely different ways, so one of them probably ought to work better. Every psychiatrist I've spoken to believes that some drugs are better than others - but they rarely agree on which ones are better. So there's room for more knowledge here.

The Lancet paper tries to establish the comparative efficacy and tolerability of 12 "newer" antidepressants. This includes SSRIs like fluoxetine (Prozac) and citalopram, as well as the noradrenaline reuptake inhibitor reboxetine (Edronax), dual-action venlafaxine (Effexor), and a few others. However, it doesn't include pre-1990 drugs like tricyclics and MAOis - sometimes regarded as a bit more powerful (but much less safe) than any newer drugs.

The headline results?
Mirtazapine, escitalopram, venlafaxine, and sertraline were among the most efficacious treatments [in that order], and escitalopram, sertraline, bupropion, and citalopram were better tolerated than the other remaining antidepressants [in that order]
In other words, escitalopram has the mildest side effects and is also very effective; mirtazepine is slightly more effective, but the side effects are considerably worse. Sertraline offers a good combination of tolerability and power, but escitalopram is even better. (Sertraline is much cheaper though, because the patent has expired.) Hurrah. Reboxetine, on the other hand, is declared total rubbish, being the least effective and also the worst tolerated of the 12. Oh dear.

But how did they reach these bold conclusions? They did a meta-analysis of 117 randomized controlled trials directly comparing one antidepressant against another ("head-to-head comparitor trials"). There was plenty of data - in total the trials covered 25,928 people. But the data was patchy. There are plenty of trials comparing fluoxetine vs. venlafaxine, but there are very few comparing, say, venlafaxine with citalopram. The diagram at the top shows the number of each type of comparison; some drugs were almost never compared with anything. Why? Generally, because these trials are run by drug companies comparing their newest product with an established competitor, in an attempt to show that theirs is better.

In an attempt to get around this problem, the authors did a "multiple-treatments meta-analysis"; essentially, this involves indirectly comparing drug A and drug B, by looking at direct comparisons of both to drug C. If A is much better than C, and B is a little better than C, you can work out that A is better than B.

Of course, this involves a lot of assumptions. And in the case where you have 12 drugs, not just 3, it becomes very very complicated. The methods section offers little insight into exactly what the authors did:
We did a random-effects model within a Bayesian framework using Markov chain Monte Carlo methods in WinBUGS (MRC Biostatistics Unit, Cambridge, UK). We modelled the binary outcomes in every treatment group of every study, and specified the relations among the odds ratios (ORs) across studies making diff erent comparisons. This method combines direct and indirect evidence for any given pair of treatments. We used p values less than 0·05 and 95% CIs (according to whether the CI included the null value) to assess signifi cance, and looked at a plausible range for the magnitude of the population difference. We also assessed the probability that each antidepressant drug was the most effi cacious regimen, the second best, the third best, and so on, by calculating the OR for each drug compared with an arbitrary common control group, and counting the proportion of iterations of the Markov chain in which each drug had the highest OR, the second highest, and so on. We ranked treatments in terms of acceptability with the same methods.
I don't know what that means, in practice. I know vaguely what it means in theory but in any kind of data-crunching like this, there are always things that can go wrong and difficult decisions to be made. So the analysis might have been completely reasonable - but we don't know. The authors deny that any drug company funded the study. I vaguely know some of them, and I don't believe for a second that they deliberately fixed the results in favor of escitalopram. But readers of the paper have no way of knowing whether their analysis method was reliable or not.

The more basic problem with this kind of thing is that it doesn't address the question of whether some drugs are better for some people. Anecdotal evidence strongly suggests that some are ("Sertraline made me feel terrible, but citalopram helped" - you hear this kind of thing a lot when talking about antidepressants) but there's not much hard evidence. For patients and doctors, though, it would be very useful to know which drug to prescribe to a certain person. The answer will not always be escitalopram.

Reboxetine may not be good for everyone, but for some people, it might be all they need. For example, given that reboxetine tends to have a stimulant-like "energizing" effect and to wake you up, you might assume that it would be good for someone whose main depression symptom was fatigue & sleepiness. You'd have to assume that, though, because there's no scientific evidence.

Finally, just for a sense of perspective, here's what happened in a couple of other recent antidepressant beauty contests. As you can see, they don't really agree on much...
  • Gartlehner et. al. (2008) concluded that "Second-generation antidepressants did not substantially differ in efficacy or effectiveness for the treatment of major depressive disorder on the basis of 203 studies; however, the incidence of specific adverse events and the onset of action differed."
  • Montgomery et. al. (2007) said that "[in "moderate-to-severe depression"] three antidepressants met these criteria [for superiority to any other drug]: clomipramine, venlafaxine, and escitalopram. Three antidepressants were found to have probable superiority: milnacipran, duloxetine, and mirtazapine." Note that clomipramine is an older drug not considered in the Lancet paper.
  • Papakostas et. al. (2008) report that "These results suggest that the NRI reboxetine and the SSRIs differ with respect to their side-effect profile and overall tolerability but not their efficacy in treating MDD."

ResearchBlogging.orgA CIPRIANI, T FURUKAWA, G SALANTI, J GEDDES, J HIGGINS, R CHURCHILL, N WATANABE, A NAKAGAWA, I OMORI, H MCGUIRE (2009). Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments meta-analysis The Lancet DOI: 10.1016/S0140-6736(09)60046-5

What's the Best Antidepressant?

Edit: For more discussion of this paper, see here. (29.10.09)

It's escitalopram (Lexapro aka Cipralex) - hurrah! That is if you believe a meta-analysis just published in The Lancet. Should you believe it? The Lancet's a highly-regarded journal. However, this paper certainly bears a close reading.

The question of whether any antidepressant works "better" than any other is an old one. There are many who hold that all antidepressants are pretty much equal. Then again, there are people who deny that they really work at all. If you think about it, it would be pretty odd if tianeptine, a drug which enhances the reuptake of serotonin, was exactly as good as tranylcypromine, which blocks the breakdown of serotonin, noradrenaline and dopamine. They work in completely different ways, so one of them probably ought to work better. Every psychiatrist I've spoken to believes that some drugs are better than others - but they rarely agree on which ones are better. So there's room for more knowledge here.

The Lancet paper tries to establish the comparative efficacy and tolerability of 12 "newer" antidepressants. This includes SSRIs like fluoxetine (Prozac) and citalopram, as well as the noradrenaline reuptake inhibitor reboxetine (Edronax), dual-action venlafaxine (Effexor), and a few others. However, it doesn't include pre-1990 drugs like tricyclics and MAOis - sometimes regarded as a bit more powerful (but much less safe) than any newer drugs.

The headline results?
Mirtazapine, escitalopram, venlafaxine, and sertraline were among the most efficacious treatments [in that order], and escitalopram, sertraline, bupropion, and citalopram were better tolerated than the other remaining antidepressants [in that order]
In other words, escitalopram has the mildest side effects and is also very effective; mirtazepine is slightly more effective, but the side effects are considerably worse. Sertraline offers a good combination of tolerability and power, but escitalopram is even better. (Sertraline is much cheaper though, because the patent has expired.) Hurrah. Reboxetine, on the other hand, is declared total rubbish, being the least effective and also the worst tolerated of the 12. Oh dear.

But how did they reach these bold conclusions? They did a meta-analysis of 117 randomized controlled trials directly comparing one antidepressant against another ("head-to-head comparitor trials"). There was plenty of data - in total the trials covered 25,928 people. But the data was patchy. There are plenty of trials comparing fluoxetine vs. venlafaxine, but there are very few comparing, say, venlafaxine with citalopram. The diagram at the top shows the number of each type of comparison; some drugs were almost never compared with anything. Why? Generally, because these trials are run by drug companies comparing their newest product with an established competitor, in an attempt to show that theirs is better.

In an attempt to get around this problem, the authors did a "multiple-treatments meta-analysis"; essentially, this involves indirectly comparing drug A and drug B, by looking at direct comparisons of both to drug C. If A is much better than C, and B is a little better than C, you can work out that A is better than B.

Of course, this involves a lot of assumptions. And in the case where you have 12 drugs, not just 3, it becomes very very complicated. The methods section offers little insight into exactly what the authors did:
We did a random-effects model within a Bayesian framework using Markov chain Monte Carlo methods in WinBUGS (MRC Biostatistics Unit, Cambridge, UK). We modelled the binary outcomes in every treatment group of every study, and specified the relations among the odds ratios (ORs) across studies making diff erent comparisons. This method combines direct and indirect evidence for any given pair of treatments. We used p values less than 0·05 and 95% CIs (according to whether the CI included the null value) to assess signifi cance, and looked at a plausible range for the magnitude of the population difference. We also assessed the probability that each antidepressant drug was the most effi cacious regimen, the second best, the third best, and so on, by calculating the OR for each drug compared with an arbitrary common control group, and counting the proportion of iterations of the Markov chain in which each drug had the highest OR, the second highest, and so on. We ranked treatments in terms of acceptability with the same methods.
I don't know what that means, in practice. I know vaguely what it means in theory but in any kind of data-crunching like this, there are always things that can go wrong and difficult decisions to be made. So the analysis might have been completely reasonable - but we don't know. The authors deny that any drug company funded the study. I vaguely know some of them, and I don't believe for a second that they deliberately fixed the results in favor of escitalopram. But readers of the paper have no way of knowing whether their analysis method was reliable or not.

The more basic problem with this kind of thing is that it doesn't address the question of whether some drugs are better for some people. Anecdotal evidence strongly suggests that some are ("Sertraline made me feel terrible, but citalopram helped" - you hear this kind of thing a lot when talking about antidepressants) but there's not much hard evidence. For patients and doctors, though, it would be very useful to know which drug to prescribe to a certain person. The answer will not always be escitalopram.

Reboxetine may not be good for everyone, but for some people, it might be all they need. For example, given that reboxetine tends to have a stimulant-like "energizing" effect and to wake you up, you might assume that it would be good for someone whose main depression symptom was fatigue & sleepiness. You'd have to assume that, though, because there's no scientific evidence.

Finally, just for a sense of perspective, here's what happened in a couple of other recent antidepressant beauty contests. As you can see, they don't really agree on much...
  • Gartlehner et. al. (2008) concluded that "Second-generation antidepressants did not substantially differ in efficacy or effectiveness for the treatment of major depressive disorder on the basis of 203 studies; however, the incidence of specific adverse events and the onset of action differed."
  • Montgomery et. al. (2007) said that "[in "moderate-to-severe depression"] three antidepressants met these criteria [for superiority to any other drug]: clomipramine, venlafaxine, and escitalopram. Three antidepressants were found to have probable superiority: milnacipran, duloxetine, and mirtazapine." Note that clomipramine is an older drug not considered in the Lancet paper.
  • Papakostas et. al. (2008) report that "These results suggest that the NRI reboxetine and the SSRIs differ with respect to their side-effect profile and overall tolerability but not their efficacy in treating MDD."

ResearchBlogging.orgA CIPRIANI, T FURUKAWA, G SALANTI, J GEDDES, J HIGGINS, R CHURCHILL, N WATANABE, A NAKAGAWA, I OMORI, H MCGUIRE (2009). Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments meta-analysis The Lancet DOI: 10.1016/S0140-6736(09)60046-5

Saturday, February 7, 2009

The Case Against Placebos

In one form or another, this argument has become popular: Most forms of complementary and alternative medicine (CAM) are just elaborate placebos. However, the placebo effect is incredibly powerful and useful, so these treatments are useful too.

Amongst many other people, Michael Brooks from the Guardian makes such a case here. It's an interesting idea. But I don't buy it.

Firstly, to my knowledge, there's no evidence that placebo treatments are clinically effective in the long term. There's no evidence against it, either, but this lack of evidence is important. (I'm not an expert so if such evidence exists, please say so!) There are, certainly, those well-known studies showing that placebos can improve symptoms in the lab, or in short-term clinical trials. And any doctor can tell you that placebos are a useful way of keeping people who want a quick fix satisfied. But is that what we want? Valium is a quick fix for anxiety and insomnia. It works great, in the short term. That doesn't mean you should take it every night. I don't think you should be taking a placebo every night either.

There's something pretty unsettling about the notion of handing out placebos. They're not physiologically addictive, but this doesn't mean that they can't become an expensive and damaging habit. Unlike many people, I'm not especially concerned about the "deception" aspect of it - if deception is what patients need to feel better, then they should get it. What I find unsettling is the idea that we should be medically treating people who we know don't need real medicine.

Prescribing someone any kind of treatment - whether real drugs, sugar pills, CAM, or anything else - legitimizes the notion that they're ill. The idea that one is ill is a very powerful one and you can do someone great harm by leading them to see themselves as ill unnecessarily.

Suppose you have a couple of weeks where you're feeling a bit tired, a bit down, a bit achey, a bit fuzzy. Maybe you're ill - maybe you've got mild anemia, for example. Most likely, though, you're not. Suppose you go to some kind of professional, whether it be your doctor, your homeopath, or anyone else. They might tell you that it's nothing to worry about, it's normal, just get on with your life, and it'll pass. You'd get annoyed, because you'd hoped for a quick fix, but you live with it, and you don't see yourself as suffering from a medical problem, so you don't expect to need treatment. (Could that be the most powerful placebo of all?)

But what if the professional thinks they can treat you? They give you a pill, or a foot rub, or some lovely oil, with confidence and a smile. You expect to get better, and you do. Hooray! Until the next time you start feeling a bit miserable. At which point, you go back to the professional, for more treatment. After all, it worked wonders last time. Again, it works, for a while. Then you start to notice a pain in your back you never did before - could the professional help? Sure. And while you're there, why not see if he has anything to help with that winter cold?

I do not know how often this happens, but it can't be uncommon. Medicalization is not just driven by drug companies. "Complementary and alternative" medicalization is at least as bad; perhaps worse, because drug companies at least have to convince trained doctors to prescribe their drugs. CAM, almost exclusively aimed at consumers, has no such constraints. There is nothing to stop any perfectly healthy person who believes themselves to be ill from going to a homeopath or a nutritionist, and having that belief validated. I would hope that no responsible CAM practioner would ever give a medical diagnosis, but this isn't the point - if you treat someone, even with sugar pills, you are telling them that they are ill.

If the claims of CAM practioners, or indeed CAM-as-placebo supporters, were valid, there probably wouldn't be such demand for CAM. If people really could go to a professional placebo-giver and walk out feeling happy and healthy for ever after, that would be great. Such a person would, presumably, rarely if ever need to see another practitioner, at least for the original ailment (and how many can one person have?) Unfortunately, I don't see this happening very often, although again I'm not aware of any evidence on this point. Saying that most CAM customers are satisfied with their service is not equivalent. The sheer amount of CAM, like the sheer amount of antidepressants being prescribed today, strongly suggests that it is, to an important extent, creating its own market.

[BPSDB]

The Case Against Placebos

In one form or another, this argument has become popular: Most forms of complementary and alternative medicine (CAM) are just elaborate placebos. However, the placebo effect is incredibly powerful and useful, so these treatments are useful too.

Amongst many other people, Michael Brooks from the Guardian makes such a case here. It's an interesting idea. But I don't buy it.

Firstly, to my knowledge, there's no evidence that placebo treatments are clinically effective in the long term. There's no evidence against it, either, but this lack of evidence is important. (I'm not an expert so if such evidence exists, please say so!) There are, certainly, those well-known studies showing that placebos can improve symptoms in the lab, or in short-term clinical trials. And any doctor can tell you that placebos are a useful way of keeping people who want a quick fix satisfied. But is that what we want? Valium is a quick fix for anxiety and insomnia. It works great, in the short term. That doesn't mean you should take it every night. I don't think you should be taking a placebo every night either.

There's something pretty unsettling about the notion of handing out placebos. They're not physiologically addictive, but this doesn't mean that they can't become an expensive and damaging habit. Unlike many people, I'm not especially concerned about the "deception" aspect of it - if deception is what patients need to feel better, then they should get it. What I find unsettling is the idea that we should be medically treating people who we know don't need real medicine.

Prescribing someone any kind of treatment - whether real drugs, sugar pills, CAM, or anything else - legitimizes the notion that they're ill. The idea that one is ill is a very powerful one and you can do someone great harm by leading them to see themselves as ill unnecessarily.

Suppose you have a couple of weeks where you're feeling a bit tired, a bit down, a bit achey, a bit fuzzy. Maybe you're ill - maybe you've got mild anemia, for example. Most likely, though, you're not. Suppose you go to some kind of professional, whether it be your doctor, your homeopath, or anyone else. They might tell you that it's nothing to worry about, it's normal, just get on with your life, and it'll pass. You'd get annoyed, because you'd hoped for a quick fix, but you live with it, and you don't see yourself as suffering from a medical problem, so you don't expect to need treatment. (Could that be the most powerful placebo of all?)

But what if the professional thinks they can treat you? They give you a pill, or a foot rub, or some lovely oil, with confidence and a smile. You expect to get better, and you do. Hooray! Until the next time you start feeling a bit miserable. At which point, you go back to the professional, for more treatment. After all, it worked wonders last time. Again, it works, for a while. Then you start to notice a pain in your back you never did before - could the professional help? Sure. And while you're there, why not see if he has anything to help with that winter cold?

I do not know how often this happens, but it can't be uncommon. Medicalization is not just driven by drug companies. "Complementary and alternative" medicalization is at least as bad; perhaps worse, because drug companies at least have to convince trained doctors to prescribe their drugs. CAM, almost exclusively aimed at consumers, has no such constraints. There is nothing to stop any perfectly healthy person who believes themselves to be ill from going to a homeopath or a nutritionist, and having that belief validated. I would hope that no responsible CAM practioner would ever give a medical diagnosis, but this isn't the point - if you treat someone, even with sugar pills, you are telling them that they are ill.

If the claims of CAM practioners, or indeed CAM-as-placebo supporters, were valid, there probably wouldn't be such demand for CAM. If people really could go to a professional placebo-giver and walk out feeling happy and healthy for ever after, that would be great. Such a person would, presumably, rarely if ever need to see another practitioner, at least for the original ailment (and how many can one person have?) Unfortunately, I don't see this happening very often, although again I'm not aware of any evidence on this point. Saying that most CAM customers are satisfied with their service is not equivalent. The sheer amount of CAM, like the sheer amount of antidepressants being prescribed today, strongly suggests that it is, to an important extent, creating its own market.

[BPSDB]

Wednesday, February 4, 2009

"Voodoo Correlations" in fMRI - Whose voodoo?

It's the paper that needs little introduction - Ed Vul et. al.'s "Voodoo Correlations in Social Neuroscience". If you haven't already heard about it, read the Neurocritic's summary here or the summary at BPS research digest here. Ed Vul's personal page has some interesting further information here. (Probably the most extensive discussion so far, with a very comprehensive collection of links, is here.)

Few neuroscience papers have been discussed so widely, so quickly, as this one. (Nature, New Scientist, Newsweek, Scientific American have all covered it.) Sadly, both new and old media commentators seem to have been more willing to talk about the implications of the controversy than to explain exactly what is going on. This post is a modest attempt to, first and foremost, explain the issues, and then to evaluate some of the strengths and limitations of Vul et al's paper.

[Full disclosure: I'm an academic neuroscientist who uses fMRI, but I've never performed any of the kind of correlational analyses discussed below. I have no association with Vul et al., nor - to my knowledge - with any of the authors of any of the papers in the firing line. ]

1. Vul et al.'s central argument. Note that this is not their only argument.

The essence of the main argument is quite simple: if you take a set of numbers, then pick out some of the highest ones, and then take the average of the numbers you picked, the average will tend to be high. This should be no surprise, because you specifically picked out the high numbers. However, if for some reason you forgot or overlooked the fact that you had picked out the high numbers, you might think that your high average was an interesting discovery. This would be an error. We can call it the "non-independence error", as Vul et al. do.

Vul et al. argue that roughly half of the published scientific papers in a certain field of neuroscience include results which fall prey to this error. The papers in question are those which attempt to correlate activity in certain parts of the brain (measured using fMRI) against behavioural or self-report measures of "social" traits - essentially, personality. Vul et al. call this "social neuroscience", but it's important to note that it's only a small part of that field.

Suppose, for example, that the magnitude of the neural activation in the amygdala caused by seeing a frightening picture was positively correlated with the personality trait of neuroticism - tending to be anxious and worried about things. The more of a worrier a person is, the bigger their amygdala response to the scary image. (I made this example up, but it's plausible.)

The correlation coefficient, r, is a measure of how strong the relationship is. A coefficient of 1.0 indicates a perfect linear correlation. A coefficient of 0.4 would mean that the link was a lot weaker, although still fairly strong. A coefficient of 0 indicates no correlation at all. This image from Wikipedia shows what linear correlations of different strengths "look like".

Vul's argument is that many of the correlation coefficients appearing in social neuroscience papers are higher than they ought to be, because they fall prey to the non-independence error discussed above. Many reported correlations were in the range of r=0.7-0.9, which they describe as being implausibly high.

They say that the problem arises when researchers search across the whole brain for any parts where the correlation between activity and some personality measure is statistically significant - that is to say, where it is high - and then work out the average correlation coefficient in only those parts. The reported correlation coefficient will tend to be a high number, because they specifically picked out the high numbers (since only high numbers are likely to be statistically significantly different from zero.)

Suppose that you divided the amygdala into 100 small parts (voxels) and separately worked out the linear correlation between activity and neuroticism for each voxel. Suppose that you then selected those voxels in which the correlation was greater than (say) 0.8, and work out the average: (say) 0.86. This does not mean that activity across the amygdala as a whole is correlated with neuroticism with r=0.86. The "full" amygdala-neuroticism correlation must be less than this. (Clarification 5.2.09: Since there is random noise in any set of data, it is likely that some of those correlations which reached statistical significance were those which were very high by chance. This does not mean that there weren't any genuinely correlated voxels. However, it means that the average of the correlated voxels is not a measure of the average of the genuinely correlated voxels. This is a case of regression to the mean.)

Vul et. al. say that out of 52 social neuroscience fMRI papers they considered, 28 (54%) fell prey to this problem. They determined this by writing to the authors of the papers and asking them to answer some multiple-choice questions about their statistical methodology.This chart shows the reported correlation coefficients in the papers which seemed to suffer from the problem (in red) vs. those which didn't (in green); unsurprisingly, the ones which do tended to give higher coefficients. (Each square is one paper.)
That's it. It's quite simple. but....there is a very important question remaining. We've said that non-independent analysis leads to "inflated" or "too high" correlations, but too high compared to what? Well, the "inflated" correlation value reported by a non-independent analysis is entirely accurate - in that it's not just made up - but it only refers to a small and probably unrepresentative collection of voxels. It only becomes wrong if you think that this correlation is representative of the whole amygdala (say).

So you might decide that the "true" correlation might be the mean correlation over all of the voxels in the amygdala. But that's only one option. There are others. It would be equally valid to take the average correlation over the whole amygdalo-hippocampal complex (a larger region). Or the whole temporal cortex. That would be silly, but not an error - so long as you make it clear what your correlation refers to, any correlation figure is valid. If you say "The voxel in the amygdala with the greatest correlation with neuroticism in this data-set had an r=0.99", that would be fine, because readers will realize that this r=0.99 figure was probably an outlier. However, if you say, or imply, that "The amygdala was correlated with neuroticism r=0.99" based on the same data, you're making an error.

My diagram (if you can call it that...) to the left illustrates this point. The ovals represent the brain. The colour of each point in the brain represents the degree of linear correlation between some particular fMRI signal in that spot, and some measure of personality.

Oval 1 represents a brain in which no area is really correlated with personality. So most of the brain is gray, meaning very low correlation. But a few spots are moderately correlated just by chance, so they show up as yellow.

Oval 2 represents a brain in which a large blob of the brain (the "amygdala" let's call it) is really correlated quite well i.e. yellow. However, some points within this blob are, just by chance, even more correlated, shown in red.

Now, if you took the average correlation over the whole of the "amygdala", it would be moderate (yellow) - i.e. picture 2a. However, suppose that instead, you picked out those parts of the brain where the correlation was so high that it could not have occurred by chance (statistically significant).

We've seen that yellow spots often occur by chance even without any real correlation, but red ones don't - it's just too unlikely. So you pick out the red spots. If you average those, the average is obviously going to be very high (red). i.e. picture 2b. But if you then noticed that all of the red spots were in the amygdala, and said that the correlation in the amygdala was extremely high, you'd be making (one form of) the non-independence error.

Some people have taken issue with Vul's argument, saying that it's perfectly valid to search for voxels significantly correlated with a behaviour, and then to report on the strength of that correlation. See for example this anonymous commentator:
many papers conducted a whole brain correlation of activation with some behavioral/personality measure. Then they simply reported the magnitude of the correlation or extracted the data for visualization in a scatterplot. That is clearly NOT a second inferential step, it is simply a descriptive step at that point to help visualize the correlation that was ALREADY determined to be significant.
The academic responses to Vul make the same point (but less snappily).

The truth is that while there is technically nothing wrong with doing this, it could easily be misleading in practice. Searching for voxels in the brain where activation is significantly correlated with something is perfectly valid, of course. But the magnitude of the correlation in these voxels will be high by definition. These voxels are not representative because they have been selected for high correlation. In particular, even if these voxels all happen to be located within, say, the amygdala, they are not representative of the average correlation in the amygdala.

A related question is whether this is a "one-step" or a "two-step" analysis. Some have objected t that Vul implies it is a two-step analysis in which the second step is "wrong", whereas in fact it's just a one-step analysis. That's a purely semantic issue. There is only one statistical inference step (searching for significantly correlated voxels). But to then calculate and report the average correlation in those voxels is a second, descriptive step. The second step is not strictly wrong but it could be misleading, not because it introduces a new, flawed analysis, but because it would be a misinterpretation of the results of the first step.

2. Vul et al.'s secondary argument The argument set out above is not the only argument in the Vul et. al. paper. There's an entirely separate one introduced on Page 18 (Section F.)

The central argument is limited in scope. If valid it means that some papers, those which used non-independent methods to compute correlations, reported inappropriately high correlation coefficients. But it does not even claim that the true correlation coefficients were zero, or that the correlated parts of the brain were in the wrong places. If one picks out those voxels in the brain which are significantly correlated with a certain measure, it may be wrong to then compute the average correlation, but the fact that the correlation is significantly greater than zero remains. Indeed, the whole argument rests upon the fact that they are!

but...this all assumes that the calculation of statistical significance was done correctly. Such calculations can get very complex when it comes to fMRI data. It can be difficult to correct for the multiple comparisons problem. Vul et al. point out that some of the papers in question (they only cite one, but say that the same also applies to an unspecified number of others), the calculation of significance seems to have been done wrong. They trace the mistake to a table printed in a paper published in 1995. They accuse some people of having misunderstood this table, leading to completely wrong significance calculations.
The per-voxel false detection probabilities described by E. et al (and others) seem to come from Forman et al.’s Table 2C. Values in Forman et al’s table report the probability of false alarms that cluster within a single 2D slice (a single 128x128 voxel slice, smoothed with a FWHM of 0.6*voxel size). However, the statistics of clusters in 2D (a slice) are very different from those of a 3D volume: there are many more opportunity for spatially clustering false alarm voxels in the 3D case, as compared to the 2D case. Moreover, the smoothing parameter used in the papers in question was much larger than 0.6*voxel size assumed by Forman in Table 2C (in E. et al., this was >2*voxel size). The smoothing, too, increases the chances of false alarms appearing in larger spatial clusters.
If this is true, then it's a knock-down point. Any results based upon such a flawed significance calculation would be junk, plain and simple. You'd need to read the papers concerned in detail to judge whether it was, in fact, accurate. But this is a completely separate point to Vul et al.'s primary non-independence argument. The primary argument concerns a statistical phenomenon; this secondary argument accuses some people of simply failing to read a paper. The primary argument suggests that some reported correlation coefficients are too high, but only this second argument suggests that some correlation coefficients may in fact be zero. And Vul et al. do not say how many papers they think suffer from this serious flaw.

These two arguments seem to have gotten mixed up in the minds of many people. Responses to the Vul et al. paper have seized upon the secondary accusation that some correlations are completely spurious. The word "voodoo" in the title can't have helped. But this misses the point of Vul et al.'s central argument, which is entirely separate, and seems almost indisputable so far as it goes.

3. Some Points to Note
  • Just to reiterate, there are two arguments about brain-behaviour correlations in Vul et al. The main one - the one everyone's excited about - purports to show that 54% of the reported correlations in social neuroscience are weaker than they have been claimed, but cannot be taken to mean that they are zero. The second one claims that some correlations are entirely spurious because they were based on a very serious error stemming from misreading a paper. But at present only one paper has been named as a victim of this error.
  • The non-independence error argument is easy to understand and isn't really about statistics at all. If you've read this far, you should understand it as well as I do. There are no "intricacies". (The secondary argument, about multiple-comparison testing in fMRI, is a lot trickier however.)
  • How much the non-independence error inflates correlation sizes is difficult to determine, and it will vary in every different case. Amongst many other things the degree of inflation will depend upon two factors: the strictness of the statistical threshold used to pick the voxels (a stricter threshold = higher correlations picked); and the number of voxels picked (if you pick 99% of the voxels in the amygdala, then that's nearly as good as averaging over the whole thing; if you pick the one best voxel, then you could inflate the correlation enormously.) Note, however, that many of the papers that avoided the error still reported pretty strong correlations.
  • It's easy to work out brain activity-behaviour correlations while avoiding the non-independence problem. Half of the papers Vul et al. considered in fact did this (the "green" papers). One simply needs to select the voxels in which to calculate the average correlation based on some criteria other than the correlation itself. One could, for example, use an anatomy textbook to select those voxels making up the amygdala. Or, one could select those voxels which are strongly activated by seeing a scary picture. Many of the "green" papers which did this still reported strong correlations (r=0.6 or above).
  • Vul et al.'s criticisms apply only to reports of linear correlations between regional fMRI activity and some behavioural or personality measure. Most fMRI studies do not try to do this. In fact, many do not include any behavioural or personality measures at all. At the moment, fMRI researchers are generally seeking to find areas of the brain which are activated during experience of a certain emotion, performance of a cognitive process, etc. Such papers escape entirely unscathed.
  • Conversely, although Vul et al. looked at papers from social neuroscience, any paper reporting on brain activity-behaviour linear correlations could suffer from the non-independence problem. The fact that the authors happened to have chosen to focus on social neuroscience is irrelevant.
  • Indeed, Vul & Kerwisher have also recently written an excellent book chapter discussing the non-independence problem in a more general sense. Read it and you'll understand the "voodoo" better.
  • Therefore, "social neuroscience" is not under attack (in this paper.) To anyone who's read & understood the paper, this will be quite obvious.
4. Remarks: On the Art of Voodoo Criticism Vul et al. is a sound warning about a technical problem that can arise with a certain class of fMRI analyzes. The central point, although simple, is not obvious - no-one has noticed it before, after all - and we should be very grateful to have it pointed out. I can see no sound defense against the central argument: the correlations reported on the "red list" papers are probably misleadingly high, although we do not know by how much. (The only valid defense would be to say that your paper did not, in fact, use a non-independent analysis.)

Some have criticized Vul et. al. for their combative or sensationalist tone. It's true that they could have written the paper very differently. They could have used a conservative academic style and called it "Activity-behaviour correlations in functional neuroimaging: a methodological note". But no-one would have read it. Calling their paper "Voodoo correlations" was a very smart move - although there is no real justification for this, it brilliantly served to attract attention. And attention is what papers like this deserve.

But this paper is not an attack on fMRI as a whole, or social neuroscience as a whole, or even the calculation of brain-behaviour correlations as a whole. Those who treat it as such are the real voodoo practitioners in the old-fashioned sense: they see Vul sticking pins into a small part of neuroscience, and believe that this will do harm to the whole of it. This means you, Sharon Begley of Newsweek : "The upcoming paper, which rips apart an entire field: the use of brain imaging in social neuroscience...". This means you, anyone who read about this paper and thought "I knew it". No, you didn't, you may have thought that there was something wrong with all of these social neuroscience fMRI papers, but unless you are Ed Vul, you didn't know what it was.

There's certainly much wrong with contemporary cognitive neuroscience and fMRI. Conceptual, mathematical, and technical problems plague the field, just a few of which have been covered previously on Neuroskeptic and on other blogs as well as in a few papers (although surprisingly few). In all honesty, a few inflated correlations ranks low on the list of the problems with the field. Vul's is a fine paper. But its scope is limited. As always, be skeptical of the skeptics.

ResearchBlogging.orgEdward Vul, Christine Harris, Piotr Winkielman, Harold Pashler (2008). Voodoo Correlations in Social Neuroscience Perspectives on Psychological Science