Madelaine White at the Globe and Mail has taken umbrage over the a recent study by a prof at U of T on the relationship between women’s estimated fertility and their ability to distinguish gay from straight men by looking at black-and-white photos of them.
Journalists are paid to attract attention, not to tell the truth, and this fact, combined with a complete ignorance of data analysis and scientific technique results in journalists routinely mangling scientific stories. A colleague from my Caltech days once commented that a recent issue of the LA Times had five “science” stories, all dealing with projects he was familiar with: four of them were completely wrong, and the last was so badly distorted he wasn’t absolutely sure it was even the experiment he was thinking of.
In the present case, Ms White ignores all the interesting and important bits of the study. She reports on the number of participants but tells us nothing–not one thing–about the results. Nowhere in her tale does she mention anything at all about the actual outcome of the experiment, and this cannot be emphasized enough.
This is a common feature of “science” reporting: even on those rare occasions when the experiment is described with a modicum of accuracy, the outcome is always completely ignored. Instead of discussing the outcome, Ms White flat-out misrepresents the results, saying, “Ladies, next time your ‘gaydar’ goes off, trust it – especially if you’re ovulating, according to a new study.”
The problem is that the study says nothing of the kind. There is literally nothing in the actual paper (which is hidden behind a paywall but which I have access to via my university account) that could possibly be used by a sane and numerate person to conclude that women’s judgement of male sexual orientation ought to be trusted.
In particular, the actual result of the study, which Ms White never says anything about, is that women are right in their judgement of sexual orientation based on black-and-white photos of men about 60% of the time.
That’s a rate much higher than the 50/50 that pure chance would give, but it is also much lower than the accuracy with which the stock market can be predicted by algorithms that still lose money (trust me on this–I know.)
The popular press routinely makes this mistake when making up stories about tests that discriminate between conditions: they treat “statistically better than chance” to mean “perfectly discriminating”, whereas in reality these are completely unrelated. It is trivial and common for tests to do much better than chance but still be completely useless as practical predictors. This is especially true when the test is for something that occurs rarely: in reality only about 5% of men are gay, but a woman who trusts her “gaydar” absolutely will, with a 60% success rate, judge that almost 40% of men are!
All of this is lost on Ms White, her editors and her publisher, whose interests are in page views and lots of comments, not informing the public.
All that said, the study does have some peculiar features that make me question even the modest effect the researchers do measure. In particular, they judge women’s fertility at the time of the test by asking them where they are in their menstrual cycle and what their typical cycle length is, and then counting backward by 14 days from the presumed start of their next period.
All else being equal, one would expect that women would be randomly distributed across all phases of their cycle. Also, all else being equal, one would expect that women might be tempted to lie about their current menstrual status if they were menstruating at the time of the study: women tend to be reticent about this, particularly when talking to men (the paper does not give the sex of the researcher who actually took these data.)
If we look at the actual distribution of women within their menstrual cycle from Figure 1 in the paper, reproduced with modifications below, we can see that there is a considerable surplus of women reporting to be far from their period:
(Based on Figure 1 from: Nicholas O. Rule, Katherine S. Rosen, Michael L. Slepian and Nalini Ambady, Mating Interest Improves Women’s Accuracy in Judging Male Sexual Orientation,
published online 13 June 2011 Psychological Science, DOI: 10.1177/0956797611412394, The online version of this article can be found at: http://pss.sagepub.com/content/early/2011/06/13/0956797611412394)
I have suppressed the parabolic curve the authors use to make their primary point, and added a grid (in red) to demonstrate mine.
The “Time From Ovulation” scale can be thought of roughly in terms of 0.25 equaling one week. Women at either end of the scale are close to their period. Women in the middle are far from it. If we count the number of women in each quarter of the range we find:
First quarter: 3
Second quarter: 11
Third quarter: 18
Fourth quarter: 7
40 women were tested but one reported an irregular menstrual cycle and so was dropped from the data analysis, leaving the 39 we see on the graph.
The distribution is notably not flat, as shown in the following histogram:
The question is, how likely is it that this distribution would come about by chance, given that women ought to be distributed uniformly across the phase of their menstrual period?
The answer is easy to reach with about 20 lines of Python: we can simply generate 39 random numbers and drop them in four different bins according to size. On average we would expect to get about 10 counts per bin (the 9.75 line shown in the above figure, which is one quarter of 39). In reality there is some variation: that’s what randomness is all about. But to get the observed distribution, with just three people in the lowest bin, eleven in the next, eighteen in the next and back down to seven in the last is really unlikely: it happens only about one in a hundred thousand times we run the random numbers.
So this is telling us there is something fishy about these data. I don’t know what. It could be that chance has produced this outcome. That can’t be ruled out. But it’s a lot more likely that the experimental subjects are not being entirely truthful about which phase of their menstrual period they are in, and that could have rather dramatic consequences for the paper’s conclusions.
This is how science is actually done: people take data, think about it, publish their analysis of it, and get subject to criticism by curious gadflies like me, who would like nothing more than to knock over the other guy’s results, just because we can.
Popular “science” articles have almost nothing to do with this process, instead focusing on misleading and annoying the public in the hopes of getting a steady stream of advertising revenue.