Definición Lying with statistics
We want you to be prepared for the charlatans of statistics out there. That is why we introduce you to some of their favorite tricks and juggleries. Please remember: Despite their bad reputation, statistics have a great importance attached to them. To put it with the German statistician Elisabeth Noelle-Neumann: ‘For me, statistics are the information medium for the responsible. Those who know how to handle them will be less open to manipulation. The sentence ‘You can prove anything with statistics’ is only true for the lazy people who do not feel like taking a closer look.’
In a (fictitious) study, the average age at the time of death for various professions is collected. The results of the study are astonishing. While pilots and professional football players die under the age of 60 on average, teachers and physicians live significantly longer. What is the reason for this? Dangerous working conditions, too much stress at the football field, too many aircraft accidents? No. The reason is that in this study, professions are compared that are not open to a direct comparison, because a third variable (besides profession and age) disrupts the investigation: the average age. Real professional footballers only came up in the 60s and the aero industry has grown exponentially in the last years. Accordingly, there are on average more young pilots and professional footballers than young teachers or physicians. If a footballer or pilot dies at young age due to an accident or illness, those cases carry more weight than in the other professions, because they are offset by less cases that die at high age.
Clothes make the man and digits make numbers - the more precise numbers are, the more trust we put into them. This trick has already been known in Ancient Greece: Herodotus wrote after the Persian war that the enemy’s army counted 5,283,220 men. Though the historian had exaggerated immensely (the real number was closer to 15,000), he made a very well informed impression and the victory of the Greeks shone in a glorious light. Another example comes from the English theologian John Lightfoot who knew about the power of precision: ‘Heaven and earth and everything that comes with it, was created by the Trinity at the same moment: On Sunday, October 21st, 4004 BC, at 9 am’. Who could question the date of the world’s creation with such accurateness? But spurious accuracy as a means of persuasion is not a thing of the past: If today an economic report says that each year Germans do 1,450,000,000 hours of overtime, this number should be taken with caution. Calculated on the basis of approximately 40 million employees in Germany, this would be exactly 36.25 hours of overtime per year per person. It is equally likely that the real number is anything between 33 and 40 hours per person. So it would be more honest to say that the number of overtime hours in Germany is anywhere between one and two billion according to estimations.
‘I am a bit skeptic towards statistics’, said Franklin D. Roosevelt, the 32nd president of the United States. ‘Because, according to statistics, a millionaire and a poor lad each own half a million.’ If a statistician adds the wealth of a millionaire and someone without money, divides the sum by two, he calculates two half-millionaires. This sounds exaggerated, but it points towards a problem in statistics. The ‘average’ in statistics means the arithmetic mean. Each time there are extreme values in a group, the mean should be taken with a pinch of salt - especially if the sample is not very big. But doesn’t this level out across a large sample? Let’s take a small town in Massachusetts with 10,000 households: On average, the household income may be USD 60,000 per year per household (let’s assume it is quite a wealthy area). The next year, the King family is moving to this town. Thanks to owning a trading company, their annual household income is USD 200 million. Statistically, the small town now does not earn USD 600 million per year any longer (10,000 households with USD 60,000), but USD 800 million. The average household income now is USD 80,000 - according to statistics. Therefore, extreme values do not lose their influence on the average, even in larger groups. The arithmetic mean has a competitor: the median. If a statistician calculates the average monthly income of 7 lawyers using the arithmetic mean, he adds the income together (1,000 + 2,000 + 5,000 + 7,000 + 10,000 + 20,000 + 95,000 = 140,000) and divides the result by seven (140,000 / 7 = 20,000). Every lawyer seems to earn the quite impressive amount of USD 20,000 per month. The median, however, tells us that the mean value is USD 7,000. This value is located exactly in the middle - three lawyers earn less, three have a higher income. If an average lawyer earns USD 20,000 or 7,000 per month is depending on the choice between arithmetic mean and median.
Imagine a newspaper title: ‘Wife was murdered again - marriage is dangerous’. The responsible editor drew this conclusion because statistics revealed that 75 percent of all women are murdered by their husband. Declaring marriage a breakneck thing is a fallacy. The author only looked at a subset of the data base: Correctly he should not have asked how many married women are killed by their husbands, but how many married women pass away violently compared to unmarried women. Let’s take the small town of Demise County as an example. In this town in the last year (it was not a very good year for the local police) three married women were battered to death by their husbands. On top of this, one unmarried woman was murdered. Thus, the average 75% of all murdered women were killed by their husbands (three out of four murder victims in total). Demise County has a population of 6,000 married and 1,000 unmarried women. The likelihood of passing away violently therefore is 1 in 2,000 for the married women (three homicides in 6,000 women), but 1 in 1,000 for single ladies. The conclusion is that the wedding band is a clear lifesaver in Demise County (on second position - behind moving out of this area).
The climate is changing, the sea level rises; does this mean that the U.S. will be inundated soon? A logical conclusion. But in reality meteorologists cannot foresee the exact effects of global warming. Even the most stringent prognoses can only conditionally be carried further. Assuming the average height of an 18-year old in 1970 was 5’’11; 20 years later the average young man was measuring 6’’1. Even if the typical grew further to 6’’3 in 2010 - another 20 years later, you cannot take this trend to infinity without hesitation and state that another 20 years later, in 2030, the average 18-year old guy will be 6’’7.
‘Do you accept nuclear energy from a nuclear power plant?’ If you ask ten Greenpeace activists and ten employees of a power plant, you might end up with ten ‘Yes’ and ten ‘No’ answers. The sample, meaning the people that participated in the research, can be chosen one way or another. A convinced environmentalist could easily prove his hypothesis ‘Most of the population disapproves of nuclear energy’ by asking mostly environmental activists. But the manipulation of samples can also be more subtle. If you want to prove that people have become more phlegmatic, you should conduct research that carries out the interviews between 8 and 10pm by calling people at home. The likelihood that more homely people are included in the survey rises naturally - absolutely independent of age, sex, income or region. The opposite result can be reached if you conduct the same survey on a Sunday afternoon in a park. Professional researchers therefore not only include different target groups but also ensure that the interviews are carried out on different workdays and the method includes both telephone interviews and interviews in public areas (e.g. on the street).
The same environmentalist also asks: ‘Are you also in favor of the protection of the environment and against nuclear energy?’. Many interviewees will affirm, because they do not want to be seen as the environmental bad guy. In the same breath, they are classified as a nuclear opponent. Insidious questions are a great instrument for manipulation. The problem is, that the original questions are not mentioned any longer once the statistics are published. The headline then could be: ‘88 percent of U.S. citizens refuse beef’ - a shock for all cattle breeders. Who would expect, that the underlying question was: ‘Can you imagine reducing your consumption of beef in light of various food scandals, the BSE threat and a higher incidence of parasites like threadworms in meat?’.
Graphical representations of statistics are often improved for cosmetic reasons. The most important thing to keep in mind is the scale. Diagrams emphasize important data points to make a statistical result more accessible - but sometimes they overshoot and are simply wrong. Such misleading diagrams are often found across various media outlets. Editors can manipulate scales by choosing different lengths for the intervals or not starting a bar at the zero point.
The ABC party celebrates the increase of their women’s quota by 100 percent - this sounds impressive. The XYZ party has to admit compunctiously that they were only able to increase their female share by 20%. But how many women do we talk about? Assume that the ABC party had four delegates and now adds four more to this. This is in fact an increase of 100 percent. At the end, the ABC party now has 8 women - amongst more than 100 representatives. The absolute share of female delegates therefore is only eight percent. On the other side, the XYZ party already has 40 women amongst their 100 representatives - a women’s quota of already 40 percent. If they add eight more ladies, they can only claim an increase of 20 percent, while in fact it would have been advantageous for them to claim that they added 100 percent more female representatives to their group than the ABC party (eight instead of four). Or to stress that their parliamentary party has 400 percent more female delegates than the ABC group (32 more). You see: You can claim nearly everything with percentages. Let’s look at another example of an ambitious small winemaker who prides himself that he has sold 57 percent of his white wine, 30 percent of his red wine and 13 percent of his sparkling wine. Who would have thought that this young vintner this year has sold 13 bottles of Sauvignon Blanc, 7 bottles of Cabernet Sauvignon and 3 bottles of bubbly wine to his customers?
Sinking unemployment figures are always celebrated by the authorities. Subsequently, nobody asks who those unemployed are and how their number is counted. ‘An unemployed person is someone without a job’ - this sounds plausible, but is very naive. Unemployment is defined by a person of a certain age who is actively looking for a job and is unable to find work. The millionaire’s wife, the long term student, the job-seeker who has not registered with the authorities and the invalidity pensioner do not have a job, but also do not count into the official statistics. As long as the reader does not know, how the unemployment figures are calculated, statistics cannot do any harm. But if the statistician reaches into his magic hat and presents a number to us, which only he knows the basis of, we should be skeptical. If you want to keep track of economic developments, the number of people contributing to social security systems is much more illustrative than the unemployment figures.
Not every probability for a (statistical) incidence can be captured correctly based on pure intuition. Let’s imagine the following: You have just spent three weeks on holidays in a tropical country. It is reported in the news that red fever is beginning to spread in that country. All tourists are advised to have themselves tested for that illness. The next day, your doctor tells you that the test has a default of one percent for people who do not carry the virus and no fault for infected people. What does that mean? Of 100 healthy people tested, 99 are recognized as healthy. One person will be declared ill although she is in fact healthy. If a person is ill, the test result will clearly show this. Two days later, you check back in with your doctor. In the meantime you have researched that only every thousandth tourist from the tropical country has caught the bug. You think positively, but then you get the shocking result: The test says that you are infected. How likely is it that you really suffer from red fever? Your gut feeling tells you that ‘approximately 99 percent’ means a very high probability. But is that right? Let’s do the maths: Assuming that roughly 100,000 tourists who returned from your holiday destination have taken the test for red fever. As every thousandth tourist is infected, there will be around 100 ill people amongst those 100,000 tourists. All 100 infected people will be correctly recognized by the test. Out of the 99,900 healthy people, 99 percent will be correctly tested as ‘not infected’. But one percent of those who in reality are not infected will get the test result ‘infected with red fever’. That is an impressive 999 persons out of the group of 99,900. In total, 100 people got diagnosed correctly as ill and 999 wrongly. The probability that you are really infected is not ‘approximately 99 percent’ but only 9 percent.Calculation: 100 infected people / (100 infected people + 999 wrongly diagnosed) = 0.09.
That is how much you should trust your gut feeling when it comes to probabilities. There are a lot of entertaining books on the topic, some of which served as an inspiration for this article. If you are interested in a comprehensive introduction, we recommend ‘Statistics for Dummies’ by Deborah Rumsey.
Tenga en cuenta que las entradas de nuestro glosario son explicaciones simplificadas de términos estadísticos. Nuestro objetivo es hacerlo accesible para un público amplio, así que puede que algunas definiciones no cumplan los estándares científicos.