Charles Petzold



De-Obfuscating the Statistics of Mass Shootings

July 5, 2015
Roscoe, N.Y.

After the horrifying killings at the Mother Emanuel African Methodist Episcopal Church in Charleston, South Carolina, President Obama once more had to speak publicly about a mass shooting. "Let’s be clear," he said. "At some point, we as a country will have to reckon with the fact that this type of mass violence does not happen in other advanced countries. It doesn’t happen in other places with this kind of frequency."

Of course, those people whose function in life is to contradict everything this President says or does were quick to note that other countries do have mass shootings. Some right-wing web sites even went a step further by posting statistics that seem to suggest that when mass shootings are corrected for population, the United States doesn't come out too bad. One such article on IJReview.com included the following chart:

The web site triumphantly exclaimed "Boom, here we go."

The table shows data of 12 of the 34 countries that comprise the Organisation for Economic Co-operation and Development (OECD). These countries are generally considered to be examples of "industrialized" or "advanced" countries and can legitimately be compared.

The first four columns of the table show (not in this order) the number of rampage shootings in these 12 countries during the five-year period from 2009 through 2013; the number of fatalities of the shootings; and the per-capita rates per million of population. Regardless whether you look at the number of shooting incidents or the number of fatalities, the United States ranks 6th after Norway, Finland, Slovakia, Israel, and Switzerland.

IJReview.com obtained its numbers from a defunct web site called www.RampageShooting.com, but an archived page is available that lists the other 22 countries of the OECD and their populations. (Six additional countries had one rampage shooting each during this five-year period but were not listed in the IJReview summary.) The RampageShooting.com site even highlights the five countries with higher rates than the U.S. with graphics that form guns out of the countries' flags:

Do you see American flag here? The graphic emphasizes that the United States has lower rates of mass shootings than these five countries. In this analysis, we're not number one.

I’m not going to argue with the validity of the data themselves. I’m going to assume that all the numbers are all correct. But I am going to question the validity of ranking countries in this way and drawing conclusions from that ranking.

For this analysis, I’ll focus exclusively on the number of incidents of mass shootings, and not the number of people killed in these mass shootings. The second figure seems to me to involve a second variable, which relates to the average number of people killed in such shootings.

Here is my table that reproduces the countries that experienced mass shootings, ordered by rate of mass shootings per million of population:

CountryRampage
Shootings
PopulationShootings
Per Million
Finland25,421,8270.369
Israel27,941,9000.252
Switzerland28,000,0000.250
Norway15,033,6750.199
Slovakia15,445,3250.184
United States38314,941,0000.121
Hungary19,942,0000.101
Greece110,787,6900.093
Belgium111,041,2660.091
Netherlands116,751,3230.060
Canada235,010,0000.057
Germany381,799,6000.037
Spain147,190,4930.021
Italy160,813,3260.016
United Kingdom162,262,0000.016
France165,350,0000.015
Mexico1113,910,6080.009
Japan1126,659,6830.008

Here's a second version of the same table including the other countries that comprise the OECD. These countries are again sorted by rate of mass shootings per million of population, and then by population:

CountryRampage
Shootings
PopulationShootings
Per Million
Finland25,421,8270.369
Israel27,941,9000.252
Switzerland28,000,0000.250
Norway15,033,6750.199
Slovakia15,445,3250.184
United States38314,941,0000.121
Hungary19,942,0000.101
Greece110,787,6900.093
Belgium111,041,2660.091
Netherlands116,751,3230.060
Canada235,010,0000.057
Germany381,799,6000.037
Spain147,190,4930.021
Italy160,813,3260.016
United Kingdom162,262,0000.016
France165,350,0000.015
Mexico1113,910,6080.009
Japan1126,659,6830.008
Turkey074,724,2690.000
South Korea050,004,4410.000
Poland038,186,8600.000
Australia022,841,9210.000
Chile016,572,4750.000
Portugal010,581,9490.000
Czech Republic010,512,2080.000
Sweden09,540,0650.000
Austria08,414,6380.000
Ireland06,399,1520.000
Denmark05,580,4130.000
New Zealand04,445,4360.000
Slovenia02,055,4960.000
Estonia01,340,1940.000
Luxembourg0524,8530.000
Iceland0320,0600.000
 
Total611,250,346,1460.049
Total Non-US23935,405,1460.025

This table, however, includes two summary lines at the bottom that neither the RampageShooting.com nor IJReview.com bothered with. These are totals with and without the United States. Perhaps there was a reason why these obviously important totals were excluded.

Just to reiterate: These are the 34 countries that comprise the OECD, with the population and number of mass shootings from the years 2009 through 2013 taken directly from the RampageShooting.com web site. This is all the information that I'll be analyzing.

Statistical Significance

The primary purpose of statistics is to help us understand various phenomena of the real world and possibly to predict what might happen in the future. How meaningful is the fact that Finland tops the chart with a rate of 0.369 mass shootings per million of population over a five-year period? Does it tell us anything significant about Finland? Does it mean that Finland is the mass shooting capitol of the world? How could it, with only two mass-shooting incidences in five years? Does it mean that Finland will continue to have two mass shootings every five years? Not necessarily. The numbers are too small to tell us anything.

Tiny numbers do not make good statistics. Yet, all the countries in this table (except one) experienced just three mass shootings or fewer. These are very tiny numbers and their statistical significance is pretty much negligible.

What's additionally interesting is that the top five countries in this table all have populations under 10 million:

CountryRampage
Shootings
PopulationShootings
Per Million
Finland25,421,8270.369
Israel27,941,9000.252
Switzerland28,000,0000.250
Norway15,033,6750.199
Slovakia15,445,3250.184

Only seven of the other countries in the OECD have populations less than 8 million. Keep in mind that the lower the population, the higher the per-capita rate. So we're dealing here not only with tiny numbers of incidents — because mass killings are not overall very common — but also small populations.

There is a phenomenon in statistics called "regression towards the mean." As you examine larger and larger populations, they tend to gravitate towards the average. Smaller populations are statistically more erratic and unstable because they more susceptible to random fluctuations. For a small country, 1 or 2 additional mass shootings in a five-year period can propel it to the top of the list.

Suppose we were to plot a graph with a horizontal axis based on ranges of rates of mass shootings. For each range of rates, the graph shows the total population of the countries that fit into that range. What should we expect?

We would expect the larger countries to cluster towards the range of tiny rates of mass shootings. By contrast, the smaller countries are the outliers where 1 or 2 mass shootings affect the rate a great deal. These smaller countries should be further from the average and tend more towards extremes, but with small heights in the graph because the populations are so small. In other words, we should expect a graph like this with a long but miniscule tail:

The four tiny bumps to the right of 0.100 are the five countries with the highest rates of mass shootings.

But the problem with that table is that it doesn't include the United States. Let's add the United States to the table:

And now we see a bar in this graph with much more statistical significance because the population is very large, but which at the same time is also quite removed from the average established by the other OECD countries.

While it's interesting to examine comparisons of mass-shooting incidents in various countries, it is statistically invalid to compare these countries based on rankings that result from 1 or 2 or 3 mass shootings in the five-year period. When medical statistics are compiled, populations with less than a certain number of incidents of a particular disease or injury are considered to be unreliable. Here's a web page from the New York Department of Health that answers the question "Why are rates based on fewer than 20 cases marked as being unreliable?" The conclusion is that "When the rates are based on only a few cases or deaths, it is almost impossible to distinguish random fluctuation from true changes in the underlying risk of disease or injury."

Most of the countries in the tables posted by RampageShooting.com and IJReview.com have far lower than 20 incidents of mass shootings. Claiming that these data have statistical validity is either deliberately deceitful or ignorantly deceptive.

In the entire table of mass shooting statistics, only three lines meet any type of criteria for being statistically meaningful. Here they are:

CountryRampage
Shootings
PopulationShootings
Per Million
United States38314,941,0000.121
All Other Countries23935,405,1460.025
 
Total611,250,346,1460.049

If you want a quick takeaway, the United States has a population that is one-quarter of the total population of the OECD countries, but accounts for more than half of the mass-shooting incidents. That is the truest statement that can be deduced from these data.

Nevertheless, let's continue the analysis to understand why a tiny number of incidents is usually treated as statistically insignificant.

A Computer Simulation

This talk about statistical stability and fluctuation of course prompts us to wonder if any of these data are valid. Let's explore this a bit by doing a few computer simulations. Here is an image showing the relative populations of the 34 OECD countries arranged alphabetically from left to right:

During the five-year period from 2009 through 2013 there occurred 61 incidents of mass shootings. Let us randomly distribute those 61 shootings throughout these countries. The implicit assumption is that the rate of mass shooting for each country is the same as the overall actual rate. Each shooting incident is symbolized as a black vertical bar:

Now let's put the results in a table, ordered by the rate of shootings per million:

Random Shootings (Seed = 10097, Incidents = 61)
CountryRampage
Shootings
PopulationShootings
Per Million
Slovakia15,445,3250.184
Chile316,572,4750.181
Denmark15,580,4130.179
Ireland16,399,1520.156
Switzerland18,000,0000.125
Mexico11113,910,6080.097
Belgium111,041,2660.091
Australia222,841,9210.088
France465,350,0000.061
South Korea350,004,4410.060
Netherlands116,751,3230.060
Turkey474,724,2690.054
United States15314,941,0000.048
Japan6126,659,6830.047
Spain247,190,4930.042
United Kingdom262,262,0000.032
Canada135,010,0000.029
Poland138,186,8600.026
Italy160,813,3260.016
 
Total611,250,346,1460.049
Total Non-US46935,405,1460.049

This table doesn't look much like the table of the actual numbers. Many more countries have mass shootings, and some of them have quite a few. While the United States still has more than anyone else — it is after all, the largest country here — the rate of mass shootings isn't nearly has high as the actual figure of 0.121.

Since these data were generated from a pseudo-random sequence of numbers that began with a "seed" number indicated in the heading, maybe a different seed will produce different results. Let's try another:

Here's the table with the results:

Random Shootings (Seed = 37542, Incidents = 61)
CountryRampage
Shootings
PopulationShootings
Per Million
Ireland16,399,1520.156
Australia322,841,9210.131
Switzerland18,000,0000.125
Sweden19,540,0650.105
Greece110,787,6900.093
Belgium111,041,2660.091
South Korea450,004,4410.080
United States24314,941,0000.076
Germany581,799,6000.061
Netherlands116,751,3230.060
Canada235,010,0000.057
Spain247,190,4930.042
Turkey374,724,2690.040
Italy260,813,3260.033
Japan4126,659,6830.032
France265,350,0000.031
Mexico3113,910,6080.026
Poland138,186,8600.026
 
Total611,250,346,1460.049
Total Non-US37935,405,1460.040

This demonstrates that simple random fluctuation can produce very different results when not very many incidents are involved. Now the United States has a rate of shootings per million that is 50% higher than the average (but still not as high as its actual value). Let's try it again:

And here's the table:

Random Shootings (Seed = 8422, Incidents = 61)
CountryRampage
Shootings
PopulationShootings
Per Million
Sweden39,540,0650.314
Slovakia15,445,3250.184
Switzerland18,000,0000.125
Chile216,572,4750.121
Poland438,186,8600.105
Hungary19,942,0000.101
Belgium111,041,2660.091
Canada335,010,0000.086
Turkey574,724,2690.067
Japan8126,659,6830.063
France465,350,0000.061
Mexico6113,910,6080.053
United States14314,941,0000.044
Spain247,190,4930.042
United Kingdom262,262,0000.032
Germany281,799,6000.024
South Korea150,004,4410.020
Italy160,813,3260.016
 
Total611,250,346,1460.049
Total Non-US47935,405,1460.050

And now the U.S. is lower than the average. That's the way randomness works. You really can't anticipate what can happen. But these irregularities are accentuated when small numbers are involved.

But where are these random "seeds" coming from? Am I making them up or experimenting with different values to see which ones will tell a particular story?

Not at all. The seeds that I'm using are from the first several entries in the famous book A Million Random Digits with 100,000 Normal Deviates. I'm using these seeds to generate random numbers and draw the results in a WPF program that you can download and experiment with yourself.

If we keep trying different random distributions of 61 mass shootings, will we ever find a case where 38 of the shootings are in the United States? Perhaps. But it should be clear by this time that the incidence of mass shootings in the United States is intrinsically different from the other OECD countries taken in aggregate.

One approach to see the difference is to artifically inflate the population of the United States by a factor of 4 and then distribute the 61 mass shootings among this artificial population. Because the United States is now 4 times its normal size (and larger than all the other countries combined) it gets more of the random shootings:

And here's the table summarizing the results:

Random Shootings (Seed = 99019, Incidents = 61)
US Population Increased by Factor of 4
CountryRampage
Shootings
PopulationShootings
Per Million
New Zealand14,445,4360.225
Czech Republic210,512,2080.190
Denmark15,580,4130.179
Hungary19,942,0000.101
Australia222,841,9210.088
Spain347,190,4930.064
Canada235,010,0000.057
Italy260,813,3260.033
United States401,259,764,0000.032
Turkey274,724,2690.027
Poland138,186,8600.026
South Korea150,004,4410.020
United Kingdom162,262,0000.016
Mexico1113,910,6080.009
Japan1126,659,6830.008
 
Total612,195,169,1460.028
Total Non-US21935,405,1460.022

This table looks a lot like the one with the real data. The other countries in the table all have incidences of 1, 2, or 3 mass shootings while the United States has 40 mass shootings. The actual figure is 38.

Let's try another random number seed:

And here's the table summarizing the results:

Random Shootings (Seed = 12807, Incidents = 61)
US Population Increased by Factor of 4
CountryRampage
Shootings
PopulationShootings
Per Million
Slovakia15,445,3250.184
Greece110,787,6900.093
Australia222,841,9210.088
South Korea350,004,4410.060
Turkey474,724,2690.054
Mexico4113,910,6080.035
United Kingdom262,262,0000.032
Japan4126,659,6830.032
France265,350,0000.031
United States331,259,764,0000.026
Poland138,186,8600.026
Germany281,799,6000.024
Spain147,190,4930.021
Italy160,813,3260.016
 
Total612,195,169,1460.028
Total Non-US28935,405,1460.030

Now there are a few countries with 4 mass shooting incidences and the United States is down to 33. Shall we try one more? Here goes:

And here's the table summarizing the results:

Random Shootings (Seed = 32533, Incidents = 61)
US Population Increased by Factor of 4
CountryRampage
Shootings
PopulationShootings
Per Million
Sweden29,540,0650.210
Turkey574,724,2690.067
United Kingdom462,262,0000.064
Poland238,186,8600.052
Spain247,190,4930.042
Germany381,799,6000.037
Mexico4113,910,6080.035
France265,350,0000.031
United States331,259,764,0000.026
Japan3126,659,6830.024
Italy160,813,3260.016
 
Total612,195,169,1460.028
Total Non-US28935,405,1460.030

Again, 33 in the United States.

But we are now generating tables of random mass shootings that generally resemble the table of actual mass shootings.

In other words, mass shootings among the OECD countries seems to resemble a random distribution but only if the United States is assumed to have a population that is four times its actual size.

Probability Distributions

Let's come at this analysis from another direction. If we know the probability of a particular event, we can also calculate the probability that a population of a certain size will experience a specific number of those events.

For example, consider a six-sided die. Toss it ten times. What is the probability that it will land 4 every time in these ten tosses? The probability of landing 4 just once is 1/6, so the probability of ten tosses in a row landing 4 is (1/6)10.

If you toss a die ten times, what is the probability of it landing on 4 only once, and something else the other nine times? The probability of it landing on 4 is 1/6, and the probability of it landing on something other than 4 is 5/6. For that to happen nine time is (5/6)9. However, there are ten ways this can happen. The first toss can land on 4, or the second, or the third etc, so the complete probability is 10 × (1/6) × (5/6)9. For the probability of two 4's coming up in ten tosses of a die, you have to figure out the combinations of how many ways that can happen, which is 45, so the probability is 45 × (1/6)2 × (5/6)8.

In general, for n trials where the probability of a "success" is p the probability of k successes is given by the binomial probability formula:

Let's assume that the probability of a mass shooting over a five-year period is the overall OECD average of 0.049 per million of population. The probability is actually 0.000000049 per person. That's the value p. What is the probability of 1 mass shooting in a population of 10,000,000, which is roughly characteristic of countries like Switzerland and Sweden? The variable n is 10,000,000 and the value r is 1. We can actually calculate the probabilities of 0 mass shootings, 1 mass shooting, 2, and so forth, and put them in a graph:

The dark bars show the probabilities. The probability of there being no shootings is a bit over 60% while the probability of there being just one shooting is a bit less than 30%. The gray bars show the accumulated probability, which is often useful. The probability of there being 0 or more shootings is obviously 1 or 100%, while the probability of there being 1 or more shootings is close to 40%.

Here's a similar graph for a population of 25,000,000, which is (very roughly) the population of Australia:

Now the most likely outcome is one mass shooting in a five-year period. Here's the distribution for a population of 50,000,000, such as Italy, Spain, France, the UK, and South Korea:

You can pretty much anticipate which will be the highest bar by just multiplying the probability of 0.049 times the population. For this example it's about 2.5, which is the highest likelihood of the expected number of mass shootings.

Here's a population of 100,000,000, which is about the size of Mexico and Japan:

Now we're seeing a likelihood that is closer to 4 or 5 mass shootings. In reality, both Mexico and Japan had just one mass shooting in the five-year period. Why the big difference?

The probability of 0.049 per million that is being used for these graphs is the overall rate of mass shootings for the OECD countries, and that number is distorted by the high rate of mass shooting in the United States. For the non-US countries, the rate is actually 0.025. Let's try that with a population of 100,000,000:

And now we get something much closer to reality.

Finally, let's jump up to a population of 300,000,000, which applies to the United States. Here's the distribution using the total OECD mass shooting rate of 0.049:

In reality the United States had 38 mass shootings. This graph is telling us that the likelihood of that happening is essentially zero.

Again, the problem is that we're using the overall OECD rate of 0.049. If we instead use the US rate of 0.121, then we see something quite different:

But this isn't telling us anything that we didn't already know — that the mass shooting rate in the United States is much higher than the other OECD countries.

At the other extreme, here is the distribution for countries with a population of 5,000,000 — the approximate population of the three countries at the top of the IJReview and RampageShooting rankings. This uses the rate chacteristic of the total OECD countries excluding the United States:

To be sure, it is expected that these countries will have no mass shootings, but there's a 10% probability that they will have at least one.

Conclusions

To get meaningful information from data concerning mass shootings, it is necessary to be aware of statistical fluctuations that result from an insufficient numbers of incidents. Once that is done, it becomes obvious that the rate of mass shootings in the United States is significantly higher than the other OECD countries.

Of course, this isn't an academic exercise. Nobody will be surprised to learn that there is political motivation behind these attempts to demonstrate that the United States doesn't have horrendous incidences of mass shootings and other gun crimes. If the United States has levels of gun violence comparable with the rest of the world, there is certainly no need for gun-safety legislation.

Our political arena is open enough to debate these issues. But the debate should not involve the abuse of statistics. If people are opposed to gun-safety legislation, they should own the consequences of that opposition rather than try to hide those consequences behind a bogus interpretation of statistics.

Actual lives are at stake.