With the debates heating up more than ever, we step closer to the General Election each day. The question of the day is – will you vote?
I love democracy. I’m also a statistician. Mixing those two and you get the following conclusion: allowing every single citizen to vote is an unnecessary waste of the nation’s resources, which is therefore, unpatriotic.
Let’s make some simplifying assumptions regarding the election process:
- Popular vote is what determines the outcome – let’s assume no electoral college here. It’s how election should be done, anyway.
- There are only two candidates, and every voter will vote for one or the other. This is almost true in the United States, with less than 1% of the popular votes going toward a non-Republican non-Democrat in 2004.
Now, let’s see the following scenario. 121 million citizens voted in the 2004 presidential election and mattered (no offense to those voting for Nader, Badnarik, etc… but they really didn’t count for much). 51.2% of them voted for Bush, who continued to become our president for the next four years. The remaining 48.8%, of course, wishfully voted for Kerry. Without going far into the statistical details, we know that if we drew a random sample of 2.6 million voters, we can be 99.9% confident that their voting outcome will be within 0.1% of the entire country’s voting results.
I made a simple Sample Size Calculator based on the formulas from this website. It takes four parameters regarding the population size, estimate of the result, and the required confidence level, and gives you the necessary sample size. The noteworthy formula used is the inverse normal function:
=NORMINV(probability, mean, standard_dev)
It gives you the value below which the distribution is expected p% of the time. Use 0 for mean and 1 for standard deviation to get the normalized z value. Do take caution to adjust the two-tailed confidence level to one-tailed.
The variables and formulas for this problem include:
N = size of the population (number of eligible voters)
p = expected % outcome (% of the population voting for Democrat or Republican); assume 50% for the “worse case” scenario
c = confidence interval expressed as a %
CL = confidence level, as a %
Z = NORMINV(1-(1-CL)/2, 0, 1) = the number of standard deviations that we are CL% confident the result is within
ss = (Z² * (p) * (1-p)) / c² = the sample size needed for an infinite population
ssf = ss / (1 + (ss-1) / N) = the sample size needed for a finite population with size N
Another thing to point out is that the statistical significance heavily depends on the expected outcome p. The closer p is to the mid-point (50%), the less confidence we have and the larger the sample size we require. As historically observed, the U.S. voters are split fairly evenly down the middle. However, that’s frequently not the case when you examine voting results by state or county. When a locality is as heavily as 80% leaning toward red or blue, we’d only need a very small sample of voters to be very confident about the results.
If you aren’t familiar with the concept of confidence intervals, this article probably isn’t making much sense to you. Otherwise, it’s proving a practical political point – in an age where everything needs to go green and at a time when everyone is on a tight budget, let’s not waste all the applications, ballots, and time by allowing every citizen to vote! Instead, we should randomly select 2.2% of the eligible voters in the country and rest assured that their votes are nearly 100% representative of ALL Americans!