Confidence Interval Approximations

What are some of the standard approximations for confidence intervals, and when are they used?

First, let’s review the definition of a confidence interval.

Confidence Interval: A range of values so defined that there is a specified probability that the value of a parameter lies within it.

Example for the most common parameter: the mean. A 95% confidence interval is a range of values with a probability of 95% that the range contains the true mean of the population.

Why use a confidence interval?

Often it is impossible or impractical to have access to an entire population being studied. As a result, we must take a sample to estimate a population parameter such as the mean. When you take a sample, the mean of that individual sample may or may not be close to the population mean we are trying to estimate.

This is where confidential intervals can help.

Calculating a Confidence Interval

In order to calculate a given confidence interval, say for a population mean, you need to know the estimate of the sample mean, the sample size, and, if you don’t know the standard deviation of the population, an estimate of the standard deviation from the sample.

Finally, you need to know the probability distribution that best describes the population.

We won’t go into the details here. There is plenty of information on the internet and in texts.

But what if we don’t have information about the probability distribution? That‘s when approximations can be useful.

We briefly list three approximations. Details on their calculation and use can be found on the internet by searching and in Wikipedia.

Gauss Inequality: Gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mode or equivalently a lower bound on the random variable lying within the interval defined by a certain number of standard deviations.
Chebyshev’s Inequality: Gives an upper bound on the probability that a random variable lies more than any given distance from its mean without requiring unimodality or equivalently a lower bound on the random variable lying within the interval defined by a certain number of standard deviations.
Vysochanski-Petunin Inequality: Gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mean or equivalently a lower bound on the random variable lying with the interval defined by a certain number of standard deviations.

Guidelines:

If you have no information about the underlying population distribution, have a unimodal (single-peaked) distribution, and need a confidence interval about the mode, use the Gauss Inequality.

If you have no information about the underlying population distribution, have no information about whether it is unimodal, and need a confidence interval about the mean, use Chebyshev’s Inequality.

Finally, if you have no information about the underlying population distribution, but have a unimodal distribution and need a confidence interval about the mean, use the Vysochanski-Petunin Inequality.

The more you know about a distribution, the tighter the bound. So, if you know that you have a unimodal distribution, the Vysochanski-Petunin Inequality improves upon the bound found by the Chebyshev Inequality.

In this short video, Robert Donely, Assistant Professor at Queensborough Community College gives a concrete example of using Chebyshev’s Inequality to construct a 95% confidence interval.

See if you can calculate a tighter bound for Dr. Donely’s example using the Vysochanski-Petunin Inequality as defined here:

Did you get 70 ± 2*2.98 = 70 ± 5.96 = 64.06, 75.96?