Introduction to Biostatistics
-statistics are simply a collection of tools that researchers employ to help answer research questions
- Normal distribution is a mathematical construct.
- It suggests naturally occurring observations follow a given pattern.
- A normal distribution is bell-shaped and symmetric.
- The distribution is determined by the mean mu, and the standard deviation sigma.
- The mean, mu controls the centre and standard deviation, sigma controls the spread.
- A normal distribution curve is drawn by:
- First drawing a normal curve.
- Next, place the mean, mu on the curve.
- Then place sigma on curve by placing the segment from the mean to the upper (or lower) inflection point on your curve.
- From this information, the scale on the horizontal axis can be placed on the graph.
- Normal distribution helps us to predict that where cases will fall within a distribution probabilistically.
- For example, what are the odds, given the population parameter of human height that someone will grow to more than eight feet?
- Answer: likely less than a .025 probability.
- For any normal curve with mean mu and standard deviation sigma:
- 68 percent of the observations fall within one standard deviation sigma of the mean.
- 95 percent of observation falls within 2 standard deviations.
- 99.7 percent of observations fall within 3 standard deviations of the mean.
- Mean = median = mode
- The normal distribution has a skewness of zero.
- Under most circumstances the mean, median, and mode will not be exactly the same.
- Theoretically, two tails of the curve never touches the horizontal axis.
Skewness of distributions
- Skewness is a measure of the asymmetry of the probability distribution.
- If there is zero skewness (i.e., the distribution is symmetric) then the mean = median.
- Skewness is of two types:
- Positive skewness: the right tail is longer; Its greatest frequency occurs at a value near the left of the graph. It has relatively few high values. The distribution is said to be right-skewed. Example (observations): 1,2,3,4,100.
- Negative skewness: the left tail is longer; Its greatest frequency occurs at a value near the right of the graph. It has relatively few low values. The distribution is said to be left-skewed. Example (observations): 1, 1000, 1001, 1002, 1003.
- Kurtosis is the measure to explain whether the distribution may have high or low variance, even if normal.
- The kurtosis value for a normal distribution will equal 3. Anything above this is a peaked value (low variance) and anything below is platykurtic (high variance).
- Z-scores are used to standardize the central tendency away from the mean across different samples.
- The basic unit of the z-score is the standard deviation.
- The formula for calculating z-scores: any normal distribution with mean=mu and standard deviation= sigma, can be converted into a standard normal Z distribution by the following transformation:
- In the normal distribution, the z-score of 1.96 and 2.58 indicate the limits on either side of population mean within which 95percent and 99 percent of all observations will fall.
PROBABILITY THEORY AND TEST OF SIGNIFICANCE
- The probability theory arose from the study of games of chance (gambling).
- Probability may be defined as quantifying the chance that a stated outcome of an event will take place.
- Probability values fall on scale between 0 (impossibility) and 1 (certainty).
- P= Number of nominated outcome/Number of possible outcome.
- Statisticians conventionally adopt three critical probability values:
- An outcome that is predicted to occur in less than 1 trial in 20 (p<0.05) is considered to be unlikely or statistically significant.
- An outcome that is predicted to occur in less than 1 trial in 100 (p<0.01) is considered to be very unlikely or statistically highly significant.
- An outcome that is predicted to occur in less than 1 trial in 1000 (p<0.001) is considered to be extremely unlikely or statistically very highly significant.
Standard Error or Random Sampling Error
- Several small samples drawn from the same population generally provide different values of the same statistic, yet they are all estimates of the same population parameter. The variation between these individual estimates is called is due to sampling error.
- A statistical constant which measures the dispersion of the sample means is around the total population mean is called standard error.
- A sample statistic standard error is the difference between the mean of a sample and the mean of the population from which it is drawn.
- The higher variance in the population also causes higher error in samples taken from it.
- It is obtained by dividing standard deviation by the sample size.
- As the variance of the population increases, so does the chance that a sample could not reflect the population parameters.
- The way in which sample statistics cluster around a population is called sampling distribution.
- Random Sampling Error = standard deviation/ square root of the sample size
- Formula for S.E.
Central Limit Theorem
- If an infinite number of samples were taken from a population, the means of these samples would be normally distributed.
- Hence, the larger the sample relative to the population, the more likely the sample mean will capture the population mean.
Confidence Interval (CI)
- We can actually use the information we have about a standard deviation from the mean and calculate the range of values for which a sample would have if they were to fall close to the mean of the population.
- This range is based on the probability that the sample mean falls close to the population mean with a probability of .95, or 5% error.
- Social scientists use a 95% as a threshold to test whether or not the results are product of chance. That is, 1 out of 20 chances are taken to be wrong.
- The purpose of knowing about CI are:
- CI interval is used for hypothesis testing.
- Find a null and an alternative hypothesis
- – H0: Expected response is equal in both groups
- – H1: Expected response is different between groups.
- p-value: is the probability to observe the observed values given that H0 is true.
- Reject H0 if the p-value is less than a given significance level (e.g. 0.05 or 0.01)