Header image  
line decor
Research l Reviews l Theories l Mental Health l Quiz
line decor
Introduction to Biostatistics

-statistics are simply a collection of tools that researchers employ to help answer research questions



  • A measure of central tendency is a single number used to represent the centre of a grouped data.
  • The basic measures are;
    • Mean, Median and Mode
  • For any symmetrical distribution, the mean, median, and mode will be identical.
  • Each measure is designed to represent a typical score.
  • The choice of which measure to use depends on:
    • the shape of the distribution (whether normal or skewed), and
    • the variable’s “level of measurement” (data are nominal, ordinal or interval).


  • The mean (or average) is found by adding all the numbers and then dividing by how many numbers you added together. 
  • Most common measure of central tendency.
  • Formula for calculation of mean:

  • Best for making predictions.
  • Applicable under two conditions:
  • scores are measured at the interval  level, and
  • distribution is more or less normal [symmetrical].


  • 3,4,5,6,7
  • 3+4+5+6+7= 25
  • 25 divided by 5 = 5
  • The mean is 5
  • Advantages of mean
    • Mathematical center of a distribution.
    • Good for interval and ratio data.
    • Does not ignore any information.
    • Inferential statistics is based on mathematical properties of the mean.
  • Disadvantages of mean
    • Influenced by extreme scores and skewed distributions.
    • May not exist in the data.


  • When the numbers are arranged in numerical order, the middle one is the median.
  • 50% of observations are above the Median, 50% are below it.
  • Formula Median = n + 1 / 2.


  • 3,6,2,5,7
  • Arrange in order 2,3,5,6,7
  • The number in the middle is 5
  • The median is 5
  • Advantages:
    •  Not influenced by extreme scores or skewed distribution.
    •  Good with ordinal data.
    • Easier to compute than the mean.
    • Considered as the typical observation.
  • Disadvantages:
    •  May not exist in the data.
    •  Does not take actual values into account.


  • The number that occurs most frequently is the mode.
  • We usually find the mode by creating a frequency distribution in which we count how often each value occurs.
  • If we find that every value occurs only once, the distribution has no mode.
  • If we find that two or more values are tied as the most common, the distribution has more than one mode.


  •  2,2,2,4,5,6,7,7,7,7,8
  • The number that occurs most frequently is 7
  • The mode is 7
  • Advantages:
    •  Good with nominal data.
    •  Bimodal distribution might verify clinical observations (pre and post-menopausal breast cancer).
    •  Easy to compute and understand.
    •  The score exists in the data set.
  • Disadvantages:
    •  Ignore most of the information in a distribution.
    •  Small samples may not have a mode
    •  More than one mode might exist.

Appropriate Measures of Central Tendency

  • Nominal variables                -  Mode 
  • Ordinal variables                  -  Median
  • Interval level variables        -   Mean
    • If the distribution is normal (median is better with skewed distribution)


“If there is no variability within populations there would be no need for statistics.”

  • Three indices are used to measure variation or dispersion among scores:
    • range
    • variance, and
    • standard deviation (Cozby, 2000).
  • These indices answer the question: How Spread out is the distribution?
  • Dispersion/Deviation/Spread tells us a lot about how a variable is distributed.


  • Range is the simplest method of examining variation among scores
  • It refers to the difference between the highest and lowest values produced.
  • For continuous variables, the range is the arithmetic difference between the highest and lowest observations in the sample. In the case of counts or measurements, 1 should be added to the difference because the range is inclusive of the extreme observations.
  • Another statistic, known as the interquartile range, describes the interval of scores bounded by the 25th and 75th percentile ranks; the interquartile range is bounded by the range of scores that represent the middle 50 percent of the distribution.

Percentiles (or quartiles)

  • The First quartile is the 25th percentile (noted Q1),
  • the Median value is the 50th percentile (noted Median), and
  • the Third quartile is the 75th percentile (noted Q3).
  • ‘’ A percentile is a value at or below which a given percentage or fraction of the variable values lie.”
  • The p-th percentile is the value that has p% of the measurements below it and (100-p)% above it.
  • Thus, the 20th percentile is the value such that one fifth of the data lie below it. It is higher than 20% of the data values and lower than 80% of the data values.’’
    • E.g. if you are in the 80th percentile on a real GMAT result, you scored better on that section than 80% of the students taking the GMAT.

Standard deviation

  • The standard deviation is the most widely applied measure of variability.
  • It shows how much variation there is from the "average" (mean).
  • Large standard deviations suggest that scores are probably widely scattered.
  • Small standards deviations suggest that there is very little deference among scores.
  • Computational formula for S.D:

Example: (Adapted from Wikipedia)

  • Consider a population consisting of the following values:

  • There are eight data points in total, with a mean (or average) value of 5:

  • To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result:

  • Next divide the sum of these values by the number of values and take the square root to give the standard deviation:

  • Therefore, the above has a population standard deviation of 2.


  • The squire of the standard deviation is the variance.




About Us l Privacy Policy l Ad Policy l Disclaimer

Copyright 2013@Current