Basic Statistical Concepts for Nurses
-statistics
are simply tools that researchers employ to help answer research questions.
Saleem. T.K,
tkpalath@rediffmail.com
________________________________________________________________
Statistical
Power
Sampling
Descriptive Statistics
Correlation
Inferential statistics
Parametric and Non-parametric Tests
Appropriate
Statistical tests
Common
Statistical tests
Choosing the appropriate test
Computer
Aided Analysis
References & Bibliography
As the context of health
care is changing due to the pharmaceutical services and technological
advances, nurses and other health care professionals need to be prepared
to respond in knowledgeable and practical ways. Health information is very
often explained in statistical terms for making it concise and
understandable. Statistics plays a vitally important role in the research.
Statistics help to answer important research questions and it is the
answers to such questions that further our understanding of the field and
provide for academic study. It is required the researcher to have an
understanding of what tools are suitable for a particular research study.
It is essential for healthcare professionals to have a basic understanding
of basic concepts of statistics as it enables them to read and evaluate
reports and other literature and to take independent research
investigations by selecting the most appropriate statistical test for
their problems. The purpose of analyzing data in a study is to describe
the data in meaningful terms.
Depending on the kinds of
variables identified (nominal, ordinal, interval, and ratio) and the
design of particular study, a number of statistical techniques is
available to analyze data. There are two approaches to the statistical
analysis of data the descriptive approach and inferential approach.
Descriptive statistics convert data into picture of the information that
is readily understandable. The inferential approach helps to decide
whether the outcome of the study is a result of factors planned within
design of the study or determined by chance. The two approaches are often
used sequentially in that first data are described with descriptive
statistics, and then additional statistical manipulations are done to make
inferences about the likelihood that the outcome the outcome was due to
chance through inferential statistics. When descriptive approach is used,
terms like mean, median, mode, variation, and standard deviation are used
to communicate the analysis information of data. When inferential approach
is used, probability values (P) are used to communicate the
significance or lack of significance of the results (Streiner & Norman,
1996).
Measurement defined as
“assignment of numeral according to rules” (Tyler 1963:7). Regardless of
the variables under study, in order to make sense out of data collected,
each variable must e measured in such a way that its magnitude or quantity
must be clearly identified. The specific strategy for a particular study
depends upon the particular research problem, the sample under study, the
availability of instruments, and the general feasibility of the project (Brockopp
& Hastings-Tolsma, 2003). A variety of measurement methods are available
for use in nursing research. Four measurement scales are used:
nominal, ordinal, interval and ratio.
The nominal level of
measurement
The nominal level of
measurement is the most primitive or lowest level of classifying
information. Nominal variables include categories of people, events,
and other phenomena are named, are exhaustive in nature, and are mutually
exclusive. These categories are discrete and noncontinous. In case of
nominal measurement admissible statistical operation are counting of
frequency, percentage, proportion, mode, and coefficient of contingency.
Addition, subtraction
The ordinal level of
measurement
The ordinal level of
measurement is second in terms of its refinement as a means of classifying
information. Ordinal implies that the values of variables can be
rank-ordered from highest to lowest.
Interval Level of
Measurement
Interval level of
measurement is quantitative in nature. The individual units are
equidistant from one point to the other. The interval data does not have
an absolute zero. For example, temperature is measured in Celsius or
Fahrenheit. Interval level of measurement refers to the third
level of measurement in relation to complexity of statistical techniques
that can be used to analyze data. Variables with in this level of
measurement are assessed incrementally, and the increments are equal.
Ratio Level of
Measurement
Ratio level of measurement
is characterized by variables that are assessed incrementally with equal
distances between the increments and a scale that has an absolute zero.
Ratio variables exhibit the characteristics of ordinal and interval
measurement and can also be compared by describing it as two or three
times another number or as one-third, one-quarter, and so on. Variable
like time, length and weight are ratio scales and also be measured using
nominal or ordinal scale.
The mathematical properties
of interval and ratio scales are very similar, so the statistical
procedures are common for both the scales.
Errors of measurement
When a variable is measured
there is the potential for errors to occur. Some of the sources of errors
in measurement are, instrument clarity, variations in administrations,
situational variations, response set bias, transitory personal factors,
response sampling, and instrument format.
Population is defined as
the entire collection of a set of objects, people, or events, in a
particular context. The population is the entire group of persons
or objects that is of interest to the investigator. In statistics
population means, any collection of individual items or units that is the
subject of investigation. Population refers to the collection of all items
upon which statements will be based. This might include all patients with
schizophrenia in a particular hospital, or all depressed individuals in a
certain community.
Characteristics of a
population that differ form individual to individual are called
variables. A variable is a concept (construct) that has been so
specifically defined that precise observations and therefore measurement
can be accomplished. Length, age, weight, temperature, pulse rate are a
few examples of variables.
The sample is a subset of the
population selected by investigator to participate in a research study.
A sample refers to a subset of observations selected from the
population. It might be unusual for an investigator to describe only
patients with schizophrenia in a particular hospital and it is unlikely
that an investigator will measure every depressed person in a community.
As it is rarely practicable to obtain measures of a particular variable
from all the units in population, the investigator has to collect
information from a smaller group or sub-set that represents the group as a
whole. This sub-set is called a sample. Each unit in the sample
provides a record, such as measurement, which is called an observation.
The sample represents the population of those critical
characteristics the investigator plan to study.
Dependent and
independent variables
An independent variable is
presumed cause of the dependent variable-the presumed effect. The
independent variable is one which explains or accounts for variations in
the dependent variable. An independent variable is one whose change
results in change in other variable. In experiments, the independent
variable is the variable manipulated by the experimenter. A dependent
variable is one which changes in relationship to changes in another
variable. A variable which is dependent in one study may be independent in
another. Intervening variable is one that comes between the independent
and dependent variable.
Hypothesis is statement or
declaration of the expected outcome of a research study. It is based on
logical rationale and has empirical possibilities for testing. Hypothesis
is formulated in experimental research. In some non-experimental
correlational studies, hypothesis may also be developed. Normally, there
are four elements in a hypothesis: (1) dependent and independent
variables, (2) some type of relationship between independent and dependent
variable, (3) the direction of the change, and (4) it mentions about the
subjects, i.e. population being studied. It is defined as “A tentative
assumption made in order to draw out and test its logical or empirical
consequences” (Webster 1968).
Standards in formulating a
hypothesis (Ahuja, R. 2001):
It should be empirically
testable, whether it is right or wrong.
It should be specific and
precise.
The statements in the
hypothesis should not be contradictory.
It should specify variables
between which the relationship to be established
It should describe one
issue only.
Characteristics of a
Hypothesis
Characteristics of a
Hypothesis (Treece & Treece, 1989)
It is testable
It is logical
It is directly related to
the research problem
It is factually or
theoretically based
It states a relationship
between variables
It is stated in such a form
that it can be accepted or rejected
Directional hypothesis
predicts an outcome in a particular direction, and nondirectional
hypothesis simply states that there will be difference between the groups.
There can be two hypotheses, research hypothesis and null hypothesis. The
null hypothesis is formed for the statistical purpose of negating it. If
the research hypothesis states there is positive correlation between
smoking and cancer, the null hypothesis states there is no relation
between smoking and cancer. It is easy to negate a statement than
establishing it.
The null hypothesis is
statistical statement that there is no difference between the groups under
study. A statistical test is used to determine the probability that the
null hypothesis is not true and rejected, i.e. inferential statistics are
used in an effort to reject the null, thereby showing that a
deference does exists. The null hypothesis is a technical necessity when
using inferential statistics, based on statistical significance which is
used as criterion.
When the null hypothesis is
rejected, the observed differences between groups are deemed improbable by
chance alone. For example, if drug A is compared to a placebo for its
effects on depression and the null hypothesis is rejected, the
investigator concludes that the observed differences most likely are not
explainable simply by sampling error. The key word in these statements is
probable. When offering this conclusion, the investigator has the odds on
his or her side. However, what are the chances of the statement being
incorrect? In statistical inference there is no way to say with certainty
that rejection or retention of the null hypothesis was correct. There are
two types of potential errors. A type I error occurs when the null
hypothesis is rejected when indeed it should have been retained; a type II
error occurs if the null hypothesis is retained when indeed it should have
been rejected.
Type I Error
Type I errors occur when
the null hypothesis is rejected but should have been retained, such as
when a researcher decides that two means are different. He or she might
conclude that the treatment works or those groups are not sampled from the
same population whereas in reality the observed differences are
attributable only to sampling error. In a conservative scientific setting,
type I errors should be made rarely. There is a great disadvantage to
advocating treatments that really do not work. The probability of a type I
error is denoted with the Greek letter alpha (a). Because of the desire to
avoid type I errors, statistical models have been created so that the
investigator has control over the probability of a type I error. At the
.05 significance or alpha level, a type I error is expected to occur in 5
percent of all cases. At the .01 level, it may occur in 1 percent of all
cases. Thus, at the .05 a level, one type I error is expected to be made
in each of 20 independent tests. At the .01 a level, one type I error is
expected to be made in each 100 independent tests.
Type II Error
The motivation to avoid a
type I error might increase the probability of making a second type of
error. In this case the null hypothesis is retained when it actually was
wrong. For example, an investigator may reach the conclusion that a
treatment does not work when actually it is efficacious. The probability
of a type II error is symbolized by the Greek capital letter beta (B).
Here the decision is not to reject the null hypothesis when in actuality
the null hypothesis was false. This is a type II error with the
probability of beta (B).
There are several maneuvers
that will increase control over the probability of different types of
errors and correct decisions. One type of correct decision is the
probability of rejecting the null hypothesis and being correct in that
decision. Power is defined as the probability of rejecting the null
hypothesis when it should have been rejected. Ultimately, the statistical
evaluation will be more meaningful if it has high power. It is
particularly important to have high statistical power when the null
hypothesis is retained. Retaining the null hypothesis with high power
gives the investigator more confidence in stating that differences between
groups were non-significant. One factor that affects the power is the
sample size. As the sample size increases, power increases. The larger the
sample, greater the probability that a correct decision will be made in
rejecting or retaining the null hypothesis. Another factor that influences
power is the significance level. As significance increases, the power
increases. For instance, if the .05 level is selected rather than the .01
level, there will be a greater chance of rejecting the null hypothesis.
However, there will also be a higher probability of a type I error. By
reducing the chances of a type I error, the chances of correctly
identifying the real difference (power) are also reduced. Thus, the safest
manipulation to affect power without affecting the probability of a type I
error is to increase the sample size. The third factor affecting power is
effect size. The larger the true differences between two groups, the
greater the power. Experiments attempting to detect a very strong effect,
such as the impact of a very potent treatment, might have substantial
power even with small sample sizes. The detection of subtle effects may
require very large samples in order to achieve reasonable statistical
power. It is worth noting that not all statistical tests have equal power.
The probability of correctly rejecting the null hypothesis is higher with
some statistical methods than with others. For example, nonparametric
statistics are typically less powerful than parametric statistics, for
example.
The process of selecting a
fraction of the sampling unit (i.e. a collection with specified
dimensions) of the target population for inclusion in the study is called
sampling. Sampling can be probability sampling or non-probability
sampling.
Probability Sampling or
Random sampling
Probability sampling, also
called random sampling, is a selection process that ensures each
participant the same probability of being selected. Probability sampling
is the process of selecting samples based on probability theory.
Probability theory states that possibility that events occur by chance.
Random sampling is the best method for ensuring that a sample is
representative of the larger population. Random sampling can be simple
random sampling, stratified random sampling, and cluster sampling.
Nonprobability
sampling
Nonprobability sampling is
the selection process in which the probability that any one individual or
subject selected is not equal to the probability that another individual
or subject may be chosen. The probability of inclusion and the degree to
which the sample represents the population are unknown. The major problem
with nonprobability sampling is that sampling bias can occur.
Nonprobability sampling can be convenience sampling, purposive sampling or
quota sampling.
Sampling Error
(Standard Error)
Sampling error refers to
the discrepancies that inevitably occur when a small group (sample) is
selected to represent the characteristics of a larger group (population).
It is defined as the deference between a parameter and an estimate of that
parameter which is derived from a sample (Lindquist, 1968:8). The means
and standard deviations calculated from the data collected on a given
sample would not be the same as those calculations derived from data
collected from the entire population. It is the discrepancy between the
characteristics of the sample and the population that constitutes sampling
error.
Descriptive statistics are techniques which help the investigator to
organize, summarize and describe measures of a sample. Here no predictions
or inferences are made regarding population parameters. Descriptive
statistics are used to summarize observations and to place these
observations within context. The most common descriptive statistics
include measures of central tendency and measures of variability.
Central tendency or “measures of the middle”
There are three commonly
used measures of central tendency: the mean, the median, and the mode- are
calculated to identify the average, the most typical and the most common
values, respectively among the data collected. The mean is the arithmetic
average, the median is the point representing the 50th percentile in a
distribution, and the mode is the most common score. Sometimes each of
these measures is the same; on other occasions, the mean, the median, and
the mode can be different. The mean, median, and mode are the same when
the distribution of scores is normal. Under most circumstances the mean,
median, and mode will not be exactly the same. The mode is most likely to
misrepresent the underlying distribution and is rarely used in statistical
analysis. The mean and the median are the most commonly reported measures
of central tendency. The major consideration in choosing between them is
how much weight should be given to extreme scores. The mean takes into
account each score in the distribution; the median finds only the halfway
point. As mean best represents all subjects and because of desirable
mathematical properties, the mean is typically favored in statistical
analysis. Despite the advantages of the mean, there are also some
advantages to the median. In particular, the median disregards outlier
cases, whereas the mean moves further in the direction of the outliers.
Thus, the median is often used when the investigator does not want scores
in the extreme of the distribution to have a strong impact. The median is
also valuable for summarizing data for a measure that might be insensitive
toward the higher ranges of the scale. For instance, a very easy test may
have a ceiling effect but does not show the true ability of some
test-takers. A ceiling effect occurs when the test is too easy to measure
the true ability of the best students. Thus, if some scores stack up at
the extreme, the median may be more accurate than the mean. If the high
scores had not been bounded by the highest obtainable score, the mean may
actually have been higher. The mean, median, and mode are exactly the same
in a normal distribution. However, not all distributions of scores have a
normal or bell-shaped appearance. The highest point in a distribution of
scores is called the modal peak. A distribution with the modal peak off to
one side or the other is described as skewed. The word skew literally
means "slanted."
The direction of skew is
determined by the location of the tail or flat area of the distribution.
Positive skew occurs when the tail goes off to the right of the
distribution. Negative skew occurs when the tail or low point is on the
left side of the distribution. The mode is the most frequent score in the
distribution. In a skewed distribution, the mode remains at the peak
whereas the mean and the median shift away from the mode in the direction
of the skewness. The mean moves furthest in the direction of the skewness,
and the median typically falls between the mean and the mode. Mode is the
best measure of central tendency when nominal variables are used. Median
is the best measure of central tendency when ordinal variables are used.
Mean is the best measure of central tendency when interval or ratio scales
are used.
Measures of Variability
If there is no variability
within populations there would be no need for statistics: a single item or
sampling unit would tell us all that is needed to know about the
population as a whole. Three indices are used to measure variation or
dispersion among scores: (1) range, (2) variance, and (3) standard
deviation (Cozby, 2000). The range describes the deference between the
largest and smallest observations made: the variance and standard
deviation are based on average difference or deviation of observations
from the mean.
Measures of central
tendency, such as the mean and median, are used to summarize information.
They are important because they provide information about the average
score in the distribution. Knowing the average score, however, does not
provide all the information required to describe a group of scores. In
addition, measures of variability are required. The simplest method of
describing variability is the range, which is simply the difference
between the highest score and lowest score. Another statistic, known as
the interquartile range, describes the interval of scores bounded by the
25th and 75th percentile ranks; the interquartile range is bounded by the
range of scores that represent the middle 50 percent of the distribution.
In contrast to ranges, which are used infrequently in statistical
analysis, the variance and standard deviation are used commonly. Since the
mean is the average score in a distribution, the sum of the deviations
around the mean will always equal zero. Yet, in order to understand the
characteristic of a distribution of scores, some estimation of deviation
around the mean is important. The sum of these deviations will always
equal zero. However, the squared deviations around the mean can yield a
meaningful index. The variance is the sum of the squared deviations around
the mean divided by the number of cases.
Range
Range is the simplest
method of examining variation among scores and refers to the difference
between the highest and lowest values produced. It shows how wide the
distribution is over which the measurements are spread. For continuous
variables, the range is the arithmetic difference between the highest and
lowest observations in the sample. In the case of counts or measurements,
1 should be added to the difference because the range is inclusive of the
extreme observations.. The range takes account of only the most extreme
observations. It is therefore limited in its usefulness, because it gives
no information about how observations are distributed. Interquartile range
is the area between the lowest quartile and the highest quartile, or the
middle 50% of the scores
Variance
The variance is a very
useful statistic and is commonly employed in data analysis. However, its
calculation requires finding the squared deviations around the mean rather
than the simple or absolute deviations around the mean. Thus, when the
variance is calculated, the resulting calculation will be in units that
are the natural squared units. Taking the square root of the variance puts
the observations back into their original metric. The square root of the
variance is known as the standard deviation. The standard deviation is an
approximation of the average deviation around the mean. Although the
standard deviation is not technically equal to the average deviation, it
gives an approximation of how much the average score deviates from the
mean.One method for calculating variance is to first calculate the
deviation scores. The sum of the set of deviation score equal
to zero. Variance is the squire of the standard deviation: conversely, a
standard deviation is the squire root of the variance. The deviation of a
distribution of scores can then be used to calculate the variance.
Standard Deviation
The standard deviation is
the most widely applied measure of variability. When observations have
been obtained from every item or sampling unit in a population, the symbol
for the standard deviation is (lower case sigma). This is parameter of the
population. When it is calculated from a sample it is symbolized s.
Standard deviation of a distribution of scores is the squire root of the
variance. Large standard deviations suggest that scores do not cluster
around the mean: they are probably widely scattered. Similarly small
standards deviations suggest that there is very little deference among
scores.
Normal Distribution
The normal distribution is
a mathematical construct which suggests that naturally occurring
observations follow a given pattern. The pattern is the normal curve,
which places most observations at the mean and lesser number of
observations at either extreme. This curve or bell-shaped distribution
reflects the tendency of the observations concerning a specific variable
to cluster in a particular manner.
The normal curve can be described for any set of data given the mean and
standard deviation of the data and assumptions that the characteristics
under study would be normally distributed within the population. A normal
distribution of the data suggests that 68% of observations fall within one
standard deviation of the mean, 95% fall within two standard deviations of
the mean, and 99.87% fall within three standard deviations of the mean.
Theoretically range of the curve is unlimited.
Standard Scores
One of the problems with
means and standard deviations is that their meanings are not independent
of context. For example, a mean of 45.6 means little unless the score is
known. The Z-score is a transformation into standardized units that
provides a context for the interpretation of scores. The Z-score is the
difference between the score and the mean, divided by the standard
deviation. To make comparisons between groups, standard scores rather than
raw scores can be used. Standard scores enable the investigator to
examine the position of a given score by measuring its mean deviation from
the means of all sores.
Most often, the units on
the x axis of the normal distribution are in Z-units. Any variable
transformed into Z-units will have a mean of 0 and a standard deviation of
1. Translation of Z-scores into percentile ranks is accomplished using a
table for the standard normal distribution. Certain Z-scores are of
particular interest in statistics and psychological testing. The Z-score
1.96 represents the 97.5th percentile in a distribution whereas -1.96
represents the 2.5th percentile. A Z-score of less than -1.96 or greater
than +1.96 falls outside of a 95 percent interval bounding the mean of the
Z-distribution. Some statistical definitions of abnormality view these
defined deviations as cutoff points. Thus, a person who is more than 1.96
Z-scores from the mean on some attribute might be regarded as abnormal. In
addition to the interval bounded by 95 percent of the cases, the interval
including 99 percent of all cases is also commonly used in statistics.
Confidence Intervals
In most statistical
inference problems the sample mean is used to estimate the population
mean. Each sample mean is considered to be an unbiased estimate of the
population mean. Although the sample mean is unlikely to be exactly the
same as the population mean, repeated random samples will form a sampling
distribution of sample means. The mean of the sampling distribution is an
unbiased estimate of the population mean. However, taking repeated random
samples from the population is also difficult and expensive. Instead, it
is necessary to estimate the population mean based on a single sample;
this is done by creating an interval around the sample mean.
The first step in creating
this interval is finding the standard error of the mean. The standard
error of the mean is the standard deviation divided by the square root of
the sample size. Statistical inference is used to estimate the probability
that the population mean will fall within some defined interval. Because
sample means are distributed normally around the population mean, the
sample mean is most probably near the population value. However, it is
possible that the sample mean is an overestimate or an underestimate of
the population mean. Using information about the standard error of the
mean, it is possible to put a single observation of a mean into context.
The ranges that are likely
to capture the population mean are called confidence intervals. Confidence
intervals are bounded by confidence limits. The confidence interval is
defined as a range of values with a specified probability of including the
population mean. A confidence interval is typically associated with a
certain probability level. For example, the 95 percent confidence interval
has a 95 percent chance of including the population mean. A 99 percent
confidence interval is expected to capture the true mean in 99 of each 100
cases. The confidence limits are defined as the values for points that
bound the confidence interval.Creating a confidence interval requires a
mean, a standard error of the mean, and the Z-value associated with the
interval
Correlation technique in
research refers to the tendency of a variation in one variable to be
related to a variation in another variable. Correlational research
examines relationships among variables of interest without any active
intervention on the part of the investigator. Many variables in nursing
and health care are related (for example, weight of baby and its age; the
age of a person and blood pressure). Relationships or associations between
variables are referred as correlations.
Correlations are measured
in ordinal or interval scales. A correlation coefficient describes this
relationship. Correlation coefficient is a number ranging from -1 to +1
that denotes the degree and kind, i.e. (positive or negative) relationship
that exists between two variables. The coefficient of correlation gives
two indications, first, the magnitude or size of relationship, and second
it gives indication regarding the direction of the correlation whether
positive or negative.
The fact that variables are
associated or correlated does not necessarily mean that one causes the
other. Height and weight may be correlated in a population, but one
variable cannot be said to be the cause of the other: both are undoubtedly
related to some underlying genetic factor. In statistics, correlation
refers to a quantitative relationship between two variables measured on
ordinal or interval scale.
Correlation coefficient can
be calculated by both parametric and non-parametric methods. A parametric
coefficient is the Product moment Correlation Coefficient. It is used only
for interval scale observations and is subjected t more stringent
conditions than non-parametric alternatives. Of the various non-parametric
coefficients, the Spearman Rank Correlation Coefficient is among the most
widely used. It is appropriate for observations based on ordinal (rank)
scales as well as interval scales.
Inferential statistics are mathematical procedures which help the
investigator to predict or infer population parameters from sample
measures. This is done by a process of inductive reasoning based on the
mathematical theory of probability (Fowler, J., Jarvis, P. & Chevannes M.
2002).
Probability
The idea of probability is
basic to inferential statistics. The goal of inferential statistical
techniques is same, to determine as precisely as possible the probability
of an occurrence. It can be regarded as quantifying the chance that a
stated outcome of an event will take place. Probability refers to the
likelihood that the differences between groups under study are the result
of chance. Probability Theory states, any
given event out of all possible outcomes. When any numbers of mutually
exclusive sets are given they add up to a singularity. When a coin is
tossed it has two out comes, either head or tail, i.e. 0.5 chance for head
and 0.5 chance for tail. When these two chances are added it gives 1. For
example, in a class there are fifty students, the chance of students to
become first in the class is 1 in 50 (i.e. .02). By convention,
probability values fall on a scale between 0 (impossibility) and 1
(certainty), but they are sometimes expressed as percentages, so the
‘probability’ scale has much in common with the proportion scale.
The chance of committing type one error is decided by
testing the hypothesis for its probability value. In behavioural sciences
<.05 is taken as alpha value for testing the hypothesis. When stringent
outcomes are required <.01 or <.001 are taken as the alpha value or p
value.
Statistical
Significance (alpha level)
The level of significance
(or alpha level) is determined to identify the probability that the
deference between the groups have occurred by chance rather than in
response to the manipulation of variables. The decision of whether the
null hypothesis should be rejected depends on the level of error that can
be tolerated. The tolerance level of error is expressed as a level of
significance or alpha level. The usual level of significance or alpha
level is 0.05, although at times levels of 0.01 or o.001 may be used when
high level of accuracy is required. In testing the significance of
obtained statistics, if the investigator rejects the null hypothesis when,
in fact, it is true he commits type I error or alpha error, and
when the investigator accepts the null hypothesis when, in fact, it is
false he commits type II or beta error (Singh AK, 2002).
Parametric and
non-parametric test are commonly employed in behavioral researches.
Parametric Tests
A parametric test is one
which specifies certain conditions about the parameter of the population
from which a sample is taken. Such statistical tests are considered
to be more powerful than non-parametric tests and should be used if their
basic requirements or assumptions are met. Assumptions for using
parametric tests:
-
The observation must be independent.
-
The observation must be drawn from a normal distribution.
-
The sample drawn from a population must have equal variances
and this condition is more important if the size of the sample is
particularly small, i.e. homogenicity of variables.
-
The variables must be expressed in interval or ratio scales.
-
The variables under study should be continuous
Examples of parametric
tests are t-test, z-test and F-test.
Non-parametric tests
A non-parametric test is
one does not specify any conditions about the parameter of the population
from which the population is drawn. These tests are called
distribution-free statistics. For non-parametric tests, the variables
under study should be continuous and the observations should be
independent. Requisites for using a non-parametric statistical test are:
-
The shape of the distribution of the population from which a
sample is drawn is not known to be normal curve.
-
The variables have been quantified on the basis of nominal
measures (or frequency counts)
-
The variables have been quantified on the basis of ordinal
measures or ranking.
-
A non-parametric test should be used only when parametric
assumptions cannot be met.
Common non-parametric
tests
Chi-squire test
Mann-Whitney U test
Rank difference methods
(Spearman rho and Kendal’s tau)
Coefficient of concordance
(W)
Median test
Kruskal-Wallis test
Friedman test
Two
unmatched (unrelated) groups, experimental and control (e.g. patient
receiving a prepared therapeutic intervention for depression and control
group of patients on routine care)-
See the distribution,
whether normal or non-normal
If normal, use parametric
tests (independent t-test)
If non-normal, go for
nonparametric tests- Mann-Whitney U test or making the data normal
through natural log transformation or z-transformation.
Two-matched (related) groups, pre-post design (the same group is rated
before intervention and after the period of intervention the group is
again rate. i.e. two ratings in the same or related group)-
See distribution, whether
normal or non-normal
If normal use parametric
paired t-test.
If non-normal, use
nonparametric Wilcoxon Sign Rank (W) test
More
than two –unmatched (unrelated) groups (for example three groups:
schizophrenia, bipolar and control group)-
see distribution whether
normal or non-normal
if normally distributed use
parametric One-way ANOVA
if non-normal use
nonparametric Kruskal-Wallis test
More
than two matched (related) groups (for example in ongoing intervention
ratings at different times- t1, t2, t3, t4 …)
See distribution, normal or
non-normal
If the data is normal use
parametric Repeated Measures ANOVA
If data is non-normal use
nonparametric Friedman’s test
Matched (related) and
unmatched (unrelated) observations
When analyzing bivariate
data such as correlations, a single sample unit gives a pair of
observations representing two different variables. The observations
comprising a pair are uniquely linked, are said to be matched or paired.
For example, the systolic blood pressure of 10 patients and measurements
of another 10 patients after administration are unmatched. However, the
measurements of the same 10 patients before and after administration of
the drug are matched. It is possible to conduct more sensitive analysis if
the observations are matched.
Chi-squire (X2) Test
(analyzing frequencies)
The chi-squire test is one
of the important non-parametric tests. Guilford (1956) has called it the
‘general-purpose statistic’. Chi-squire test are widely referred to as
test of homogenicity, randomness, association, independence and goodness
of fit. The chi-squire test is used when the data are expressed in terms
of frequencies of proportions or percentages. This test applies only to
discrete data, but any continuous data can be reduced to the categories of
in such a way that they can be treated as discrete data. The chi-square
statistic is used to evaluate the relative frequency or proportion of
events in a population that fall into well-defined categories. For each
category, there is an expected frequency that is obtained from knowledge
of the population or from some other theoretical perspective. There is
also an observed frequency for each category. The observed frequency is
obtained from observations made by the investigator. The chi-square
statistic expresses the discrepancy between the observed and the expected
frequency
There are several uses of
chi-squire test as:
1.
Chi-squire test can be used as a test of equal probability
hypothesis (equal probability hypothesis is meant the probability of
having the frequencies in all the given categories as equal).
2.
Testing the significance of the independence hypothesis
(independent hypothesis means that one variable is not affected by or
related to another variable and hence, these two variables are
independent).
3.
Chi-squire test can be used in testing a hypothesis regarding the
normal shape of a frequency distribution (goodness-of-fit).
4.
Chi-squire test is used in testing significance of several
statistics like phi-coefficient, coefficient of concordance, and
coefficient of contingency.
5.
In chi-squire test, the frequencies we observe are compared with
those we expect on the basis of some null hypothesis. If the discrepancy
between the observed and expected frequencies is great, then the value of
the calculated test statistic will exceed the critical value at the
appropriate number of degree of freedom. Then the null hypothesis is
rejected in favor of some alternative. The mastery of the method lies not
in so much in the computation of the test statistic itself, but in the
calculation of expected frequencies.
6.
The chi-squire statistic does not give any information regarding
the strength of a relationship: it only conveys the existence of or
non-existence of the relationship between the variables investigated. To
establish the extent and nature of the relationship, additional statistics
such as phi, Cramer’s V, or contingency coefficient can be used (Brockopp
&Hastings-Tolsma, 2003)
Tips on analyzing
frequencies
All versions of the
chi-squire test compare the agreement between a set of observed
frequencies and those expected if some null hypothesis is true.
All objects are counted the
nominal scale or unambiguous intervals on a continuous scale like
successive days or moths ma be regarded for the application of the tests.
Apply Yate’s correction in
the chi-squire test when there is only one degree of freedom, i.e. when
there is only ‘one way’ test and in 2×2 contingency table.
Testing normality of a
data
Parametric statistical
techniques depend upon the mathematical properties of the normal curve.
They usually assume that samples are drawn from populations that are
normally distributed. Before adopting a statistical test, it is essential
to determine whether the data is normal or non-normal. The normality of
data can be checked by two ways, either plot out the data to see if they
look normal or using sophisticated statistical procedures. There are
statistical tests to see normality of the data. The commonest one is
Kolmogorov-Smirnov test. As per the central limit theorem, if
there is no significance in the P value (> .05) ideally a
parametric test can be used for analyzing the data, and if there is
significance (<.05) a non-parametric test should be used for analysis. A
Shapiro-Wilk test is used to see whether parameters used to test
normality is within the allowed limit. Statistical packages like SPSS can
be used for doing this test.
t-test and z-test
(comparing means)
In experimental sciences,
comparisons between groups are very common. Usually, one group is the
treatment, or experimental group, while the other group is the untreated,
or control group. If patients are randomly assigned to these two groups,
it is assumed that they differ only by chance prior to treatment.
Differences between groups after the treatment are usually used to
estimate treatment effect. The task of the statistician is to determine
whether any observed differences between the groups following treatment
should be attributed to chance or to the treatment. The t-test is commonly
used for this purpose. There are actually several different types of
t-tests
Types of t-Tests
·
Comparison of a sample mean with a hypothetical population
mean.
·
Comparison between two scores in the same group of
individuals.
·
Comparison between observations made on two independent
groups.
t-test and z-test are
parametric inferential statistical techniques used when comparison of two
means are required. It is used to test the null hypothesis that there is
no difference in means between the two groups. The reporting of the
results of t-test generally includes the df, t-value, and
probability level. A t-test can be one-tailed or two-tailed. If the
hypothesis is directional, a one-tailed test is generally used, and if the
hypothesis is non-directional. t-test is used when sample size is
less than 30 and z-test is used when sample size is more than 30.
There are dependent and
independent t-tests. The formula to calculate a t-test can
differ depending on whether the samples involved are dependent or
independent. Samples are independent when there are two groups such as an
experimental and a control group. Samples are dependent when the
participants from two groups are paired in some manner. The form of the
t-test that is used with a dependent sample may be termed as paired,
dependent, matched, or correlated (Brockopp & Hastings-Tolsma, 2003).
Degree of freedom (df)
Degree of freedom (df)
is a mathematical concept that describes the number of events or
observations that are free to vary: for each statistical test there is a
formula for calculating the appropriate degree of freedom (n-1).
Mann-Whitney U-test
The Mann-Whitney U
test is a non-parametric substitute for the parametric t-test, for
comparing the medians of two unmatched pairs. For application of U
test data must be obtained on ordinal or interval scale. We can use
Mann-Whitney U-test to compare the median time undertaken to
perform the task by a sample of subjects who had not drunk with that of
another sample who had drunk a standardized volume of alcohol. This test
is used to see group difference, when the data is non-normal and the
groups are independent. The test can be applied in groups with unequal or
equal size.
Some key points about using
Mann-Whitney U-test are:
·
This test can be applied to interval data (measurements), to
count of things, derived variable (proportions and indices) and to ordinal
data (rank scales, etc.)
·
Unlike some test statistics, the calculated value of U
has to be smaller than the tabulated critical value in order to reject
null hypothesis.
·
The test is for difference in medians. It is common error to
record a statement like ‘the Mann-Whitney U-test showed there is
significant difference in means. There is, however, no need to calculate
the medians of each sample to do the test.
Wilcoxon test -matched
pairs
The Wilcoxon test for
matched pairs is a non-parametric test for comparing the medians of two
matched samples. It calls for a test statistic T whose probability
distribution is known. The observation must be drawn on interval scale. It
is not possible to use this test on ordinal measurements. The Wilcoxon's
test can be used in matched pair samples. This test is for difference in
medians and the test assumes that samples have been drawn from parent
populations that are symmetrically not necessarily normally distributed.
Pearson Product-Moment
Correlation Coefficient
The Pearson product-moment
correlation method is a parametric test is a common method assessing the
association between two variables under study. In this test an estimation
of at least one parameter is involved, measurement is at an interval
level, and it is assumed that the variable under study is normally
distributed within the population.
Spearman Rank
correlation Coefficient
Spearman’s r is a
nonparametric test, which is equivalent to parametric Pearson r.
Spearman’s Rank Correlation Technique is used when the conditions of the
Product Moment Correlation Coefficient do no apply. This test is widely
used by health scientists and uses ranks of the x and y observations and
the raw data themselves are discarded.
Tips on using
correlation tests
When observations of one or
both variables are on an ordinal scale, or are proportions, percentages,
indices or counts of things, use the Spearman’s Rank Correlation
Coefficient. The number of units in the sample i.e. the number of paired
observations should be between 7 and 30.
When observations are
measured on interval scale use Product Moment Correlation Coefficient
should be considered. . Sample units must be obtained randomly, and the
data should be bivariate normal i.e. x and
y.
The relationship between
the variables should be rectilinear (straight line) not curved.
Certain mathematical transformations (e.g. logarithmic transformation)
will ‘straighten up’ curved relationships.
A strong and significant
correlation does not mean does not mean one necessarily the cause of the
other. It is possible that some additional, unidentified factor is
underlying source of variability in both variables.
Correlations measured in
samples estimate correlations in the populations. A correlation in a
sample is not ‘improved’ or strengthened by obtaining more observations:
however, larger samples may be required to confirm the statistical
significance of weaker correlations.
Regression Analysis
Regression analysis is
often used to predict the value of one variable given information about
another variable. The procedure can describe how two continuous variables
are related. Regression analysis is used to examine relationships among
continuous variables and is most appropriate for data that can be plotted
on a graph. Data are usually plotted, so that the independent variable is
seen on the horizontal (x) axis and the dependent variable on the vertical
(y) axis. The statistical procedure for regression analysis includes a
test for the significance of the relationship between two variables. Given
a significant relationship between two variables, knowledge of the value
of the independent variable permits a prediction of the value of the
dependent variable.
One-Way Analysis of
Variance (ANOVA)
When there are three or
more samples, and the data from each sample are thought to be distributed
normally, analysis of variance (ANOVA) may be a technique of choice
One-way analysis of variance is a parametric inferential statistical test
that enables the investigators to compare two or more group means, which
was developed by RF. Fisher. The reporting of the results includes the
df, F value and the probability level. ANOVA is of two types: simple
analysis of variance and complex analysis of variance or two-way analysis
of variance. One-Way Analysis of Variance (ANOVA) is an extension of
t-test, which permits the investigator to compare more than two means
simultaneously. Researchers studying two or more groups can use ANOVA to
determine whether there are differences among the groups. For example,
nurse investigators who want to assess the levels of helplessness among
three groups of patients--long-term, acute care and outpatients-can
administer an instrument designed to measure levels of helplessness and
then calculate an F ratio. If the F ratio is sufficiently
large, then conclusion can be that there is a difference between at least
two of the means can be drawn. The larger the F-ratio, the more likely it
is that the null hypothesis can be rejected. Other tests called post hoc
comparisons, can be used to determine which of the means differ
significantly. Fisher’s LSD, Duncan’s new multiple range test, the Newman-Keuls,
Tukey’s HSD, and Scheffe’s test are the post hoc comparison tests that are
most frequently used following ANOVA. In some instances a post hoc
comparison is not necessary because the means of the groups under
consideration readily convey the differences between the groups (Brockopp
& Hastings-Tolsma, 2003).
Kruskal-Wallis
test-more than two samples
The Kruskal-Wallis test is
a simple non-parametric test to compare the medians of three or more
samples. Observations may be interval measurements, counts of things,
derived variables, or ordinal ranks. If there are only three samples, then
there must be at least five observations in each sample. Samples do not
have to be of equal sizes. The statistic K is used to indicate the
test value.
Two-way or Factorial
Analysis of Variance
Factorial analysis of
variance permits the investigator to analyze the effects of two or more
independent variables on the dependent variable (one-way ANOVA is used
with one independent variable and one dependent variable). The term factor
is interchangeable with independent variable and factorial ANOVA therefore
refers to the idea that data having two or more independent variables can
be analyzed using this technique.
Analysis of Covariance
(ANCOVA)
ANCOVA is an inferential
statistical test that enables investigators t adjusts statistically for
group differences that may interfere with obtaining results that relate
specifically to the effects of the independent variable(s) on the
dependent variable(s).
Multivariate Analysis
Multivariate analysis
refers to a group of inferential statistical tests that enable the
investigator to examine multiple variables simultaneously. Unlike other
statistical techniques, these tests permit the investigator to examine
several dependent and independent variables simultaneously.
If the data fulfill the
requirement of parametric assumptions, any of the parametric tests which
suit the purpose can be used. O the other hand, if the data do not fulfill
the parametric requirements, any of the non-parametric statistical tests,
which suit the purpose, can be selected. Other factors which decide the
selection of appropriate statistical tests are the number of independent
and dependent variables, and he nature of the variables (whether nominal,
ordinal, interval or ratio). When both independent and dependent variables
are interval measures and are more than one, multiple correlation is the
most appropriate statistic. On the other hand when they are interval
measures and their number is only one, Pearson r may be used. With ordinal
and nominal measures, the non-parametric statistics are the common choice.
The availability of computer software has
greatly facilitated the execution of most statistical
techniques. The many statistical packages run on different
types of platforms or computer configurations. For general
data analysis the Statistical Package for the Social
Sciences (SPSS), the BMDP series, and the Statistical
Analysis System (SAS) are recommended. These are
general-purpose statistical packages that perform
essentially all the analyses common to biomedical research.
In addition, a variety of other packages have emerged.
SYSTAT runs on both IBM-compatible and Macintosh systems and
performs most of the analyses commonly used in biomedical
research. The popular SAS program has been redeveloped for
Macintosh systems and is sold under the name JMP. Other
commonly used programs include Stata, which is excellent for
the IBM-compatible computers. The developers of Stata
release a regular newsletter providing updates, which makes
the package very attractive. StatView is a general-purpose
program for the Macintosh computer. Newer versions of
StatView include an additional program called Super ANOVA,
which is an excellent set of ANOVA routines. StatView is
user-friendly and also has superb graphics. For users
interested in epidemiological analyses, Epilog is a
relatively low-cost program that runs on the IBM-compatible
platforms. It is particularly valuable for rate
calculations, analysis of disease-clustering patterns, and
survival analysis. GB-STAT, is a low-cost, multipurpose
package that is very comprehensive.
SPSS (Statistical Package for Social
Sciences) is one among the popular computer programs for
data analysis. This software provides a comprehensive set of
flexible tools that can be used to accomplish a wide variety
of data analysis tasks (Einspruch, 1998). SPSS is available
in a variety of platforms. The latest product information
and free tutorial are available at
www.spss.com.
Computer software programs that provide easy
access to highly sophisticated statistical methodologies
represent both opportunities and dangers. On the positive
side, no serious researcher need be concerned about being
unable to utilize precisely the statistical technique that
best suits his or her purpose, and to do so with the kind of
speed and economy that was inconceivable just two decades
ago. The danger is that some investigators may be tempted to
employ after-the-fact statistical manipulations to salvage a
study that was flawed to start with, or to extract
significant findings through use of progressively more
sophisticated multivariate techniques.
References & Bibliography
1.
Ahuja R (2001). Research
Methods. Rawat Publications, New Delhi. 71-72.
2.
Brockopp D Y & Hastings-Tolsma
M (2003). Fundamental of Nursing Research. 3rd
Edition. Jones and Bartlet: Boston
3.
Cozby P C (2000). Methods in
Behavioral Research (7th Edition). Toronto:
Mayfield Publishing Co.
4.
Kerr A W, Hall H K, Kozub S A
(2002). Doing Statistics with SPSS. Sage Publications,
London.
5.
Einspruch E L (1998). An
Introductory Guide to SPSS for Windows. Sage Publications,
Calf.
6.
Fowler J, Jarvis P & Chevannes
M (2002). Practical Statistics for Nursing and Health
Care. John Wiley & Sons: England
7.
Guilliford, J P (1956).
Fundamental Statistics in Psychology and Education. New
York: McGraw-Hill Book Co.
8.
Lindquist, E F. (1968).
Statistical Analysis in Educational Research. New
Delhi: Oxford and IBH Publishing Co.
9. Singh AK. (2002). Tests,
Measurements and Research Methods in Behavioural Sciences.
Bharahty Bhavan. New Delhi.
10. Singlton, Royce A. and Straits,
Bruce (1999). Approaches to Social Research (3rd
Ed), Oxford University Press, New York.
11. Streiner, D. & Norman, G.
(1996). PDQ Epidemiology (2nd Edition).
St. Louis: Mosbey
12. Therese Baker L (1988). Doing
Social Research, McGraw Hill Book Co., New York.
13. reece E W & Treece J H (1989).
Elements of Research in Nursing, The C.V. Mosby
Co.,St.Louis.
14. Tyler L E (1963).Tests and Measurements.
Englewood Cliffs, New Jersey: Prentice Hall, a-p7.b-p.14
15.Chalmers TC, Celano P, Sacks H,
Smith H(1983). Bias in treatment assignment in controlled
clinical trials. N Engl J Med 309:1358.
16.Cohen J (1988). Statistical
Power Analysis for the Behavioral Sciences. Erlbaum,
Hillsdale, NJ.
17.Cook TD, Campbell DG(1979).
Quasi-experimentation: Design and Analysis Issues for Field
Studies. Rand-McNally, Chicago.
18.Daniel WW (1995) Biostatistics:
A Foundation for Analysis in the Health Sciences, ed 6.
Wiley, New York.
19.
Daniel WW (1990). Applied
Nonparametric Statistics, ed 2. PWS-Kent, Boston.
20. Dawson-Saunders B, Trapp RG
(1994) Basic and Clinical Biostatistics, ed 2. Appleton &
Lange, Norwalk, CT.
21.
Edwards LK, editor (1993)
Applied Analysis of Variance in Behavioral Science. Marcel
Dekker, New York.
22.
Efron B, Tibshirani R (1991).
Statistical data analysis in the computer age. Science
253:390.
23.
Jaccard J, Becker MA (1997).
Statistics for the Behavioral Sciences, ed 3. Brooks/Cole
Publishing Co, Pacific Grove, CA.
24.
Keppel G (1991). Design and
Analysis. Prentice-Hall, Englewood Cliffs, NJ.
25. Kaplan RM, Grant I, (200).
Statistics and Experimental Design in Kaplan &
Sadock's Comprehensive Textbook of Psychiatry 7th
Edition.
26.
McCall R (1994). Fundamental
Statistics for Psychology, ed 6. Harcourt Brace, &
Jovanovich, New York.
27. Pett MA (1997). Nonparametric
Statistics for Health Care Research: Statistics for Small
Samples and Unusual Distributions. Sage Publications,
Thousand Oaks, CA.
28. Sacks H, Chalmers DC, Smith H
(1982). Randomized versus historical controls for clinical
trials. Am J Med 72:233.
29. Ware ME, Brewer CL, editors
(1999). Handbook for Teaching Statistics and Research
Methods, ed 2. Erlbaum, Mahwah, NJ.
|