What Statistical Test to Use for Categorical Data

  • Journal List
  • Indian J Dermatol
  • v.61(4); Jul-Aug 2016
  • PMC4966396

Indian J Dermatol. 2016 Jul-Aug; 61(4): 385–392.

Biostatistics Serial Module four: Comparison Groups – Chiselled Variables

Avijit Hazra

From the Department of Pharmacology, Establish of Postgraduate Medical Education and Research, Kolkata, West Bengal, Republic of india

Nithya Gogtay

1 Department of Clinical Pharmacology, Seth GS Medical College and KEM Infirmary, Mumbai, Maharashtra, India

Received 2016 May; Accepted 2016 May.

Abstract

Categorical variables are commonly represented as counts or frequencies. For analysis, such data are conveniently arranged in contingency tables. Conventionally, such tables are designated equally r × c tables, with r cogent number of rows and c denoting number of columns. The Chi-square (χtwo) probability distribution is particularly useful in analyzing categorical variables. A number of tests yield examination statistics that fit, at least approximately, a χ2 distribution and hence are referred to as χ2 tests. Examples include Pearson's χ2 test (or only the χtwo test), McNemar's χ2 exam, Mantel–Haenszel χ2 test and others. The Pearson's χ2 exam is the most usually used examination for assessing difference in distribution of a categorical variable between two or more than independent groups. If the groups are ordered in some way, the χtwo test for tendency should be used. The Fisher's verbal probability exam is a exam of the independence between two dichotomous categorical variables. It provides a amend culling to the χ2 statistic to assess the difference between two contained proportions when numbers are small, but cannot be applied to a contingency tabular array larger than a two-dimensional one. The McNemar'south χii test assesses the departure between paired proportions. It is used when the frequencies in a 2 × 2 table represent paired samples or observations. The Cochran's Q test is a generalization of the McNemar's test that compares more than two related proportions. The P value from the χ2 examination or its counterparts does not signal the strength of the difference or association between the categorical variables involved. This information can be obtained from the relative risk or the odds ratio statistic which is measures of dichotomous association obtained from 2 × two tables.

Keywords: Binomial test, Chi-square distribution, Chi-square for tendency, Chi-square test, Cochran's Q test, contingency table, Mantel–Haenszel test, McNemar'due south test, sign test

Introduction

Chiselled variables ordinarily represent counts or frequencies. Count information normally pertains to subjects or articles with certain attributes, exposures or outcomes and is represented past non-negative integers. Thus, in a group, i may count the number of individuals who are tall, or have been exposed to a viral disease or may have survived lung cancer. For analysis, such data are conveniently arranged in a contingency table.

Such a tabular array provides the frequencies for 2 or more than categorical variables simultaneously and can be constructed past cross-classification (cross-tabulation) of the counts on 2 or more than variables. A tabular array that depicts the frequencies of two variables would be called a two-fashion or two-dimensional contingency table. In the simplest instance, as depicted in Tabular array 1, the tabular array will have ii rows and 2 columns, and therefore would be a two × 2 (ii-past-two) table. A table with r rows and c columns would be referred to as r × c contingency table.

Table ane

Example of a two-dimensional or two-manner contingency table

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g001.jpg

Notation that, in such a table, each subject would exist counted in only ane cell, the one corresponding to that particular combination of variable categories. The summation of the counts in all the cells would give the total number of subjects in the study. In the case presented in Table ane, the grouping variable is placed in rows, which is the usual convention. In epidemiological studies, exposure categories are generally placed in rows and outcome categories in columns. For example when we are examining the relationship between smoking and lung cancer, the exposure categories, namely 'smokers' and 'nonsmokers', would be placed in rows, while the result categories, namely 'lung cancer' and 'no lung cancer' would be placed in columns.

In contingency tables, one time the row and column totals are known, just some of the cells tin take independent values. The values in other cells would then exist constrained by the row and column totals. Therefore, a contingency table is said to have certain degrees of freedom (df), calculated equally (r − 1) × (c − one). Thus, a 2 × 2 table will have df = 1, whereas a three × iii table will have df = 4. In the example in Figure ane, one time we know that there are 250 smokers and 750 nonsmokers in the study, and that there are full nineteen individuals with lung cancer, we can ascribe an contained value to but one of the cells. The values in the other cells would and so exist automatically adamant by the row and column totals.

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g002.jpg

Common statistical tests to compare categorical information for divergence

The analysis of such ii-dimensional contingency tables often involves testing for the difference between the ii groups using the familiar Chi-square (χtwo) test and its variants. Three- and higher-dimensional tables are dealt with by multivariate log-linear analysis. In Tabular array two, we provide an example of a three-way contingency table that depicts frequencies simultaneously for three categorical variables, namely, health status, gender, and test result. In the assay of such a table, the log-linear model tin be used which, withal, is outside the scope of this module.

Table 2

Example of a three-dimensional or iii-way contingency table

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g003.jpg

Before we take up individual tests, and their variants used to assess categorical or count-type data, let the states recapitulate through Figure ane, the tests that are available to compare groups or sets of categorical data for a pregnant divergence.

Chi-foursquare Distribution

This is a continuous probability distribution given past positive values that are skewed to the right. The shape of a χ2 distribution is characterized by its df. Equally the df increases, it becomes more symmetrical and approaches the normal distribution. Figure 2 depicts a set of χ2 distributions with varying df. Thus, similar the t-distribution, the χ2 distribution represents a whole family of distributions distinguished by the df parameter, but unlike the t-distribution, it is not symmetrical. However, for df values greater than about thirty, information technology will approximate the normal distribution.

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g004.jpg

Chi-foursquare distributions with different degrees of freedom. The X-centrality denotes the χ2 value, whereas the Y-axis denotes the probability density function. Notation that as degree of freedom (ν) increases the distribution is tending to get more symmetrical

This distribution is particularly useful in analyzing categorical variables. A number of statistical tests yield test statistics that fit, at least approximately, a χ2 distribution and hence they may exist referred to as χ2 tests. Examples include Pearson's χ2 examination (also simply chosen the χii test), McNemar'south χtwo exam, Cochran–Mantel–Haenszel χ2 exam, and others.

Chi-foursquare Test

The Pearson's χ2 test (later Karl Pearson, 1900) is the most commonly used examination for the difference in distribution of categorical variables between ii or more than independent groups.

Suppose we are interested in comparing the proportion of individuals with or without a item characteristic between ii groups. The null hypothesis would exist that at that place is no difference between these 2 proportions. The data can be bundled in a ii × 2 contingency table. We will need a larger contingency table to arrange the data if there are more than two groups or the categorical variable of interest tin have more than 2 possible values.

Depending on the observed frequencies in each cell, the test involves in computing the respective expected frequencies. The expected frequency is calculated by dividing the product of the applicative row and column full for that cell by the overall total. The examination then finds out if the observed counts differ significantly from the expected counts, if there was no departure between groups. The Pearson χtwo statistic is calculated every bit:

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g005.jpg

where O: The observed count in each prison cell, and E: The expected count in that prison cell.

The calculated value of the χ2 statistic is referred to the χtwo distribution table, and the resultant significance level (P value) depends on the applicable df. If the P value is less than the selected critical value (say 0.05 or 0.01) then the null hypothesis can be rejected.

When testing for independence in a contingency tabular array, the χ2 distribution, which is a continuous probability distribution, is used as an approximation to the detached probability of observed frequencies, namely the multinomial distribution. When the total number of observations is small, the estimates of probabilities in each cell become inaccurate and the gamble of Type I error increases. Information technology is not fixed how large N should be, but probably at least xx with the expected frequency in each cell at least five. When the expected frequencies are small, the approximation of the χ2 statistic tin be improved by a continuity correction known as Yates' correction (after Frank Yates). This involves subtracting 0.5 from the positive discrepancies (observed − expected) and calculation 0.5 to the negative discrepancies earlier these values are squared in the adding of the usual χ2 statistic. If the sample size is large, this correction volition have little issue on the value of the test statistic.

Yates' continuity correction is sometimes considered to be an overly conservative adjustment. Information technology is important to retrieve that the χtwo exam is based on an approximation and the derived P value may differ from that obtained by an "exact" method that does not depend on approximation to a theoretical probability distribution. With small-scale numbers in a 2 × 2 table, the best approach is to use Fisher'south verbal probability test which is discussed below.

The analysis of larger contingency tables can also be carried out using the χ2 test as indicated above, with more cells contributing to the test statistic. The results are referred to the χ2 distribution table with the appropriately larger df. All cells should take an expected frequency > 1% and lxxx% of the cells should take expected frequencies of at least five. If this is not the case, it may help to combine some categories (this is called collapsing categories) so that the table becomes smaller but the numbers in each cell are greater. Yet, this is not ever logically feasible.

In the analysis of a large table, a significant result on χtwo testing volition not bespeak which group is different from the others. It is not appropriate to partition a larger tabular array into several 2 × 2 tables and perform multiple comparisons. A 3 × ii tabular array, for case, will yield three 2 × two tables only a separate test on each table at the original significance level may give a spuriously significant issue. One approach is to practice an initial χ2 test and if P < 0.05, perform separate χ2 tests on each divide 2 × two tabular array using a Bonferroni correction for multiple comparisons to the significance level. This, however, is not recommended if the correction reduces the significance level to <0.01.

Fisher's Verbal Test

The Fisher's exact probability examination (later Ronald Aylmer Fisher, 1934) is a test of the independence between two dichotomous chiselled variables. Information technology provides an culling to the χii statistic to assess the difference betwixt two independent proportions when numbers are pocket-sized, merely cannot exist applied to a contingency table larger than a 2-dimensional one.

The test examines all the possible 2 × 2 tables that can be constructed with the same marginal totals (i.eastward., the numbers in the cells are unlike only the row and column totals are the same) equally the original table but which are as or more extreme in their departure from the nil hypothesis. The probability of obtaining each of these tables is calculated from which a blended P value is derived. This probability is usually doubled to give a two-sided P value.

Thus, instead of referring a calculated statistic to a sampling distribution, the exam calculates an exact probability. The calculations are deadening and are seldom attempted by hand. Indeed, fifty-fifty a calculator software will not execute this test if Northward is likewise large, say over 300.

It may be noted that, for large samples, the χii exam, Yate's corrected χ2 test, and Fisher's exact test give very similar results, just for smaller samples, Fisher'south exam and Yates' correction give more bourgeois results than the conventional χ2 exam; that is the P values are larger, and we are less likely to conclude that there is a pregnant difference between the groups.

The principle of the Fisher's exact examination can now exist extended from a ii × 2 contingency table to the general instance of an k × n table, and some statistical packages provide a calculation for the more general instance. Equally i case, the Freeman–Halton extension (later Freeman and Halton, 1951) to the Fisher's exact test permits calculation of verbal P value from 2 × 3, three × 3, and ii × 4 tables.

Chi-foursquare Test for Tendency

This is practical to a two-fashion contingency tabular array in which one variable has two categories and the other has multiple mutually exclusive but ordered categories, to assess whether at that place is a deviation in the tendency of the proportions in the ii groups. The result of using the ordering in this way gives a test that is more powerful than using the conventional χ2 statistic.

It is said that in applying the χii test for tendency, the counts in individual cells may be pocket-sized but the overall sample size should exist at least thirty.

A common application of this test is to assess if there are pregnant trends (with respect to age, educational condition, socioeconomic status, etc.,) in the incidence or prevalence of disease. The presence or absence of disease would define the ii groups and the frequencies beyond age bands, socioeconomic groups, educational groups, etc., would be compared. It is too used to test the association of ordinal variables with terminal events such as death and in analyzing dose-response relationships.

This test for trend was extended by Nathan Mantel and William Haenszel in 1959 for the situation in which cases and controls take been stratified into subgroups to eliminate possibility of confounding past one or more variables. The test result is adjusted for the strata of the potential confounder involved. This test is generally used for case–control type data and the test statistic has an approximate χ2 distribution with df 1. This stratified tendency test is called the Mantel–Haenszel χ2 test, Mantel–Haenszel test, or extended Mantel–Haenszel test. The extended Mantel–Haenszel χ2 that is calculated (χ2MH) reflects the departure of a linear trend from horizontal.

Equally an example consider the post-obit information pertaining to individuals at high gamble of renal calculi:

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g006.jpg

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g007.jpg

A χ2 for tendency analysis with this data returns a χ2 trend value of 49.70 which at df = 1 yields P < 0.001, indicating that the increasing trend in stone proportion with age is statistically highly significant and likely to be true.

Now, permit us look at the aforementioned data rearranged with stratification by gender:

Now, for the male stratum, χ2 tendency value is 31.73 (df = 1; P < 0.001), and for the female stratum, it is seven.14 (df = one; P = 0.008). A Mantel-Haenszel assay will nonetheless yield a composite χ2 trend value of 35.34 (df = i; P < 0.001). Thus, we tin can translate these results equally a significant trend in stone proportion increment with age, overall as also separately for males and females. In this particular case, in that location is no confounding effect of gender.

A Cochran–Mantel–Haenszel procedure follows a similar logic and is used to derive a composite interpretation for repeated tests of independence. It is commonly applied to a state of affairs where we take multiple two × 2 tables summarizing contained proportions and these tables represent repeat sets of data such as obtained through experiments or observations repeated at different times. Thus, nosotros are dealing with 3 categorical variables, the two that brand upward individual 2 × ii tables and a tertiary nominal variable that identifies the repetitions such as fourth dimension, location, or study. In essence, a Mantel–Haenszel χtwo statistic is calculated here also. In fact, the terms Cochran–Mantel–Haenszel examination and Mantel–Haenszel test have been used interchangeably. This is disruptive but not wrong considering that the basic idea of Cochran (1954) was modified by Mantel and Haenszel (1959) to derive the test formula. The Cochran–Mantel–Haenszel test has as well been used to quantify the conclusion of meta-analyses dealing with multiple studies that look at the same binary issue in two-arm trials. Box ane provides some examples of comparison of independent proportions from published literature.

Box 1

Examples of comparison of independent proportions from published literature

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g008.jpg

Chi-square Goodness-of-fit Test

This represents a different use of the χ2 statistic. The exam is applied in relation to a single categorical variable drawn from a population through random sampling. It is used to determine whether sample data are consistent with a hypothesized distribution in the population.

For example, suppose a pharmaceutical company has printed promotional cards which are being enclosed with packs of a new health drink. It claims that 20% of its cards are gold cards, 30% silver cards, and fifty% statuary cards and these cards enable consumers to claim discounts (maximum for gilt card) on purchase of the next pack. We could gather a random sample of these promotional cards, calculate the expected frequencies of each card category, and compare with the observed frequencies using a χ2 goodness-of-fit test to come across whether our sample distribution differs significantly from the distribution claimed by the company. A significant P value would mean we have to pass up the zilch hypothesis of a skillful fit.

The χ2 goodness-of-fit examination is an culling to the Anderson–Darling and Kolmogorov–Smirnov goodness-of-fit tests. It can be practical to discrete probability distributions such equally the binomial and the Poisson. The Kolmogorov–Smirnov and Anderson–Darling tests are restricted to continuous distributions, for which they are actually more powerful.

Likelihood Ratio Chi-square Test

Likelihood ratio Chi-square test, besides called the likelihood test or G test, is an culling process to test the hypothesis of no association of columns and rows in a contingency tabular array of nominal data. Although calculated differently, likelihood ratio χii is interpreted in the same way as Pearson's χtwo. For large samples, likelihood ratio χtwo volition be close to Pearson χtwo. Even for smaller samples, it rarely leads to substantially dissimilar results if a continuity correction is applied and is therefore infrequently used.

McNemar's Chi-foursquare Test

McNemar's χ2 exam (afterwards Quinn McNemar, 1947), also simply called McNemar's test, assesses the deviation betwixt paired proportions. Thus, information technology is used when the frequencies in a two × 2 table represent paired (dependent) samples or observations. The null hypothesis is that the paired proportions are equal.

It is of import to note that the layout of the contingency tabular array for analyzing paired data is different from that used with unpaired samples. For paired data, a contingency table is created in ane of the means depicted in Table 3 depending on the nature of the data:

Table iii

Data arrangement in two-dimensional contingency tabular array for the McNemar'southward exam

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g009.jpg

The adding of the McNemar's χ2 statistics is dissimilar from that described higher up for the Pearson's χtwo exam. The examination involves calculating the difference between the number of discordant pairs in each category and scaling this departure by the total number of discordant pairs. The value of the McNemar'south χ2 is referred to the χtwo distribution table with df one.

An important observation when interpreting McNemar'due south test is that the elements of the concordant diagonal do not contribute to the determination about whether (in the above example) pre- or post-intervention condition is more favorable. Thus, the sum b + c can be small and statistical ability of the test can be depression even though full N is large. If either b or c is likewise small-scale or b + c is <25, the traditional communication is not to use McNemar'southward examination and instead use an alternative chosen the verbal binomial test. If this is non feasible, Edwards continuity correction (after Allen Edwards) version of the McNemar's test can be used to approximate the binomial exact P value.

An extension of McNemar's test application is to situations where there are clusters of paired data where the pairs in a cluster may non be independent, just independence holds between unlike clusters. An instance is analyzing the effectiveness of a dental procedure; in this case, a pair corresponds to the treatment of an individual molar in patients who might take multiple teeth treated; the effectiveness of handling of two teeth in the same patient is not probable to be independent, merely the handling of two teeth in unlike patients is more likely to be contained.

The sample size requirement sometimes specified for the McNemar'due south examination is that the number of discordant pairs should be at to the lowest degree 10. For small samples, a continuity correction, as stated above, may be applied during the calculation if number of discordant pairs are small. However, the test cannot be applied to a larger than 2 × 2 table. In such a state of affairs, Cochran's Q test may be used.

Exact versions of the McNemar's test that is similar to the Fisher's exact test have been devised just are withal to exist readily available in computer software.

Cochran'southward Q Test

Cochran'southward Q test (after William Gemmell Cochran, 1950) is essentially a generalization of the McNemar's test that compares more than two related proportions. If there is no divergence between the proportions, the test statistic Q has, approximately, a χ2 distribution with (r − one) df.

Although the test assumes random sampling, it is a nonparametric test and does not require normal distribution. The outcome is to be coded as binary responses with the same estimation across categories. Information technology should not be applied to small sample sizes. Since Cochran's Q test is for related samples, cases with missing observations for 1 or more of the variables are excluded from the analysis automatically by software which exercise the test.

The Cochran'south Q test is commonly used for assessing the hypothesis of no inter-rater difference in situations where a number of raters judge the presence or absence of some characteristic in a group of subjects. Box 2 provides examples of comparison of paired proportions from published literature.

Box 2

Examples of comparison of paired proportions from published literature

An external file that holds a picture, illustration, etc.  Object name is IJD-61-385-g010.jpg

Binomial Test and Sign Test

The binomial test is an verbal test of the statistical significance of deviations from a theoretically expected distribution of observations into ii categories, that is a binomial distribution. One common use of the binomial test is to examination the zero hypothesis that two "success or failure" blazon counts are equally likely to occur. For case, suppose a dice is rolled 100 times and vi is obtained 24 times. Since theoretically for an unbiased dice, six should announced 17 times, nosotros tin can utilize the binomial test to make up one's mind if the observed count is significantly different from the expected count.

The binomial test has sure assumptions. There has to be a number of observations which is small compared to the possible number of observations. The observations are dichotomous, i.e., each observation volition fall into one of just 2 categories. Private observations are independent and the probability of "success" or "failure" is the same for each observation. For large samples, the binomial distribution is well approximated past continuous probability distributions, and these are used as the basis for culling tests that are simpler such equally the Pearson'southward χtwo exam or the G exam. However, for small samples, these approximations break downwardly, and the binomial test is a meliorate choice. When the observations tin autumn in more than than two categories, and an exact examination is required, the multinomial test, based on the multinomial distribution, must be used instead of the binomial test.

The sign test is a special example of the binomial test where the probability of success under the cypher hypothesis is 0.5. It is used in repeated measurement designs that measure out a categorical dependent variable on the same subjects before and after some intervention. It tests if the direction of change of counts is random or not. The change is expressed as a binary variable taking the value "+" if the dependent variable value following the intervention is larger than earlier and "−" if information technology is smaller. When there is no modify, the change is coded 0 and is ignored in the assay. When we consider this test, we are not focusing on the magnitude of the issue but rather on its direction.

For example, suppose nosotros measure the average number of cigarettes smoked daily by a grouping of fifteen new smokers before and after they are exposed to counseling sessions on the dangers of smoking. Afterwards the intervention, out of these fifteen individuals, 5 smoke the same number of cigarettes, nine smoke less, and 1 smoke more than. Can nosotros consider that the counseling intervention macerated smoking trend? This problem is equivalent to considering nine favorable outcomes against i unfavorable outcome with probability of event as 0.v. In this case, a sign test would yield P < 0.05 and we can conclude that counseling did alter the smoking behavior favorably.

In clinical trials, the binomial test could exist used to appraise whether a single study site in a multicentric trial has the charge per unit of a particular event like to that observed in the entire study population. In epidemiology, this exam can show whether observed prevalence in a sample matches the known prevalence in the population. Since the sign test is a statistical method to exam for consistent differences between pairs of observations, it has been used to test the null hypothesis that the departure betwixt the median of a numerical variable Ten and the median of another numerical variable Y is zippo, bold continuous distributions of the 2 variables X and Y, in the state of affairs when we can draw paired samples from X and Y. Information technology can as well test if the median of a data set is significantly greater or less than a specified value.

Beyond the Chi-square Statistic in Comparing Chiselled Variables betwixt Groups

The χ2 statistic is used to estimate whether or not a meaning difference exists between groups with respect to categorical variables, but the P value, information technology yields does not bespeak the force of the difference or association.

This information can be obtained from the relative take chances (adventure ratio) or odds ratio. Both are measures of dichotomous clan in that they are practical to 2 × 2 tables in such a manner and so equally to measure the strength of relationships. These estimations are routinely used in epidemiological research where the relationship between exposure to risk factors and adverse outcomes needs to be studied. They are increasingly being used in interventional studies. These issues will exist considered in the take a chance assessment module.

Fiscal support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

Further Reading

1. Dodge Y. Berlin: Springer; 2008. The Concise Encyclopedia of Statistics. [Google Scholar]

2. Upton G, Cook I. 2nd ed. Oxford: Oxford Academy Press; 2001. Introducing Statistics. [Google Scholar]

3. Everitt BS, Skrondal A. 4th ed. Cambridge: Cambridge University Press; 2010. The Cambridge Dictionary of Statistics. [Google Scholar]

4. Miles PS. Oxford: Butterworth-Heinemann; 2000. Statistical Methods for Anaesthesia and Intensive Intendance. [Google Scholar]

v. Field A. 3rd ed. London: SAGE Publications Ltd; 2009. Discovering Statistics Using SPSS. [Google Scholar]


Articles from Indian Journal of Dermatology are provided here courtesy of Wolters Kluwer -- Medknow Publications


governorhatc1972.blogspot.com

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4966396/

0 Response to "What Statistical Test to Use for Categorical Data"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel