Indian Journal of Dental Research

: 2007  |  Volume : 18  |  Issue : 4  |  Page : 163--167

Use of the generalized linear models in data related to dental caries index

SB Javali1, Parameshwar V Pandit2,  
1 Department of Public Health Dentistry, SDM College of Dental Sciences and Hospital, Dharwad - 580 009, Karnataka, India
2 Department of Statistics, Bangalore University, Jnanabharati, Angalore - 560 056, Karnataka, India

Correspondence Address:
S B Javali
Department of Public Health Dentistry, SDM College of Dental Sciences and Hospital, Dharwad - 580 009, Karnataka


The aim of this study is to encourage and initiate the application of generalized linear models (GLMs) in the analysis of the covariates of decayed, missing, and filled teeth (DMFT) index data, which is not necessarily normally distributed. GLMs can be performed assuming underlying many distributions; in fact Poisson distribution with log built-in link function and binomial distribution with Logit and Probit built-in link functions are considered. The Poisson model is used for modeling the DMFT index data and the Logit and Probit models are employed to model the dichotomous outcome of DMFT = 0 and DMFT ≠ 0 (caries free/caries present). The data comprised 7188 subjects aged 18-30 years from the study on the oral health status of Karnataka state conducted by SDM College of Dental Sciences and Hospital, Dharwad, Karnataka, India. The Poisson model and binomial models (Logit and Probit) displayed dissimilarity in the outcome of results at 5% level of significance ( P <0.05). The binomial models were a poor fit, whereas the Poisson model showed a good fit for the DMFT index data. Therefore, a suitable modeling approach for DMFT index data is to use a Poisson model for the DMFT response and a binomial model for the caries free and caries present (DMFT = 0 and DMFT ≠ 0). These GLMs allow separate estimation of those covariates which influence the magnitude of caries.

How to cite this article:
Javali S B, Pandit PV. Use of the generalized linear models in data related to dental caries index.Indian J Dent Res 2007;18:163-167

How to cite this URL:
Javali S B, Pandit PV. Use of the generalized linear models in data related to dental caries index. Indian J Dent Res [serial online] 2007 [cited 2022 Aug 17 ];18:163-167
Available from:

Full Text

The decayed, missing, and filled teeth (DMFT) index is one of the most commonly used index in various epidemiological studies to measure the degree of caries experience of subjects with primary as well as permanent dentition. It is the sum of the simple count of the number of decayed, missing, and filled teeth, which represents the cumulative severity of the dental caries experience. In such studies, the mean DMFT is commonly quoted for the total sample and used as a measure to compare the caries experience between subgroups.

In the earlier studies reported in the literature, it was generally regarded that DMFT index data fulfilled the normality assumption. Hence, multiple linear regression (MLR) models were commonly used to estimate the influence of covariates. Numerous studies have been reported in the literature that have determined the influence of different covariates of dental caries experience, such as sex, gender, age, sweet consumption, frequency of brushing, etc. In a vast majority of the studies, the caries data was analyzed by using traditional MLR techniques, which assume that dental caries indices follow the normality assumption. [1],[2],[3],[4],[5]

It has been observed that the worldwide prevalence of dental caries, especially in the developed countries, [6] has declined rapidly during the last 20 years. Thus, the DMFT index, used either to assess the prevalence or incidence of dental caries, has become highly positively skewed among subjects and adults. [7] These changes in the DMFT have had the effect of increasing the proportion of zeros in the distribution.

However, as the prevalence of dental caries declined and the proportion of zeros increased, various investigators have questioned the assumption of normality and have stressed that greater focus should be given to the caries-free component. [8] When assumptions regarding normality do not hold, common regression techniques cannot be applied to study data. When the number of the caries-free component is so large that the normality assumption is not applicable to the data, there arises a need to describe the nature of the distribution of the DMFT index, which is readily available in the existing statistical software. Various investigators have proposed certain techniques to transform data to make the normality assumption more approximate. However, some statisticians maintain that a discrete index cannot strictly follow a normal distribution, either untransformed or in any transformed state. But, numerous models have been described by various authors to describe the nature of the distribution of the DMFT index. Grainger and Ried [9] suggested that the negative binomial distribution is the best and most satisfactory model for dental caries; Turlot et al. [10] proposed a model based on a Poisson 'with zeros' distribution and Fabien et al. [11] initiated the GLMs with Poisson distribution to compare caries indices.

In public dental health literature, very few studies can be found where the GLM has been applied to caries count data. [8],[11] The aim of the present study was to initiate the application of GLMs in analyzing the covariates of the DMFT index data that do not require the assumption of normality.

 Materials and Methods

Study area

The study was carried out from December 2000 to December 2001 in all the districts of Karnataka, including Bangalore district. The state is situated on the western edge of the Deccan plateau on the west coast of South India. It is one of the largest and most populous states in India. Karnataka has an area of 191791 km 2 and an approximate population of 44.9 million. The literacy rate is nearly 60%. The domestic production per capita is average when compared with the rest of the country.

Study population and sampling procedure

Eighteen to thirty year old adults of Karnataka state were included in the study and a multistage cluster sampling procedure was used. Districts of the state were the primary sampling units; these were divided into talukas. A total of 42 talukas were selected randomly; they contained 41 urban and 117 rural units. Among the selected urban and rural units, the required sample of 7188 subjects aged 18-30 years was obtained. The mean age of the children was 25.98 5.45 years.

Clinical examination

Five well-qualified dentists, assisted by three recorders, examined all the subjects. The examinations were carried out at the subjects' homes using plane mouth mirrors, WHO periodontal probes, and natural / artificial light. The DMFT examinations were conducted following standardized and widely-accepted criteria, as recommended by the WHO report on oral health. [12] Besides the oral health information, data were collected on socio-demographic characteristics and oral hygiene practices (OHP) by a structured interview method. For more detailed description of the procedure followed refer to Oral Health Status, Karnataka state. [13] In each district, 20 subjects were examined twice by the same dentist for assessing intra-examiner agreement. The kappa value for intra-examiner agreement of the tooth status in all the districts ranged from 0.61-0.80. Apart from dental health status, data on socio-demographic and other factors, i.e., age, gender, religion, occupation, location, and OHP were recorded by the personal interview procedure.

Data analysis

The authors were interested in establishing the covariates of caries experience (DMFT index). For convenience of fitting GLMs, the DMFT index data was treated as a response variable and the dummy variable 'female' was created to represent gender. To assess the influence of OHP, the children who cleaned their teeth without using brush/paste/powder were considered as the dummy variable. Similarly, all the other covariates were fitted as dummy variables, except for age (in years), which was fitted as a continuous variable.

As can be seen in [Figure 1] and [Table 1], the distribution of the DMFT index is markedly skewed, with the majority of the subjects having a low score and only a minority with a high score. About 52% (n = 3739) of 18-30 years old adults presented without any sign of caries experience. Therefore, before the initiation of the application of GLMs, a test of normality was employed to see whether the DMFT index data satisfied the assumption of normality by Shapiro-Wilk statistics; it was found that the DMFT index data was not normally distributed. Hence, we are more away from the traditional MLR models. But, these characteristics fit the various GLMs. First we initiated the GLM with Logit built-in link function: f(π) = log {π/1-π} and Probit built-in link function: f(π) = fφ-1 (π) (φ is the standard normal cumulative distribution function) of dichotomized (DMFT = 0 and DMFT ≠ 0) as a response on a set of covariates. Second, the Poisson model with log built-in link function: log (π) =x1iβ or η = log (π) was applied on the set of covariates; it means that the DMFT index data are not considered as independent events. Here, the majority are of discrete or categorical nature and more likely to fulfill the underlying assumption of Logit and Probit built-in functions. The findings of Logit and Probit models were compared with the Poisson model.


A test of normality was employed to see if the DMFT index data followed a Gaussian distribution. The Shapiro-Wilk statistic proposed by the univariate procedure of STATISTICA 5.0 [14] and SPSS 11.0 [15] (P [8] felt that the use of GLM was more appropriate for analyzing the distribution of caries data in children. The selection of the GLM is a very difficult issue. However, on examination of the sample DMFT distribution, the investigator should be able to make a good estimate of which model is best suited to their requirements, and for further validation, nonparametric goodness-of-fit tests should also be applied.

For consistent estimation, the influence of the covariates was measured with GLM procedures, i.e., Logit, Probit, and Poisson models, and were interpreted accordingly. It is important to note that the effects of the covariates were multiplicative in the Poisson model, whereas they were additive in nature in the other models, including the normal models. However, in Logit and Probit models, it is a common practice to estimate the odds ratios as it helps to interpret the measure of association, which is frequently used in dental epidemiological studies.

In general, the findings of the present study indicate that the Poisson model could be adopted for DMFT, and Logit and Probit models could be adopted for carious and caries-free subjects. These estimations allow us to separate those factors which influence the actual presence or absence of caries and those factors which influence the magnitude of caries.

In GLM procedures, the built-in link function should be chosen for more accurate estimation of the covariates of dental caries. Considerable amounts of differences were observed between the results obtained using the Poisson distribution and the binomial distribution. Moreover, with the same probability distribution, the choice of the built-in link function was very important as it would depend on the nature of the clinical data.


The comparison of the results estimated by the use of the Poisson distribution and binomial (Logit and Probit) distribution displayed some differences. These differences could lead to incorrect interpretations, affecting the general results of the studies. Therefore, the appropriate GLMs are to be chosen when comparing the covariates of the data on caries indices.


The authors thanks to Dr. Bhasker Rao, Principal, and Dr. K. V. V. Prasad, Head, and other subordinate staff of the Department of Public Health Dentistry, SDM College of Dental Sciences and Hospital, Dharwad, Karnataka, India for providing the dental caries data in this paper.


1Dummer PM, Oliver SJ, Hicks R, Kingdon A, Kingdon R, Add M, et al . Factors influencing the caries experience of a group of children at the ages of 11-12 and 15-16 years: Results from an ongoing epidemiological survey. J Dent 1990;18:37-48.
2Angellio IF, Romano F, Fortunato L, Montanazo D. Procedure of dental caries and enamel defects in children living in areas with different water fluoride concentration. Commun Dent Health 1990;29:424-34.
3Venobbergen J, Martens L, Lesaffre E, Bogaerts K, Decleack D. Assessing risk indicators for dental caries in primary dentition. Community Dent Oral Epidemiol 2001;29:424-34.
4Javali SB, Prasad KVV, Tippeswamy V. Determinants of dental caries experience. Indian J Dent Res 2001;12:230-3.0
5Javali SB, Pandit PV. Statistical analysis of data from some determinants of dental caries experience. J Pierre Fauchard Acad 2004;18:59-66.
6Downer MC. The changing pattern of dental disease over 50 years. Br Dent J 1998;185:36-41.
7Spencer AJ. Skewed Distributions-new outcome measures. Community Dent Oral Epidemiol 1997;25:52-9.
8Lewsey JD, Gilthorpe MS, Bulman JS, Bedi R. Is modeling dental caries a normal thing to do? Community Dent Health 2000;17:212-7.
9Grainger RM, Reid DB. Distribution of dental caries in children. J Dent Res 1954;33:613-23.
10Turlot JC, Cahen PM, Frank RM. Longitudinal study of the evolution of the frequency of dental caries in a school milieu: A statistical model. Rev Epidemiol Sante Publique 1984;32:398-407.
11Fabien V, Anne-Matrie OM, Guy H, Pierre-michal C. Use of the generalized linear model with Poisson distribution to compare caries indices. Community Dent Oral Epidemiol 1999;16:93-6.
12World Health Organization: Oral health surveys. Basic Methods. WHO: Geneva; 1997.
13Prasad KVV, Thanveer, Joseph J. Oral Health Status Karnataka State; An Epidemiological Survey, MDS, Dissertation, Rajiv Gandhi University of Health Sciences, Bangalore, 1999-2000
14Statistical Package: STATISTICA-5.0 version: Tulsa, OK 74104, Stat Soft INC, U. S. A. 1995
15Statistical Package for the social sciences: SPSS 11.0.1 users guide, Chicago: SPSS Inc. 2001.