*Corresponding Author:
P. V. Bharatam
Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, Mohali-160 062, India
[email protected]
Date of Submission 24 April 2013
Date of Revision 02 October 2013
Date of Acceptance 13 October 2013
Indian J Pharm Sci 2013;75(6):680-687  


Mechanism-based inhibition of cytochrome P450 involves the bioactivation of the drug to a reactive metabolite, which leads to cytochrome inhibition via various mechanisms. This is generally seen in the Phase I of drug metabolism. However, gemfibrozil (hypolipidemic drug) leads to mechanism-based inhibition after generating glucuronide conjugate (gemfibrozil acyl-β-glucuronide) in the Phase II metabolism reaction. The mechanism involves the covalent binding of the benzyl radical (generated from the oxidation of aromatic methyl group in conjugate) to the heme of CYP2C8. This article deals with the development of a 2D QSAR model based on the inhibitory potential of gemfibrozil, its analogues and corresponding glucuronide conjugates in inhibiting the CYP2C8-catalysed amodiaquine N-deethylation. The 2D QSAR model was developed using multiple linear regression analysis in Accelrys Discovery Studio 2.5 and helps in identifying the descriptors, which are actually contributing to the inhibitory potency of the molecules studied. The built model was further validated using leave one out method. The best quantitative structure activity relationship model was selected having a correlation coefficient (r) of 0.814 and cross-validated correlation coefficient (q 2 ) of 0.799. 2D QSAR revealed the importance of volume descriptor (Mor15v), shape descriptor (SP09) and 3D matrix-based descriptor (SpMax_RG) in defining the activity for this series of molecules. It was observed that volume and 3D matrix-based descriptors were crucial in imparting higher potency to gemfibrozil glucuronide conjugate, as compared with other molecules. The results obtained from the present study may be useful in predicting the inhibitory potential (IC 50 for CYP2C8 inhibition) of the glucuronide conjugates of new molecules and compare with the standard gemfibrozil acyl-β-glucuronide (in terms of pIC 50 values) in early stages of drug discovery and development.


Gemfibrozil, glucuronide conjugation, QSAR, 2D descriptors, mechanism-based inhibition, CYP2C8

Cytochrome P450 (CYP450) is a large family of heme-containing enzymes, which catalyse a variety of metabolic reactions, such as oxidation, reduction, hydroxylation and dealkylation [1,2]. The substrates for these reactions include both the exogenous compounds such as drugs, environmental chemicals, natural products as well as endogenous compounds such as hormones. The importance of CYPs in the biotransformation of lipophilic compounds to the more polar metabolites is widely known. The understanding of inhibition of these enzymes also becomes essential, since it leads to several toxicological complications and drug–drug interactions (DDIs) [3-8]. The inhibition of CYPs can be reversible or irreversible; Mechanism-based inhibition (MBI) is a type of irreversible inhibition, where the substrate is bioactivated to a more reactive metabolite, which leads to CYP inhibition by different mechanisms [5-8]. Various mechanisms by which MBI occurs include are (i) coordination with the heme-iron, (ii) alkylation or arylation of heme-porphyrin and (iii) covalent interaction with active site nucleophilic residues. Several examples of drugs leading to MBI include nortriptyline, stiripentol, paroxetine, fluoxetine, ritonavir and verapamil [5-8].

Gemfibrozil, a hypolipidemic drug, constitutes an interesting example in relation to MBI and DDIs. Gemfibrozil reduces the triglyceride levels and increases high-density lipoprotein (HDL) levels in the patients [9]. Its pharmacological actions are mediated mainly through peroxisome proliferator activated receptor alpha (PPARα) stimulation. It has been reported that the concomitant use of gemfibrozil with other drugs, such as cerivastatin, results in the increased concentration of other drugs and sometimes, fatal cases of rhabdomyolysis may occur. Beckman et al . reported that the plasma concentrations of cerivastatin increased by more than 5 fold in the presence of gemfibrozil, along with a large reduction in the concentrations of the CYP2C8-dependent major metabolite of cerivastatin [9]. Due to certain toxicological complications, the use of gemfibrozil declined. However, this DDI potential of gemfibrozil is currently explored in both in vitro and in vivo studies during drug development.

The glucuronide conjugate of gemfibrozil is believed to be responsible for the toxicological consequences via MBI of cytochrome enzyme (CYP2C8) [10-14]. Mano et al . reported the glucoronidation of gemfibrozil in the hepatocytes by uridine-5′-diphosphoglucuronosyltransferase (UGT) 2B7 enzyme [10]. Ogilvie et al. discussed the inhibitory potential of gemfibrozil glucuronide as the mechanism-based inhibitor of CYP2C8 [11]. The mechanism as described by Baer et al. and Jenkins et al. is shown in fig. 1 [12,14]. It involves the CYP2C8 catalysed benzylic oxidation of the glucuronide, leading to heme-alkylation and inactivation of CYP2C8. It is interesting to note that gemfibrozil in the parent form does not act as the mechanism-based inhibitor; rather, it acts as a reversible inhibitor of CYP2C9. Also, there were differences reported in the in vitro and in vivo potencies of CYP2C8 and CYP2C9 inhibition by gemfibrozil, which suspects the role of the MBI of CYP2C8 by gemfibrozil 1-O-β-glucuronide (gemfibrozil acyl-β-glucuronide) [10-14].


Fig 1: Reaction pathway for conjugation and mechanism-based inhibition.
Reaction pathway for the formation of gemfibrozil glucuronide conjugate, and mechanism-based inhibition of CYP2C8.

Ogilvie et al. and Jenkins et al. recently performed the in vitro studies on conjugates of gemfibrozil and its analogues in causing the inhibition of CYP2C8 catalysed amodiaquine N-deethylation [11,12]. It was observed that only acyl-β-glucuronide of gemfibrozil led to MBI of CYP2C8, whereas few other analogues showed reversible inhibitory properties. The inhibitory potential of all the conjugates and some well-known CYP2C8 inhibitors was reported in terms of IC50 values in micromolar concentration. A higher potency of gemfibrozil glucuronide towards CYP2C8 inhibition was observed in comparison to other analogues and aglycones.

Understanding the specificity of gemfibrozil glucuronide in CYP2C8 inhibition becomes essential, as it would provide information, which may be utilised for the design and development of other CYP inhibitors. Computational methods such as molecular docking and quantitative structure activity studies (QSAR) have been extensively employed to obtain such characteristics features [15-18]. The importance of in silico methods becomes more indispensable, since the in vitro–in vivo extrapolation (IVIVE) of DDIs caused by MBI is challenging. Because of that, various fatal interactions have remained unrecognised for many years.

In this study, a 2D QSAR model, using the IC50 values of a series of molecules studied by Jenkins et al. [12] was built to account for the important descriptors and parameters, which might play a role in determining the higher potency for gemfibrozil glucuronide conjugate towards inhibition of CYP2C8 (as mechanism-based inhibitor), as compared with other analogues. This also helped in understanding the reported biological data and developing predictive correlations. To the best of our knowledge, this is the first QSAR model study for glucuronide conjugates in relation to their inhibitory potential. This type of study would help in predicting the inhibitory potential of glucuronide conjugates of new molecules, in terms of pIC50 values (CYP2C8 inhibition), and CYP2C8 substrates against the standard acyl-β-glucuronide of gemfibrozil, during the early stages of drug discovery and development.

Materials and Methods

QSAR indicates the relationship or correlation between the experimental activity (including inhibitory activity) of a set of compounds and their physicochemical properties using various statistical methods such as multiple linear regression (MLR), partial least squares and others [15-18]. The biological activity, expressed in IC50 or pIC50 values, is used as the dependent variable, while the descriptor values are used as the independent variables in building a QSAR model. In this study, 2D QSAR model was generated using Accelrys Discovery Studio 2.5 [19] on a personal computer with dual core processor and window XP operating system.

Dataset for 2D QSAR model

The IC50 values for the inhibition of CYP2C8 catalysed amodiaquine N-deethylation by molecules were taken from the published work of Jenkins et al., where the experimental IC50 values were evaluated following incubation of the molecules with NADPH-fortified human liver microsomes. The IC50 values were converted to IC50 values and were used as the dependent variables for this study. The 3D structures of compounds were sketched using Sybyl 7.1 [20]. The built molecules were minimised using Powell method for 10 000 iterations with 0.05 kcal/mol Å as gradient. Tripos force field and Gasteiger–Huckel charges were implemented during energy minimisation and geometry optimisations. The energy minimised structures were utilised for calculation of 491 descriptors, including steric, topological, surface, thermodynamic, electronic, charge, spatial, such as molecular weight, molecular refractivity, logP, topological index, partial charge, element count, atom count (all atoms), atom count (carbon), atom count (hydrogen), atom count (oxygen), bond count (all bonds), dipole moment (debye), group count (amine), group count (carboxyl), group count (ether), group count (hydroxyl), group count (methyl), ionisation potential (eV), LogP, polarisability, ring count (all rings), solvent accessibility surface area (Å), using E-Dragon [21]. The 3D descriptors calculated take into account the difference between enantiomers in the dataset of molecules.

Selection of training and test set

The dataset of 29 molecules (fig. 2) was randomly divided into training set (20 compounds) and test set (9 compounds) for MLR models, using pIC50 value as the dependent variable and various 2D descriptors as the independent variables. The structures of training set and test set are tabulated in Tables 1 and 2, respectively, along with the experimental and predicted pIC50 values. The definition of the descriptors utilised in deriving the QSAR equation is given in Table 3. The statistical method utilised in deriving the QSAR equation was MLR method [15-18,22-25].


Fig 2: Structures of gemfibrozil, analogues and conjugates.
Structures of gemfibrozil, gemfibrozil analogues and their corresponding glucuronide conjugates and other known CYP2C8 inhibitors. Gemfibrozil (1) at 200 micromolar concentration and (22) at 100 micromolar concentration (Jenkins et al., 2011).

Molecule Actual pIC50 2D‑QSAR pIC50 Residual
1 3.99 3.74 ‑0.25
2 4.99 4.25 ‑0.74
3 4.66 4.57 ‑0.09
6 3.70 3.81 0.11
7 3.60 3.50 ‑0.10
9 3.70 3.49 ‑0.21
10 3.70 3.89 0.19
11 3.70 3.98 0.28
13 5.46 5.25 ‑0.21
15 4.46 4.37 ‑0.09
16 5.02 4.92 ‑0.10
18 4.00 4.11 0.11
19 4.00 4.16 0.16
21 4.00 3.97 ‑0.03
23 6.00 6.33 0.33
24 4.77 4.55 ‑0.22
25 4.37 4.44 0.07
26 4.10 4.04 ‑0.06
27 4.77 4.76 ‑0.01
28 4.10 4.11 0.01

Table 1: Structures, Experimental And Predicted Ic50 Values For Training Set Of Compounds

Mole Actual pIC50 2D QSAR pIC50 Residual
4 4.65 4.70 0.05
5 4.24 4.53 0.29
8 3.52 3.66 0.14
12 5.85 5.70 ‑0.15
14 5.08 4.82 ‑0.26
17 4.59 4.79 0.20
20 4.66 4.84 0.18
22 3.96 3.69 ‑0.27
29 3.92 4.03 0.11

Table 2: Structures, Experimental And Predicted Ic50 Values For Test Set Of Compounds

Abbreviation Description
Mor15v 3D‑MoRSE descriptor, indicates 15/weighted
  by van der Waals volume
SP09 Indicates shape profile number 9 in Randic
  molecular profile
SpMax_RG 3D matrix‑based descriptor, indicates
  leading Eigen value from reciprocal squared
  geometrical matrix

Table 3: Definitions Of Molecular Descriptors Used In The Qsar Model

Descriptor selection

The selection of potential descriptors from a set of multiple descriptors is a critical step in QSAR analysis. Out of the various 1D, 2D and 3D descriptors calculated by E-Dragon [21], highly correlated descriptors were removed. Pearson’s correlation coefficient was calculated for each pair of descriptors as well as between a descriptor and the activity of a molecule. The correlation matrix was, thereafter, built using the above analysis (Table 4). Simultaneously, the contribution of descriptors for the activity is also calculated in the correlation matrix. This strategy afforded a total of 145 descriptors. The resulting sets of 2D properties in terms of descriptors were given as an input for Genetic Function Approximation (GFA) [25] model protocol under QSAR cluster of protocols in DS 2.5. GFA algorithm is considered to be a stochastic optimisation method, which is inspired by evolutionary principles. It investigates many solutions of a given problem simultaneously to explore various regions of the parameter space [22-25]. In GFA, population size was kept as 100 and the run was set up to 10 000 generations. As the number of compounds in the study is less than 30, maximum equation length was set to 3 with R2 as scoring function. Ten models were generated to select the best set of descriptors, which was tested during MLR analysis.

  Mor15v SP09 SpMax_RG Activity
Mor15v 1.000      
SP09 0.243 1.000    
SpMax_RG 0.187 ‑0.036 1.000  
Activity ‑0.475 0.684 0.358 1.000

Table 4: The Correlation Matrix Of The Three Descriptors Used For Qsar Model

Regression analysis

The dataset of 29 molecules was subjected to regression analysis using MLR as the method for model development. As mentioned earlier, QSAR model was generated using pIC50 value as the dependent variable and various descriptors selected during GFA analysis (Table 3) as the independent variables. The number of variables utilised in the final equation was three for this model. The selection criteria for the best model covered a range of statistical parameters such as correlation coefficient r, cross-validated correlation coefficient r2cv/q2, predictive squared correlation coefficients r2pred, number of compounds in regression (n) and others.

Multiple linear regression analysis

This method is widely used for 2D QSAR model development and is utilised in modelling a linear relationship between the dependent variable (pIC50) and independent variables (descriptors). In the fitting of the MLR model (based on least squares), the sum of squares of differences of observed and predicted pIC50 values is minimised. The values of regression coefficient r2 in MLR are estimated using least squares curve fitting method. The relationship is depicted in the form of a straight line from the model that shows best approximation for all the individual data points. In MLR, the conditional mean of pIC50 (dependent variable) is dependent on many descriptors (more than one independent variable) used in the study. The regression Eqn 1 is generally represented as, Y=B1×x1+b2×x2+c....(1), where, Y refers to the dependent variable (pIC50 in this case), b refers to the regression coefficients for the corresponding independent variables (x), c refers to the regression constant.

Validation of QSAR model

The QSAR model can be validated using leave one out (LOO) method, which is mostly used. Herein, one value from the dependent variable (biological activity in pIC50) is eliminated from the training set, followed by the division of the training dataset into subsets of equal size. The values of dependent variable are determined using the same subsets (utilised in building QSAR model), and this gives the predictive value for the model. The same procedure is thereafter repeated, until each value from the dependent variable has been eliminated once. The q2 is calculated using Eqn 2, q2 = 1−Σ(Ypred−Yact)2/Σ(Yact−Ymean).....(2), where Ypred, Yact, and Ymean refer to the predicted, actual and mean values of the pIC50, respectively. Σ(Ypred-Yact)2 is the predictive residual error sum of squares (PRESS).

The predicted ability of the final QSAR equation obtained was checked using cross-validated squared correlation coefficient (q2) and predictive squared correlation coefficients (r2pred). The fitness plot for the QSAR model supports the statistical significance of the model.

Results and Discussion

Several equations (20) were obtained when the data set was subjected to MLR analysis in order to develop a 2D QSAR model. Finally, the best model was derived after a number of permutations and trials using three descriptors as shown in Table 3. The best model was selected on the basis of statistically significant parameters, such as correlation coefficient (r), its square (r2), cross-validation (q2) and others as discussed before. The best 2D QSAR equation obtained using MLR as, pIC50=−4.2548+3.9824*Mor15v+0.7154*SP09 −2.6936 SpMax_RG, where, n=20training and 9test, r=0.814, r2=0.847, r2adj=0.838, q2=0.799, r2pred=0.718.

In this QSAR study, r2 refers to variance in the biological activity (including inhibitory activity), obtained by the multiplication of the correlation coefficient with 100. Predictive ability of the built QSAR model was evaluated by q2 (LOO method for internal validation), as discussed earlier. Predictive ability was also confirmed for test compounds by external validation (r2pred).

The role of the major descriptors in the QSAR model can be visualised in the correlation matrix, shown in Table 4. It indicates a significant correlation between all the three descriptors. As per the definition of the descriptors, it can be ascertained that the information conveyed by these descriptors is different and distinct. Randic recommends that the descriptors, which are different and distinct in their information content, can be retained for building the QSAR model, even if they show a high correlation. The QSAR equation derived showed a good value of correlation coefficient, r of 0.814, with 84.7% of the variation observed in the inhibitory activity. The validation of the model showed a good internal predictive value (q2) of 0.799.

2D QSAR model generated indicate the positive contribution of Mor15v and SP09 descriptors, and negative contribution of SpMax_RG descriptor. Gemfibrozil glucuronide conjugate contains one hydrophobic phenyl ring and one polar glucuronide moiety, whereas other analogues contain more than two hydrophobic aromatic rings. This tends to increase their lipophilicity, which is indicated by 3D matrix-based descriptor such as SpMax_RG. Therefore, more the value of SpMax_RG, greater negative contribution and lesser would be the inhibitory potential. The descriptors SpMax_RG and Mor15v were found to contribute more significantly to the activity, owing to their higher coefficients in the equation as shown before. This indicates that the volume, shape and 3D matrix-based descriptor of the molecule play a significant role in determining the inhibitory potential of the molecules. Hence, out of the descriptors studied, few play a crucial determining role in imparting higher potency to gemfibrozil glucuronide conjugate, as compared to other molecules.

Fig. 3a shows the plot of experimental v/s predicted pIC50 values for the training set using the obtained QSAR equation (mentioned earlier). The external validation of the test set showed a good predictive ability of 0.718. Table 2 shows the predicted pIC50 values for the test set. Fig. 3b shows the plot of experimental versus predicted pIC50 values for the test set using the obtained QSAR equation. It shall be noted that the different set of descriptors may give different correlation with the inhibitory activity, and this should be taken into consideration while interpreting the derived equation.


Fig 3: The plots of experimental v/s predicted pIC50.
(a) The plot of experimental v/s predicted pIC50 values for the training set, and (b) test set using the obtained QSAR equation.

Glucuronidation plays a pivotal role in the MBI of CYP2C8 by gemfibrozil. 2D QSAR model predicted the contribution of volume, shape and 3D matrix-based descriptors in the inhibitory potential of conjugates of gemfibrozil and its analogues. 2D QSAR studies revealed that Mor15v, SP09 and SpMax_RG were the major contributing descriptors. Mor15v and SP09 contributed positively, while, SpMax_RG contributed negatively to the inhibitory activity. The descriptors Mor15v and SpMax_RG were found to contribute more significantly to the activity, owing to their higher coefficients in the equation. This indicates that the van der Waals volume along with the 3D matrix-based properties of the molecule play a significant role in determining the inhibitory interaction with CYP2C8. A good correlation coefficient (r) of 0.814 was obtained from the QSAR model. The model when validated using LOO method showed a cross-validated correlation coefficient (q2) value of 0.799, and a good predictive value (r2pred, external validation) of 0.718. In this QSAR model, 84.7% of the variance in biological activity was predicted, as indicated by r2 value of 0.847 multiplied by 100.

This QSAR study thus, shows that the inhibitory potential of glucuronide conjugates of gemfibrozil and its analogues is governed by the contribution of volume, shape and 3D matrix-based descriptors. This model would be useful in the prediction of inhibitory potential in terms of pIC50 values (CYP2C8 inhibition) of glucuronide conjugates of new molecules and CYP2C8 substrates against the standard acyl-β-glucuronide of gemfibrozil (potent mechanism-based inhibitor of CYP2C8), during the early stages of drug discovery and development.


NT is thankful to Department of Science and Technology (DST) for DST Inspire Fellowship. The authors thank Mr. Anup Shah for his help and the reviewers for their valuable suggestions.