As legacy toxicogenomics databases have become available, improved data mining approaches are now key to extracting and visualizing subtle relationships between toxicants and gene expression. In the present study, a novel “aggregating bundles of clusters” (ABC) procedure was applied to separate cholestatic from non-cholestatic drugs and model toxicants in the Johnson & Johnson (Janssen) rat liver toxicogenomics database  . Drug-induced cholestasis is an important issue, particularly when a new compound enters the market with this liability, with standard preclinical models often mispredicting this toxicity. Three well-characterized cholestasis-responsive genes (Cyp7a1, Mrp3 and Bsep) were chosen from a previous in-house Janssen gene expression signature; these three genes show differing, non-redundant responses across the 90+ paradigm compounds in our database. Using the ABC procedure, extraneous contributions were minimized in comparisons of compound gene responses. All genes were assigned weights proportional to their correlations with Cyp7a1, Mrp3 and Bsep, and a resampling technique was used to derive a stable measure of compound similarity. The compounds that were known to be associated with rat cholestasis generally had small values of this measure relative to each other but also had large values of this measure relative to non-cholestatic compounds. Visualization of the data with the ABC-derived signature showed a very tight, essentially identically behaving cluster of robust human cholestatic drugs and experimental cholestatic toxicants (ethinyl estradiol, LPS, ANIT and methylene dianiline, disulfiram, naltrexone, methapyrilene, phenacetin, alpha-methyl dopa, flutamide, the NSAIDs–—indomethacin, flurbiprofen, diclofenac, flufenamic acid, sulindac, and nimesulide, butylated hydroxytoluene, piperonyl butoxide, and bromobenzene), some slightly less active compounds (3′-acetamidofluorene, amsacrine, hydralazine, tannic acid), some drugs that behaved very differently, and were distinct from both non-cholestatic and cholestatic drugs (ketoconazole, dipyridamole, cyproheptadine and aniline), and many postulated human cholestatic drugs that in rat showed no evidence of cholestasis (chlorpromazine, erythromycin, niacin, captopril, dapsone, rifampicin, glibenclamide, simvastatin, furosemide, tamoxifen, and sulfamethoxazole). Most of these latter drugs were noted previously by other groups as showing cholestasis only in humans. The results of this work suggest that the ABC procedure and similar statistical approaches can be instrumental in combining data to compare toxicants across toxicogenomics databases, extract similarities among responses and reduce unexplained data varation.
Cluster analysis ; Cholestasis ; Gene signature ; Microarray ; Prediction ; Toxicogenomics
Cholestasis, or the reduction of bile flow, can progress to serious hepatotoxicity in patients and is a particular concern in evaluation of novel drug candidates. Extrahepatic cholestasis occurs within the bile duct, forming blockages via concentrated, precipitated drugs, or by damage of the biliary epithelial cells, and loss of bile transport and flow. Although a variety of drugs are concentrated in the bile, drug-induced obstructive cholestasis is often readily observed preclinically, and can be avoided. Intrahepatic cholestasis associated with hepatotoxicity generally results from bile salt export pump (BSEP) inhibition, which if not adequately handled by accessory pathways such as MRP3, results in accumulation of detergent-like bile salts, that are toxic to hepatocytes. At high dose levels, novel drug candidates can bind to multiple hepatic transporters and trigger cholestatic signals. Therefore, screening for transporter inhibition in membrane vesicles is a typical step in the drug discovery and development process  ,  ,  and  . Identification and translational understanding of cholestatic responses in preclinical species is key step in the nonclinical safety assessment before initiating pharmaceutical clinical development.
The rat is a well-characterized model in which to study cholestasis, and is a principal species for preclinical toxicology studies. There are some differences from human: bile salts are conjugated by different enzymes (cholic acid and chenodeoxycholic acid are predominantly conjugated with glycine in man versus with taurine in the rat), the molecular weight “filter” for drug biliary secretion is generally lower than in humans (∼350 molecular weight in rat, versus ∼500 in humans), and the rat lacks a gall bladder  and  . This latter difference is an experimental advantage in that there is more consistent bile flow regulation in rats. Additionally, laboratory rats eat a well-defined, carbohydrate-rich rodent chow and feed most heavily during the dark cycle, more continually through the day than most humans. These feeding habits contribute to less variation in rat biliary transport and a continual and higher volume of bile flow  . Despite numerous species differences when comparing drug effects in rat to human, most transporter inhibition studies show remarkable consistency; for example, no rat versus human differences were observed in a large study of drug inhibition of BSEP transport  .
A number of models of cholestasis in the rat are well studied and established: biliary ligation, which models bile duct obstruction; ANIT, which damages/destroys biliary epithelium; LPS, which down-regulates much of the hepatic metabolism including most transporters; and glucuronidated ethinyl estradiol-induced cholestasis by inhibition of BSEP  ,  ,  ,  ,  ,  ,  and  . All of these models have been characterized by gene expression on microarrays  ,  ,  and  . There is a misleading tendency to generalize the specifics of each of these different models to all forms of cholestasis. For characterization of novel drug candidates, there is a need to capture as much of the invariant cholestasis gene response as possible across a wide range of pharmaceutical structural classes.
At Janssen we have developed a predictive, 24 h treatment rat toxicogenomic database using approximately one hundred non-proprietary treatments and four times as many proprietary compound treatments, the latter having supporting exposure, clinical chemistry and histopathology data, often out to a month of dosing. We have published predictive gene signatures for non-genotoxic carcinogens  and  macrophage and PPARα receptor activation  and particularly oxidative stress/reactive metabolite responses for detecting idiosyncratic hepatotoxicants  and  . We previously developed a cholestasis gene expression signature using in-house proprietary compounds that induced cholestasis in rat studies (unpublished data). In the present study, we have used three robust non-redundant genes (Cyp7a1, Mrp3 and Bsep) from this previous rat signature which have been implicated in cholestasis  ,  ,  and  . Non-redundant genes differentially respond with large variances to compounds across our database, therefore a single one of these genes suffices for relevant transcriptomics information. Protein expression of MRP3 is up-regulated during cholestasis and genetic defects in MRP3 have been linked with pregnancy or estrogen hormone-induced cholestasis  . Mutations of the gene encoding the human bile salt export pump are implicated in progressive familial intrahepatic cholestasis type 2 (PFIC2), and inhibition of human and/or rat BSEP transporter constructs have been demonstrated to be correlated to cholestatic potential  . CYP7A1 is an important participant in transcriptional regulation, due to bile acid synthesis via nuclear hormone receptors and modulates cholestasis response via the classical pathway of bile enzyme synthesis  ,  and  .
From this starting point of three genes, we then applied the “aggregating bundles of clusters” (ABC) statistical approach  to develop a rat cholestasis gene expression signature that better discriminates cholestatic from non-cholestatic compounds. Both our unpublished proprietary signature and the present 100-gene rat cholestasis signature of this work yield good separation of compounds, and can discriminate many drugs known to cause cholestasis in man.
Male Sprague-Dawley rats, seven to eight weeks old, and approximately 275 g body weight (Charles River Laboratories, Inc.) were used for experiments. Animals were individually housed in wire-bottom cages, on a 12 h/light/dark cycle, and fed Certified Rodent Diet 5002 (LabDiet) ad libitum, with free access to water. On the day prior to dosing, animals were randomized by weight and allocated to groups (n = 3 rats/group). The route of administration for each test article is denoted in Table 1 . Oral gavage was the most commonly utilized mode of administration as it is the typical route for the majority of compounds (predominantly pharmaceuticals) included in the training set. Animals were dosed in the morning hours, with health checks performed at 1 h and 4 h after dosing, and the end of the workday. Any animals deemed to be in poor health status were evaluated by a veterinarian: in these experiments no animals were prematurely terminated due to poor health.
Necropsy was performed on fasted rats, 24 h following dosing. Rats were killed by exsanguination, severing the vena cava under CO2 analgesia, and liver sections (approximately 200 mg of the right medial lobe) were transfered to labeled cryo tubes and snap frozen in liquid nitrogen.
The selected dose level was the maximum tolerated dose, as elucidated from the literature  and  . Compound selection rationale, dose levels, route of exposure, group size, and experiment number are documented within Johnson and Johnson publically-accessible NIEHS-hosted database http://cebs.niehs.nih.gov .
In all instances, the animals were humanely handled and according to institutional guidelines, in accordance with the IACUC and NRC Guide for the Care and Use of Laboratory Animals.
Total RNA was extracted from liver samples using Qiagen RNEasy Midi kits (Qiagen, Inc., Valencia, CA) as per kit instructions. The amount of RNA in the samples was determined spectrophotometrically by absorbance ratio at 260 and 280 nm. Quality of RNA in the samples was assessed using rRNA peaks determined by an Agilent 2100 Bioanalyzer.
Reverse transcription of RNA with a T7 promoter oligo (dT) primer, followed by in vitro transcription using the RiboBeast 1-Round Aminoallyl-aRNA Amplification Kit was used for one round of RNA amplification (Epicentre, Madison, WI). Superscript II (Invitrogen, Carslbad, CA) was used as the reverse transcriptase, and the Rneasy96 block was used for purification (Qiagen, Valencia, CA) as previously described  . Purified biotin-labeled cRNA was fragmented at 94 °C for 20 min and added to Codelink hybridization buffer A and B (GE Healthcare, Chandler, AZ). Each cRNA sample was applied to two Codelink Rat Whole Genome microarrays (GE Healthcare) using the manufacturer’s protocol, and hybridized at 37 °C overnight. Microarrays were washed, stained with an Alexa555-streptavidin conjugate (Invitrogen), and then scanned at 532 nm with an Agilent G2565BA Microarray Scanner (Agilent Technologies, Palo Alto, CA). Fluorescence intensities for the microarray features were determined using Codelink EXPv4.1 software (GE Healthcare). Additional details regarding experimental procedures may be found in Ref.  .
The data for this study was part of a larger set of toxicogenomic experiments which spanned several hybridization batches. There were several vehicle controls run concurrently in each hybridization batch to establish baseline gene expression levels. The normalization procedure consisted of the following steps within each hybridization batch: (i) spot winsorization on the log-intensities to remove outliers; (b) quantile normalization to the median of the vehicle controls in that batch; (c) mean summarization of the log-intensities for each gene of the technical replicates for each RNA sample; (d) log ratios of the mean in (c) with the median of the vehicle control within the hybridization set. Details of the winsorization and quantile normalization can be found in Ref.  . The dataset consists of log-ratios of the expression levels to the median vehicle control of the corresponding hybridization batch as described above, for each sample, for the 35129 probesets on the microarray chip. All analyses described below were conducted on the probesets. Compounds that were known PPAR agonists and macrophage activators (many of the acutely hepatototoxic compounds) were omitted from analysis despite cholestatic properties, since their robust effects on gene expression interfered with subtle signatures, as noted previously  and  . LPS was added to the analysis since it is an important cholestatic model compound in the rat.
The data used in this study is from the non-proprietary Johnson & Johnson (Janssen) toxicogenomics Codelink database, available in normalized ratio format in collaboration with Strand Life Sciences, Pvt., Ltd., at the weblink: http://pubdata.strandls.com/ in the folder “jnj codelink data/”  .
Raw data related to this project are also publically available in the CEBS at the NIEHS website.
To access the CEBS website:
A novel ABC-like procedure was applied  . The following three steps were performed R times:
Once these steps were run R times, the results were collated. Of the number of times that the i th and j th compounds were both included in the same bag (in step (1) of the procedure), the proportion Cij of times that they also fell into the same cluster (in step (3) of the procedure) was determined.
Let Dij = 1 − Cij . A value of Dij close to zero indicated that compounds i and j often clustered together in the base clusters, thereby implying that they were relatively similar to each other. Alternatively, a value of Dij close to 1 indicated that compounds i and j rarely clustered together in the base clusters, thereby implying that they were relatively dissimilar to one another. Thus Dij could serve as a dissimilarity measure. Multidimensional scaling of (Dij ) was then used to generate a two-dimensional representation of the similarities and dissimilarities of the compounds.
Although an in-house proprietary cholestasis signature was available our goal was to generate a robust gene expression signature using only non-proprietary drugs and compounds. Table 1 provides a list of cholestatic (red) and non-cholestatic (black) compounds. Initially expression profiles of only three genes (Cyp7a1, Mrp3 and Bsep), well characterized as cholestasis-responsive genes, were examined. Their expression profiles, plotted in a three-dimensional scatterplot, with known cholestatic compounds highlighted in red, did not reveal a useful separation of the compounds in any readily interpretable fashion (Fig. 1 ).
mRNA expression profiles of 3 genes (Cyp7a1, Mrp3 and Bsep) to cholestatic (red) and non-cholestatic (black) training set pharmaceuticals and model toxicants in male rat liver. Note that no clear separation was observed between classes.
We next examined the simultaneous expression profiles of all genes using several methods. One approach was to reduce the dimensionality of the data using principal component analysis and to then plot the first two components (Fig. 2 ). Another was to calculate the dissimilarity between every pair of compounds using either Euclidean distance (Fig. 3 ) or 1-correlation (Fig. 4 ) and to then use multidimensional scaling to display the compounds on a two-dimensional scatterplot  . This indicated fair separation between groups, typical of many gene expression signatures, however many compounds in the two groups overlapped, with no obvious basis for the way cholestatic and non-cholestatic compounds distributed in space. Clearly while it was insufficient to consider three key genes (Fig. 1 ), consideration of all genes was also unproductive (Fig. 2 , Fig. 3 and Fig. 4 ). Although such mild separations were considered acceptable in past studies  , a better statistical analysis method was sought.
Principal component analysis of training set (plot of first two components). The applied methods generate a composite representative point with arbitrary units within a two-dimensional space reflecting the similarity of one compound to another.
Two-dimensional scatterplot of training set using multidimensional scaling to illustrate dissimilarity between every pair of pharmaceuticals and model toxicants based on Euclidean distance. The applied methods generate a composite representative point with arbitrary units within a two-dimensional space reflecting the similarity of one compound to another.
Two-dimensional scatterplot of training set using multidimensional scaling to illustrate dissimilarity between every pair of pharmaceuticals and model toxicants based on 1-correlation. The applied methods generate a composite representative point with arbitrary units within a two-dimensional space reflecting the similarity of one compound to another.
A novel approach based on the ABC (“aggregating bundles of clusters”) technique was implemented  . The premise of this method is that most of the genes do not contribute to the organization of the compounds but the analysis should tilt towards the genes that do. Because the three selected “focal” non-redundant genes Cyp7a1, Mrp3 and Bsep are implicated with cholestasis, other gene responses could be ranked in importance by how much they correlated to responses of these three genes. Application of the ABC procedure yielded a display of the compounds that was interpretable in the context of cholestasis. Use of Dij as a dissimilarity measure allowed for generation of a two-dimensional representation of similar and dissimilar compounds ( Fig. 5 ). Compounds that are similar to each other according to (Dij ) are oriented close to one another while compounds that are dissimilar according to (Dij ) are spaced further apart. Many of the best-characterized cholestatic compounds (depicted in red) fell into a tight cluster on the lower right-hand side of Fig. 5 , and their gene responses after selection were almost identical. An expanded view of this cholestatic cluster is provided in Fig. 6 . A few compounds with weaker responses, or those that act via a different mechanism, displayed as a “bridge” in this plot between the tight cluster of known cholestatic (red) compounds on the lower right and the clearly non-cholestatic (black) compounds at bottom left in Fig. 5 . Additionally there was some separation of compounds falling in the non-cholestatic left region of Fig. 5 : while a number of the compounds on the top left and those closest to the cluster on the lower right are reportedly cholestatic in man, separation from non-cholestatic compounds was minimal and not investigated further in this study.
Two-dimensional projection of the similarities and dissimilarities of training set pharmaceuticals and model toxicants based on multi-dimensional scaling following the ABC clustering procedure. Non-cholestatic compounds are noted in black and cholestatic compunds are noted in red. The applied statistical methods generate a composite representative point with arbitrary units within a two-dimensional space reflecting the similarity of one compound to another. A strong cholestatic (red) cluster can be noted on the lower right.
Expanded view of the cholestatic (red) compounds in the right-hand “arm” of Fig. 1 . Two-dimensional projection of the similarities and dissimilarities of training set pharmaceuticals and model toxicants based on multi-dimensional scaling following the ABC clustering procedure. The applied statistical methods generate a composite representative point with arbitrary units within a two-dimensional space reflecting the similarity of one compound to another.
Cholestatic potential of compounds on the lower right (cholestatic) and bottom left (non-cholestatic) were confirmed via literature references. From this exercise, a ten-compound cholestatic (cholestatic in rat, and excluding LPS and gastrointestinally toxic NSAIDs) and four-compound non-cholestatic (non-cholestatic in rat, but cholestatic in human) compound training set (Table 2 ) was used to derive an improved gene expression signature for rat-specific cholestasis. We applied a boosting procedure and elastic net methodology to select a signature set of Codelink IDs  and  . The top set of 100 genes derived using an arbitrary cutoff with highest frequencies (i.e., those with highest statistical significance) is recorded as our final rat cholestasis signature and presented in Table 3 .
|Cholestatic compounds from the right arm||Non-cholestatic compounds from the left arm|
|Ethinyl Estradiol||ErythroMC Estolate|
|Codelink||Accession #||Rat gene|
|GE1262741||BE110134||Similar to ATPase inhibitory factor 1|
|GE1174448||BC062045||similar to lipase, member H|
|GE1206195||BI274696||similar to Fam48a protein|
Using the ABC approach, our results indicate that separation of paradigm human cholestasis compounds into cholestatic/non-cholestatic classes using rat experimental data gives as good or better separation than that observed with our proprietary Johnson & Johnson/Janssen compound-derived signature.
In the present study the ABC statistical approach  was used with three non-redundant cholestasis indicator genes, Cyp7a1, Mrp3 and Bsep, to separate and visualize human cholestatic and non-cholestatic compounds in the rat. Although these three genes are well known to be regulated by the nuclear bile acid receptor (FXR), their different responses across the drugs tested (their non-redundant responses) suggest more complicated regulation in rats, consistent with integrated bile salt, cholesterol, sterols and other hepatic lipid metabolism pathways and the host of transcription factors involved (such as LXR, SREBP, PPAR, PXR, CAR and others, in addition to FXR and SHP; reviewed by Karagianni and Talianidis  . These three starting genes are recognized to be important in cholestasis, and were important in our previous in-house cholestasis signature based on multiple doses of proprietary compounds and histopathological evidence of cholestasis (Alex Nie, unpublished data). The separation by the ABC approach of paradigm human cholestasis compounds into cholestatic/non-cholestatic compounds in the rat was as good or better than observed with the proprietary Johnson & Johnson/Janssen compound derived signature. The ABC approach  was used to statistically weigh additional genes according to their similarity/dissimilarity to the three pivotal cholestasis genes and to derive predictions for potential cholestatic propensities of our paradigm compounds. The most robust cholestatic and non-cholestatic compounds in rat (all cholestatic in human) derived from this approach were used as a training set, along with a boosting procedure/elastic net approach  , to select a broad 100-gene signature set associated with rat-specific cholestasis.
The chief advantage of the ABC approach is that it allows tight clustering of similar toxicants of interest (in this case rat cholestatic compounds) and displays the compounds along a continuum. Thus the most active rat cholestatic compounds essentially behaved identically by gene expression: cholestatic model compounds ANIT, LPS, and ethinyl estradiol, many high dose NSAIDs which can cause gut bleeding and endotoxin entry (phenacetin, flurbiprofen, sulindac, flufenamic acid, nimesulide, indomethacin, diclofenac, and ibuprofen  and  , flutamide, disulfiram, methapyrilene, and the tool compounds piperonyl butoxide, bromobenzene, butylated hydroxytoluene, and methylene dianiline clustered together). Naltrexone, 2-acetamidoflourene, amsacrine, hydrazine and tannic acid were slightly outside of the tight cholestatic cluster, and dipyridamole, ketoconazole and cyproterone acetate were further removed from the cluster, showing different gene expression although quite distant from the non-cholestatic compounds.
Aniline was the only known rat cholestatic compound that fell in the “bridging” area between the cholestatic and non-cholestatic compounds. Carbamazepine, clozapine and dieldrin also fell within this bridging area, suggesting pronounced differences in gene expression but also shared features with cholestatic compounds. Erythromycin, glibenclamide, chlorpromazine, simvastatin, sulfamethoxazole, and tamoxifen (all known cholestatic drugs in human) were the closest of the non-cholestatic cluster to the cholestatic compounds, but clearly non-cholestatic in the rat, at least with acute dosing at high dose.
A number of pharmaceuticals that cause cholestasis in humans do not appear to induce cholestasis in rats: chlorpromazine and erythromycin-induced cholestasis appears to be human-specific  and  . Other human cholestatic compounds such as furosemide, rifampicin, niacin, captopril and dapsone showed no cholestatic gene expression response in the rat which may reflects species differences in cholestatic mechanisms such as binding and activation/inactivation of PXR, FXR, LXR and bile salt transporters  . Rifampin and glibenclamide reportedly block both rat BSEP and human BSEP transport in vitro and produce cholestasis in patients  and  but the rat appears to be much better at processing these compounds and many other human cholestasis-inducing drugs  . Species differences in drug effects on metabolic regulatory pathways, drug-induced changes in bile salt solubility, interference with clearance of other endogenous substrates (particularly steroid hormones and bilirubin), and precipitation of compound in the bile ducts, septicemia and localized inflammation, such as pancreatitis, and idiosyncratic/hypersensitivity and inflammation may also modulate cholestasis differently in man versus the rat  .
The regulation of the three cholestasis indicator genes Cyp7a1, Bsep and Mrp3 used as the starting point for the ABC approach, and the responses of these three genes across Janssen toxicogenomics database compounds is different enough that several components of the rat cholestatic response are captured in our analysis. The approach allows for improved characterization of cholestasis in the rat than would be obtained using Cyp7a1 and other redundant, identically regulated genes, such as Cyp8b1.
Importantly, the 100 genes comprising our derived rat cholestasis signature would not have been obvious from most biochemical pathway analyses. There were, however, several duplicate genes with different accession numbers that were selected by the algorithm (monocyte to macrophage differentiation associated, Mmd2, tRNA methyltransferase 10a, Trmt10a, serine dehydratase, Sds, glutathione transferases, Ran, high mobility group nucleosome-binding proteins, Hmgns, and Methylenetetrahydrofolate dehydrogenases, Mthfd), confirming the gene selections. Other transporters selected were for aromatic amino acids and nucleotide sugars, and another Cyp selected, Cyp2d4, is well known for 21-hydroxylation of steroid hormones rather than of bile salts or cholesterol. The robustness of the three starting genes and selection against strongly redundant responses means there are a limited number of tightly linked and/or well-known cholestatic signature genes.
In terms of in vivo study design, our experimental protocol was designed for tissue collection at an optimal time point for detection of robust gene expression changes. The non-proprietary paradigm compounds used in this work were generally administered as single dose, with animals necropsied 24 h later. Histopathology was not performed, as these compounds were typically well-characterized for their cholestatic potential in rat and human. Additionally, this acute treatment regimen was not expected to induce clinical manifestation of cholestasis after such a short dosing period. Exploration of dose-response and identification of additional cholestatic agents with multiple or chronic dosing regimens was not within the scope of this study: such investigations may yield additional insight and improved predictivity.
While other microarray studies of cholestasis exist for the rat, they have generally employed a single cholestatic agent. Such signatures are appropriate for the individual particular cholestatic model studied but may be more limited when applied to the variety of structural classes encountered in drug discovery and development. Comparison to ethinyl estradiol EE-treatment (24 h sampling time) showed minimal overlap with our >80 named genes (aflatoxin aldehyde reductase, FK506 binding protein, guanine nucleotide binding protein alpha inhibiting, high mobility group nucleosomal binding domain 1, NIMA-related kinases, nucleoporins, RAN, member RAS oncogene family, Serine protease inhibitor, Kazal type, and several tubulins)  . Comparison with a 15 named gene for cholestasis derived by the supervised learning approach SVM (support vector machine) showed only one gene overlap—protein kinase c-binding protein  . Comparison with a signature for hepatobiliary injury showed pronounced differences, with dysregulated genes reflecting immune activation and changes in integrins and extracellular matrix markers  , which were missing in our 24-h cholestasis gene signature.
The data mining approach used in the present study provides an innovative analysis of pharmaceuticals and model toxicants that induce cholestasis in the rat, and an improved, more predictive 100-gene signature beyond the original three-gene panel comprised of Cyp7a1, Mrp3 and Bsep. While applicability of this statistical approach remains to be demonstrated with other data sets, similar data mining approaches with future or existing data may yield insights into other toxicological properties, using comprehensive gene expression as surrogates for important biological responses.
The authors thank Komal Singh Sandu and co-workers at Strand Life Sciences, and Jennifer Fostel, Carolyn Favaro and Xiang Yao for enabling public access of this project data.