Supplementary Material - Computers in Biology and Medicine

This Website contains supplementary material from a paper submitted to Computers in Biology and Medicine.

Summary:

  1. Descriptive statistics performed by tissue
  2. Microarray analyses
  3. Steps followed in the implementation of WGCNA
  4. Correlation results summary
  5. Results of gene ontology analysis performed in co-expression clusters fulfilling eligibility criteria
  6. References

Figure index:

  • Fig. 1. PCA by tissue color coded by sex.
  • Fig. 2. Summary of WGCNA results by approach.
  • Fig. 3. Participants dendrogram and trait heatmap.
  • Fig. 4. Probes clustering in approach I from VAT.
  • Fig. 5. Probes clustering in approach I from SMT.
  • Fig. 6. Probes clustering in approach II from VAT.
  • Fig. 7. Probes clustering in approach II from SMT.
  • Fig. 8. Probes clustering in approach III inter-tissue.
  • Fig. 9. Analysis of genetic network topology for several soft-thresholding powers (β) among all approaches.
  • Fig. 10. Co-expression cluster composition comparison between VAT-approach I and SMT-approach I.
  • Fig. 11. Co-expression cluster composition comparison between VAT-approach I and inter-tissue approach III.
  • Fig. 12. Co-expression cluster composition comparison between SMT-approach I and inter-tissue approach III.
  • Fig. 13. Co-expression cluster composition comparison between VAT-approach II and SMT-approach II.
  • Fig. 14. Co-expression cluster composition comparison between VAT-approach II and inter-tissue approach III.
  • Fig. 15. Co-expression cluster composition comparison between SMT-approach II and inter-tissue approach III.
  • Fig 16. Cluster-trait relationships in approach I from VAT.
  • Fig 17. Cluster-trait relationships in approach I from SMT.
  • Fig 18. Cluster-trait relationships in approach II from VAT.
  • Fig 19. Cluster-trait relationships in approach II from SMT.
  • Fig 20. Cluster-trait relationships in approach III from inter-tissue.

Table index:

  • Table 1. Descriptive characteristics of the study participants per tissue subgroups.
  • Table 2. PC correlation of each tissue with Age.
  • Table 3. I VAT Hubs list.
  • Table 4. I SMT Hubs list.
  • Table 5. II VAT Hubs list.
  • Table 6. II SMT Hubs list.
  • Table 7. III Inter Hubs list.
  • Table 8. Intersected Hubs only from Approach III Inter-tissue.
  • Table 9. Common Intersected Hubs between Approaches II and III.
  • Table 10. Correlation results summary in approach I from VAT.
  • Table 11. Correlation results summary in approach I from SMT.
  • Table 12. Correlation results summary in approach II from VAT.
  • Table 13. Correlation results summary in approach II from SMT.
  • Table 14. Correlation results summary in approach III from inter-tissue.
  • Table 15. Functional enrichment analysis of selected clusters by approach.

1. Descriptive statistics performed by tissue

Description of the calculations performed and information contained in the descriptive table can be found in the paper associated with this web site. The excel file can be downloaded by clicking here

  VAT SMT BOTH TISSUES  
  Normal-weight children Normal-weight children Children with obesity P value
Variables 5 boys/1 girl 4 boys/1 girl 5 boys VAT SMT
Table 1. Descriptive characteristics of the study participants per tissue subgroups.
Age (years) 8.17 (1.47) 8.60 (1.82) 10.20 (1.30) 0.040 0.148
Weight (kg) 30.00 [25.00-31.60] 36.52 (18.94) 57.00 [48.00-62.00] 0.006 0.064
Height (cm) 131.98 (8.28) 137.46 (17.56) 141.00 (8.94) 0.097 0.698
BMI (kg/m2) 16.60 [14.73-18.7] 16.89 [14.73-25.74] 28.30 [26.08-29.50] 0.006 0.009
BMI Z-score -0.61 (0.89) -0.16 (1.35) 3.12 (1.07) <0.001 0.003
DBP (mm Hg) 61.33 (7.42) 58.00 (6.68) 71.20 (12.56) 0.139 0.102
SBP (mm Hg) 111.33 (7.84) 108.50 (8.23) 123.00 (15.80) 0.144 0.143
Glucose (mg/mL) 85.17 (13.53) 79.40 (7.02) 89.20 (7.98) 0.573 0.073
Insulin (mU/L) 1.42 [0.62-2.06] 1.42 [0.62-1.61] 2.50 [0.84-5.30] 0.286 0.386
HOMA index 0.29 [0.11-0.44] 0.29 [0.11-0.33] 0.53 [0.17-1.28] 0.522 0.564
QUICKI index 0.51 (0.06) 0.51 (0.07) 0.45 (0.08) 0.250 0.276
Apo-A (mg/dL) 122.67 (21.52) 105.40 (28.60) 144.50 (10.47) 0.100 0.037
Apo-B (mg/dL) 74.50 (22.62) 60.40 (14.81) 61.00 (21.32) 0.373 0.962
HDL-c (mg/dL) 59.67 (8.50) 54.40 (16.77) 69.00 (18.46) 0.304 0.254
LDL-c (mg/dL) 95.00 [43.60-112.00] 16.04 (21.35) 84.10 [48.40-111.00] 0.720 0.758
Total Cholesterol (mg/dL) 162.17 (32.67) 141.20 (31.65) 162.75 (44.55) 0.981 0.423
Triglycerides (mg/dL) 52.50 [38.00-150.00] 53.80 (18.63) 62.50 [35.00-77.00] 0.831 0.668
AST (U/L) 20.50 (3.39) 18.60 (2.70) 20.25 (3.86) 0.916 0.474
ALT (U/L) 14.67 (4.23) 13.80 (4.44) 22.75 (5.12) 0.026 0.026
PCR (mg/L) 29.92 (22.65) 213.80 [19.30-380.4] 15.20 [2.40-72.00] 0.860 0.347
Urea (mg/dL) 25.33 (3.27) 22.40 (2.97) 23.80 (7.26) 0.679 0.705
Proteins (g/dL) 7.22 (0.31) 7.06 (0.62) 7.45 (0.58) 0.425 0.369
Calcium (mg/dL) 10.66 (0.44) 9.85 [9.40-10.60] 10.50 [10.30-10.60] 0.505 0.714
Sodium (mmol/L) 136.50 (2.74) 135.00 (2.24) 135.20 (3.83) 0.528 0.922
Potassium (mmol/L) 4.25 [4.25-5.30] 4.24 (0.48) 4.30 [4.00-4.50] 0.577 1.000
Chlorine (mmol/L) 101.00 (3.16) 99.00 [97.00-105.00] 99.00 [96.00-105.00] 0.495 0.914
TSH (mU/L) 1.74 [0.32-8.50] 1.04 (0.90) 1.01 [0.62-3.41] 0.831 0.574
T4 (ng/dL) 1.40 (0.37) 1.35 (0.40) 1.28 (0.10) 0.534 0.730
FSH (mU/L) 0.30 [0.30-1.70] 0.25 [0.20-1.70] 0.60 [0.20-1.00] 0.904 0.797
Testosterone (pg/mL) 0.45 (0.17) 0.44 (0.26) 0.60 (0.27) 0.249 0.237
Cortisol (nmol/L) 206.08 (15.44) 29.77 (1 .71) 18.98 (16.99) 0.512 0.413
Leukocytes (cell n°) 13853.33 (6628.75) 161120.00 (5487.90) 13200.00 (4988.99) 0.860 0.404
Erythrocytes (cell n°) 4.65 (0.48) 4.46 (0.31) 4.68 (0.42) 0.931 0.417
Hemoglobin (g/dL) 12.75 (0.94) 12.56 (0.53) 13.16 (0.86) 0.475 0.222
Hematocrit (%) 37.67 (3.70) 36.78 (2.06) 39.04 (3.00) 0.522 0.202

We did also evaluated the effect of confounding variable age in the expression of each tissue performing a principal component analysis and correlating the first two components with age.

Fig. 1.
Fig. 1. PCA by tissue color coded by sex. Principal Component Analysis (PCA) of visceral adipose tissue and skeletal muscle tissue color-coded by sex. It should be noted that in each tissue there is only a normal-weight girl, so our results are predominantly referred to boys. For more quality see the pdf here .

Table 2. Results of the correlation of confounding variable age with the first two principal components (PC) of visceral adipose tissue (VAT) and skeletal muscle tissue expression data. The excel file of Table 2 can be downloaded by clicking here

  Cor. statistic Cor. estimate P-value Conf. interval
Table 2. PC correlation of each tissue with Age.
VAT PC1-AGE -1,57 -0,46 0,15 [-0.83, 0.19]
VAT PC2-AGE -1,68 -0,49 0,127 [-0.84, 0.16]
SMT PC1-AGE -1,35 -0,43 0,214 [-0.83, 0.27]
SMT PC2-AGE -1,21 -0,39 0,26 [-0.82, 0.31]

 

 

2. Microarray analyses

Genome-wide mRNA expression analysis from VAT and SMT was performed using the HG-U133 Plus2.0 array technology (Affymetrix, Santa Clara, CA, USA)[1]. A total of 11 array chips for VAT and 10 array chips for SMT were used. Analyses were conducted according to the protocol available in the GeneChip Expression Analysis Technical Manual (Affymetrix, Santa Clara, CA, USA) [2]. Adipose and muscle tissue samples were extracted during abdominal surgery. Approximately 400 mg of intra-abdominal VAT was collected and excised into pieces of approximately 150 mg. SMT samples were collected as 300 mg of rectus femoris. Subsequently, VAT and SMT samples were immersed into tubes containing 1.5 mL RNA-later solution (Qiagen, Izasa, Spain) and stored at −80 °C for further transcriptomic analysis. Total RNA from VAT biopsies was isolated using the RNeasy Lipid Tissue Mini Kit (Qiagen, Izasa, Spain) and concentrated in 15 μl. RNA extraction of SMT biopsies was performed using RNeasy Mini kit and Qiacube system according to the manufacturer's instructions. The integrity of the total RNA from each tissue was checked by visual inspection of the 28S and 18S RNA on agarose gel electrophoresis.

Briefly, 5 μg of total RNA from VAT and 1.3 μg of total RNA from SMT were used as the starting material for the Affymetrix sample preparation. The cDNA was synthesized with the SuperScript Choice System Kit (Life Technologies Corporation, Izasa, Spain) in VAT, and with the One Cycle cDNA Synthesis kit (Affymetrix) in SMT. The cDNA was used for the cRNA synthesis using an IVT Labeling kit (Affymetrix, Santa Clara, CA, USA) and then purified using a GeneChip Sample Cleanup Cluster Kit (Affymetrix). 15 μg of biotin-labelled cRNA from each tissue was fragmented and hybridized to the HG-U133 Plus2.0 array (Affymetrix), which offers a comprehensive analysis of the genome-wide expression of over 47200 VAT transcripts corresponding to 38500 well-characterized human genes; and over 54600 SMT transcripts corresponding to 35900 well-characterized human genes.

 

3. Steps followed in the implementation of WGCNA

Details and performance of the WGCNA procedure is shown in Fig. 2

Fig. 2. Summary of WGCNA results by approachFig. 2. Summary of WGCNA results by approach. The number of participants analyzed in each approach is presented as “N”. Details for the WGCNA procedure are also presented. Resulting total number of clusters, number of clusters passing our defined quality criteria and most correlated ones according to each approach aim are also included. Abbreviations: ß, soft-thresholding power; r, scale-free topology model fit index. For more quality see the pdf here .

Our study design is based on three main analytic approaches, whose general aim is to disentangle the gene co-expression networks associated with obesity and metabolic dysfunction in children in the key metabolic tissues: visceral adipose tissue (VAT) and skeletal muscle tissue (SMT). Each approach involved different subjects or statistical methodologies given their specific objectives:

Approach 1 pursues identifying the gene co-expression clusters associated with obesity (mainly adiposity traits like body mass index or obesity status) in each evaluated tissue individually (VAT and SMT). This suggests that the signatures that have been identified may not necessarily be identical, and as a result, may uncover distinct biological pathways linked to excessive fat accumulation in different tissue matrices. For this purpose, both children with normal-weight and children with obesity were selected and the classical weighted gene co-expression network analysis (WGCNA) approach was implemented for the generation of clusters on their transcriptomic data, individually in VAT and in SMT. At a final step, generated clusters of each tissue were filtered out based on their association with obesity.

Approach 2 and 3, on the other hand, aim to identify the gene co-expression clusters associated with metabolic dysfunction in children with obesity in the evaluated tissues (i.e., referring metabolic dysfunction to any cardiometabolic alteration related to the concept of metabolic syndrome, including disorders in blood pressure, insulin-resistance and lipid alterations). To achieve this objective, solely children exhibiting obesity were chosen, wherein varying degrees of metabolic pathology can be discerned. Approach 2 and 3 differ in their methodological approach, where the former employs the classical weighted gene co-expression network analysis (WGCNA) to identify molecular signatures specific to each tissue (i.e. VAT and SMT), while the latter utilizes a WGCNA variation known as the "consensus approach". The consensus approach involves constraining the learning process to detect gene expression signatures that are shared between both tissues, leading to the identification of clusters that are shared between VAT and SMT.

The specific methodological steps followed in the implementation of WGCNA were as follows:

First, to identify outlier samples, we employed a hierarchical clustering approach with dendrogram visualization. This hierarchical clustering approach was based on clinical traits data similarity measures and is presented in Fig. 3.

Fig. 3. Participants dendrogram and trait heatmap      Fig. 3. Participants dendrogram and trait heatmap. (A) VAT and (B) SMT study population. Children with obesity are individuals 1-5; and normal-weight children are individuals 6-11. . For more quality see the pdf here .

Co-expression clusters are defined as undirected, weighted gene networks where nodes represent genes, and edges between nodes are calculated by the pairwise Pearson correlation [3] (Fig. 4-8). Fig. 4-8 probes clustering in each approach. Gene expression clustering dendrogram computed using dissimilarities based on topological overlap from each approach in children with obesity and normal-weight. The nodes of the cladogram represent each DNA probe analysed. Assignment of original colour clusters (Dynamic Tree Cut) and merged clusters (Merged dynamic) can be observed.

Figure 4: I VAT Probes clusteringFig. 4. Probes clustering in approach I from VAT. For more quality see the pdf here .

Figure 5: I SMT Probes clusteringFig. 5. Probes clustering in approach I from SMT. For more quality see the pdf here .

Figure 6: II VAT Probes clusteringFig. 6. Probes clustering in approach II from VAT. For more quality see the pdf here .

Figure 7: II SMT Probes clusteringFig. 7. Probes clustering in approach II from SMT. For more quality see the pdf here .

Figure 8: III Intertissue Probes clusteringFig. 8. Probes clustering in approach III inter-tissue. For more quality see the pdf here .

Information concerning the network connection strength between all pairwise Pearson correlations of microarray probes was contained in the adjacency matrix (a symmetric N × N matrix with entries in {0,1} depending on the connection strength between gene pairs) [4]. To calculate the adjacency matrix, we needed to choose an optimal absolute threshold value of β (soft thresholding power) that fulfil the scale-free topology criterion [5]. The β selection was thus based on the visualization of a pre-set group of candidates β values with its corresponding output scale-free topology model fit indexes (r). In each analysis, the graphs showing the selection of β whose r was 0.9 (for each approach) can be observed in Fig. 9.

Fig. 9. Analysis of genetic network topology for several soft-thresholding powers (β) among all approachesFig. 9. Analysis of genetic network topology for several soft-thresholding powers (β) among all approaches. The scale-free topology model fit index (r) (y axis) is plotted as a function of β (x-axis). β was selected when r was 0.9 in each case, which can be observed in (A) VAT and (B) SMT from approach I. The selection of β in (C) VAT and (D) SMT from approach II; as well as in (E) both tissues from approach III is also shown.. For more quality see the pdf here .

Once the β value was selected, co-expression clusters were defined as sets of tightly co-regulated probes and calculated using the Topological Overlap Measure (TOM) as a robust measure of network interconnectedness [6]. Once all clusters were identified and the membership of all probe/genes pairs was calculated, those with a very similar profile (TOM > 0.8) were then merged into one single cluster [7]. Each co-expression cluster can be summarized by its eigengene cluster vector, which can be viewed as a summary of the gene expression profile in that cluster and a reflection of cluster behaviour. The membership degree of each gene to its corresponding cluster is measured with Membership Cluster (MM), which provides a value in the interval {0-1} (where higher values indicate higher MM) by correlating the particular gene expression profile with the cluster eigengene vector of a given co-expression cluster [8].

Once clusters were identified for each approach, the resulting clusters between each pair of approaches were compared (in terms of gene composition), so we can conclude how similar are the clusters identified by the method in each approach and tissue group. Common clusters between approaches were identified according to the number of shared genes annotated (please, see Fig. 10-15). Importantly, although some co-expression clusters may have the same colour name in different approaches, this does not necessarily imply that they contain the same genes. Briefly, the co-expression clusters identified in the VAT approach I could be considered to be independent of those identified in the SMT approach I (Fig. 10). Indeed, it may be observed that the few relevant matches belonged to the clusters with the highest number of annotated genes, something to be expected due to the increased probability of matches between large and small clusters. Approach I has also been compared with approach III to illustrate the differences between the clusters identified in normal-weight and children with obesity compared to those identified in only children with obesity. The level of homology between approach III and the VAT approach I was very low (Fig. 11), whereas a bit higher with the SMT of Approach I (Fig. 12). The only notable cluster of both approaches with a relevant number of matching genes was the dark-grey cluster of the VAT of approach I that matched with the blue cluster of approach III. The differences between VAT and SMT of approach II were also assessed to reveal the presence of co-expressed genes in children with obesity, highlighting the match between purple of SMT and blue of VAT. The similarities between approach I and III, as well as between tissues in approach II, maybe due to the homologous genes having a high correlation with BMI Z-score in both groups of children. Fig. 10-15: Co-expression cluster composition comparison between approaches. Numbers in the table indicate gene counts in the intersection of the corresponding clusters. Colouring of the table encodes -log(p), with p being the Fisher’s exact test p-value for the two clusters overlapping. The stronger red colour indicates more significant overlap.

Figure 10: I VAT vs I SMT Cluster comparisonFig. 10. Co-expression cluster composition comparison between VAT-approach I and SMT-approach I. For more quality see the pdf here .

Figure 11: I VAT vs III Inter Cluser comparisonFig. 11. Co-expression cluster composition comparison between VAT-approach I and inter-tissue approach III. For more quality see the pdf here .

Figure 12: I SMT vs III Inter Cluster comparisonFig. 12. Co-expression cluster composition comparison between SMT-approach I and inter-tissue approach III. For more quality see the pdf here .

Figure 13: II VAT vs II SMT Cluster comparisonFig. 13. Co-expression cluster composition comparison between VAT-approach II and SMT-approach II. For more quality see the pdf here .

Figure 14: II VAT vs III Inter Cluster comparisonFig. 14. Co-expression cluster composition comparison between VAT-approach II and inter-tissue approach III. For more quality see the pdf here .

Figure 15: II SMT vs III Inter Cluster comparisonFig. 15. Co-expression cluster composition comparison between SMT-approach II and inter-tissue approach III. For more quality see the pdf here .

We were especially interested in identifying those co-expression clusters that were significantly associated with clinical traits. Thus, the association between each co-expression cluster (eigengene vector) and all clinical traits was quantified using Pearson correlations, and their corresponding p-value. The resulting correlation matrices can be observed in the form of heatmaps, among which we highlight the inter-tissue approach III.

Finally, the intra-network connectivity is a measure of MM that allows identifying hub genes (i.e., highly connected genes that accurately represent the cluster behaviour) in each resulting cluster. Those genes with absolute MM values > 0.8 were selected as hubs for each cluster (Tables 3-7). Among all the hubs of each cluster, we further highlighted those mapped in the functional enrichment

Tables 3, 4, 5, 6 and 7 containing hub genes of each approach. The co-expression clusters selected to extract the hubs were only those that met the quality selection criteria.


It is crucial to highlight that approaches 2 and 3 are expected to yield distinctive signatures and complementary outcomes, which would facilitate the identification of tissue-specific molecular imprints linked to metabolic aberrations in children with obesity, while also underscoring the shared altered pathways, and thereby the most important from the clinical point of view, between them.

In order to understand the similarities and differences between these approaches, we explored the intersection between both. Importantly, the mathematical parameters used to obtain the co-expression clusters were not identical. Indeed, approach 2 implemented the classical (individual) WGCNA pipeline, while consensus WGCNA involve the following different steps: a) scaling the topological overlap matrices of each tissue to be comparable across sets;, and b) computation of consensus topological overlap incorporated a parameter that combined both matrices from each tissue, denominated component-wise minimum.

Therefore, it is crucial to emphasize the importance of analyzing each scenario, particularly the observation of hub-genes that were exclusively identified in consensus approach but not in approach 2, emphasizing the need for a joint analysis of both tissues. The exclusive hub genes can be consulted in Table 8. We did also include Table 9 with the list of common hub genes between approaches II and III to provide additional information concerning the interaction.

 

4. Correlation results summary

In this section we show the main results of correlation analysis between co-expression clusters and clinical traits of approaches I, II and III in each tissue. The results obtained directly from the application of the WGCNA can be seen in Fig. 16-20, as heatmaps showing the correlation between co-expression clusters and clinical traits. More information concerning creation and description of the figures can be observed in the paper associated with this web site.

Fig 16. Cluster-trait relationships in approach I from VATFig 16. Cluster-trait relationships in approach I from VAT. For more quality see the pdf here .

Fig 17. Cluster-trait relationships in approach I from SMTFig 17. Cluster-trait relationships in approach I from SMT. For more quality see the pdf here .

Fig 18. Cluster-trait relationships in approach II from VATFig 18. Cluster-trait relationships in approach II from VAT. For more quality see the pdf here .

Fig 19. Cluster-trait relationships in approach II from SMTFig 19. Cluster-trait relationships in approach II from SMT. For more quality see the pdf here .

Fig 20. Cluster-trait relationships in approach III from inter-tissue.Fig 20. Cluster-trait relationships in approach III from inter-tissue.. For more quality see the pdf here .

The interpretation of correlation results by applying quality criteria to select the clusters of greatest relevance to the pathology in each approach can be seen in Tables 10-14. Quality selection criteria of co-expression cluster, as well as the meaning of the heatmaps are described in the paper associated with this website (paper's Section 2, subsection F).

 

5. Results of gene ontology analysis performed in co-expression clusters fulfilling eligibility criteria

Here we present results of gene ontology analysis performed in co-expression clusters fulfilling eligibility criteria and including the significantly enriched biological pathways (nominal p-value < 0.05) and hub-genes of them from each approach. Table 15. Functional enrichment analysis of selected clusters by approach. The excel file of Table 15 can be downloaded by clicking here

 

6. References

1. C. M. Aguilera, C. Gomez-Llorente, I. Tofe, M. Gil-Campos, R. Cañete, Á. Gil. Genome-wide expression in visceral adipose tissue from obese prepubertal children. Int. J. Mol. Sci. 16:4 (2015) 7723–7737. doi: 10.3390/ijms16047723

2. R. A. Irizarry, B. M. Bolstad, F. Collin, L. M. Cope, B. Hobbs, T.P. Speed. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research 31:4 (2003) e15-e15. doi: 10.1093/nar/gng015.

3. G. Pei, L. Chen, W. Zhang. WGCNA Application to Proteomic and Metabolomic Data Analysis. Methods in Enzymology 585 (2017) 135–58. doi: 10.1016/bs.mie.2016.09.016.

4. A.M. Yip, S. Horvath. Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 8:1 (2007) 22. doi: 10.1186/1471-2105-8-22.

5. A. Rzhetsky, S. M. Gomez. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17-10 (2001) 988-996. doi: 10.1093/bioinformatics/17.10.988.

6. B. Zhang, S. Horvath. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology 4:1 (2005). doi: 10.2202/1544-6115.1128.

7. P. Langfelder, S. Horvath. Fast R Functions for Robust Correlations and Hierarchical Clustering. Journal of Statistical Software 46:11 (2012) 1–17. doi: 10.18637/jss.v046.i11.

8. P. Langfelder, R. Luo, M. C. Oldham, S. Horvath. Is My Network Module Preserved and Reproducible?. PLOS Computational Biology 7:1 (2011) 1-29. doi: 10.1371/journal.pcbi.1001057