*Corresponding Author:
Daikai Lu
Department of Otolaryngology, Hwamei Hospital, University of Chinese Academy of Sciences, Ningbo, Zhejiang 315000, China
This article was originally published in a special issue, “Recent Developments in Biomedical Research and Pharmaceutical Sciences”
Indian J Pharm Sci 2022:84(4) Spl Issue “77-83”

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms


Objective in order to discover correlated gene modules and hub genes for recurrent laryngeal squamous cell carcinoma through weighted gene co-expression network analysis method. The microarray dataset of recurrent laryngeal cancer, namely GSE27020, were obtained from the gene expression Omnibus database. Weighted gene co-expression network analysis was introduced to establish a gene co-expression network, mining key clinical trait correlated hub genes. Gene ontology enrichment analyses were performed for the genes in modules related to recurrent laryngeal squamous cell carcinoma. Then, we build up a proteinprotein interaction network with the genes in interest gene module and identified hub genes through analyze such protein-protein interaction network. The hub genes were mined using cytohubba plus-in. Finally, we analyzed these hub genes overall survival using gene expression profiling interactive analysis database. Forty four gene co-expression gene modules were achieved via weighted gene co-expression network analysis analysis. We found that the orangered4 module was the most correlated module with recurrence in laryngeal squamous cell carcinoma patients. Genes in the orangered4 module were related to organelle organization, response to chemical, regulation of catalytic activity and regulation of cell differentiation. Two genes were discovered as hub genes which were related to poor prognosis, namely annexin A2 and S100 calcium binding protein A10. Here, we found several hub genes that played important roles in recurrent laryngeal squamous cell carcinoma, which may improve our understanding of the mechanisms underlying recurrence laryngeal squamous cell carcinoma.


Laryngeal squamous cell carcinoma, gene ontology, bioinformatics, laryngectomy, tumor

Laryngeal Squamous Cell Carcinoma (LSCC), nearly 20 % of all head and neck malignant tumors, is a common malignant tumor of upper respiratory tract[1]. The 5 y survival rate of LSCC (about 60 %) has changed little in the past decade[2]. Recurrence is the crucial factor of failure of anti-tumor therapy in LSCC patients[3]. A better understanding of the mechanism of recurrence of LSCC will help to inhibit tumor progression and improve the survival rate and effect of treatment. It is very important to identify hub biomarkers and discover potential mechanism for recurrence LSCC.

With the tremendous development of Ribonucleic Acid (RNA) microarray, the study of co-expression genes related to clinical trait has improved our understanding of the mechanism of recurrence LSCC[4]. Weighted Gene Co-Expression Network Analysis (WGCNA) is an influential tool which was used to analyze gene expression datasets and to discovery gene modules which are highly correlated to clinical trait. Here, we constructed a WGCNA network and identified hub genes associated with recurrence LSCC.

In this study, the modules which were most related to the progression of clinical staging and recurrence was obtained. The Gene Ontology (GO) analysis and functional annotation showed that genes in orangered4 module which were mostly correlated to recurrence LSCC were enriched in organelle organization, response to chemical, regulation of catalytic activity and regulation of cell differentiation in LSCC patients. Finally, we discovered two hub genes (Annexin A2 (ANXA2) and S100 Calcium Binding Protein A10 (S100A10)) that could surely predict recurrence of LSCC. Also, we combined Gene Expression Profiling Interactive Analysis (GEPIA) databases to verify if these two hub genes be able to predict the progression and prognosis of LSCC.

Materials and Methods

Data processing:

GSE27020, obtained from Gene Expression Omnibus (GEO) database[5] was introduced to build up coexpression networks and mine hub genes which were correlated to recurrence LSCC. The dataset GSE27020 provided gene expression profile from 34 recurrence LSCC patients and 75 non-recurrence LSCC patients. After log2 conversion and quantile normalization, the GSE27020 data set is normalized using Robust Multiarray Average (RMA)[6].

Construction of WGCNA co-expression network:

The co-expression network[7] was constructed with GSE27020 dataset using the WGCNA package in R-project. The soft-thresholding power of the coexpression network we set was 6 and 0.9 was set as the correlation coefficient threshold. The minimum number of genes in the module was set to 30. In order to merge possible similar modules, we define 0.2 as the threshold of cutting height.

Identification of significant modules correlated to radioresistence:

Eigengene and gene significance[8] were introduced to discover gene modules correlated to recurrence of LSCC. The association between module eigengenes and clinical trait was used to identify the significant clinical module. The gene significance was a mediated p-value of each gene in the linear regression between expression and clinical traits. And the module significance was the average the gene significance of all genes associated with the module. The average absolute gene significance was defined as module significance.

Functional enrichment analysis:

In order to achieve deeper understanding of the biological function of the genes in the interested module related to recurrence LSCC, we introduced the GO analysis[9] using the powerful online bioinformatics tool Database for Annotation, Visualization, and Integrated Discovery (DAVID) database[10] (https://david.ncifcrf. gov/home.jsp/) and p<0.05 was set as the cut-off.

Hub genes identification:

Gene module which had the highest connectivity to recurrence LSCC was identified by the WGCNA algorithm. The Protein-Protein Interaction (PPI) network[11] was established with the genes in such gene module using Cytoscape v3.7.0[12]. The network is analyzed by Molecular Complex Detection (MCODE), and the hub genes were discovered using the cytoHubba in Cytoscape with Matthews Correlation Coefficient (MCC) algorithms[13].

Overall survival of these hub genes:

GEPIA is a web-based powerful tool for cancer bioinformatics study which was based on The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data[14]. GEPIA offers several key functions including patient survival analysis. Since there is no record of LSCC in TCGA database, we analyzed the Head and Neck Squamous Cell Carcinoma (HNSC) data set in TCGA database as a substitute. In this study, we performed survival analysis with GEPIA to investigate the relationship between hub genes expression level and HNSC patient’s prognosis.

Results and Discussion

We downloaded the dataset GSE27020 of LSCC patients together and its clinical trait data from the GEO database. There were 34 recurrence LSCC patients and 75 non-recurrence patients in GSE27020. The raw microarray data of GSE27020 were normalized using the limma package in R. When the soft thresholding power beta (β) was set at 6, the scale independence reached 0.90 (fig. 1). We used the one-step network to construct the functional to identification module using WGCNA in R and 45 gene co-expression modules were finally obtained as shown in fig. 2.


Fig. 1: The network topology and different soft threshold powers


Fig. 2: Clustering dendrogram of genes with difference based on topological overlay

To obtain the information about the relationship between the gene co-expression modules, we analyzed the correlation of eigengenes. We found that the eigengenes were clustered into several modules. As the results, 45 modules could be divided into two clusters (fig. 3) and a few modules had a high degree of interaction connectivity.


Fig. 3: Eigengene dendrogram and eigengene adjacency plot

We associate the gene module with the recurrence and identify the most significant correlation. As the result, module orangered4 was mainly related to recurrence LSCC as shown in fig. 4.


Fig. 4: Module-trait associations. Each row corresponds to a module and each column corresponds to a trait and each cell contains the corresponding correlation and p value

In this study, we introduced GO analysis for the genes in the module orangered4 to identify potential molecular mechanism. The results showed that genes in orangered4 module were primarily enriched in organelle organization, response to chemical, regulation of catalytic activity and regulation of cell differentiation in biological process. As for the cellular component, these genes were mainly enriched in vesicle, endoplasmic reticulum, cytoskeleton, organelle membrane and mitochondrion. Regarding molecular function, these genes were enriched in cytoskeletal protein binding, metal ion binding, enzyme regulator activity and phosphatase activity. As for the Kyoto Encyclopedia of Genes and Genomes (KEGG) signal pathway analysis, these genes were enriched in Relaxin signaling pathway, Chemokine signaling pathway, Ras signaling pathway and Mitogen Activated Protein Kinase (MAPK) signaling pathway (fig. 5).


Fig. 5: GO and KEGG analysis of the genes in orangered4 module, (A): Biological process; (B): Cellular component; (C): Molecular function and (D): KEGG analysis

To achieve a deeper understanding of the associations between genes in the orangered4 gene modules, a PPI network was constructed with these genes. As the results shown, top 5 hub genes were discovered in the orangered4 module, including Leucine-Rich Repeat Binding FLII Interacting Protein 1 (LRRFIP1), ArfGAP with FG Repeats 1 (AGFG1), Myoferlin (MYOF), ANXA2 and S100A10 as shown in fig. 6. We introduced an overall survival analysis for these hub genes in HNSCA using GEPIA. The results showed that ANXA2 and S100A10 could cause poor prognosis in HNSC patients as shown in fig. 7.


Fig. 6: Top five hub genes were identified from the genes in orangered4 modules by STRING and MCODE. The hub genes were identified with a degree cut-off=2, haircut on, node score cut-off=0.2, k-core=2 and max depth=100


Fig. 7: Overall survival analysis of these 5 hub genes by GEPIA survival plotter
Note: p<0.05 was considered statistically significant. ANXA2 and S100A10 were associated with the prognosis of head and neck tumors, : Low ANAX2 TPM and : High ANAX2 TPM

In 2016, more than 13 000 new cases of LSCC were diagnosed, and nearly 3600 patients will die from LSCC[15]. About 60 % of patients were advanced LSCC at initial diagnosis[16-18]. LSCC is one of the oncologic diseases with low 5 y survival rate[15]. The pathogenesis of LSCC is related to many risk factors. The most important ones are tobacco and alcohol consumption[19-22]. Besides, exposure to other environmental factors, including asbestos, polycyclic aromatic hydrocarbons and textile dust, is firmly believed to increase the risk of LSCC[23,24].

Before the early 1990s, total laryngectomy was the standard treatment for advanced LSCC[25]; however, due to surgery related complications and the existence of anastomoses, the treatment has been changed to chemotherapy combined with radiotherapy[26]. Although the treatment of LSCC had improved, the overall 5 y survival rate has not increased[27], so it is necessary to find new and improved diagnosis, prognosis evaluation and treatment methods. Disclosuring the mechanism of recurrence in LSCC will help to inhibit tumor progression and improve quality of life of LSCC patients. Therefore, exploring susceptibility modules and genes for recurrence LSCC patients is important.

Here, we established the co-expression network by WGCNA using the published data of recurrence LSCC patients. All the genes in this dataset were included in the network. Genes with similar expression patterns were clustered into 45 modules. We identified the mostly correlated modules with recurrence LSCC patients. The genes in this key module were mainly enriched in organelle organization, response to chemical, regulation of catalytic activity and regulation of cell differentiation. As for the KEGG signal pathway analysis[28], the genes in key module were mainly enriched in relaxin signaling pathway, chemokine signaling pathway, Ras signaling pathway and MAPK signaling pathway. We identified two huh genes which were related to recurrence LSCC and prognosis, namely ANXA2 and S100A10.

WGCNA is widely used to analyze large-scale gene expression data sets and find gene modules highly related to clinical characteristics[29]. Through in-depth analysis of the GSE27020 data set, we determined that the orangered4 module was significantly associated with the recurrence of LSCC patients. GO analyses shows that organelle organization, response to chemical, regulation of catalytic activity and regulation of cell differentiation was activated during recurrence in LSCC patients. Moreover, we identified two hub genes which were highest correlated to recurrence LSCC and prognosis, including ANXA2 and S100A10. Studies have confirmed that high expression of ANXA2 promotes tumor progression by promoting the migration, invasion and metastasis in several types of tumors, including breast cancer[30], esophageal squamous cell carcinoma[31], glioblastoma[32] and human cervical cancer[33]. Besides, S100A10, mainly binding to annexin A2, mediates the conversion of plasminogen to plasmin[34]. Studies had shown that higher S100A10 expression linked to worse outcome and chemo resistance in a number of cancer types in lung, breast, ovary, pancreas, gall bladder and colorectal and leukemia[35]. And it plays a key role in cancer progression, prognostic and was a potential cancer therapy target. These researches are consistent with the results of this study. Totally, we discovered a gene module and two hub genes that acted as essential roles in recurrence LSCC, which may be novel therapeutic targets of LSCC.

Conflict of interests:

The authors declared no conflict of interest.