ABSTRACT
Objective
Cancer is a disease characterized by an unregulated division of abnormal cells in the body. The discovery of oncogenes and tumor suppressor genes has paved the way for the targeted use of individual biomarkers and proteins in cancer therapy. The signaling pathways in cells are closely linked, and research into these connections would lead to more precise personalized treatments for cancer. An imbalance in the complement system is associated with the development and progression of cancer. Comparable variations in gene expression and common complement biomarkers in different cancer types are poorly understood. This study aims to gain insights into biomarkers linking the complement system to carcinogenesis.
Methods
Clinical and transcriptome data from the cancer genome atlas were used to analyze differentially expressed genes involved in the complement system in different cancer types. Various bioinformatics and machine learning techniques were used to suggest complement pathway-related carcinogenesis biomarkers.
Results
This study provides a comprehensive elucidation of component 7 (C7), complement factor-D (CFD), interleukin-11 (IL11), apolipoprotein C1 (APOC1), and integrin binding sialic acid protein (IBSP) proteins as common biomarkers associated with the complement system in cancer and highlights the diagnostic and prognostic potential of these biomarkers.
Conclusions
These biomarkers would pave the way for targeted cancer treatments in the context of precision medicine.
INTRODUCTION
Cancer is a disease characterized by an unregulated division of abnormal cells in the body. While chemotherapy and surgery were initially the only options for the treatment of tumors, the identification of tumor suppressor genes and oncogenes, has contributed to the notion that individual biomarkers can be targeted for cancer treatment. Current developments in multi-omics analysis and next-generation sequencing have shown that signaling pathways in cells are tightly linked and create intricate connections.
The integrity of the immune system is crucial for the detection and elimination of cancer cells, through a dynamic mechanism that balances immune evasion and protection1. The complement system is a crucial aspect of both adaptive and innate immunity and consists of membrane-bound, soluble, and intracellular proteins2. Despite some studies in the literature (reviewed in3), not much is known about comparable changes in gene expression and biomarkers of the complement system in different types of cancer.
The use of biomarkers to individualize medical treatments is an instrument of precision medicine4. To this end, clinical and transcriptome data from nine distinct cancer types were utilized to investigate differentially expressed genes associated with the complement system, aiming to gain insights into biomarkers linking the complement system to carcinogenesis in this study. The study design is illustrated in Figure 1. This study also provides a comprehensive elucidation of the common biomarkers associated with the complement system in these cancers and highlights the potential of these biomarkers. The common biomarkers associated with complement signaling would pave the way for targeted, patient-tailored treatments in the context of precision medicine.
MATERIALS and METHODS
As mentioned in the “Data and Code Availability Statement” section, this study is a bioinformatics study in which publicly accessible data are drawn from the TCGA database. There is no need for an ethics committee, ethics and patient consent document.
Data Selection and Differential Gene Expression Analysis
The cancer genome atlas (TCGA) was used for gene expression profiling data based on RNA-seq that included more than 500 tumor and normal cases, as 500 tumor and normal cases is the smallest recommended population size for logistic regression analyses5. Nine different types of cancer were linked to these datasets: uterine corpus endometrial carcinoma (UCEC), thyroid carcinoma (THCA), prostate adenocarcinoma (PRAD), squamous cell carcinoma of the lung (LUSC), lung adenocarcinoma (LUAD), clear renal cell carcinoma (KIRC), squamous cell carcinoma of the head and neck, adenocarcinoma of the colon (COAD), and invasive breast carcinoma (BRCA).
The R packages “TCGAbiolinks” (v.2.32.0)6 and “DESeq2” (v.1.44.0)7 were utilized for dataset acquisition and pre-analysis as well as differential gene expression (DEG) analysis. Logarithmic fold change (logFC) values and Benjamini-Hochberg adjusted p-values for each gene were derived from DESeq2 results. Genes that met the thresholds for logFC>1 (upregulated), logFC<-1 (downregulated), and adjusted p-value <0.05 were designated as “DEGs”, following standard practices in the literature. The genes associated with the complement system were retrieved from the molecular signatures database8.
Screening of Differential Gene Expressions Across the Complement System Associated Genes
The differentially expressed genes of each cancer type were examined for genes associated with the complement system. The DEGs of each cancer type related to the complement system were defined as “cancer complement genes” specific to that tumor type.
Similarity of Various Cancers Across the Complement System
The distance between cancer types in terms of the distribution of cancer complement genes was investigated using an analogous technique that has been used previously9. The simple matching coefficient (SMC) was used to calculate this distance.
(1)
The SMCs were used to assess the strength of the relationships between the different cancer types, which carry cancer complement genes. Here, the two different cancer types are represented by the letters i and j; f00 denotes the total number of genes where neither cancer type has the matching gene in its individual cancer gene list; f11 denotes the total number of genes where both cancer types have the matching gene in their individual cancer gene list; And f10 and f01 represent the total number of genes where one cancer type has the matching gene in its individual cancer gene list and the other does not. The distance between the cancer types with respect to the cancer complement genes was calculated using the R package “nomclust” (v.2.8.0)10 and visualized with the R package “corrplot” (v.0.92)11.
Evaluation of Immune Cell Infiltration
An online portal called CIBERSORTx (https://cibersortx.stanford.edu/) was used to obtain processed data to analyse the proportion of immune cells in different types of cancer. This tool uses the LM22 gene signature, which allows sensitive and precise identification of 22 phenotypes of human hematopoietic cells, along with a deconvolution algorithm against the gene expression data. Median gene expression values for each gene were used for each cancer type to allow comparison of cancers. For each cancer type, CIBERSORTx calculates a p-value by deconvolution. This number indicates the level of confidence in the results, and a p-value <0.05 was considered significant12. The number of permutations was adjusted to 1000. The distance between the cancer types in relation to the immune cell infiltration was calculated and visualized with “nomclust” and “corrplot” R packages.
Statistical Analysis
The cancer complement genes common to all types of cancers have been designated as the prospective “cancer complement biomarkers”. Based on the survival data of TCGA patients, the predictive efficacy of each cancer complement biomarker was evaluated and visualized for each cancer type using the R package “Survival” (v.3.6.4)13. This technique allowed the classification of patients based on risk scores and prognostic performance. The p-values of the log-rank test were used to evaluate the prognostic potential of the cancer complement biomarkers.
Logistic Regression Analysis
The R package nnet (v.7.3.19)14, was used to develop a logistic regression model that predicted associations between the cancer complement biomarkers and carcinogenesis in this study. The receiver operating characteristic (ROC) curves were generated using the “ROCR” package (v.1.0.11)15.
Construction of a Regulatory Network
Transcription factors (TFs), microRNAs (miRNAs), and competing endogenous RNAs (ceRNAs) all influence the expression of genes. The miRNAs, that interacted with the obtained cancer complement biomarkers were predicted using mirDIP16 and miRNet (which integrates miRNA data from 14 different miRNA databases)17 hTFtarget miRNet (which integrates TF data from 5 different TF databases) were used to collect TF elements associated with cancer complement biomarkers17, 18. ceRNAs that would affect the cancer complement biomarkers were found via the Starbase19 and LncACTdb20 databases. The proteins in interaction with the biomarkers were obtained from BioGrid (v.4.4.235)21.
Cytoscape (v.3.10.0) was used to map the regulatory network with protein-protein interactions22. The nodes of the network were determined using the “Cytohubba” plugin23.
Functional Enrichment Analysis of Regulatory Network Elements
Gene Ontology (GO) annotation24, Kyoto Encyclopedia of Genes and Genomes (KEGG) functional overrepresentation25, Reactome functional overrepresentation26 were all analyzed with the R package “clusterProfiler” (v.4.12.0)27 and displayed with the R package “genekitr” (v.1.2.5)28.
RESULTS
Transcriptome Analysis in Different Types of Cancer
Differentially expressed genes were recognized as those an adjusted p-value <0.05 and logFC >1 or logFC <-1 (Supplementary Table S1). According to the results, KIRC had the most DEGs of the nine types of cancer examined, while THCA had the fewest. All cancers except THCA had more upregulated genes than downregulated genes.
Determination of Genes of the Complement System in Different Cancer Types
A total of 522 genes potentially related to the complement system were identified (Supplementary Table S2). The DEGs of each cancer type associated with the complement system, defined as cancer complement genes specific to that tumor type, are listed in Supplementary Table S3. Supplementary Table S4 shows the binary matrix indicating the presence or absence of complement system genes in each cancer type. According to this table, all cancer types in this study showed a considerable number of cancer complement genes with differential expression. Among the 522 complement system genes, PRAD had the lowest proportion of these genes (20%), while KIRC had the highest proportion of these genes (55%).
The common elements of the complement system in different types of cancer are shown in Figure 2A. Five genes, namely apolipoprotein C1 (APOC1), component 7 (C7), complement factor-D (CFD), integrin-binding sialic acid protein (IBSP), and interleukin-11 (IL11), were common to all cancer types. These cancer complement genes, which are common to all cancers, were designated as the prospective “cancer complement biomarkers”. The heatmap of these biomarkers expressed in all cancer types with their fold change values is shown in Figure 2B. According to this heatmap, C7 was downregulated in all cancers, and CFD was also downregulated in all cancers except KIRC. In contrast, IBSP was upregulated in all cancers, and IL11 was also upregulated in all cancers except KIRC. APOC1, on the other hand, was upregulated in all cancers except LUAD and LUSC.
Similarity Analysis Between Cancers Over the Complement System Genes
Considering the distributions of cancer complement genes, a similarity analysis was performed to calculate the distances between cancer types and to determine the strength of correlations between cancer types across these genes.
The SMC coefficients between cancers ranged from 0.50 to 0.75 (Figure 2C). The distance between THCA and KIRC was the largest (SMC=0.50), while UCEC and LUAD (SMC=0.75), and LUSC and LUAD (SMC=0.74) were the most similar cancer types in terms of cancer complement genes.
Evaluation of Immune Cell Infiltration
The immune infiltration deconvolution of each cancer was analyzed using CIBERSORTx. The results of the KIRC, PRAD, and THCA failed deconvolution (CIBERSORTx p>0.05), whereas the other six cancer types showed significant immune infiltrate deconvolution results (Figure 2D and Supplementary Figure S1). Of 22 immune cell types, 15 cell types were detected in two or more cancer types, while naive B-cells, gamma delta T-cells, CD4 memory resting T-cells, activated dendritic cells, CD4 memory activated T-cells, resting mast cells, and neutrophils were not detected in any of the cancer types. Memory B-cells were the most common population in six cancer types (more than 56% in all), and M2 macrophages were present at significantly higher levels in BRCA compared to the other cancer types (11%).
The SMC analysis regarding immune cells showed that LUAD and LUSC (SMC=0.41) and LUAD and BRCA (SMC=0.41) are the most strongly correlated of these six cancer types (Figure 2E).
Prognostic Potential of the Cancer Complement Biomarkers
Survival analysis was performed using the Cox regression model and Kaplan-Meier estimates, to determine the prognostic power of five potential cancer complement biomarkers for each cancer type and to emphasise the predictive power of patient survival between low and high risk groups.
Among the five biomarkers, APOC1 showed significant predictive power (p<0.05) for KIRC and THCA, as did C7 for LUAD, PRAD and UCEC, CFD for UCEC, IBSP for COAD, KIRC and LUAD, and IL11 for BRCA, KIRC and LUAD (Figure 3).
Diagnostic Potential of the Cancer Complement Biomarkers
A logistic regression model was developed to predict the relationship between cancers and the five prospective cancer complement biomarkers. ROC curves were generated to investigate the potential predictive value of these biomarkers in each cancer type. Figure 4 illustrates the the area under the curve (AUC) of all cancers for each biomarker.
The most common technique for determining correlations between binary outcomes and biomarkers is logistic regression, where the accuracy of a model is provided by the ROC curves. The classification scheme proposed by Hosmer and Lemeshow and confirmed in the literature for the discriminatory power of a biomarker based on the AUC is as follows: ineffective (0.0-0.5), poor (0.5-0.6), sufficient (0.6-0.7), good (0.7-0.8), very good (0.8-0.9), excellent (0.9-1.0)29.
According to the logistic regression results, of the 45 analyses (5 biomarkers for 9 cancer types each), only 3 cases had no diagnostic significance [APOC1 for COAD (AUC=0.59), IBSP for PRAD (AUC=0.55), and IL11 for UCEC (AUC=0.29)]. The AUCs for the other cases ranged from good to excellent according to the classification of Hosmer and Lemeshow (Figure 3). To maintain figure clarity, the p-values of the Kaplan-Meier curves and the AUCs of the ROC curves are not displayed within the figures. Instead, these values are presented separately in Table 1.
Regulatory Network around Cancer Complement Biomarkers
The ceRNAs, miRNAs, TFs and proteins associated with these biomarkers are listed in Supplementary Table S5. A total of 445 elements, including 61 ceRNA, 156 miRNA, 171 TFs and 57 proteins, were found around these biomarkers. Figure 5A shows the multifactorial regulatory network (MRN) of cancer complement biomarkers. The degree and betweenness centrality analysis with the Cytohubba tool, revealed 13 elements, namely IL11, CFD, APOC1, C7, IBSP, CREB1, CTCF, EP300, MYC, P63, AR, hsa-mir-16-5p, and hsa-mir-155-5p, as hub elements (Supplementary Table S6).
Functional Enrichment Analysis
The regulatory network elements were used to investigate enriched pathways associated with the cancer complement biomarkers. GO annotation, KEGG functional overrepresentation, and reactome functional overrepresentation revealed that MRN elements were enriched mainly in carcinogenesis and complement system-associated pathways such as estrogen receptor signaling (ESR)-mediated signaling and SUMOylation (Figure 5B-D, Supplementary Tables S7, 8, 9).
DISCUSSION
The complement system’s critical role significantly influences the development and spread of tumors, which in turn affect the prognosis and diagnosis of cancer. Researchers may be able to develop individualized treatments and gain a deeper understanding of cancer biology by using common biomarker profiles of different cancer types within this pathway. In this work, transcriptome and clinical data from TCGA were used to identify common differentially expressed genes associated with the complement system in nine cancer types with large sample sizes.
All cancer types in this study showed a considerable number of cancer complement genes with varying levels of expression. The SMC coefficients related to cancer complement genes in different cancer types ranged from 0.50 to 0.75, suggesting that at least half of the complement system genes are shared across different cancer types. LUSC and LUAD (SMC=0.74) are among the most similar cancers in terms of cancer complement genes. Similarly, the results of CIBERSORTx show that different types of immune cells infiltrate different cancer types. As a result of the CIBERSORTx analysis, M2 macrophages were found to be significantly increased in breast cancer compared to the other cancer types, which is consistent with the study showing that M2 macrophages stimulate cell migration and growth in breast cancer30.
The correlation between cancer types in terms of their immune cell proportions illustrates that LUAD and LUSC are the most strongly correlated of the six cancer types with significant deconvolution (SMC=0.41) (Figure 2E). These two findings are consistent with the fact that both are subtypes of lung cancer and are classified together as non-small cell lung cancer31.
Functional enrichment analysis revealed that MRN elements were mainly enriched in pathways related to the complement system and carcinogenesis, including SUMOylation and ESR-mediated signaling. These results are consistent with the literature that SUMOylation is a post-translational modification that regulates immunological responses, carcinogenesis and DNA damage repair32, and the ESR pathway is important in breast growth and development and is a target for breast cancer33.
Five biomarkers, namely APOC1, C7, CFD, IBSP, and IL11, were common to all cancer types. The diagnostic and prognostic performance of these biomarkers, which were determined individually for each cancer type, shows remarkable results in most cancer types and represents an important resource for future research.
The most abundant apolipoprotein in very low density lipoprotein cholesterol is APOC. Recently, APOC1 was discovered to function as an immunological biomarker that controls macrophage polarization and contributes to the development of renal cell carcinoma34. This protein indicates a poor prognosis and is associated with immune infiltration of the tumor in esophageal squamous cell carcinoma35. The terminal component of the complement cascade, complement C7, is essential for the development of the membrane attack complex as it penetrates lipid bilayers36. In an omics study done by Chen et al.37 C7 was suggested to be a novel down regulated prognostic biomarker and immunotherapy target in PRAD. This study is consistent with literature indicating that C7 was downregulated in all cancers, suggesting its tumor suppressive role (Figure 2B).
Adipsin, referred to as CFD, is a type of adipokine that is mostly produced in fat tissues and then released into the bloodstream. Also, it plays a crucial role in the activation of the complement system and serves as the rate-limiting component in the alternative complement pathway. IBSP is an essential component of bone formation, renewal and repair. Cell surface-related complexes that prevent cells from complement-mediated lysis are formed when IBSP binds to complement factor H38. The proliferation of cancer cells and the inflammatory microenvironment of the tumor are mediated by cytokines. Together with IL-6 and IL-27, IL-11 belongs to the family of glycoprotein 130 cytokines39. Numerous studies have demonstrated the possible involvement of IL-11 in a number of cancers, including prostate, ovarian, pancreatic, breast, uterine, bone, stomach, and colorectal cancers1.
Study Limitations
This study has certain limitations. First, due to the limited availability of cancer data, the analyses were confined to TCGA, with each tumor type represented by a single dataset. While the number of cases was sufficient for statistical and logistic regression analyses, this restriction in sample size limits the generalizability of the findings. Second, transcriptome analyses primarily identify associations between diseases and traits but provide limited insight into the underlying mechanisms. Understanding how various cell types respond to therapy and impact the overall prognosis is crucial. Additionally, further research is needed to elucidate the mechanisms through which cancer complement biomarkers exert tumor-suppressive or carcinogenic effects in the examined cancer types.
CONCLUSION
In conclusion, the growth and distribution of tumors are significantly influenced by the tumor microenvironment (TME), which in turn affects the therapeutic outcome for the patient. The complement system plays an important and complex role in this scenario. It could destroy tumor cells covered with antibodies, induce localized chronic inflammation, or suppress the T-cell response to the tumor, which promotes tumor growth. These contradictions strongly depend on the composition of the TME, the regions of complement activation, and the susceptibility of the tumor cells to the attack of the complement system, according to the latest research results. The proposed five biomarkers of this study and their surrounding network hubs open up fascinating opportunities for translational research and innovation in patient-centred healthcare and precision medicine.


