论文部分内容阅读
Breast cancer is the second leading cause of cancer-related death for women in the United States.Roughly 12.5% of women will be diagnosed with breast cancer in their lifetime.Patients who receive early detection can get a better prognosis and receive less severe treatments than those diagnosed at a later stage.Therefore, it is important to identify new biomarkers that will enable early detection of breast cancer.Among different genetic and epigenetic modifications,DNA methylation (the addition of a methyl group to a cytosine) plays a key regulatory role in cancerous cells.Many research studies have shown that DNA methylation is one of the most common molecular changes in breast cancer cells.It has also been shown that the methylations of many genes are potential biomarkers in complex diseases including lung, ovarian, and breast cancers.With the next generation sequencing (NGS) technologies, it is now possible to identify differentially methylated regions as potential biomarkers by considering methylation at all CG sites in the whole genome.However, NGS technologies generated a huge amount of data with complex technical and biological features.It is challenging to analyze such large and complex data sets.In order to address this difficult question, we have developed a new statistical method using a hidden Markov model to identify differential methylated regions (or multiple single CG sites) that are potential breast cancer methylation biomarkers.We will demonstrate our method and compare it with a traditional statistical method.Our proposed method has the following advantages: 1) it can indirectly remove errors while taking into account the neighboring CG site information, and 2) it can identify any differentially methylated region without necessarily limiting the analysis unit to a gene or CpG island (regions rich in CG sites).