论文部分内容阅读
Childhood acute lymphoblastic leukaemia (ALL) is the most common childhood malignancy.Current treatment strategies are determined by clinical presentation, however there has been a recent push towards panomics to realise the goals ofpersonalised medicine.This assumes that patients with similar genetics will respond to similar treatments with similar outcomes.We present an approach that utilises the similarity of patients genetics to predict treatment outcome in ALL.Constructing a spatial representation of patient genetic similarity, or "similarity space", especially one built with markers that underlie treatment outcome, will be useful for this endeavor, whilst acknowledging the complexity of patients integrated SNP genotypic markers and gene activity.A two-step strategy is required-firstly,identify the markers that are predictive of treatment outcome and, secondly, combining these attributes to construct a similarity space.Our data was generated on a cohort of over 100 precursor B-cell childhood ALL patients diagnosed at the Childrens Hospital at Westmead.We generated 22,277 gene expression probesets from diagnostic bone marrow and 13,917 SNPs from remission peripheral blood of these patients.We used the popular data-mining technique Random Forest (RF) to rank the probesets and SNPs most predictive of treatment outcome.The rankings of the probesets and SNP of each RF ensemble were combined into a global list of informative features.In essence, we gauged the overall genetic similarity between patients based on the 250 most predictive probesets and SNPs.We then constructed our similarity space by using different visualisation techniques, in particular spectral graph techniques to tease out and visualise the subtle differences between these patients in terms of treatment outcome.We found that there was little association between clinical presentation markers and patient position in this similarity space.We discovered that our two-step approach using the combined gene expression and SNP data provided a stronger patient separation than either of the two datasets alone.This implies that integrating different types of genetic data offers deeper insight into the underlying biological basis of individual ALL disease.Our approach aims to develop a predictive model in the form of a similarity space that handles complex genetics underlying ALL.We have demonstrated that there is potential for complex panomic data to be represented in a space reflective of individual patient status which has applications for clinical management.