African American women have a 4-5 fold greater risk of death from breast cancer compared to Caucasian women, even after controlling for stage at diagnosis, treatment, and other known prognostic factors. Our initial cross-sectional studies suggest that the composition of serum from African American vs. Caucasian women were different and reflected biochemical changes due to socioeconomic status. Thus, we are now tackling a complex multidimensional dataset including proteomic, genomic, biometric, geographic and socioeconomic measurements. These dimensions need to be harmonized and correct statistical approaches applied, in order to determine the exact combination of factors that drive this racial health disparity. Additionally, we are planning to increase the size of our dataset, which will make the problem computationally challenging. We invite a talented student to participate in this important and exciting project, and get involved in parallelization of R code and development of advanced statistical approaches.
Desired skills: statistics, machine learning, computing, bioinformatics
Genomic analyses have moved into the arena of big data, thus requiring full automation for deployment on advanced computing infrastructure. The computational workflows tend to be complex, consist of multiple steps, fans, merges, and user level conditionals. Numerous quality control and job monitoring procedures are required. Deployment and optimization of this large and complex workload is a big challenge in itself. Different strategies are appropriate for running these analyses in the cloud, on analytics platforms or the traditional grid clusters. NCSA Genomics invites a computationally-savvy student to partake in this activity and learn about the different workflow management systems, code benchmarking and optimization, cloud computing and big data analytics.
Desired skills: computing, engineering, bioinformatics, genomics