We are improving performance of a stepwise epistatic model selection for Genome-Wide Association Studies. The method itself works well, but the current Java implementation is way too slow for modern data sizes. We would like to deploy this Java code on Spark, to see if the necessary performance gains could be obtained. A successful student applicant will use Java Spark API to adapt the current code for a Spark platform that is being deployed at NCSA's Innovative Systems Lab. This code will be validated for correctness in collaboration with a student statistician from the lab of Dr. Lipka, who developed this statistical method.
The teams of NCSA Genomics and Data Analytics are jointly looking for a student who enjoys running complex statistical analyses in R. We deal with a range of problems in bioinformatics, genomics, cheminformatics, and disciplines outside biology, that require advanced stats. However, most such codes are written as single-threaded R scripts. Methods have been developed to parallelize R codes for use in high performance computing environment. The successful applicant will learn these parallelization approaches and apply them to improve performance of codes for a variety of projects both with Illinois faculty and the Industry partners. Strong statistical background, and a love of R is required. Familiarity with Linux is a bonus.
We are starting a collaboration with the University of Birmingham around the effects of environmental pollution on gene expression. The collaborators at the UofB are planning to analyze massive amounts of data, and need our help automating their workflows. The student will need to learn Nextflow, a workflow management system written in Groovy and Ruby. Nextflow will be applied to wrap a series of bioinformatics software in a workflow that provides automatic execution on large number of files, good data management and loggery. The student must have experience with several computer languages and a background in biology/biochemistry/genomics, or willingness to learn.