Genomic analyses have moved into the arena of big data, thus requiring full automation for deployment on advanced computing infrastructure. The computational workflows tend to be complex, consist of multiple steps, fans, merges, and user level conditionals. Numerous quality control and job monitoring procedures are required. Deployment and optimization of this large and complex workload is a big challenge in itself. Different strategies are appropriate for running these analyses in the cloud, on analytics platforms or the traditional grid clusters. NCSA Genomics invites a computationally-savvy student to partake in this activity and learn about the different workflow management systems, code benchmarking and optimization, cloud computing and big data analytics. Desired skills: computing, engineering, bioinformatics, genomics.
Health disparities, be it racial, economic, rural-urban, gender- or age-based, have come to the forefront across the world. Understanding their underlying causes and making reliable predictions that drive informed decisions by policy makers and health practitioners requires tackling complex multidimensional datasets that include proteomic, genomic, biometric, geographic and socioeconomic measurements. These dimensions need to be harmonized and correct statistical approaches applied, in order to determine the exact combination of factors that drive health disparities, without making the problem computationally challenging. This requires development of advanced statistical and machine learning approaches. For example, we intend to do this when studying effects of pollution and poverty on rural and racial health disparities in Illinois. Another aspect of our work involves civil infrastructure: quality of water, sewage, electricity, proximity to education, transportation and medical centers, and quality of buildings where people live and work. We intend to use advanced geostatistical methods to isolate neighborhood clusters and test whether these neighborhoods are more or less likely to exhibit certain soil or water contamination, or socioeconomic patterns and increased health risks. Machine learning (ML), including deep learning, will be applied to high-resolution satellite images, aerial imagery and LiDAR data (elevation images) to detect variations in human environments quality. ML models will be trained to relate elements that are visible in geospatial data such as roof conditions, vegetation status, the degree of land use mixing, etc. to socioeconomic and health information about individuals who live and work in Champaign, Urbana, and Rantoul. We would like to have a team of talented students to participate in this important and exciting project, with the desired skills in statistics, machine learning, computing, bioinformatics.