2018-2019 Academic Year Mentors

NCSA SPIN mentor Colleen Bushell

The Visual Analytics team at NCSA focuses on the application and development of machine learning and visual information design techniques that aid in the comprehension of complex data, especially for use in precision medicine.

Recent advances in genetic sequencing and health monitoring technology have allowed for the collection of complex data at unprecedented scale. The focus of our R&D work is; 1) to apply and refine analytical approaches for the selection and ranking of key features from genetic, microbial, mass spectrometry and other relevant data, that can be used as biomarkers to predict health outcome and/or guide treatment, 2) to develop interactive visualization methods that put these results in a biological context allowing researchers to study the data effectively and to assist doctor-patient decision-making. We are looking for students who have strong skills in one of the following areas: biology (human genome, microbial genome, human disease), health care, software engineering, machine learning, visualization or interface design. Potential projects areas where students can participate include: information design, user-experience design, visualization software development, literature reviews, review of current clinical data tools and challenges faced by healthcare providers.

NCSA SPIN mentors LĂ­dia Carvalho Gomes and Elif Ertekin

Thermoelectric materials can convert heat directly into electricity. The discovery of new semiconductors for thermoelectric applications is a challenging task, but it is also of extreme importance to contribute and accelerate the development of the next generation clean and renewable energy sources. There are many factors that can affect a thermoelectric material's properties. For instance, by introducing impurities in the crystalline structure of some thermoelectric materials, we may be able to highly improve their performance by tuning their electronic properties.

We are then interested in using high-throughput first-principles calculations to investigate how defects can help us achieve high-efficiency energy conversion in materials with potential thermoelectric applications. Depending on the student's interest and experience, the work can either be focused on the development of computational tools for data management and analysis or on the understanding of fundamental physics and chemistry of the materials.

Interested students are expected to have some experience or be willing to learn density functional theory. Basic programming skills will be very useful.

NCSA SPIN mentor Donna Cox

The Advanced Visualization Lab is looking for "Renaissance" students who are interdisciplinary and want to work on projects at the intersection of art + technology. Are you a programmer with an eye for design? A filmmaker who wants to make use of VR/AR technologies? A double-major in music and math? A dancer and an engineer? Tell us what you are passionate about and what project ideas you have. The Advanced Visualization Lab specializes in cinematic data visualization for public audiences, scientists, and performers. We want to support self-motivated students in building digital experiences for cutting-edge arts applications, through the Fiddler Endowment.

NCSA SPIN mentor James Eyrich

This project will be involved with the creation and deployment of a system that uses multiple GPU cores to perform password cracking. The student will need to determine hardware and software requirements, build the system and create documentation for how to use the system.

Separate but related project: Cloud spot pricing for GPU instances can be crazy cheap and have per minute billing, would be cool to evaluate the price/performance and TCO of using temporary cloud instances. A fun project could be to build a system that accepts one or more password hashes, starts up a GPU spot instance, cracks the password, and then shuts down the instance.

Skills required: familiarity with gpu hardware, os employment (student's choice), programming language (irst would prefer python 3 or go and javascript for web)

Contact James Eyrich

SSH-Auditor works, but is currently designed to run at regular intervals and do the entire discovery+scan process, then go quiet for a few hours. An alternative mode would be to have individual components that listen for events. This would enable a discovery to run slower, but continuously, sending a few packets per second all day long.

The store could be wrapped in a component that is a RPC-type service that 'owns' the data.

There would be a discovery component that continually scans for SSH servers and notifies the store when new hosts are found. This could be designed to take a target packet per second rate so it finishes in a certain time frame.

There would be a component that continually checks the store for outdated hosts and scans them.

GRPC could be a good fit for this, the design would be similar to how Boulder flow diagrams work.

If this project is completed before end of semester we have similar tools that we would like a web interface developed before.

IRST would prefer Python 3 or Go and JavaScript for web, Nmap-like tools.

NCSA SPIN mentor Patricia M. Gregg

Prof. Gregg's Volcano Lab utilizes sophisticated finite element modeling approaches to forecast volcanic eruption at active volcanoes worldwide. Geophysical data from satellites and ground-based observations such as GPS and earthquakes are used to provide constraints on models that simulate magma storage, migration, and eruption in volcanic systems. Students will have the opportunity to learn about the factors controlling volcano dynamics and work with team members build their own models to investigate an active volcanic system (e.g., Mt. St Helens, Kilauea HI, and Yellowstone WY). Students will also have the opportunity to take advantage of the High-Performance Computing technique to calculate FEMs with billions of Degrees of Freedom. Coding experience such as Matlab or Python is required. Solid mechanics (stress, strain, failure) and numerical analysis (finite element method) are desired skills, but can be obtained throughout the project. The successful student will have the chance to participate in publishing peer-reviewed papers based on their results.

NCSA SPIN mentor Kaiyu Guan

Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and U.S. agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

NCSA SPIN mentor Roland Haas

Modern scientific simulations have enabled us to study non-linear phenomena that are impossible to study otherwise. Among the most challenging problems is the study of Einstein's theory of relativity which predicts the existence of gravitational waves detected very recently be the LIGO collaboration. The Einstein Toolkit is a community-driven framework for astrophysical simulations. I am interested in recruiting a student interested in helping to move the Einstein Toolkit's issue tracker, wiki, website and code repositories from self-hosted servers using Trac, MediaWiki and bitbucket to GitHub. The transition process will preserve existing tickets and wiki content, requiring interaction with the API of Trac, MediaWiki and GitHub to transfer the information. The successful applicant will be involved with both the Relativity Group at NCSA and the Blue Waters project, and will be invited to participate in the weekly group meetings and discussions of their research.

Details: The Einstein Toolkit currently uses a self-hosted Trac service to host its issue tracker, a MediaWiki for user generated documentation, Apache and PHP for its website to host the code. The project aims to unify these using a facilities ofered by GitHub to provide an issue tracker, wiki and website. Website content is mostly static but some parts of it need to be auto-generated from LaTeX source files and require integration with Travis-CI or similar on GitHub's pages service. Skills useful for this project include knowledge or willingness to learn using GitHub, Travis-CI, Jekyll, Ruby, HTML, LaTeX, JavaScript, PHP, python.

The Einstein Toolkit is a community driven framework for astrophysical simulations. I am interested in recruiting a student interested in improving the regression test-suites of the Einstein Toolkit. Test-suites are used to to ensure correctness of results after code changes. Currently these test are run serially one after the other and only text file output is supported. This SPIN project aims at parallelizing the test-suite code and to add support for binary, HDF5 output data.

Skills required: good knowledge of perl, working knowledge of c, knowledge of python is beneficial but not required

Contact Roland Haas

NCSA SPIN mentor Eliu Huerta

Combining HPC and AI provides a fresh approach to maximize scientific discovery with gravitational waves and light. If you are interested in developing state-of-the-art deep neural networks using the Blue Waters supercomputer to advance the emergent field of multimessenger astronomy, then this is the project you are looking for. Skills needed: Linux, C, Git, Python, TensorFlow. Familiarity with HPC environments is desirable, but this may also be learned during the internship. No background in astronomy or physics is required.

NCSA SPIN mentor Vlad Kindratenko

The goal of this project is to deploy, maintain, and experiment with the latest release of OpenStack cloud operating system software on a cluster at the Innovative Systems Lab. The purpose of this experimental OpenStack deployment is to gain and maintain operational awareness of the new features and functionality ahead of the NCSA's production cloud, provide NCSA staff and affiliate faculty with a platform to experiment with the new OpenStack functionality, and to study and evaluate new projects within the OpenStack environment. This project is best suited for students interested in system administration, deployment and operation of complex cloud and HPC environments. Requirements: CS 425 or similar course.

This project will involve deployment and evaluation of existing deep learning frameworks on an HPC cluster and on a cloud. The goal is to gain hands-on experience with deep learning codes, frameworks, and methodologies and to support upcoming projects requiring deep learning. The work may also require parallelizing codes to work on multiple nodes. This project is best suited for students interested in the development of machine learning techniques and their applications in science and technology fields. Requirements: CS 446 and CS 420, or similar courses.

NCSA SPIN mentor Kacper Kowalik

Many physical properties of stars are derived from spectroscopic observations. Typically, an equivalent width of stellar absorption lines is measured and physical quantities are obtained through application of computational models. Ultimately, only resulting values are published as scientific artifacts. In rare cases scientists make raw data available through external data repositories. The intermediate research data is almost never made public. The goal of this project is to create an interactive web service that would allow researchers to store a large quantity of data (reduced stellar spectra) and enable browsing and measuring individual stellar lines. Knowledge of Python (SciPy stack), JavaScript (ReactJS) will be very helpful in this project. Familiarity with cloud environment (OpenStack) and containerization (Docker) would be a plus. This project is best suited for students interested in designing user interfaces and scientific data visualization.

The yt Hub is a platform for the community affiliated with the yt Project, to conduct data-intensive research and easily disseminate large scientific datasets. The goal of this project to improve current yt Hub's infrastructure, specifically:

  • Automate deployment and monitoring using infrastructure-as-a-code (IaaC)
  • Port current container management system to a scalable Container Orchestration platform such as Kubernetes, or Docker Swarm
  • Improve current user interface for managing running containers
  • (optionally) design and implement integration tests

Familiarity with Python, JavaScript (BackboneJS), cloud providers (OpenStack), infrastructure management automation (Terraform), containers at scale (Docker Swarm/Kubernetes) will be very helpful in this project. This project is best suited for students interested in DevOps methodology.

NCSA SPIN mentor Liudmila Mainzer

African American women have a 4-5 fold greater risk of death from breast cancer compared to Caucasian women, even after controlling for stage at diagnosis, treatment, and other known prognostic factors. Our initial cross-sectional studies suggest that the composition of serum from African American vs. Caucasian women were different and reflected biochemical changes due to socioeconomic status. Thus, we are now tackling a complex multidimensional dataset including proteomic, genomic, biometric, geographic and socioeconomic measurements. These dimensions need to be harmonized and correct statistical approaches applied, in order to determine the exact combination of factors that drive this racial health disparity. Additionally, we are planning to increase the size of our dataset, which will make the problem computationally challenging. We invite a talented student to participate in this important and exciting project, and get involved in parallelization of R code and development of advanced statistical approaches.

Desired skills: statistics, machine learning, computing, bioinformatics

Contact Liudmila Mainzer

Genomic analyses have moved into the arena of big data, thus requiring full automation for deployment on advanced computing infrastructure. The computational workflows tend to be complex, consist of multiple steps, fans, merges, and user level conditionals. Numerous quality control and job monitoring procedures are required. Deployment and optimization of this large and complex workload is a big challenge in itself. Different strategies are appropriate for running these analyses in the cloud, on analytics platforms or the traditional grid clusters. NCSA Genomics invites a computationally-savvy student to partake in this activity and learn about the different workflow management systems, code benchmarking and optimization, cloud computing and big data analytics.

Desired skills: computing, engineering, bioinformatics, genomics

Contact Liudmila Mainzer

NCSA SPIN mentor Brendan McGinty and Neil Andrews

Students interested in applied research and corporate engagement can explore opportunities with NCSA Industry partners, including some of the world's largest companies. Areas of need include data analytics, bioinformatics, modeling and simulation, machine learning, cybersecurity, and visualization. It may be possible match specific areas of interest in any of these areas with existing and prospective industrial contacts thanks to broad and ever-expanding needs within these organizations, many of which have a global footprint.

NCSA SPIN mentor Michael Miller

This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition.

NCSA SPIN mentor Andre Schleife

Computational materials science research produces large amounts of static and time-dependent data for atomic positions and electron densities that is rich in information. Determining underlying processes and mechanisms from this data, and visualizing it in a comprehensive way, constitutes an important scientific challenge. In this project we will use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the atomic and electronic structure of existing and novel materials e.g. for solar-energy harvesting and optoelectronic applications. In a team, we will further develop codes based on the physics-based ray-tracer Blender/LuxRender and the yt framework to produce immersive images and movies. Stereoscopic images will be visualized using virtual-reality viewers such as Google Cardboard/Google Daydream. Preliminary implementations for Google Daydream exist and within this project the team will extend these towards volume- and time-dependent rendering of large data sets, as well as implementing user interaction using the Daydream controller. Skills that are beneficial for this work: Android app development; OpenGL/Unity/WebGL; VR code development, ideally Google Daydream; creativity and motivation.

The overarching goal of this project is to develop a data-science infrastructure that will enable a new computational/experimental approach to design semiconductor nanocrystals for bioimaging. Experiments and simulation produce a large variety of data from instruments and electronic-structure calculations, respectively. This data needs to be analyzed, post-processed, and shared in order to enable design of optimized nanocrystals. We will use and improve a computational collaborative infrastructure needed to facilitate a close feedback loop between experiment and theory to extract relevant information and to determine underlying physical processes and mechanisms. This team will 1) develop an infrastructure for optimizing materials parameters using machine learning techniques and 2) develop and optimize data and workflow curation and sharing via an advanced computational infrastructure that we have partially implemented. The data and workflows will be made available to scientists as a web service and collaborators will be equipped with tools needed to routinely store new results that will be acquired in materials data repositories such as the Materials Data Facility. Skills that are beneficial for this work: web development, machine learning techniques, basic understanding of material research or data characterization and storage.

NCSA SPIN mentor Rebecca Smith

An NCSA faculty fellowship converted an individual-based model for disease in dairy cattle to both C and in Java and prepared it for multi-threading, but the NCSA programmer left before finishing the project. There is an allocation on XSEDE for running the model optimization. However, changes need to be made to the model to include an economic component and, if time allows, a second disease. Additionally, an algorithm must be coded to allow for stochastic optimization. Skills needed are: ability to program in C or Java and familiarity with multithreading. Skills preferred are: interest in stochastic simulation modeling and/or optimization, familiarity with XSEDE. I can provide funding for the expected work time and pay rate.

NCSA SPIN mentor Sever Tipei

The project involves algorithmic composition and digital sound synthesis using a software package developed at Illinois Computer Music Project and Argonne National Laboratory. It is an ongoing project using stochastic distributions, elements of Graph Theory and Information Theory and requiring good C++ programming skills and possibly building/maintaining a Graphic User Interface. The software presently runs in multitasking mode on the Innovative Systems Laboratory and Computer Music Project computers and is being parallelized at the San Diego Supercomputer Center.