2019 Summer Mentors

NCSA SPIN mentor Aleksei Aksimentiev

Atomic Resolution Brownian Dynamics (ARBD) is a GPU-accelerated code developed by the Aksimentiev lab at Illinois to perform coarse-grained molecular dynamics simulations of biomolecular systems. We are looking for students having interest and some experience in the development of scientific software to assist with ARBD development. Possible research projects include implementation of popular coarse-grained models in ARBD, redesign of ARBD class structure, increasing parallel performance of ARBD on multi-node GPU systems and development of a graphical user interface. The qualified students should be familiar with C++ language and programming in Linux environment. Ideally, they would also have some basic knowledge of CUDA programming. Previous experience in either microscopic simulations, numerical algorithms, parallel programming or GUI development using C/C++ and/or Tcl/Tk would be an asset.

NCSA SPIN mentor Nigel Bosch

Deep neural networks are typically trained to optimize a metric such as mean squared error or cross-entropy. This strategy works well in general, but there are many cases where the metrics we care about in practice are different: precision, recall, F1, and others. These situations are especially common for human-centered machine learning models. For example, it may be very important for a model to have high recall when predicting which students in an online course need extra guidance to succeed, so that the course instructor does not miss any potential issues.

In this project, we will build neural networks to predict emotion, learning, and other education-related outcomes. We will compare several different metrics for model selection to answer research questions about which metrics are best suited for training models in educational application domains, and how metrics relate to each other in terms of the models that are selected. Additionally, we will train fully-connected, recurrent, and convolutional neural networks to uncover the relationships between model selection metrics and neural network structures. Prior experience with Python, NumPy, and scikit-learn will be preferred. We will use TensorFlow to train neural network models, but no prior experience with deep learning is needed.

NCSA SPIN mentor Dora Cai

As we are heading towards intensive analytics, like deep learning, on extreme-scale of data, it is important to apply HPC technologies to meet the challenges. This project will systematically perform studies to discover how to take the advantage of HPC on deep learning. The project will be focused on two HPC techniques: parallelization and in-memory processing. A serial of experiments will be performed using a variety of system settings, environment configurations, algorithm parameters, and parallelization mechanisms. This project is best suited for students who are interested in deep learning and HPC.

Skills desired: Programming experience using R and Python is required.

Contact Dora Cai

NCSA SPIN mentors Lídia Carvalho Gomes, Elif Ertekin

Thermoelectric materials can convert heat directly into electricity. The discovery of new semiconductors for thermoelectric applications is a challenging task, but it is also of extreme importance to contribute and accelerate the development of the next generation clean and renewable energy sources.

There are many factors that can affect a thermoelectric material's properties. For instance, by introducing impurities in the crystalline structure of some thermoelectric materials, we may be able to highly improve their performance by tuning their electronic properties.

We are then interested in using high-throughput first-principles calculations to investigate how defects can help us achieve high-efficiency energy conversion in materials with potential thermoelectric applications.

Depending on the student's interest and experience, the work can either be focused on the development of computational tools for data management and analysis or on the understanding of fundamental physics and chemistry of the materials. Interested students are expected to have some experience or be willing to learn density functional theory. Basic programming skills will be very useful.

NCSA SPIN mentor Donna Cox

The Advanced Visualization Lab is looking for "Renaissance" students who are interdisciplinary and want to work at the intersection of art and technology.

Self-motivated students might describe their relevant experience to the following proposed projects:

  • Experiment with brand new advanced visualization techniques for relational datasets, using animation tools to present big data like that found in the fields of genetics, health, economics, and politics
  • Work with archaeological researchers at the Cyprus Institute to create virtual environments of historical cities and deploy them in virtual reality or augmented reality

This project will explore different visualization techniques for biological data, and will consist of two phases: (1) a research phase, to learn about techniques and software for three-dimensional spatial or network visualization, and (2) an implementation phase, to create a visualization, possibly for an immersive and/or interactive environment (e.g. stereo or VR). The student should have some programming or 3D design experience.

This student will be co-mentored by the Advanced Visualization Lab (AVL) and the VI-Bio team at the NCSA. The AVL specializes in cinematic data visualization, software development, and interactive computer graphics. The VI-Bio team focuses on the application and development of visual information design techniques that aid in the comprehension of complex data.

This project will bring a student into an ongoing international collaboration of plant scientists and computer scientists interested in addressing issues of global hunger by building and experimenting with computational crop models. Specific tasks might include writing software that describes the geometric structure and growth dynamics of common crops, integrating different computational models of crops together, and real-time visualization of virtual crop growth experiments.

This student will be mentored by the Advanced Visualization Lab (AVL). AVL specializes in cinematic data visualization, software development, and interactive computer graphics.

NCSA SPIN mentor Kiel Gilleade

We are looking for a student to review descriptions—Android phone not required but would be preferable—of breathing apps available on the Google Play store and write a summary report describing the current state of the market. You will be provided a list of ~250 apps and will be tasked with (1) identifying which apps teach breathing techniques, (2) identifying which breathing techniques are being taught and (3) describe how each app teaches their breathing technique to the user.

The aim of this project is to develop a mobile stress-reduction tool that will train users how to reduce their stress levels through breath control. A mobile biofeedback app will be developed that supports both single-user and group based breathing exercises for the E4 wristband biosensor. The SDK for the E4 wristband is available for iOS, Android and Windows (Android development is preferred).

NCSA SPIN mentor Kaiyu Guan

Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and U.S. agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

NCSA SPIN mentor Roland Haas

Modern scientific simulations have enabled us to study non-linear phenomena that are impossible to study otherwise. Among the most challenging problems is the study of Einstein's theory of relativity which predicts the existence of gravitational waves detected very recently be the LIGO collaboration. I am interested in recruiting a student interested in improving the elliptic solver of used to construct initial data for such simulations, to enable it to solve elliptic equations in irregularly shaped domains. Such domains occur when constructing initial data describing two neutron stars, where elliptic equation have to be solved inside of the deformed stars. The project involves reviewing original mathematics, numerics and physics literature on methods related to the immersed surface method as well as developing a proof of concept code in python to demonstrate the performance of the methods. The successful applicant will be involved with the Relativity Group at NCSA and will be invited to participate in the weekly group meetings and discussions of their research projects.

NCSA SPIN mentor Eliu Huerta

Identifying the signatures of electromagnetic counterparts of gravitational wave sources in telescope image data will become a grand computational challenge in the LSST era. In this project, we will design neural networks that are suited to process large volumes of telescope images in real-time, performing tasks such as classification, regression and clustering of noise anomalies of transient astrophysical phenomena. This work will lay the foundations to enable contemporaneous multi-messenger discovery campaigns in the LSST era.

NCSA SPIN mentor Daniel Katz

Most scientific computational and data work can be thought of as a set of high-level steps, and these steps can often be expressed as a workflow. Software tools can help scientists define and execute these workflows, for example, via Parsl, a library that allows Python programs to execute functions and external applications in parallel and asynchronously. In addition, workflows can be stored as data sets that are basically graphs containing tasks to be executed, such as in the Common Workflow Language (CWL). This project will build a translator between Parsl and CWL. Interested students should have an interest in data science, high-performance computing, and/or distributed computing. They should be proficient in a Linux/Unix software development environment and skilled in the Python language. They will work as part of the distributed Parsl team, and gain experience in distributed open source software development.

NCSA SPIN mentor Volodymyr Kindratenko and Shirui Luo

Machine Learning (ML) has made transformative impacts on modelling many high-dimensional complex dynamical systems. Multiphase flow is one of the promising targets for using ML to improve both the fidelity and efficiency of Computational Fluid Dynamics (CFD) simulations. We are examining the use of ML to fit the CFD simulation data to develop closure relations for multiphase flow system. For example, DNNs can be trained on datasets with flows where the initial velocity and void fraction are different. The trained model is then used to predict other flow evolutions with different initial conditions. More broadly, we are tackling problems encountered with the interplay between learning and multiphase flow such as: How can learning algorithms be constructed to include physical constraints such as the incompressibility of fluid? What dimensionality reduction techniques and coarsening strategies are most applicable to identify hidden low-dimensional features? How can the computational scientists, experimentalists and theorists collaborate to produce sufficient training database for multiphase flow simulation?

The student will use open source software packages such as TensorFlow and PyTorch to construct networks to improve predictive capabilities based on a high-fidelity DNS simulation database. The student will have access to HPC platform at NCSA and learn to analyze CFD data at large scale. Besides of the practice of typical ML skills, the student will also learn more fundamentally on how the neural networks be designed to best incorporate physical constraints while avoiding overfitting to imposed physics, as typical statistical learning methods can ignore underlying physical principles.

The goal of this project is to deploy, maintain, and experiment with the latest release of OpenStack cloud operating system software on a cluster at the Innovative Systems Lab. The purpose of this experimental OpenStack deployment is to gain and maintain operational awareness of the new features and functionality ahead of the NCSA's production cloud, provide NCSA staff and affiliate faculty with a platform to experiment with the new OpenStack functionality, and to study and evaluate new projects within the OpenStack environment. This project is best suited for students interested in system administration, deployment and operation of complex cloud and HPC environments. Requirements: CS 425 or similar course.

Many research domains, such as computer vision and language understanding, have been transformed using novel machine learning (ML) and deep learning (DL) methods and techniques. However, these methods are very compute-intensive and rely on state-of-the-art hardware and large datasets to achieve an acceptable level of performance. Research team at the Innovative Systems Lab (ISL) at NCSA has been investigating how neural networks at the core of DL algorithms can be implemented on reconfigurable hardware with the objective to speedup the execution and reduce power requirements for inference algorithms. FPGAs are a good choice for implementing neural networks since they enable highly customized parallel hardware implementation and provide a great degree of flexibility with regards to numerical data types. Most recently, ISL started to explore a novel platform enabled by IBM's CAPI 2.0 interface and SNAP API. This platform allows to develop FPGA applications using high-level synthesis (HLS) methodology rather than a traditional hardware design approach and integrate kernels accelerated on an FPGA with the host-side applications running on IBM POWER9 servers.

The students working on this project will acquire the skillsets that are required to develop ML/DL algorithms in hardware using HLS approach. The students will be involved with a) evaluating performance of existing ML/DL implementations on reconfigurable hardware platforms and documenting the results, b) developing new ML/DL algorithms for implementation on reconfigurable hardware and preparing datasets for testing and evaluation, and c) helping ISL research staff with porting the algorithms to reconfigurable hardware. Required skills include completion of ECE 385 and ECE 408 or equivalent courses.

NCSA SPIN mentor Matthew Krafczyk

An important measure of the value of a scientific finding is its ability to be independently reproduced by others skilled in the area. When efforts are made to reproduce such findings years may have elapsed, and reproducibility may be unsuccessful. There are many factors which may prevent the replication of a study, and we seek to understand those related to the computational aspects of the work. We estimate that only about 10% of scientists doing computationally based work release their source code in any form. The successful applicant will join an effort to introduce more transparency to computationally based research. This will include locating source code which was used to create articles and using it to reproduce the article's result. During this process we will study scientific workflows and development habits which hinder or enable reproducibility, as well as develop tools to empower researchers to make their code available more easily. We will build a website to help elucidate these details of the scientific method to the public. Recommended Skills: R, C/C++, Python, web development including HTML, database engineering including SQL, workflow tools.

NCSA SPIN mentor Liudmila Mainzer

African American women have a 4-5 fold greater risk of death from breast cancer compared to Caucasian women, even after controlling for stage at diagnosis, treatment, and other known prognostic factors. Our initial cross-sectional studies suggest that the composition of serum from African American vs. Caucasian women were different and reflected biochemical changes due to socioeconomic status. Thus, we are now tackling a complex multidimensional dataset including proteomic, genomic, biometric, geographic and socioeconomic measurements. These dimensions need to be harmonized and correct statistical approaches applied, in order to determine the exact combination of factors that drive this racial health disparity. Additionally, we are planning to increase the size of our dataset, which will make the problem computationally challenging. We are also extending our analyses to other health disparity problems and other datasets. We invite a talented student to participate in this important and exciting project, and get involved in optimization of our analyses pipelines, development of advanced statistical approaches and data analytics.

Skills desired: Statistics, machine learning, computing, bioinformatics

Contact Liudmila Mainzer

NCSA SPIN mentor Michael Miller

This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition.

NCSA SPIN mentor Taras Pogorelov

The cell membrane environment is complex and challenging to model. The Pogorelov Lab at Illinois develops workflows that combining computational and experimental molecular data. We work in close collaboration with experimental labs. Modeling approaches include classical molecular dynamics, quantum electronic structure, and quantum nuclear dynamics. These projects include development of workflows for modeling and analysis of the lipid interactions with proteins and ions that are vital for life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD molecular modeling software.

NCSA SPIN mentor Ignacio Sarmiento Barbieri

The choice of residential affects the neighborhood with which one interacts on a daily basis. It also serves as a primary channel through which households can express demand for amenities and public services (e.g. parks, clean air, public safety). But what happens when individuals are not free to choose where to live? In this project we are examining the effects of racial discrimination in the housing market. In particular, we are interested on a key public good, clean air. We are generating experimental evidence on the impact.

NCSA SPIN mentor Andre Schleife

Computational materials science research produces large amounts of static and time-dependent data for atomic positions and electron densities that is rich in information. Determining underlying processes and mechanisms from this data, and visualizing it in a comprehensive way, constitutes an important scientific challenge. In this project we will continue development of our Unity app that is compatible with Windows Mixed Reality, Google Daydream, and iOS. We will implement new features, such as the display and interaction with time-dependent data, as well as novel modes of interaction with the data. In addition, we will use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the atomic and electronic structure of existing and novel materials e.g. for solar-energy harvesting and optoelectronic applications. In a team, we will further develop codes based on the physics-based ray-tracer Blender/LuxRender and the yt framework to produce immersive images and movies.

Skills desired: Android app development, OpenGL/Unity/WebGL, VR code development, creativity and motivation

Contact Andre Schleife

NCSA SPIN mentors Aiman Soliman and Volodymyr Kindratenko

Arctic and Polar scientists have been studying the changes of specific landscape features, fauna, and flora over fairly restricted spatial extents using field expeditions and very high-resolution remote sensing datasets. Over the past years, combined efforts in polar geospatial science and HPC have yielded novel high-resolution Digital Elevation Models (DEM), namely the ArcticDEM and the Reference Elevation of Antarctica (REMA). These state-of-the-art archives capture polar landscape surface at unprecedented spatial (meters scale) and temporal scales (2-3 weeks), and represent records of all the changes that happened and are happening at the Earth's poles. However, the size of the archives represents a real challenge for scientists to extract conclusive results. We are developing DL models that can be applied at a scale to conduct an inventory of polar landscape features and quantify their lateral and vertical changes.

The students working on this project will acquire the skillsets that are required to develop DL models while applying them to monitor the current state of polar environments. The students will be involved with a) preparing model training sets from existing field survey data; b) evaluating the performance of different DL architectures that are suited to segment images, such as Convolutional Neural Networks, as well as architectures that are suited for detecting changes in image sequences, such as Recurrent and Siamese Neural Networks; and c) developing HPC workflows to manage and apply the developed DL models to existing elevation data archives leveraging the cyberinfrastructure at NCSA.

NCSA SPIN mentor Sever Tipei

The project centers on DISSCO, software for composition, sound design and music notation/printing developed at Illinois and Argonne National Laboratory. Written in C++, it includes a graphical user interface using gtkmm, a parallel version is being developed at the San Diego Supercomputer Center. DISSCO has a directed graph structure and uses stochastic distributions, sieves (part of number theory) and elements of information theory to produce musical compositions. Presently, efforts are directed toward refining a system for the notation of music as well as to the realization of an evolving entity, a composition whose aspects change when computed recursively over long periods of time thus mirroring the way living organisms are transformed in time (artificial life).

Another possible direction of research is sonification, the aural rendition of computer generated complex data.

Skills desired: Proficiency in C++ programming, familiarity with Linux operation system, familiarity with music notation preferred but not required

Contact Sever Tipei