2019-2020 Academic Year Mentors

NCSA SPIN mentor Vikram Adve

As so-called "smart devices" become widespread in homes, farms, factories and cities, it is become increasingly important to be able to analyze and respond to information gathered by these devices quickly, often in close to real time, and with increasingly sophisticated machine learning techniques. The available computing capacity near such devices—computational power, memory capacity and sometimes energy—are all extremely limited. The major solution being explored today is to use specialized or "heterogeneous" hardware, such as a low-power Field Programmable Gate Array (FPGA) or a custom ML accelerator, which can provide 1-2 orders-of-magnitude better raw compute power and compute power per watt compared with general purpose CPUs. The goals of our research are (1) to make it possible to run more computationally intensive workloads (e.g., machine learning and signal processing) than is possible today near the edge; and (2) to make it possible for application developers who may not be computing experts to be able to program these heterogeneous systems, which today are challenging even for computing experts to program. We have developed a compiler system called Heterogeneous Parallel Virtual Machine (HPVM) that aims to support a wide range of high-level (general purpose and domain-specific) programming languages on a wide range of heterogeneous hardware accelerators. In these SPIN projects (positions available for 1-4 students), you will begin by using HPVM to compile a few neural network models written in the TensorFlow library, Keras, to work on a heterogeneous system that combines a Xilinx Zynq-7000 (ARM+FPGA) development board. You will then experiment with different techniques to optimize the generated FPGA designs, in collaboration with the team of graduate students working on the HPVM project. During the course of the year, we will optionally also target a machine learning accelerator such as the Intel Movidius Myriad X VPU, for ML problems in computer vision.

Preferred qualifications:

  • Experience with Python programming
  • Experience with at least one machine learning framework, preferably but not necessarily TensorFlow or Keras
  • Evidence of strong intellect
  • Evidence of being diligent and persistent in past endeavors
  • Experience working with large software systems would be a plus but not necessary
  • Experience in systematic performance evaluation would be a plus but not necessary
NCSA SPIN mentor Aleksei Aksimentiev

Atomic Resolution Brownian Dynamics (ARBD) is a GPU-accelerated code developed by the Aksimentiev lab at Illinois to perform coarse-grained molecular dynamics simulations of biomolecular systems. We are looking for students having interest and some experience in the development of scientific software to assist with ARBD development. Possible research projects include implementation of popular coarse-grained models in ARBD, redesign of ARBD class structure, increasing parallel performance of ARBD on multi-node GPU systems and development of a graphical user interface. The qualified students should be familiar with C++ language and programming in Linux environment. Ideally, they would also have some basic knowledge of CUDA programming. Previous experience in either microscopic simulations, numerical algorithms, parallel programming or GUI development using C/C++ and/or Tcl/Tk would be an asset.

NCSA SPIN mentors LĂ­dia Carvalho Gomes, Elif Ertekin

Thermoelectric materials can convert heat directly into electricity. The discovery of new semiconductors for thermoelectric applications is a challenging task, but it is also of extreme importance to contribute and accelerate the development of the next generation clean and renewable energy sources.

There are many factors that can affect a thermoelectric material's properties. For instance, by introducing impurities in the crystalline structure of some thermoelectric materials, we may be able to highly improve their performance by tuning their electronic properties.

We are then interested in using high-throughput first-principles calculations to investigate how defects can help us achieve high-efficiency energy conversion in materials with potential thermoelectric applications.

Depending on the student's interest and experience, the work can either be focused on the development of computational tools for data management and analysis or on the understanding of fundamental physics and chemistry of the materials. Interested students are expected to have some experience or be willing to learn density functional theory. Basic programming skills will be very useful.

NCSA SPIN mentor Donna Cox

The Advanced Visualization Lab is looking for "Renaissance" students who are interdisciplinary and want to work at the intersection of art and technology.

Self-motivated students might describe their relevant experience to the following proposed projects:

  • Experiment with brand new advanced visualization techniques for relational datasets, using animation tools to present big data like that found in the fields of genetics, health, economics, and politics
  • Work with archaeological researchers at the Cyprus Institute to create virtual environments of historical cities and deploy them in virtual reality or augmented reality
NCSA SPIN mentor James Eyrich

NCSA Incident Response and Security Team (IRST) collects network and system related data via network taps, flow records, active scanning, log collection and other sources. We are interested in an intern that is fluent in python to continue developing some of the existing tools, create new ones and generate reports based on existing data. There are some specific reports and tool changes we are interested in. We are also open to the student participating in tool direction discussions.

Duties to include but not limited to:

  • Assist in developing the host activity database and analysis system
  • Updating system contact database
  • Developing detection rules for emerging threats
  • Develop tools for automated log analysis and anomaly identification
NCSA SPIN mentors Mattia Gazzola, Volodymyr Kindratenko

Newly emerging neuromorphic chip architectures from IBM and Intel enable a novel class of algorithms that describe behaviors commonly associated with the neural cells than the fixed logic of computers. Spiking neural networks (SNNs) are at the core of these chips; they differ drastically from other current approaches to AI, such as those based on deep neural networks (DDNs) that require an extensive offline training. The objective of this project is to explore Loihi—a SNN research chip from Intel—in order to understand and characterize its potential for novel algorithms and applications for simulating nature-inspired processes. We are particularly interested in how to integrate real-time learning and adaptation that is an essential feature of all living organisms.

The student will first study theory of SNNs and explore simulation software, such as GENESIS, to model SNNs consisting of a small number of neurons. He/she will then study Intel's Loihi's architecture and its programming model supported by Intel's NxSDK via a remote access to Intel's Neuromorphic Research Cloud (NRC) system and will implement small SSNs on this chip and compare results from the simulation software and the chip. Eventually, the student should be able to setup and demonstrate to the research team how SNNs can be implemented and trained on Loihi and describe their potential and limitations with regards to the neural systems that can be efficiently modeled on this architecture. This project is particularly suited for students interested in continuing with advanced degree at the intersection of neuroscience and computer science.

NCSA SPIN mentor Kaiyu Guan

Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and U.S. agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

NCSA SPIN mentor Eliu Huerta

The student will participate in the design and exploitation of distributed training algorithms in high performance computing platforms (Blue Waters at NCSA, and Theta and Cooley at Argonne) to train neural network models at scale. Fully trained neural network models will then be benchmarked using state-of-the-art GPUs and FPGAs at the Innovative Systems Lab at NCSA. These algorithms will be used to search for and identify gravitational wave signals in LIGO data, and to characterize transient electromagnetic events in telescope images.

NCSA SPIN mentor Xin Liu

Supermassive black holes (SMBHs) are ubiquitously found at the centers of most galaxies. Measuring SMBH mass is important for fundamental science such as understanding the origin and evolution of SMBHs and enabling the usage of quasars—SMBHs that are actively growing by accreting gas from their surroundings and emitting light time series—as "Standard Candles" for cosmology. However, traditional methods require spectral data which are highly expensive to gather; the existing ~1,000,000 masses represent ~20 years' worth of state-of-the-art community efforts. The Large Synoptic Survey Telescope (LSST) project—in which NCSA is a major partner—will discover ~1,000,000,000 new SMBHs across most of the observable universe; it would take ~20,000 years to weigh them with traditional methods. Therefore, a much more efficient approach is needed to maximize LSST science, which is, however, still lacking.

To solve this problem, here we propose a pilot SPIN research project to develop a completely new, interdisciplinary approach. We will combine astronomy Big Data with machine learning tools to build an algorithm that weighs SMBHs using quasar light time series, circumventing the need for expensive spectra. There are empirical evidence and theoretical reasons to believe that the mass information is encoded in the light time series. However, the encoding is highly nonlinear, which is difficult to fully model using human engineered statistics. We will train deep learning algorithms that directly learn from the data to map out the nonlinear encoding. The result would transform the field of SMBHs and cosmology.

Skills desired: Experience with basic coding, ideally with Python

Contact Xin Liu

NCSA SPIN mentors Shirui Luo, Volodymyr Kindratenko

Machine learning (ML) has made transformative impacts on modeling many high-dimensional complex dynamical systems. Multiphase flow is one of the promising targets for using ML to improve both the fidelity and efficiency of computational fluid dynamics (CFD) simulations. We are examining the use of ML to fit the CFD simulation data to develop closure relations for multiphase flow system. For example, DNNs can be trained on datasets with flows where the initial velocity and void fraction are different. The trained model is then used to predict other flow evolutions with different initial conditions. More broadly, we are tackling problems encountered with the interplay between learning and multiphase flow such as: How can learning algorithms be constructed to include physical constraints such as the incompressibility of fluid? What dimensionality reduction techniques and coarsening strategies are most applicable to identify hidden low-dimensional features? How can the computational scientists, experimentalists and theorists collaborate to produce sufficient training database for multiphase flow simulation?

The student will use open source software packages such as TensorFlow and PyTorch to construct networks to improve predictive capabilities based on a high-fidelity DNS simulation database. The student will have access to HPC platform at NCSA and learn to analyze CFD data at large scale. Besides of the practice of typical ML skills, the student will also learn more fundamentally on how the neural networks be designed to best incorporate physical constraints while avoiding overfitting to imposed physics, as typical statistical learning methods can ignore underlying physical principles.

NCSA SPIN mentor Liudmila Mainzer

African American women have a 4-5 fold greater risk of death from breast cancer compared to Caucasian women, even after controlling for stage at diagnosis, treatment, and other known prognostic factors. Our initial cross-sectional studies suggest that the composition of serum from African American vs. Caucasian women were different and reflected biochemical changes due to socioeconomic status. Thus, we are now tackling a complex multidimensional dataset including proteomic, genomic, biometric, geographic and socioeconomic measurements. These dimensions need to be harmonized and correct statistical approaches applied, in order to determine the exact combination of factors that drive this racial health disparity. Additionally, we are planning to increase the size of our dataset, which will make the problem computationally challenging. We are also extending our analyses to other health disparity problems and other datasets. We invite a talented student to participate in this important and exciting project, and get involved in optimization of our analyses pipelines, development of advanced statistical approaches and data analytics.

Skills desired: Statistics, machine learning, computing, bioinformatics

Contact Liudmila Mainzer

NCSA SPIN mentor Michael Miller

This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition.

NCSA SPIN mentors Ashish Misra, Volodymyr Kindratenko

Many research domains, such as computer vision and language understanding, have been transformed using novel machine learning (ML) and deep learning (DL) methods and techniques. However, these methods are very compute-intensive and rely on state-of-the-art hardware and large datasets to achieve an acceptable level of performance. Research team at the Innovative Systems Lab (ISL) at NCSA has been investigating how neural networks at the core of DL algorithms can be implemented on reconfigurable hardware with the objective to speedup the execution and reduce power requirements for inference algorithms. FPGAs are a good choice for implementing neural networks since they enable highly customized parallel hardware implementation and provide a great degree of flexibility with regards to numerical data types. Most recently, ISL started to explore a novel platform enabled by IBM's CAPI 2.0 interface and SNAP API. This platform allows to develop FPGA applications using high-level synthesis (HLS) methodology rather than a traditional hardware design approach and integrate kernels accelerated on an FPGA with the host-side applications running on IBM POWER9 servers.

The students working on this project will acquire the skillsets that are required to develop ML/DL algorithms in hardware using HLS approach. The students will be involved with a) evaluating performance of existing ML/DL implementations on reconfigurable hardware platforms and documenting the results, b) developing new ML/DL algorithms for implementation on reconfigurable hardware and preparing datasets for testing and evaluation, and c) helping ISL research staff with porting the algorithms to reconfigurable hardware. Required skills include completion of ECE 385 and ECE 408 or equivalent courses.

NCSA SPIN mentor Taras Pogorelov

The cell membrane environment is complex and challenging to model. The Pogorelov Lab at Illinois develops workflows that combining computational and experimental molecular data. We work in close collaboration with experimental labs. Modeling approaches include classical molecular dynamics, quantum electronic structure, and quantum nuclear dynamics. These projects include development of workflows for modeling and analysis of the lipid interactions with proteins and ions that are vital for life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD molecular modeling software.

The cell environment is complex and crowded and is difficult to capture for substantial timescales with modern computational approaches. The Pogorelov Lab at Illinois uses the specialized supercomputer Anton 2 to model cell-like environment for hundreds of microseconds. We develop computational analyses tools and workflow to mine this large amount of unique data. We work in close collaboration with experimental lab to cross validate when possible computational and experimental data. Modeling approaches include classical molecular dynamics and data analyses. These projects include development of workflows for analysis of protein-protein and protein-metabolite interactions, and water dynamics vital life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD, MDAnalysis, and VMD software packages.

NCSA SPIN mentor Andre Schleife

Computational materials science research produces large amounts of static and time-dependent data for atomic positions and electron densities that is rich in information. Determining underlying processes and mechanisms from this data, and visualizing it in a comprehensive way, constitutes an important scientific challenge. In this project we will continue development of our Unity app that is compatible with Windows Mixed Reality, Google Daydream, and iOS. We will implement new features, such as the display and interaction with time-dependent data, as well as novel modes of interaction with the data. In addition, we will use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the atomic and electronic structure of existing and novel materials e.g. for solar-energy harvesting and optoelectronic applications. In a team, we will further develop codes based on the physics-based ray-tracer Blender/LuxRender and the yt framework to produce immersive images and movies.

Skills desired: Android app development, OpenGL/Unity/WebGL, VR code development, creativity and motivation

Contact Andre Schleife

In order to develop nanocrystals that are able to distinguish diseased from healthy tissue and determine how the complex genetics underlying cancer respond to therapy, we need to understand a complex design space. Experiment and theory provide insight into size, shape, composition, and internal structure of different nanocrystals. Students in this team will work with computational and experimental researchers in several departments in order to establish a database to store, share, and catalog optical properties and other relevant data describing semiconductor nanocrystals. This requires developing schemas and analysis workflows that can be efficiently shared between multiple researchers. Students will first identify all information that will need to be included in this catalogue. Students will then write JSON and python code and interface with Globus and the Materials Data Facility. They will create well-documented iPython notebooks that operate directly on the Globus file structure and run in the web browser. Students will also develop code that automatically analyzes data stored in the facility, e.g. to verify and validate experimental and computational results against each other. Eventually, both the data and the workflows will be made available to the general public. This project is highly interdisciplinary and students will work with a team of researchers in bioengineering, materials science, mechanical engineering, and NCSA.

Skills desired: Writing JSON, XML, or any data-interchange formats; programming in Python; collaborative skills in teams of computational and experimental researcher

Contact Andre Schleife

NCSA SPIN mentors Aiman Soliman, Volodymyr Kindratenko

Arctic and Polar scientists have been studying the changes of specific landscape features, fauna, and flora over fairly restricted spatial extents using field expeditions and very high-resolution remote sensing datasets. Over the past years, combined efforts in polar geospatial science and HPC have yielded novel high-resolution Digital Elevation Models (DEM), namely the ArcticDEM and the Reference Elevation of Antarctica (REMA). These state-of-the-art archives capture polar landscape surface at unprecedented spatial (meters scale) and temporal scales (2-3 weeks), and represent records of all the changes that happened and are happening at the Earth's poles. However, the size of the archives represents a real challenge for scientists to extract conclusive results. We are developing DL models that can be applied at a scale to conduct an inventory of polar landscape features and quantify their lateral and vertical changes.

The students working on this project will acquire the skillsets that are required to develop DL models while applying them to monitor the current state of polar environments. The students will be involved with a) preparing model training sets from existing field survey data; b) evaluating the performance of different DL architectures that are suited to segment images, such as Convolutional Neural Networks, as well as architectures that are suited for detecting changes in image sequences, such as Recurrent and Siamese Neural Networks; and c) developing HPC workflows to manage and apply the developed DL models to existing elevation data archives leveraging the cyberinfrastructure at NCSA.

NCSA SPIN mentor Sever Tipei

The project centers on DISSCO, software for composition, sound design and music notation/printing developed at Illinois and Argonne National Laboratory. Written in C++, it includes a graphical user interface using gtkmm, a parallel version is being developed at the San Diego Supercomputer Center. DISSCO has a directed graph structure and uses stochastic distributions, sieves (part of number theory) and elements of information theory to produce musical compositions. Presently, efforts are directed toward refining a system for the notation of music as well as to the realization of an evolving entity, a composition whose aspects change when computed recursively over long periods of time thus mirroring the way living organisms are transformed in time (artificial life).

Another possible direction of research is sonification, the aural rendition of computer generated complex data.

Skills desired: Proficiency in C++ programming, familiarity with Linux operation system, familiarity with music notation preferred but not required

Contact Sever Tipei

NCSA SPIN mentors Kimani Toussaint, Salazar Coariti, Adriana Carola

Are you passionate about games and education? The nanomanufacturing (nanoMFG) node is looking to create a game that can help children build spatial reasoning skills, understand some key concepts in nanotechnology and the importance of simulations in science. Students interested in this project will help design the storyline behind the game and its implementation. Students should have an interest in gaming and education. Desirable skills are programming, web development and planning.

NCSA SPIN mentor Mao Ye

Modern financial markets generate vast quantities of data. As the data environment has become increasingly "big" and analyses increasingly computerized, the information that different market participants extract and use has grown more varied and diverse. At one extreme, high-frequency traders (HFTs) implement ultra-minimalist algorithms optimized for speed. At the other extreme, some industry practitioners apply sophisticated machine-learning techniques that take minutes, hours, or days to run. The proposed project seeks to understand this full spectrum of machine-based trading, with the purpose to inform the public policy and to augment theoretical studies on financial markets. The research agenda focuses on three main themes. 1) Taxonomy. Developing methodologies for estimating the amount of trading activity due to traders at each horizon from proprietary as well as publicly available data; 2) Machine-Machine Interaction. How do interactions among "cyber-traders" impact markets? Under what conditions do such interactions produce extreme disruptions like the Flash Crash of 2010? 3) Machine-Human Interaction. Does machine-based trading mitigate effects of human behavioral biases? Exacerbate them? Do the algorithms themselves introduce any novel types of biases? How do microstructure effects impact larger-scale outcomes for asset pricing and corporate finance? The proposed project will also organize six workshops on big-data research in finance, supported by NBER and the Extreme Science and Engineering Discovery Environment (XSEDE), to stimulate collaboration between financial economists and experts on high-performance computing (HPC) and big data. We are looking for a student with great English writing skills. It will be a plus if a succesful candidate also knows Chinese and has programing skills, but both skills are not necessary.