Spin – 2021 Summer Mentors

2021 Summer Mentors

Summer Mentors

Visual Analytics Group 3D Biological Network Visualization

Charles Blatti, Donna Cox, Colleen Bushell

This project will continue ongoing work using 3D data visualization techniques to interpret top results from gene expression analysis. The student will augment an existing application built on the Unity game engine with a specialized mode for viewing complex gene networks and their relevant annotations in a three-dimensional interactive environment. Previous 3D design or programming experience is valuable.

This student will be co-mentored by the Advanced Visualization Lab (AVL) and the Visual Analytics team at the NCSA. The AVL specializes in cinematic data visualization, software development, and interactive computer graphics. The Visual Analytics team focuses on the application and development of visual information design techniques that aid in the comprehension of complex biological and medical data.

Improving Automatic Anonymization of Text with Machine Learning

Nigel Bosch

Analyzing what people write in online discussion forums can yield valuable insights about people’s sentiment and thought processes. However, much of that text data includes personal identifying information that needs to be removed to protect privacy. This project will address the privacy issue by improving on an existing machine learning system for automatic text anonymization. 

In particular, the project will involve extracting new features from the text. Decision tree and deep neural network models will then be trained with these new features and compared to the existing models to determine whether accuracy improvements can be made. 

 

Proficiency in Python is required, along with data processing experience with Pandas and NumPy. Experience with Scikit-learn and TensorFlow is also helpful, but not required.

Improving Champaign County 211 Service Provision

Anita Chan, Jorge Rojas Alvarez

The Community Data Clinic and Cunningham Township are seeking a student with programming and web design skills for a project rebuilding the Illinois PATH 211 database from scratch. The new version will be a web-based directory based on a relational database of social service providers in the Champaign County area. The student’s main task will be building out this web application to filter results and integrate user feedback.

We are collaborating with local partners and stakeholders to ensure that our platform is as accessible and intuitive as possible, so a keen UI sensibility or experience with front-end accessibility would be a plus. For the first year we are imagining a lightweight proof-of-concept, perhaps just using JavaScript and Firebase, plus HTML and CSS.

Natural Language Processing

AJ Christensen, Jill Naiman, Jana Diesner

 

NCSA’s Advanced Visualization Lab (AVL) in collaboration with the iSchool are looking for an undergraduate research intern to help with a research project that builds on the research of doctoral candidate Rezvaneh (Shadi) Rezapour and Professor Jana Diesner, which uses data mining and natural language processing techniques to study the effects of issue-focused documentary films on various audiences by analyzing reviews and comments on streaming media sites. 

This new research will focus specifically on science-themed documentaries that use computational science research in their science explanations. Student researchers would be responsible for working with mentors in the iSchool and the AVL to collect data from streaming sites and analyze the data using existing purpose-built software and developing new tools.

Application of an Agent-Based Model Simulation Platform to Evaluate COVID-19 Control Measures in Northern Illinois Counties

Weihao Ge, Liudmila Sergeevna Mainzer

 

This project aims to immediately benefit northern Illinois counties in response to the pandemic. We hope to collaborate with 11 counties in northern Illinois to acquire detailed local COVID-19 case data. We will apply a well-implemented stochastic agent-based model (ABM) to identify optimal anti-COVID-19 policies and interventions. The work will provide hands-on experience with data analysis and epidemiology simulation on computational clusters. Meanwhile, it will set foundations for real-time epidemic policy evaluation for larger areas. 

We desire students familiar with Python, but most importantly, enthusiastic about converting research products into community services.

Using Satellite Data for Large-Scale Crop Monitoring

Kaiyu Guan, Jian Peng

 

Dr. Kaiyu Guan’s lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and U.S. agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. 

The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

Convergence of Physics-Inspired AI and Extreme Scale Computing for Multi-Messenger Astrophysics

Eliu Huerta

 

We have a research opening for a student interested in the development of physics-inspired AI models for multi-messenger astrophysics. The selected student will work with a team of undergraduate and graduate students, postdocs and faculty with extensive expertise in AI, data and supercomputing.

The student is expected to have knowledge of Python, version control (gitlab, GitHub, etc.). Knowledge of TensorFlow, PyTorch or any other open source platform for deep learning is desired. No knowledge of physics or astronomy is required. The student will become an affiliate of the NCSA Center of Artificial Intelligence Innovation and the NCSA Gravity Group, and will have access to the Hardware Accelerated Learning (HAL) cluster at NCSA, and to the entire ecosystem of AI supercomputers in the U.S., including Bridges-AI, Neocortex and Summit.

On-the-Fly Data Generation for Neural Network Based Gravitational Wave Detection

Roland Haas

 

This project is part of the ongoing effort in the NCSA Gravity Group to study gravitational waves produced by colliding compact objects like black holes and neutron stars. We use machine learning techniques to improve LIGO’s detection capabilities to be able to detect gravitational waves more quickly (lower latency) and increase the set of waveforms that can be detected (for example detecting black holes on elliptic orbits as well as circular orbits).

Training these networks requires large numbers of training waveforms to be available to the network. We currently use LIGO’s LIGO Algorithm Library (LAL) to produce these waveforms offline then read them in during the network training phase. For large parameter spaces this become infeasible since the training dataset increases in size to hundreds of terabytes. This project aims to produce training waveforms on the fly, interleaving a training epoch (on the GPU) with waveform production for the next epoch (on the CPU). You will use Python, LAL and TensorFlow to set up a pipeline that uses CPU cores to produce waveforms while using the produced waveforms to train neural networks. Students working on this project will have access to the HAL cluster at UIUC and Summit at Oak Ridge National Lab.

 

Skills required:

good understanding of Python

working knowledge of Python multiprocessing module

willing to learn new Python modules (h5py, numpy)

experience using Linux and the command line interface

some basic experience using Tensorflow

Before applying for this project please work through the exercise available at https://wiki.ncsa.illinois.edu/display/~rhaas/SPIN+2021+Exercise as this will be part of the interview with prospective candidates. I will not consider your application unless I have received the exercise.

Improving Speed of Characterizing Numerical Relativity Waveforms

Roland Haas

 

The Laser Interferometer Gravitational-Wave Observatory’s (LIGO) detection of gravitational waves from merging black holes in September 2014 inaugurated a new era in astronomy and astrophysics, opening a window to observe the Universe through gravitational radiation. Occurring 100 years after Einstein’s announcement of his theory of general relativity, the detection spurred world-wide interest in physics and science in general, making headline news around the world. The recent Nobel Prize awarded for this detection and the announcement of the detection of the double binary neutron star system by LIGO/Virgo underline the importance of these efforts and the interest that the wider society has in it.

This project is part of the ongoing effort in the NCSA Gravity Group to study gravitational waves produced by colliding compact objects like black holes and neutron stars. We use various software tools for this purpose. For simulations on supercomputers we use the Einstein Toolkit computational framework. We have developed a Python code to convert simulation results produced to LIGO injection format files. Part of the conversion is to measure parameters of the waveforms by comparing to post-Newtonian waveforms. This project will aim at two goals:

 

improve speed of Python code used to generate PN waveform

improve method used to measure parameters

fully integrate multiple Python scripts into a single script

 

Skills required:

knowledge of Python including numerical libraries numpy, scipy

some basic numerical analysis knowledge e.g. for root finding

Before applying for this project please work through the exercise available at https://wiki.ncsa.illinois.edu/display/~rhaas/SPIN+2021+Exercise+II as this will be part of the interview with prospective candidates. Please note that I will not consider your application unless I have received the exercise.

Hybrid Cloud Infrastructure 

Volodymyr Kindratenko

 

The Innovative Systems Lab (ISL) is looking for a student interested in deploying and operating private cloud infrastructure based on OpenStack, RHEL OpenShift, Kubernetes or similar technologies. The student with work with ISL system engineers and C3SR researchers to develop and deploy innovative solutions to support hybrid cloud infrastructure research. The student is expected to have knowledge of Linux system administration, CLI, and Python. Knowledge of any open source or commercial cloud platforms is desirable.

Implementation of Algorithms on Reconfigurable Hardware

Volodymy Kindratenko

 

The Center for AI Innovation in collaboration with the Innovative Systems Lab (ISL) is looking for students interested in acceleration of machine learning algorithms on FPGAs and other unconventional architectures. The students will work with a team of other undergraduate and graduate students and a postdoc on several aspects of FPGA-based computing, ranging from machine learning frameworks integration with FPGA-based inference models to the development of HLS-based FPGA codes. The students are expected to have taken ECE 385 or similar class as well as an applied machine learning class. Knowledge of TensorFlow, PyTorch or any other open source platform for deep learning is desirable; knowledge of HLS design methodology is a plus. The students will become affiliates of the NCSA Center for AI Innovation, and will have access to FPGA systems at ISL and Xilinx Center of Excellence for Adaptive Computing at the Coordinated Systems Lab.

Deep Learning Model Optimization

Volodymy Kindratenko

 

The Center for AI Innovation is looking for a student interested in the development of optimization techniques for reducing complexity of deep learning models. Previously we have developed a technique for network pruning carried out simultaneously with model training. This current project seeks to advance this technique by implementing it on new NVIDIA GPUs that have hardware support for sparse matrix operations. The student is expected to have taken ECE 408 or similar class as well as an applied machine learning class. Proficiency with TensorFlow, PyTorch or any other open source platform for deep learning is required. The student will become an affiliate of the NCSA Center for AI Innovation and will have access to GPU systems at the Innovative Systems Lab at NCSA.

Development of AI Models for Human Action Recognition

Volodymy Kindratenko

 

The Center for AI Innovation is looking for a student interested in the development and implementation of machine learning models for recognizing human actions. We previously have developed models for human fall detection and aggression detection, and have implemented human fall detection model on RaspberryPI platform. The selected student will work on improving these models and developing new models and their implementations on low-power edge devices. The student is expected to have a good working knowledge of Python and C++. Knowledge of TensorFlow, PyTorch or any other open source platforms for deep learning is required as well. The student will become an affiliate of the NCSA Center for AI Innovation and will have access to advanced GPU hardware for model training.

Deep Learning in Medical Image Analysis

Xiaoxia Liao

 

Image-based pathology (histopathology) is the gold standard for cancer diagnosis but has been unable to differentiate high-risk cancer cases from low-risk ones. In the United States, this inability demands interventions for all patients and generates significant medical side effects, at an estimated annual cost of $4 billion for breast cancer alone. Since early 2000s, it has been recognized that high-risk cancer cases, but not the low-risk ones, are accompanied by an active tumor microenvironment for tumor cells to invade and metastasize. Thus, standard histopathology that focuses on detection of tumor cells but lacks information on the tumor microenvironment should ideally be complemented by an optical imaging technology that better reveals the tumor microenvironment.

Using information extracted from the tumor microenvironment will permit to overcome this critical inability to differentiate low risk from high-risk. To better take into account the tumor micro-environment, here, we propose to use the multiphoton histopathology. Our hypothesis is that a combined standard and multiphoton histopathology, together with a demonstrated platform of a survival convolutional neural network, will dramatically improve the prediction accuracy for breast cancer outcome.

 

Skill requirements:

Must at least take one machine learning or deep learning class

Have knowledge and understanding of deep learning

Programming experience

Machine Learning Approach to Computational Fluid Dynamics

Shirui Luo, Volodomyr Kindratenko

 

Machine learning (ML) has made transformative impacts on modelling many high-dimensional complex dynamical systems. Multiphase flow is one of the promising targets for using ML to improve both the fidelity and efficiency of computational fluid dynamics (CFD) simulations. We are examining the use of ML to fit the CFD simulation data to develop closure relations for multiphase flow system. For example, DNNs can be trained on datasets with flows where the initial velocity and void fraction are different. The trained model is then used to predict other flow evolutions with different initial conditions. 

More broadly, we are tackling problems encountered with the interplay between learning and multiphase flow such as: How can learning algorithms be constructed to include physical constraints such as the incompressibility of fluid? What dimensionality reduction techniques and coarsening strategies are most applicable to identify hidden low-dimensional features? How can the computational scientists, experimentalists and theorists collaborate to produce sufficient training database for multiphase flow simulation?

 

The student will use open source software packages such as TensorFlow and PyTorch to construct networks to improve predictive capabilities based on a high-fidelity DNS simulation database. The student will have access to HPC platform at NCSA and learn to analyze CFD data at large scale. Besides of the practice of typical ML skills, the student will also learn more fundamentally on how the neural networks be designed to best incorporate physical constraints while avoiding overfitting to imposed physics, as typical statistical learning methods can ignore underlying physical principles.

Determination of Biomarkers to be Used in the Diagnosis of Cardiac Microvascular Disease in Postmenopausal Women

Zeynep Madak-Erdogan, Justina Zurauskiene

 

Coronary microvascular disease (CMD) is a common form of heart disease in postmenopausal women. CMD is due to dysfunction of microvessels that feed the heart muscle and is different from coronary artery disease (CAD), which is due to plaque formation. Majority of the patients do not receive a proper diagnosis and have to go back to hospital with persistent symptoms. We are proposing to identify circulating biomarkers of CMD.

We hypothesize that plasma metabolite and protein profiles are different for postmenopausal women with no heart disease, with CAD or with CMD. We are collaborating with clinicians from Izmir Katip Celebi Research and Training Hospital, Turkey. They already recruited 75 patients (25 patients per group, 3 groups-healthy, CAD and CMD) and completed their full health screening and tests. We are proposing to perform full metabolite and proteomic profiling of plasma samples from these individuals, identify biomarkers using machine learning approaches and validate our findings in a second cohort of patients that our collaborators are currently recruiting. Our research will identify circulating biomarkers of this debilitating heart disease in postmenopausal women and will have a clinical impact by providing biomarkers that can be used for diagnostic test design in the future.

 

Contact Aiman Soliman

Speech-to-Text Auto Captioning

Michael Miller

This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition. A successful candidate would need to have completed CS125 (Intro to Computer Science) or have equivalent experience.

Modeling of the Complex Environment of the Cell 

Taras Pogorelov

The cell environment is complex, crowded and is difficult to capture for substantial timescales with modern computational approaches. The Pogorelov Lab at Illinois uses the specialized supercomputer Anton 2 to model cell-like environment for hundreds of microseconds. We develop computational analyses tools and workflow to mine this large amount of unique data. We work in close collaboration with experimental lab to cross validate when possible computational and experimental data. 

Modeling approaches include classical molecular dynamics and data analyses. These projects include development of workflows for analysis of protein-protein and protein-metabolite interactions, and water dynamics vital life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD, MDAnalysis, and VMD software packages.

Multiscale Modeling of the Cellular Membrane-Associated Phenomena

Taras Pogorelov

The cell membrane environment is complex and challenging to model. The Pogorelov Lab at Illinois develops workflows that combining computational and experimental molecular data. We work in close collaboration with experimental labs. Addressed questions include investigations of fundamental mechanisms of membrane activity, structural dynamics of peripheral and transmembrane proteins, and development of drugs. 

Modeling approaches include classical molecular dynamics, quantum electronic structure, and quantum nuclear dynamics. These projects include development of workflows for modeling and analysis of the lipid interactions with proteins and ions that are vital for life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD molecular modeling software.

Virtual Reality and Holographic Data Visualization in Materials Science

Andre Schleife

Computational materials science research produces large amounts of static and time-dependent data that is rich in information, e.g. for atomic positions and electron densities. This poses a challenge when determining underlying processes and mechanisms from this data, in order to advance scientific understanding.

In addition, visualizing this information in a comprehensive way constitutes an important scientific challenge in itself. In this project we aim to build compelling and interactive visualizations, by using virtual reality hardware (Oculus Quest, Windows Mixed Reality) and the LookingGlass holographic display. This requires development of the software infrastructure that takes materials science data, creates 3D scenes in Blender and exports those to Unity, and, finally, run the results on VR or holographic hardware. Turning this into an interactive user experience will also be part of the project and the extension to time-dependent data will be explored. Experience with virtual-reality SDKs (especially Unity), raytracing (Blender), and data visualization are necessary for this project.

AI System for Identification of Wildlife Species in Conservation Videos 

Aiman Soliman, Brian Allan

To date, processing of conservation footage has been performed manually by trained volunteers, with each identification requiring verification by a member of the scientific leadership team, a highly time-consuming process. Moreover, obtaining accurate species population estimates from such footage presents a challenge since, for many species, individual animals cannot be identified, and assumptions must be made about the number of individuals detected.

The objective of the project is to develop deep learning models for high-throughput processing of conservation footage collected to automate species identifications and tracking of individual animals to produce more precise population estimates for wild and marine life species.

 

Skills desired: TensorFlow, image analyses, basic Linux

Music on High-Performance Computers 

Sever Tipei

The project centers on DISSCO, software for composition, sound design and music notation/printing developed at Illinois and Argonne National Laboratory. Written in C++, it includes a graphical user interface using gtkmm, a parallel version is being developed at the San Diego Supercomputer Center. DISSCO has a directed graph structure and uses stochastic distributions, sieves (part of number theory) and elements of information theory to produce musical compositions. 

Presently, efforts are directed toward refining a system for the notation of music as well as to the realization of an evolving entity, a composition whose aspects change when computed recursively over long periods of time thus mirroring the way living organisms are transformed in time (artificial life).