2018 Summer Mentors

NCSA SPIN mentor Gabrielle Allen

With the recent discoveries of gravitational waves from black hole and neutron star mergers in the last year, we want to make our codes and tools for simulating, analyzing and visualizing these astrophysical systems accessible for K-12 students of different ages. In this project, we are looking either for a student with physics, astronomy or computing skills who is interested in developing and writing pedagogical materials, or a student with education, writing or journalism skills who is interested in science communication.

NCSA SPIN mentor Nigel Bosch

Automatically detecting people's cognitive and affective states has real-world applications that are just beginning to be explored. Examples include smart education software that senses when students are frustrated and offers a hint, or marketers evaluating customers' reactions to a new product. This project will take advantage of recent advancements in machine learning research to improve cognition and affect detection models via semi-supervised and unsupervised methods including adversarial neural networks and denoising autoencoders. Semi-supervised and unsupervised methods will help to address critical issues in this field, especially the lack of labeled data in most emotion datasets. Proficiency in Python is required; linear algebra, machine learning, and data processing are desired skills but can be learned throughout the course of the project. The successful student will also be included on peer-reviewed publications resulting from the project.

NCSA SPIN mentor Colleen Bushell

The Visual Analytics team at NCSA focuses on the application and development of machine learning and visual information design techniques that aid in the comprehension of complex data, especially for use in precision medicine.

Recent advances in genetic sequencing and health monitoring technology have allowed for the collection of complex data at unprecedented scale. The focus of our R&D work is; 1) to apply and refine analytical approaches for the selection and ranking of key features from genetic, microbial, mass spectrometry and other relevant data, that can be used as biomarkers to predict health outcome and/or guide treatment, 2) to develop interactive visualization methods that put these results in a biological context allowing researchers to study the data effectively and to assist doctor-patient decision-making. We are looking for students who have strong skills in one of the following areas: biology (human genome, microbial genome, human disease), health care, software engineering, machine learning, visualization or interface design. Potential projects areas where students can participate include: information design, user-experience design, visualization software development, literature reviews, review of current clinical data tools and challenges faced by healthcare providers.

NCSA SPIN mentor Donna Cox

Are you a programmer, a filmmaker, a musician, an architect, a physicist, or a mathematician? Do you know GPU programming, MaxMSP, or Processing? Can you use Houdini, Maya, After Effects, or Unity? Have you built mobile apps, virtual or augmented reality scenes, or computer simulations? The AVL is looking for multi-disciplinary students who can build digital experiences for cutting-edge arts applications. Tell us what you are good at so we can see you in your best light!

NCSA SPIN mentor Anita Chan

As more and more revisions are made to data and scientific analyses available on government websites concerning environmental and climate protection, there has been a growing need for researchers and coders to preserve environmental data and keep citizens informed of such changes. This position will assist the Environmental Data and Governance Initiative (EDGI), a network of scholars and researchers that archives federal environmental data to safeguard it against potential reductions in access by the current administration, develops online tools to support monitoring changes to federal environmental websites, and tracks cuts in funding, research, and regulation at environmentally oriented agencies. These agencies and departments include, but are not limited to, the EPA (Environmental Protection Agency), NOAA (National Oceanic and Atmospheric Administration), NASA (National Aeronautics and Space Administration), USGS (United State Geological Survey), OSHA (Occupational Safety and Health Administration), DOE (Department of Energy) and BLM (Bureau of Land Management).

This position will support collaborations under EDGI's public data working group that include projects for indexing millions of government web pages on a weekly basis, tracking changes on them, and producing regular reports. Additional ongoing efforts include distributed protocol development for data storage, machine learning work that can isolate the most important website changes for enhanced tracking efforts, and security advancements for privacy protection of EDGI volunteers and workshop participants engaging in data preservation and website monitoring.  Potential project work could also extend developments made under EDGI's Google Summer of Code partnership, where recent collaborations utilized machine learning algorithms to identify and monitor changes on government agency websites using data from multiple sources: Versionista, PageFreezer, and Internet Archive; another recent collaboration used D3 to develop DataRescue Maps as impactful, publically-meaningful models to allow users to easily visualize changes to government websites archived by EDGI. The data being archived is vital for environmental research and protection, but it can be meaningless or overwhelming in the hands of users without clear graphs or interactive models that help provide context and a general overview of the data.

This project welcomes applicants with an interest in environmental data analysis and preservation and other interdisciplinary skills: including Spanish translation, experience in data visualization and coding (Python, Ruby on Rails, or JavaScript in particular); interest in or experience working with data and databases, web crawling, API work, machine learning, open science, and community organizing.

NCSA SPIN mentor Elif Ertekin

The goal of this work is to propel the integration of two dimensional (2D) materials into engineering applications. Two dimensional materials are characterized by their atomic thickness. The atomic thickness changes the properties of 2D materials compared to their bulk counterparts because transport is confined to two dimensions. This results in unique engineering control of electronic—band engineering, transport—and chemical—catalysis—properties that is not capable otherwise. The electronic structure, which we use to tailor properties, is dependent on the topological structure of 2D materials. As part of this project, we will develop topological analysis methods to connect how the electronic structure is changed based on how defects are manipulated. This work involves modeling of the defects at multiple scales using atomistic modeling tools in density functional theory and molecular dynamics and developing descriptions of these defects at the continuum scale. You will learn how to use scientific programming to assemble these methods

NCSA SPIN mentor Nathan Goldbaum

Molecular clouds are the regions in galaxies where the gas density is high enough to collapse under the influence of gravity, leading to the formation of stars. Understanding how molecular clouds form, evolve, and get destroyed is key to understanding the star formation process on galactic scales. This project will have a student analyzing a 3D numerical simulation of galaxy evolution, focusing on the molecular clouds that form in the simulation. The student will develop a pipeline for extracting the location, shape, and mass of gravitationally bounds clouds in the simulation data. The cloud extraction pipeline will be applied to a time series of simulation outputs, tracking the clouds as a function of time, and extracting the lifetime of the clouds, identifying cloud merger events, inferring cloud orbits, and deriving a cloud mass distribution. These data can be directly compared with observations of molecular clouds in nearby galaxies. This project may eventually lead to a journal publication.

The yt project is an open source community developed toolkit for analyzing and visualizing volumetric data. Over the past two years, we have made efforts to improve support for particle data in yt. One aspect of this work is improving the visualization capabilities of yt for particle data. In this project, a motivated student will implement a ray tracing pipeline for producing volume renderings visualizations of 3D smoothed particle hydrodynamics simulations. This project will enable advanced visualizations of simulations produced by a worldwide community of researchers who make use of yt for their day-to-day analysis and visualization tasks. Familiarity with python, cython, C, or C++ will be very helpful in this project. Successful completion of this project will allow in-memory volume rendering of particle simulations with yt. An ambitious student will enable parallel volume rendering on state-of-the-art large-scale N-body simulations on the Blue Waters supercomputer.

NCSA SPIN mentor Kaiyu Guan

Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and US agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

NCSA SPIN mentor Roland Haas

Modern scientific simulations have enabled us to study non-linear phenomena that are impossible to study otherwise. Among the most challenging problems is the study of Einstein's theory of relativity which predicts the existence of gravitational waves detected very recently be the LIGO collaboration. The Einstein Toolkit is a community driven framework for astrophysical simulations. I am interested in recruiting a student interested in improving the code basis and functionality of the Einstein Toolkit. This project will involve developing code in C and C++ incorporating it into an the existing Cactus framework. The successful applicant will be involved with both the Relativity Group at NCSA and the Blue Waters project and will be invited to participate in the weekly group meetings and discussions of their research projects.

NCSA SPIN mentor Eliu Huerta

This project focuses on the development and implementation of new algorithms for the detection and characterization of gravitational wave sources with advanced LIGO. The successful candidate will participate on the development of novel waveform modeling techniques using machine learning, and on the implementation of a suite of data analysis routines that will be implemented at Illinois' campus cluster. These tools will be used to carry out large-scale data analysis studies to shed light on the detectability of spin-precessing, eccentric binary black hole systems with LIGO. The selected student will also participate in the generation of catalogs of numerical relativity simulations with the Einstein Toolkit using the Blue Waters supercomputer and XSEDE. No knowledge of general relativity or advanced physics is required. Knowledge of Linux, C/C++ and Python is highly desirable. The successful candidate will join NCSA's Gravity Group, which is part of the LIGO Scientific Collaboration and the Einstein Toolkit Consortium.

NCSA SPIN mentor Sandra Kappes

Adapt an existing unit of instruction from NCSA's CI-Tutor learning management system and implement it within NCSA’s Moodle learning management system. With help from an NCSA mentor, a suitable instructional design approach will be followed to guide the conversion. This project will provide experience in using LMS technology to develop and deliver instruction and an introduction to learning theories and their implications for instructional design.

NCSA SPIN mentor Daniel Katz

Most scientific computational and data work can be thought of as a set of high-level steps, and these steps can often be expressed as a workflow. Software tools can help scientists define and execute these workflows, for example, Parsl, a library that allows Python programs to execute functions and external applications in parallel and asynchronously. This project could have two parts, depending on the student's interests and experiences. One, focused on scientific applications, will examine how workflow systems like Parsl can be used to help scientific communities that haven't considered generic workflow tools or will compare Parsl with other workflow systems in the context of a specific application. The second, focused on software aspects, will work on improving Parsl and the associated parts of the project (documentation, testing, tutorials). Interested students should have an interest in high-performance computing, big data computing, and/or distributed computing. They should be proficient in a Linux/Unix software development environment and skilled in the Python language.

NCSA SPIN mentor Volodymyr Kindratenko

Deep neural networks are at the core of artificial intelligence, machine learning, computer vision, and other advanced applications across many disciplines.  Such networks allow computers to "learn and infer" rather than "compute," which is essential for many problems in which models that describe the data are multi-dimensional, non-linear, and generally are too complex for traditional mathematical techniques.  Many deep learning frameworks have been developed over the course of past decade providing advanced neural network construction, training, and inference functionality. However, vast majority of these codes have been developed for a single compute node serial execution, which precludes them from training complex network models using large datasets in acceptable time. The challenge is to redesign existing or develop new frameworks that can take advantage of heterogeneous computing platforms to speed up the network training tasks while providing easy to use programming abstractions for domain scientists.

In this project, students will analyze state-of-the-art deep learning software frameworks and will work on optimizing them and removing bottlenecks in order to improve performance of the applications relying on these codes.  The project will contribute to the development of NSF-funded computer system for deep learning and will result in open-source software that will be deployed on this system. Students will write C and Python code; ideally they should be familiar with CUDA and/or OpenCL programming paradigm as well as OpenMP and MPI. Experience with OpenCL programming for FPGAs is a plus.

The goal of this project is to deploy, maintain, and experiment with the latest release of OpenStack cloud operating system software on a cluster at the Innovative Systems Lab. The purpose of this experimental OpenStack deployment is to gain and maintain operational awareness of the new features and functionality ahead of the NCSA's production cloud, provide NCSA staff and affiliate faculty with a platform to experiment with the new OpenStack functionality, and to study and evaluate new projects within the OpenStack environment. This project is best suited for students interested in system administration, deployment and operation of complex cloud and HPC environments. Requirements: CS 425 or similar course.

NCSA SPIN mentor JaeHyuk Kwack

Finite element method is a popular numerical technique in science and engineering projects for finding approximate solutions to boundary value problems for partial differential equations. It requires discretized domains (i.e., meshes) filled with finite elements (e.g., tetrahedrons and hexahedrons in 3D). The p-refinement is a common way to improve numerical accuracy of solutions without changing the number of elements in the finite element mesh. It refers to increasing the degree of the highest complete polynomial (p) within an element; as a result, each element happens to have higher accuracy for computed field data. This project is a multi-year continued effort via the NCSA SPIN program. The SPIN intern this year will make an effort for an enhanced representation for curved shapes. For shape functions with the C0 continuation, their gradients are element-wise continuous, but discontinuous at the element boundary. The SPIN intern will propose an optimal approach to minimize the discontinuity via p-refinements, and will implement it to the stand-alone program developed via the previous SPIN project. The developed program will provide an efficient way for users to improve mesh quality without maneuvering complicate geometry in a GUI environment. In addition, the SPIN intern will provide an interface to a parallel finite element program for computational fluid dynamics and fluid-structure interactions; therefore, it will allow the developed program to be used as a building block for an adaptive mesh refinement scheme or a sub-grid mesh generation process for multi-scale analyses. Skills required: C, C++ or Python.

NCSA SPIN mentor David LeBauer

The TERRA REF program will provide an unprecedented open-access source of data and an integrated phenotyping system for energy sorghum. The TERRA REF system includes field- and controlled-environment digital sensing of energy sorghum along with computational pipelines and open data for the research community. These will be used for crop selection and better understanding of the interactions among genes, traits, and the environment. This position will assist in the development of infrastructure for data processing and access required by the TERRA program.

The intern will work with researchers at NCSA, IGB, Crop Sciences, and Civil Engineering to develop process and faciliate the cross-disciplinary exchange of data and information. Desired skills in image analysis, geospatial information systems, informatics, and high performance computing will be useful. Programming can be done in any open source scripted or compiled language such as R, Python, or C++.

NCSA SPIN mentor Hon Wai Leong

Workload scheduler is one of the most important components in HPC system. Due to the complexity of a HPC system with different component (compute, storage, software, etc.) integrated together, the scheduler’s functionality and performance trigger different kind of issues throughout the operational life span of the system. Some are known issues with documented fix, some requires new fixes, some are variants of previous fixed issues that reoccurred due to a change in other component of the system (e.g. after a system patch). Most of these issues are identified through human interaction, either via user reported incidents or human written automated regression tests. Some minor issues are not detected until they cause major incident such as unexpected large volumes of job failures. Humans are not always quick enough to detect any abnormal behavior in the system. This is where machine learning could be use utilized to improve the effectiveness. A ML framework can be trained to study the behavior of the scheduler by analyzing scheduler logs, identifying job patterns and send out notification to system administrators when it detects any anomaly in the scheduler. Early detection of these unusual activities can be addressed as soon as it is identified, before it causes more damage or escalates to a major problem later. The ML framework could also be trained to analyze the regular performance of the scheduler, and send out notification when the scheduler is under performance due to some abnormal activities. Having mixture of workloads with different job requirement running in a HPC system, it is expected to be a challenging task in training the ML framework to achieve a fully self-aware mechanism in learning and identifying these unusual behaviors that human could not easily see.

NCSA SPIN mentor Bertram Ludaescher

Since data science consists of 80% data wrangling (or data clean(s)ing) and 20% analytics, its importance is increasingly realized in industry and academia. In this internship, you will explore various data wrangling techniques, using specialized tools such as OpenRefine, Trifacta Data Wrangler, and then compare these with solutions based on database technology, and general purpose libraries in Python and R. The goal is to determine the strengths and weakness of the various tools and approaches take steps towards more scalable and automated data wrangling solutions for data science. 

The ideal candidate will have experience with at least one of Python or R and should be excited about learning new technologies for data management and analysis.

NCSA SPIN mentor Charalampos Markakis

Numerical relativity is a rapidly developing field. The development of black-hole simulations has been revolutionary, and their predictions were recently confirmed with the detection of gravitational waves by LIGO. The next expected source is neutron-star binaries, but their simulation is more complicated, as one needs to model relativistic fluids in curved spacetime, and the behavior of matter under the extreme conditions found in neutron-star cores. In this project, you will use the methods you are already familiar with, from Lagrangian or Hamiltonian mechanics, to model fluids in an intuitive way. You will find that a seemingly complex hydrodynamic problem can be greatly simplified, and be reduced to just solving a non-linear scalar field equation. The successful applicants will be able to solve such wave equations numerically in their favorite programming or scripting language (C, Python, Mathematica, etc). This powerful approach allows one to accurately model oscillating stars or radiating binaries, some of the most promising sources expected to be observed in the next LIGO science runs.

Student background: A background in classical mechanics and numerical methods is useful. Familiarity with fluid dynamics or scalar fields is a plus, but training will be provided.

NCSA SPIN mentor Michael Miller

This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition. A successful candidate would need to have completed CS125 (Intro to Computer Science) or have equivalent experience.

NCSA SPIN mentor Andre Schleife

Computational materials science research produces large amounts of static and time-dependent data that is rich in information. Extracting relevant information from these data to determine underlying processes and mechanisms constitutes an important scientific challenge. It is the goal of this project to use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the structure of existing and novel materials e.g. for solar-energy harvesting, optoelectronic applications, and focused-ion beam technology. This team will develop codes e.g. based on the open-source ray-tracer Blender/LuxRender and the open-source yt framework to produce image files and movies. Stereoscopic images will be visualized using virtual-reality viewers such as Google Cardboard, Oculus Rift, or HTC Vive. Preliminary implementations exist and within this project the team will develop GPU-based visualization codes to enable high-throughput rendering of large data sets.

Skills that are beneficial for this work: Java/Android/iOS app development; OpenGL/Unity/WebGL; VR code development, ideally Google Daydream; creativity and motivation.

NCSA SPIN mentor Sever Tipei

The project consists of expanding the features of DISSCO, software that combines composition, sound design and music notation/printing in a seamless process. Written in C++ it includes a Graphic User Interface using gtkmm. Since random distributions could be introduced by the user at all structural levels, DISSCO produce multiple variants of the same composition that preserve a basic framework while differing in details. Future efforts will include researching how elements of Graph Theory and Information Theory could contribute to the realization of an Evolving Entity, a composition that is computed over long periods of time creating multiple consecutive variants resembling the transformations of a living organism (artificial life).