2017-2018 Academic Year Mentors

NCSA SPIN mentor Donna Cox

Are you a programmer, a filmmaker, a musician, an architect, a physicist, or a mathematician? Do you know GPU programming, MaxMSP, or Processing? Can you use Houdini, Maya, After Effects, or Unity? Have you built mobile apps, virtual or augmented reality scenes, or computer simulations? The AVL is looking for multi-disciplinary students who can build digital experiences for cutting-edge arts applications. Tell us what you are good at so we can see you in your best light!

NCSA SPIN mentors Mark Fredricksen and Daniel LaPine

We need to develop an easy way to gather information needed for metrics related to our service level agreements, and verifying functionality at all times. Right now there are a variety of tools used to gather the information, that we would want collected into a single tool for reporting. We also would like to consolidate some of the tools where there is redundant capability. Skills required: programming or scripting, web development basics, and planning.

NCSA SPIN mentor Kaiyu Guan

Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and US agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

NCSA SPIN mentor Roland Haas

Modern scientific simulations have enabled us to study non-linear phenomena that are impossible to study otherwise. Among the most challenging problems is the study of Einstein's theory of relativity which predicts the existence of gravitational waves detected very recently be the LIGO collaboration. The Einstein Toolkit is a community-driven framework for astrophysical simulations. I am interested in recruiting a student interested in improving the elliptic solver of the Einstein Toolkit, to extend its functionality and improve its speed. Depending on student interest the project can focus more on mathematical aspects or on actual coding. The successful applicant will be involved with both the Relativity Group at NCSA and the Blue Waters project and will be invited to participate in the weekly group meetings and discussions of their research projects.

Numerical simulations of Einstein's equations of general relativity require realistic sets of initial data that describe astrophysically-realistic scenarios. This typically involves solving an elliptic type (Poisson-like) partial differential equations. The Einstein Toolkit contains a parallel, multi-grid elliptic solver CT_MultiLevel that can solve the initial data problem using simple grids and boundary conditions. The proposed projects extend CT_MultiLevel to support spherical grids to improve the quality of the boundary condition and employ an improved solution method to speed up finding a solution of the partial differential equation. Students participating in this project will gain experience working with large. collaborative science codes, practice code development for real world problems, and learn about current numerical methods that are applicable to a wide range of scientific areas. Familiarity with Linux and command line tools as well as a good working understanding of basic C/C++ is required to implement the methods. Willingness to understand existing code into which the new methods need to be integrated is also a prerequisite. The mathematical aspects of the problems center on properties of elliptic equations and some familiarity with the notation of partial differential equations will be helpful to gain most from the experience.

The NCSA gravity group uses laptops and workstations to develop code as well as the campus cluster and Blue Waters for production simulations in which all group members are involved.

Modern scientific simulations have enabled us to study non-linear phenomena that are impossible to study otherwise. Among the most challenging problems is the study of Einstein's theory of relativity which predicts the existence of gravitational waves detected very recently be the LIGO collaboration. The Einstein Toolkit is a community-driven framework for astrophysical simulations. I am interested in recruiting a student interested in improving the scalability of the Einstein Toolkit, and its use for large-scale simulation campaigns. Depending on student interest project range from Python scripting to C++ codes running on thousands of CPU cores on Blue Waters. The successful applicant will be involved with both the Relativity Group at NCSA and the Blue Waters project and will be invited to participate in the weekly group meetings and discussions of their research projects.

Beyond its flagship codes McLachlan and GRHydro to solver Einstein's equations of general relativity and general relativistic fluid dynamic respectively the Einstein Toolkit contains utilities to manage code and simulations. SimFactory simplifies downloading and compiling the Einstein Toolkit and to submit and manage simulations on supercomputers. On supercomputers data is usually removed after it has not been used for a set number of times (purged) and typically need to be copied to offline storage (archived) for later retrieval. The first sub-project aims to add an archiving interface to SimFactory that presents the user with a unified set of commands to archive and retrieve simulation data that is independent of the cluster on which the data resides and that lets the user move data between supercomputer clusters. The Einstein Toolkit contains an extensive set of test-suites to ensure correctness of results after code changes. Currently these test are run serially one after the other and only text file output is supported. The second sub-project aims at parallelizing the test suite and to add support for binary, HDF5 output data. A working knowledge of Perl, Python and possibly C/C++ will be required to succeed in these projects as well as a willingness to understand existing code.

NCSA SPIN mentor Kathryn Huff

The candidate will implement a proof-of-concept Python package demonstrating acceleration of Monte Carlo methods—in particular, Monte Carlo for neutral particle transport—using machine learning techniques. First, a simple, threaded, Monte Carlo Python package will be implemented and tested. Next, the student will implement a few potential acceleration methods inspired by machine learning and optimization techniques in the literature. Finally, these acceleration methods will be tested and compared to one another for simple problems. This work will emphasize implementation of best practices in scientific computing, including integrating documentation, implementing unit tests, investigating scalability, and developing demonstration Jupyter Notebooks. A strong interest in scientific computing combined with interest in data science and statistics will be required.

skills required: python, linux, bash, basic statistics

skills desired: git, sphinx, pytest, machine learning

Contact Kathryn Huff

NCSA SPIN mentor Sandra Kappes

Adapt an existing unit of instruction from NCSA's CI-Tutor learning management system and implement it within NCSA's Moodle learning management system. With help from an NCSA mentor, a suitable instructional design approach will be followed to guide the conversion. This project will provide experience in using LMS technology to develop and deliver instruction and an introduction to learning theories and their implications for instructional design.

NCSA SPIN mentor Dan Katz

Most scientific computational and data work can be thought of as a set of high-level steps, and these steps can often be expressed as a workflow. Software tools can help scientists define and execute these workflows, for example, Swift which is both a language and a runtime system. This project could have two parts, depending on the student's interests and experiences. One, focused on scientific applications, will examine how workflows like Swift can be used to help scientific communities that haven't considered generic workflow tools, specifically in astronomy, such as the Large Synoptic Survey Telescope (LSST) and the Square Kilometer Array (SKA). LSST is a new kind of telescope, currently under construction in Chile, designed to conduct a ten-year survey of the dynamic universe. LSST can map the entire visible sky in just a few nights, and images will be immediately analyzed to identify objects that have change or moved: from exploding supernovae on the other side of the Universe to asteroids that might impact the Earth. SKA is a massive, international, multiple radio telescope project, that will provide the highest resolution images in all astronomy. Both projects represent challenging data acquisition and analysis problems, integrating workflows, scientific codes, and advanced data centers. The second, focused on software aspects, will examine how Swift might interact with other open source projects, such as the Apache stack. Interested students should have an interest in high-performance computing, big data computing, and/or distributed computing. They should be proficient in a Linux/Unix software development environment and skilled in the C language. Optional but desirable skills include Java, ANTLR/bison/yacc/lex, sockets, and/or MPI.

NCSA SPIN mentor Vlad Kindratenko

The goal of this project is to deploy, maintain, and experiment with the latest release of OpenStack cloud operating system software on a cluster at the Innovative Systems Lab. The purpose of this experimental OpenStack deployment is to gain and maintain operational awareness of the new features and functionality ahead of the NCSA's production cloud, provide NCSA staff and affiliate faculty with a platform to experiment with the new OpenStack functionality, and to study and evaluate new projects within the OpenStack environment. This project is best suited for students interested in system administration, deployment and operation of complex cloud and HPC environments. Requirements: CS 425 or similar course.

This project will involve deployment and evaluation of existing deep learning frameworks on an HPC cluster and on a cloud. The goal is to gain hands-on experience with deep learning codes, frameworks, and methodologies and to support upcoming projects requiring deep learning.  The work may also require parallelizing codes to work on multiple nodes. This project is best suited for students interested in the development of machine learning techniques and their applications in science and technology fields. Requirements: CS 446 and CS 420, or similar courses.

NCSA SPIN mentor Matthew Krafczyk

An important measure of the value of a scientific finding is its ability to be independently reproduced by others skilled in the area. When efforts are made to reproduce such findings years may have elapsed, and reproducibility may be unsuccessful. There are many factors which may prevent the replication of a study, and we seek to understand those related to the computational aspects of the work. We estimate that only about 10% of scientists doing computationally based work release their source code in any form. The successful applicant will join an effort to introduce more transparency to computationally based research. This will include locating source code which was used to create articles and using it to reproduce the article's result. During this process we will study scientific workflows and development habits which hinder or enable reproducibility, as well as develop tools to empower researchers to make their code available more easily. We will build a website to help elucidate these details of the scientific method to the public. Recommended Skills: R, C/C++, Python, web development including HTML, database engineering including SQL, workflow tools.

NCSA SPIN mentor JaeHyuk Kwack

Finite element method is a popular numerical technique in science and engineering projects for finding approximate solutions to boundary value problems for partial differential equations. It requires discretized domains (i.e., meshes) filled with finite elements (e.g., tetrahedrons and hexahedrons in 3D). The p-refinement is a common way to improve numerical accuracy of solutions without changing the number of elements in the finite element mesh. It refers to increasing the degree of the highest complete polynomial (p) within an element; as a result, each element happens to have higher accuracy for computed field data. Via this project, the SPIN intern will develop a standalone program for p-refinements of 3D finite element meshes with boundary conditions. Since mesh generation for complicated geometry is usually processed under GUI environments ahead of numerical simulations, it is practically inefficient to update mesh information like p-refinement at the beginning of simulations. The developed program will provide an efficient way for users to improve mesh quality without maneuvering complicate geometry in a GUI environment. In addition, the SPIN intern will provide an interface to a parallel finite element program for computational fluid dynamics and fluid-structure interactions; therefore, it will allow the developed program to be used as a building block for an adaptive mesh refinement scheme or a sub-grid mesh generation process for multi-scale analyses. Skills required: C, C++, or Fortran.

NCSA SPIN mentor David LeBauer

The TERRA REF program will provide an unprecedented open-access source of data and an integrated phenotyping system for energy sorghum. The TERRA REF system includes field- and controlled-environment digital sensing of energy sorghum along with computational pipelines and open data for the research community. These will be used for crop selection and better understanding of the interactions among genes, traits, and the environment. This position will assist in the development of infrastructure for data processing and access required by the TERRA program.

The intern will work with researchers at NCSA, IGB, Crop Sciences, and Civil Engineering to develop process and faciliate the cross-disciplinary exchange of data and information. Desired skills in image analysis, geospatial information systems, informatics, and high performance computing will be useful. Programming can be done in any open source scripted or compiled language such as R, Python, or C++.

NCSA SPIN mentor Bertram Ludaescher

Data provenance (or data lineage) describes the origin and processing history of data products from workflows or scripts and thus is important metadata in support of transparency and reproducibility in computational and data science. Provenance information often comes in the form of labeled, directed graphs, representing the conceptual or actual dataflow of the computation. Being able to effectively and efficiently query such graphs is an important research problem. As part of this internship, you will learn about different languages for querying graphs (e.g., regular path queries), interesting advanced queries (e.g., to compute the lowest common ancestor(s) in trees and DAGs), and ways to implement such queries using different approaches. The overall goal is to prototype one or more graph querying approaches and evaluate their efficiency on large provenance graphs. Desirable skills: programming experience and interest in algorithms and databases.

NCSA SPIN mentor Liudmila Mainzer

We are improving performance of a stepwise epistatic model selection for Genome-Wide Association Studies. The method itself works well, but the current Java implementation is way too slow for modern data sizes. We would like to deploy this Java code on Spark, to see if the necessary performance gains could be obtained. A successful student applicant will use Java Spark API to adapt the current code for a Spark platform that is being deployed at NCSA's Innovative Systems Lab. This code will be validated for correctness in collaboration with a student statistician from the lab of Dr. Lipka, who developed this statistical method.

The teams of NCSA Genomics and Data Analytics are jointly looking for a student who enjoys running complex statistical analyses in R. We deal with a range of problems in bioinformatics, genomics, cheminformatics, and disciplines outside biology, that require advanced stats. However, most such codes are written as single-threaded R scripts. Methods have been developed to parallelize R codes for use in high performance computing environment. The successful applicant will learn these parallelization approaches and apply them to improve performance of codes for a variety of projects both with Illinois faculty and the Industry partners. Strong statistical background, and a love of R is required. Familiarity with Linux is a bonus.

We are starting a collaboration with the University of Birmingham around the effects of environmental pollution on gene expression. The collaborators at the UofB are planning to analyze massive amounts of data, and need our help automating their workflows. The student will need to learn Nextflow, a workflow management system written in Groovy and Ruby. Nextflow will be applied to wrap a series of bioinformatics software in a workflow that provides automatic execution on large number of files, good data management and loggery. The student must have experience with several computer languages and a background in biology/biochemistry/genomics, or willingness to learn.

NCSA SPIN mentor Charalampos Markakis

Numerical relativity is a rapidly developing field. The development of black-hole simulations has been revolutionary, and their predictions were recently confirmed with the detection of gravitational waves by LIGO. The next expected source is neutron-star binaries, but their simulation is more complicated, as one needs to model relativistic fluids in curved spacetime, and the behavior of matter under the extreme conditions found in neutron-star cores. In this project, you will use the methods you are already familiar with, from Lagrangian or Hamiltonian mechanics, to model fluids in an intuitive way. You will find that a seemingly complex hydrodynamic problem can be greatly simplified, and be reduced to just solving a non-linear scalar field equation. The successful applicant will be able to solve such wave equations numerically in his favorite programming or scripting language (C, Python, Mathematica, etc). This powerful approach allows one to accurately model oscillating stars or radiating binaries, some of the most promising sources expected to be observed in the next LIGO science runs.

Student background: A background in classical mechanics and numerical methods is useful. Familiarity with fluid dynamics or scalar fields is a plus, but training will be provided.

NCSA SPIN mentor Michael Miller

This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition.

NCSA SPIN mentor Andre Schleife

Computational materials science research produces large amounts of static and time-dependent data that is rich in information. Extracting relevant information from these data to determine underlying processes and mechanisms constitutes an important scientific challenge. It is the goal of this project to use and develop physics-based ray-tracing and stereoscopic rendering techniques to visualize the structure of existing and novel materials e.g. for solar-energy harvesting, optoelectronic applications, and focused-ion beam technology. This team will develop codes e.g. based on the open-source ray-tracer Blender/LuxRender and the open-source yt framework to produce image files and movies. Stereoscopic images will be visualized using virtual-reality viewers such as Google Cardboard, Oculus Rift, or HTC Vive. Preliminary implementations exist and within this project the team will develop GPU-based visualization codes to enable high-throughput rendering of large data sets.

In order to develop nanocrystals that are able to distinguish diseased from healthy tissue and determine how the complex genetics underlying cancer respond to therapy, we need to understand a complex design space. Experiment and theory provide insight into size, shape, composition, and internal structure of different nanocrystals. Students in this team will work with computational and experimental researchers in several departments in order to establish a database to store, share, and catalog optical properties and other relevant data describing semiconductor nanocrystals. This requires developing schemas and analysis workflows that can be efficiently shared between multiple researchers. Students will first identify all information that will need to be included in this catalogue. Students will then write JSON and python code and interface with Globus and the Materials Data Facility. They will create well-documented iPython notebooks that operate directly on the Globus file structure and run in the web browser. Students will also develop code that automatically analyzes data stored in the facility, e.g. to verify and validate experimental and computational results against each other. Eventually, both the data and the workflows will be made available to the general public. This project is highly interdisciplinary and students will work with a team of researchers in bioengineering, materials science, mechanical engineering, and NCSA.

skills desired: writing json, xml, or any data-interchange formats; programming in python; collaborative skills in teams of computational and experimental researchers

Contact Andre Schleife

NCSA SPIN mentor Sever Tipei

The project involves algorithmic composition and digital sound synthesis using a software package developed at the University of Illinois Computer Music Project and Argonne National Laboratory. It is an ongoing project using stochastic distributions, elements of Graph Theory and Information Theory and requiring C++ programming skills and possibly Graphic User Interface building.