2016-2017 Mentors

NCSA SPIN mentor Peter Christensen

This intern will help our team to develop and expand our data collection and visualization platforms. This will include the development and integration of applications to collect and marshal data for use in economic analysis. Our data collection services run as disparate containers on a virtual computing cluster, to achieve modularity, reliability, and reusability of our applications. The engineer will also support the development of tools to visualize the results of our economic analysis. This internship is an opportunity to sharpen your programming skills in a fast-paced applied research environment. You will also become familiar with the economic intuition that guides public policy decisions and will develop statistical skills that will bring rigor to your work in industry or research settings. This is also an opportunity to use your skills to make a real impact on public policy.

skills desired: comfort designing and implementing nosql and sql databases, comfort working with python, some statistics background and familiarity with software packages, r/matlab/stata, ability to learn quickly and work independently

Contact Peter Christensen

NCSA SPIN mentor Donna Cox

Are you a programmer, a filmmaker, a musician, an architect, a physicist, or a mathematician? Do you know GPU programming, MaxMSP, or Processing? Can you use Maya, After Effects, or Unity? Have you built mobile apps, Virtual or Augmented reality scenes, or computer simulations? The AVL is looking for multi-disciplinary students who can build digital experiences for cutting-edge arts applications. Tell us what you are good at so we can see you in your best light!

NCSA SPIN mentor Jeremy Enos

A universal monitoring issue spans a range of Unix platforms, from HPC Supercomputers to standalone Linux servers. Particularly for administrative activity on systems with shared administrative responsibility, tracking who did what and when is a well-known challenge. Several approaches have been taken to address the issue, but all have specific weaknesses or cannot meet a rapidly evolving need. With the Multi-Session Monitoring tool concept, a specific set of gateway hosts would be enabled with secure shell session recording capability. Session propagation which traversed the gateways would then be recorded as well, and would therefore need no custom software changes on any further endpoints in order to be included in the recording. Further, logic could be applied to the recording content or system information outside of the sessions to identify when sessions propagated to different hosts, or when an identity change took place. Ultimately, these recordings would be cataloged and available for historical or live views from a web-based interface. (a python based session recorder to HTML already exists and could potentially be extended to this effort) The interface would permit filtering on various criteria, starting with user, endpoint host, idle time, command, and point in time (or live). The result would be the capability to use the interface to very rapidly assess the fourth item given three are provided from who, what, when, and where an action was taken. This type of tool would make waves in a community facing challenges of multi-stewardship of Unix resources, particularly when some endpoints (e.g. Unix “appliances”) are not permitted to be modified with instrumentation tools already in existence to partially address this challenge.

The large-scale scientific simulation data routinely generated by researchers on Blue Waters is central to the process by which scientific breakthroughs are made. While this data presents significant challenges in every stage of its lifecycle, its prominence drives concerted efforts at each of those stages. Behind-the-scenes however, data analogous to the aforementioned is generated: extensive system diagnostic information. As opposed to the physically-structured simulation data, this diagnostic data may be considered a collection of independent events and/or measurements. As stated, this information is extensive but technical staff have nonetheless taken the steps to curate and store this data so as to remain available for future analysis. We believe the relative obscurity of this data along with its likelihood of having only an indirect impact on discoveries to be the reason that methods for benefitting from this data are ad hoc or incomplete. We also believe this data represents a vast set of opportunities for research and development. In this project we will begin by evaluating current development efforts in visualizing this data toward creating a tool that will actually be deployed and used by HPC professionals. We will then plan future activities through the coordination of perspectives including: existing data, the experts generating the data, those who might benefit from its use, challenges or gaps in coverage that require innovation to address, and what looks fun.

NCSA SPIN mentor Elif Ertekin

The objective of this effort is to create an online design laboratory to computationally screen and assess candidate magnetic shape memory alloys for use in solid state cooling and refrigeration. In the magnetocaloric Heusler shape memory alloys, the application and removal of a magnetic field induces a martensitic phase transformation in the alloy, and the latent heat of the transformation causes a change in temperature. Solid state cooling and refrigeration based on these phase change materials has the potential to replace conventional vapor compression refrigeration in any application requiring heat removal and extraction, greatly offsetting the use of chemicals with high global warming potential (GWP). Today, however, the optimal shape memory alloys and material compositions have not yet been identified. The goal of this project is to develop an online tool for data mining and analysis to identify optimal alloy compositions and structures. The tools will be used to create an online database of candidate materials and compositions, as a shared resource for the shape memory alloy community.

skills required: python/ipython, database interaction, web design

Contact Elif Ertekin

NCSA SPIN mentor Nathan Goldbaum

The yt project is a python library for the analysis and visualization of 3D data. We would like to improve the capabilities of yt for real-time interactive exploration of data, since right now most interactions happen via scripting. This project will involve writing data exploration widgets based on current yt capabilities. This could be native GUI widgets based on matplotlib's widget framework, web-based widgets using the IPython widget framework, or the holoviews widgets framework, or another widget framework that the student is interested in working with. This work will be incorporated into the yt package so students and researchers worldwide can benefit from the work done for this project.

NCSA SPIN mentor Ben Grosser

How is an interface that foregrounds our friend count changing our conceptions of friendship? Why does adding nonsense to the end of one’s email affect what people write and with whom they connect? What can we learn about attitudes towards surveillance by showing users the traces they leave behind when visiting a website? Artist and NCSA faculty affiliate Ben Grosser creates award-winning interactive experiences, machines, and systems that examine the cultural, social, and political implications of software. For this project, assist Ben in developing a number of new code-based artworks that investigate the role of software in/as culture.

skills required: experience with programming

Contact Ben Grosser

NCSA SPIN mentor Kaiyu Guan

Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and US agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.

NCSA SPIN mentor Eliu Huerta

Now that gravitational waves have been detected, we are in a unique position to learn about the astrophysical properties of black holes and neutron stars with unprecedented precession. I am interested in recruiting a student who is willing to get involved in the development and exploitation of numerical algorithms for the detection and characterization of gravitational wave sources—including those that may be detected in the next few months. The successful applicant will join the Relativity Group at NCSA and will become a member of the LIGO Scientific Collaboration throughout the SPIN internship.

NCSA SPIN mentor Dan Katz

Scientific software is an essential enabler across computation, experiment and theory in all disciplines. Much of this software is open source, meaning that in many case, it is produced and shared on a voluntary basis, at least at universities. One of the reasons the system works as well as it does is reputation: those who write the code are recognized for having done so by their peers. However, the informal reputation system used in open source is not the same as the academic reputation system that is based on publication of peer-reviewed papers, citations (people discussing your paper in their papers), journal impact factors (papers that cite papers in a given journal), and metrics such as h-index (a single measure of an authors productivity and citations). To encourage open source software and shared software in academia, we want to map software metrics to existing paper metrics, starting with the idea of software citation. This project will investigate and test possible implementations of software citation, and will use some of those to better understand the impact and knowledge that could be gained by having a software citation system and culture in place. Interested students should have an interest in Internet computing, and sharing/cooperative work in the context of academia. They should be experienced with programming in a Linux/Unix software development environment and ideally with GitHub or other distributed software management systems.

Most scientific computational and data work can be thought of as a set of high-level steps, and these steps can often be expressed as a workflow. Software tools can help scientists define and execute these workflows, for example, Swift which is both a language and a runtime system. This project could have two parts, depending on the student’s interests and experiences. One, focused on scientific applications, will examine how workflows like Swift can be used to help scientific communities that haven’t considered generic workflow tools, specifically in astronomy, such as LSST and SKA. The Large Synoptic Survey Telescope, LSST, is a new kind of telescope, currently under construction in Chile, designed to conduct a ten-year survey of the dynamic universe. LSST can map the entire visible sky in just a few nights, and images will be immediately analyzed to identify objects that have change or moved: from exploding supernovae on the other side of the Universe to asteroids that might impact the Earth. The Square Kilometer Array, SKA, is a massive, international, multiple radio telescope project, that will provide the highest resolution images in all astronomy. Both projects represent challenging data acquisition and analysis problems, integrating workflows, scientific codes, and advanced data centers. The second, focused on software aspects, will examine how Swift might interact with other open source projects, such as the Apache stack. Interested students should have an interest in high-performance computing, big data computing, and/or distributed computing. They should be proficient in a Linux/Unix software development environment and skilled in the C language. Optional but desirable skills include Java, ANTLR/bison/yacc/lex, sockets, and/or MPI.

NCSA SPIN mentor Vlad Kindratenko

Help in the investigation and deployment of cloud computing and storage technologies. This project will involve deploying a cloud environment and experimenting with various components and add-ons to enable a flexible and scalable framework for executing complex data-intensive workflows.

NCSA SPIN mentor Andriy Kot

Would you like to help with a data analysis project? The Blue Waters applications group has a big data challenge. Not all data can reside in a database. This is especially true for the system monitoring data for Blue Waters. We have a large number of temporal records stored in a collection of relatively big files, around 50GB each. The records are variable in length and are not sorted. Each record contains multiple data points. We would like to extract a subset of data points from a subset of records, then perform some basic statistical operations such as sum, min, max, mean, etc. Doing this in a straightforward way requires a lot of computing time, looking at the same files repeatedly (e.g., for a different set of data points from the same records) requires the same amount of computing time every time. The proposed solution is to index the data files, either by preprocessing or on-demand and store the indexes in a database to accelerate all subsequent queries. The interested SPIN student should have some understanding of file I/O (an understanding of parallel file I/O would be a plus) and some relational database experience.

NCSA SPIN mentor Bertram Ludaescher

Dedekind numbers (D(n) | n = 0, 1, 2, . . . ) are a rapidly growing sequence of integers: 2, 3, 6, 20, 168, 7,581, 7,828,354, 2,414,682,040,998, 56,130,437,228,687,557,907,788. D(N) counts the number of monotonic Boolean functions of n variables. D(8) is the biggest known Dedekind number so far. It was first computed in 1991 by Doug Wiedemann. This took 200 hours on a Cray-2. In the thesis of Arjen Teijo Zijlstra, Wiedemann's strategy is explained, implemented in C/C++ and parallelized using the Message Passing Interface MPI. The goal was to gather knowledge about the theory and to check the calculation. Another intention of the thesis was to speedup the calculation as much as possible. The goal of this SPIN project is to study and benchmark Ziljstra's implementation, and try and improve on the techniques presented there. This also provides an interesting challenge to experiment with different parallelization strategies and study applications: e.g., the NSF-funded Euler projects uses model-based diagnosis techniques to debug taxonomy alignments. Thus, the project is interesting both from a "pure HPC" technology point of view, as well as from an applications point of view.

What does it take to reproduce a script-based scientific workflow?

For example, if the Python or R scripts implementing a workflow are available through an open source repository such as github, are we all set? Not so fast! A user might fail to successfully run the scripts or replicate the results for any of a number of reasons (for starters, the installation may fail due to complex software and version dependencies; or the user may fail to properly run, adapt, or understand the scripts due to lack of documentation, etc.).

In this project we will experiment with a number of technologies and tools that can improve the reproducibility of script-based workflows: e.g., the YesWorkflow (YW) toolkit allows authors to annotate scripts to model and export prospective provenance, i.e., the workflow structure otherwise latent in the script. YW can also be used to reconstruct retrospective provenance or to query other sources of provenance information, e.g., runtime provenance logged directly by the script author or recorded by the DataONE MATLAB tool, the NCEAS recordr, or the noWorkflow system (for capturing Python execution provenance). To manage platform and software dependencies of script-based workflows, docker containers can be used. Last but not least, active elements can be embedded in PDF files to support interactive exploration of published results.

Using one or more example scripts, we will apply these different technologies and study their benefits and limitations. The overall goal is to deploy a prototypical example of a “highly reproducible” script-based workflow using a combination of the above-mentioned technologies.

skills desired: experience with sql databases, scripting languages (e.g., python, bash, tcsh, or similar), and modern software development tools and practice (e.g., version control with git or mercurial, test-driven and agile development, software deployment via docker containers) a plus

Contact Bertram Ludaescher

NCSA SPIN mentor Michael Miller

This project researches frameworks and workflows for Speech-to-Text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing Speech-To-Text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition.

NCSA SPIN mentor Luc Paquette

The C-STEPS sketching tools, developed by Professor Emma Mercier and her team in the College of Education, allow students to collaboratively work together to solve problems presented to them on tablet computers. Students interacts with tablets using a stylus or their fingers to write and draw on a digital worksheet as they collaborate to solve complex problems. Every input entered by a student on their own tablet is automatically synchronized to tablets of the other members of the group, allowing students to work together to solve problems. As they interact with the tablet, C-STEPS collects a complete log of the students’ actions in the software, thus providing detailed trace of the students’ behavior as they solve problems in C-STEPS.

In this project, the selected SPIN intern will apply machine learning approaches to analyze interaction logs collected from engineering undergraduate students who used C-STEPS as part of their regular curriculum. The goal of those analyses will be to discover common behavior patterns used by groups of students in C-STEPS and study how those patterns are related to good or bad collaborative learning practices. The results of those analyses will be used to provide in-the-moment actionable reports to Teaching Assistants in order for them to better support students during their learning activities.

NCSA SPIN mentor Ron Payne

These financial tools are web-based and used by the 19 academic partners, that make-up the XSEDE project, for maintaining the project budget, as well as, tracking the monthly invoices that are submitted by all partners. The enhancements needed include additional functionality and the implementation of a structured reporting system within the tools. Candidates should have proficiency in web development (preferably PHP), Java scripting, and SQL databases.

In addition to the technical skills, a successful candidate should possess a sense of accountability, reliability, and eagerness to participate in a nation-wide NSF-funded project.

NCSA SPIN mentor Andre Schleife

It is our goal to explore and enable physics-based ray tracing and different virtual-reality techniques for their use in materials science research and education. Applications include visualization and analysis of scientific data and interactive manipulation of atomic structures. Google Cardboard, HTC Vice, or Oculus Rift are devices that we would like to explore for this purpose. We are developing native Android apps, a JavaScript based web project, and pre-rendered images or videos to visualize atomic structures and electron densities. You will work in an interdisciplinary team jointly with other SPIN students and with materials science graduate students towards this goal. If you are interested, you can also participate in outreach events where people across all age groups will interact with materials and learn about materials science through the software developed in this project.

NCSA SPIN mentor Michael Showerman

This project will involve the development of a mobile application that interfaces to a variety of Blue Waters information services as well as integrating a customizable alert system. This will involve software development of an Android and iOS app, and will be extended from an existing prototype. Prior C++ or Java programming experience is greatly preferred, but mobile programming can be learned as part of this project.

NCSA SPIN mentor Rob Sisneros

The Hadoop MapReduce software framework is commonly used for processing "big data" and as such we are evaluating its potential for success on modern HPC equipment through deployment on Blue Waters. While there has been some work in creating visualization algorithms in that framework there is little overlap with the state of the art for visualizing large-scale HPC simulation data. We would like to explore the design of a software volume renderer that is implemented in this framework that we can stand up quickly and improve incrementally and modularly. The goal of this project is to create a useful and releasable software package with a low level of entry for future contributing developers.

NCSA SPIN mentor Jeff Terstriep

The CyberGIS Center is dedicated to leading edge research combining big spatial data and high-performance computing. The center has several ongoing projects in the areas of flood mapping, climate change analysis, disaster response planning, and processing of high-resolution remote sensing data. Projects require development in a variety of technical areas ranging from web frontend development and visualization to creation of modern microservice applications to parallel algorithm development for spatial data.

The Center is looking for highly motived undergrad students to participate in these projects. Students will have an opportunity to work on a variety of projects as part of a team with graduate students and CyberGIS Center staff. Students will not only participate in leading edge projects but gain skill and experience with the latest in software tools and development techniques.

NCSA SPIN mentor Sever Tipei

"Composition as an Evolving Entity" envisions a musical work in continuous transformation, never reaching an equilibrium, a complex structure whose components permanently fluctuate and adjust to global changes. The project is based on the view of musical works as Complex Dynamic Systems whose structural levels can be represented as vertices of Directed Graphs connected by weighted edges that describe the relationships between them.

Concepts provided by Information Theory such as Originality or im-probability, Redundancy or repetition/familiarity and Complexity are used to modify the weights assigned to various edges. The project is also based on software for composition and sound synthesis developed at Illinois and running on NCSA Innovative Systems Laboratory computers that will be further developed.