About SPIN

NCSA has a history of encouraging and nurturing innovative concepts. Some of the best ideas have come from highly motivated, creative undergraduate students working on NCSA challenges, or pursuing their own ideas with inspiration from colleagues and mentors. If you are looking for a unique opportunity to explore your potential, apply for a SPIN (Students Pushing Innovation) internship.

How to Apply

A call for applications and detailed instructions will be available in Fall 2014.

Eligibility

Internships are open to all University of Illinois at Urbana-Champaign undergraduate students enrolled for the 2014-2015 academic year.

Summary of Important Dates

Open House

Friday, August 28 — 3-5 p.m.

SPIN Open House
Calling all University of Illinois undergraduate students! Visit NCSA between 3 p.m. and 5 p.m. to learn more about paid internship and hands-on research opportunities at the center! No advance registration required.

Browse Research Interests of 2013 Mentors and Past SPIN Interns

2013-2014 Mentors

The Public Affairs team at NCSA uses graphic design, web development, social media, video and animation, and other tools to tell stories about the center's research, technologies, science successes, and staff. Current projects / focus areas to which students could contribute include:

  • Adaptive website development (Skills: Web development, CSS, Javascript, browser compatibility, Content Management Systems, usability and accessibility testing)
  • Event planning & promotion, such as Petascale Day & Supercomputing Conference (Skills: Promotions, social media, public speaking/tours, graphic design)
  • Augmented reality content
  • Infographics & explanatory videos or animations

We are also open to ideas from students about how to promote NCSA and its accomplishments to key audiences.

Help make the world better developing innovative solutions in distributed cyberinfrastructure.

Scientists, engineers, social scientists, and humanists around the world—many of them at colleges and universities—use advanced digital resources and services every day. Things like supercomputers, collections of data, and new tools are critical to the success of those researchers, who use them to make our lives healthier, safer, and better. The Extreme Science and Engineering Discovery Environment (XSEDE) integrates these resources and services, makes them easier to use, and helps more people use them. XSEDE supports 16 supercomputers and high-end visualization and data analysis resources across the country.

Among the advantages of digital services are seamless integration to the National Science Foundation's high-performance computing and data resources. XSEDE's suite of advanced digital services connects with other high-end facilities and campus-based resources and serves as the foundation for a national cyberinfrastructure ecosystem. Common authentication and trust mechanisms, global namespace and filesystems, remote job submission and monitoring, and file transfer services are examples of XSEDE's services. XSEDE's standards-based architecture allows open development for future digital services and enhancements, and evolution and enhancement of these services provides opportunities for student engagement.

We are actively looking for innovative ideas including:

  • Measuring and monitoring digital services such as networking and data transfer in a distributed cyberinfrastructure environment.
  • Creating a central dashboard/console that integrates service monitoring information and provides notification.
  • Innovative solutions for resource management, job scheduling, and queue/runtime prediction tools for distributed cyberinfrastructure also are needed.

We are also looking for students interested in Apache Hadoop and the Campus Cluster.

What is Apache Hadoop? Apache Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these cluster comes from the software's ability to detect and handle failures at the application level.

What is the Illinois Campus Cluster? The Illinois Campus Cluster is a campus wide resource that meets the needs for faculty’s research computing cycles.

Project Serengeti is an open source project initiated by VMware to automate deployment and management of Apache Hadoop clusters on virtualized environments such as vSphere. Project Serengeti code can run multiple Apache Hadoop distributions from several vendors.

The objective of this SPIN project would be to get the open source virtual machine version of Hadoop running on the Illinois Campus Cluster.

If you are an undergraduate student who is a self-starter and excellent programmer, with innovative ideas, and someone who can comfortably exist on the bleeding edge of supercomputing, then submit your SPIN application and become part of the XSEDE team.

One area of focus within the Visual Analytics Group is in the development of information design techniques that facilitate the interpretation of complex, big data. This work encompasses the design of information graphics and user-interaction processes (software interface design) that present descriptive data and machine learning results in a form that makes sense to the analyst. These methods enable them to ask questions of the data, form hypothesis and make decisions.

We are looking for self-motivated students with any of these skills and experiences listed below. Students can propose specific projects, or can articulate their skills and interests relevant to these areas:

  • Graphic designers interested in information graphics, visualization and user interface design
  • Programmers interested in visualization and user interface coding
  • Geneticists/biologists/kinesiologist/pre-med students interested in exploring methods for presenting medical and self-tracking health information
  • Agronomists/agricultural engineering students interested in exploring innovative methods for presenting precision-farming data

The Advanced Visualization Laboratory (AVL) at NCSA works with scientists to develop and use cyber infrastructure to create high-end, high-fidelity, and high-resolution data-driven scientific visualizations. These visualizations are technically developed and designed to support scientific narratives for public outreach. An important goal is to communicate and inspire non-expert audiences.

NCSA's AVL team has thrilled millions of people with its visualizations. AVL's credits include two IMAX films: "Hubble 3D" in 2010 and the Oscar-nominated "Cosmic Voyage" in 1996 as well as the Terrance Malick feature film "The Tree of Life." AVL's high-definition television productions includes the PBS NOVA episodes "Hunt for the Supertwister," "Runaway Universe," and "The Monster of the Milky Way"; Discovery Channel documentaries; and pieces for CNN and NBC Nightly News. The team has also worked with New York's American Museum of Natural History to produce high-resolution visualizations for shows at the museum's Hayden Planetarium: "Passport to the Universe," and "The Search for Life: Are We Alone?" In February 2006, AVL debuted "Black Holes: The Other Side of Infinity" at the Denver Museum of Nature, which has since toured worldwide and been translated into several languages. As of 2010, AVL created a new scene for "Life: A Cosmic Story," which is playing at the Morrison Planetarium at the California Academy of Sciences in San Francisco—the world's largest all-digital planetarium—and is at work creating scenes for an upcoming grand opening show at the Adler Planetarium in Chicago—the nation's oldest planetarium.

In creating these scientific productions, the AVL team develops new cyber-technologies, including advanced visualization tools and software pipelines. Each of the team members plays a unique role, contributing a variety of skills to the process and productions. AVL's expertise includes advanced graphics and visualization software development, visual design, camera choreography, multimedia production, data management, and render wrangling. The AVL team also has developed software such as Virtual Director™, a patented virtual reality interface that enables gestural motion capture and voice control of navigation, editing, and recording.

AVL is looking to work with students on projects that include:

  • Innovative interactive visualization software
  • Remote collaborative software
  • Software for digital domes and other advanced display systems
  • Data transformation
  • Interactive arts performance software

The Technology Investigation Service (TIS) team at NCSA finds, investigates and evaluates technologies useful for high performance and grid computing

Areas where students could contribute to TIS include:

  • Discovery and investigation of technologies to meet certain defined requirements
  • Testing and evaluation of discovered technologies to verify that they meet those requirements
  • We have or are evaluating technologies such as Globus Online, Bittorrent Sync and DUO OTP—all the cool stuff.

If you are willing to find and test interesting new applications in a variety of high performance computing environments (and learn more about all this works) contact us.

PGDB is another project area where students can participate. PGDB is a parallel debugger, designed for debugging HPC applications on clusters at scale. It builds upon existing open-source technology such as Python, GDB, MRNet, and LaunchMON, and is already in use on clusters. We are currently focusing on two different areas:

  • Further development of debugger capabilities, including improved scalability. Ideas include scalable deployment of debug symbols and executables, and support for platforms such as Blue Waters. (Skills: Programming experience (C/C++ and/or Python preferred but not required), basic familiarity with HPC, familiarity with UNIX systems)

  • Testing and usability with HPC applications. We're looking for people to use PGDB and provide feedback, and to use it with different HPC applications. We're willing to help people get set-up and familiarize them on how to submit and run HPC jobs. (Skills: Minimal programming background, an interest in HPC, but no experience needed)

Students can propose projects that fall under three research areas: gesture analysis, modeling creativity, and sonification and visualization of music performance.

"MovingStories" is an interdisciplinary project that brings dance, visual art, and computation together. Using signal processing and machine learning algorithms, and minimally invasive hardware, we seek to understand and classify full-body movement as a basis for developing systems for gestural human-computer interactions that will drive intuitive, yet artistic, visualizations and sonifications. While advancing the integration of artistic disciplines, information visualization, and computation, the project will provide a novel approach to cognitively enhanced human computer interaction founded on a deeper understanding of expressive, nuanced movement.

The aim of the "Modeling Creativity" project is to create generative models of human creativity with a focus on the creativity of music composition. We use neural models of perception, memory, and cognition, and endue those models with a drive to understand musical structure balanced by a drive to achieve novelty through a reinforcement learner. Tackling the project using a bottom-up approach, we deconstruct the compositional process to gain better understanding of its key elements using a variety of machine learning strategies.

"Labyrinth/Room 35" is a collaborative project with New York-based performing and visual artists. Using real-time sonic and visual analytics, we will control and shape the audio and video processing of the performers. The collaboration will produce a multimedia performance that will be premiered in February, 2014.

The Blue Waters project goal is to create the most intense and productive computational and data-focused sustained performance processing system in the world. Blue Waters is one of the most powerful computational systems in the world, and the most powerful computation system in the U.S. academic environment. Blue Waters is uniquely balanced in terms of the computational power, the amount of general purpose and accelerator computational nodes, the amount of memory and the internal system bandwidths so that it can support international science teams that are working on problems of unprecedented scale and intensity.

At the same time, Blue Waters is the most intense data-processing system in the open science realm when bandwidth and capacity are taken into account. Blue Waters has 25 petabytes of usable online storage with >1 TB/s of sustained bandwidth, the largest aggregate memory of any system and over 300 petabytes of near-line storage.

Other aspects of the Blue Waters project include:

  • Working with science and engineering teams to make their applications more scalable and flexible. This includes using the GPU accelerators and general purpose x86 processors at very large scale.
  • Helping science and engineering teams with new modes of data and storage processing
  • Advancing the state of the art for large-scale systems management and use
  • Resiliency in the face of failure and degradation, including new ways of analyzing and reacting to failures
  • Advanced education and training opportunities for high school through postgraduate
  • Innovation in data storage and management

The Blue Waters project is open to all ideas for improving the system and the science and engineering applications that use it. The ideas below are examples of areas that may benefit from innovation.

  • Performance visualization of applications at extreme scale
    Blue Waters has dozens of large-scale, highly parallel applications and we work with science teams to help improve the performance and time to solution for these codes. A variety of debugging and tuning tools are used, including special tools provided by Cray and other experts. However, it remains a challenge to understand subtle interactions among tens of thousands to hundreds of thousands of processes and to find those that are slightly different.

  • Exploring "Big Data" usage on Blue Waters
    Blue Waters is the most intense data-processing system in the open science realm when bandwidth and capacity are taken into account. The investigation of how to use this capability to perform novel "Big Data" type analysis may require porting and interfacing "big data" software tools and interfaces to the traditional highly parallel computing and storage environment. Examples may include trying to make a mapreduce application run within on a Lustre file system, trying to get good parallel performance for graph oriented applications and other areas of "non-traditional" HPC data focused projects.

  • Using autotuning to optimize application code performance
    It is often a tedious process to adjust parameters, compiler options, run time options, etc. to generate the best performance because of complex interactions among many software and hardware components. There are some tools and libraries the use autotuning—automatically experimenting with parameter settings—to get the best performance. Libraries typically do autotuning at installation rather than on a per application basis. For applications, available autotune tools do a search of compiler options but for the most part are designed for small scale clusters.

  • System reliability and resiliency
    Blue Waters has millions of components that all have to operate in their most efficient manner. Unlike large farms of individual systems, because Blue Waters runs highly parallel applications, Amdahl's law requires that all components have very consistent performance. So reliability and resiliency include not just explicit hard failures, but also soft error recovery, slowdowns, and variable performance. Another area is understanding the differences between hardware and software errors and correctly identifying the probable causes in an automatic manner. While many worry about the number of hardware failures, there is evidence that software fails at least as often. Detecting, understanding, and correcting R&R issues are necessary at petascale, and irreplaceable at exascale, and will take a great deal of innovation.

  • Data hierarchy analysis
    Due to high latency in data access, more hierarchies are being used, from multiple levels of SSD, online storage to near-line (automated tape) storage. The goal is to have all these hierarchies be transparent to the science user, while providing "the right data to the right place at the right time." Interfacing independent layers of storage requires development and integration, but to guide those efforts, good system and use case modeling could make important contributions and possibly expose opportunities for major innovations in the implementations.

  • Cyber-protection at petascale
    In order to make the most productive use of Blue Waters, it is important that users all have as open access to the system as possible. On the other hand, preventing incidents and compromises is critical. Doing both at petascale leaves opportunities for innovation.

  • HPC application performance modeling
    Working with science teams, Blue Water has shown some benefits of combining application and system modeling methods to help identify bottlenecks, and, more importantly, to guide improvement efforts to the most important areas with greatest impact. Modeling efforts can be analytical, automatic roofline, semi-empirical, etc. Trying to model an application can be a great way to understand both applications and large-scale systems. Trying to explore data focus applications is a very open area for a lot of innovation.

  • System quality assurance
    Petascale systems border on chaos, where one small change can induce a great unexpected decrease in the quality of service. So continual system health and regression testing is critical to make sure performance and capability stay intact. But such data gathering cannot impinge on the usability or performance of the application teams. What data are the most important indicators of a healthy system and how often those data need to be measured are questions waiting for innovative approaches.

  • Resource management
    One of the largest ongoing challenges for the Blue Waters project is to schedule the right tasks at the right time to keep the science and engineering teams most effective and efficiently using the entire system. This includes not only just running jobs, but also placing their tasks in the most advantageous topology available. Of course, perfect placement is not feasible, so the challenge is to determine the best practicable balance of mapping resources to applications while optimizing overall system effectiveness and throughput—all at the petascale. Add to the mix the need for consistent performance and the fact some allocations are I/O constrained. Talk about needing innovative ideas!

  • Feature extraction techniques for petascale
    Blue Waters supports a very diverse range of science projects. Some science and engineering tasks require understanding and insight from petabytes of data—too much to rely strictly on a person's ability to extract the important features. Automated and/or human-guided tools may assist and speed the time to recognition.

  • 3D NPCF facilities modeling
    The National Petascale Computing Facility (NPCF) is a world class computational and data storage facility unlike any other in the United States. NPCF has extensive monitoring of all utility use and environmental parameters. It uses extensive water cooling and minimizes air cooling, has efficient power distribution, and has an LEED Gold rating. Combining the facility information with computer and storage usage information would enable facility optimizations that may save hundreds of thousands of dollars per year. Being able to model not just air but water flows in detail would help with placement and configuration of new equipment and/or improve facility operations. There may also be opportunities to use self-organizing remote sensing technologies.

The Image and Spatial Data Analysis Division conducts research and development in general purpose data cyberinfrastructure, addressing specifically the growing need to make use of large collections of non-universally accessible, or individually-managed, data and software (i.e. executable data). We attempt to address these needs through the development of a common suite of internally and externally created open source tools/platforms that provide means of auto and assisted curation for data/software collections. To acquire some of the needed high level metadata not provided with un-curated data we make heavy use of techniques founded in artificial intelligence, machine learning, computer vision, and natural language processing. To close the gap between the state of the art of these fields and current needs, while also providing a sense of oversight many of our domain users desire, we attempt to keep the human in the loop wherever possible by incorporating elements of social curation, crowd sourcing, and error analysis. Given the ever growing urgency to gain benefit from the deluge of un-curated data we push for the adoption of solutions derived from these relatively young fields, highlighting the value of having tools to deal with this data where there would be nothing otherwise. Attempting to follow in the footsteps of the great cyberinfrastructure successes of NCSA (i.e. Mosaic, httpd, and Telnet) we attempt to address these scientific and industrial needs in a manner that is also applicable to the general public. By catering toward broad appeal rather than focusing on a niche within the total possible users we aim at stimulating uptake and providing a life for our software solutions beyond funded project deliverables. Potential project areas students might consider include:

  • Using computer vision, artificial intelligence, or machine learning to automatically extract information from image, video, 3D, document, or audio collections for use as metadata.
  • Developing novel tools to carry out content based retrieval within image, video, 3D, document, audio collections.
  • Developing tools as described above that incorporate elements that keep the human in the loop in order to improve accuracies (e.g. passive crowd sourcing).
  • Developing web-based interactive visualizations of high dimensional data of large geospatial, health informatics and video datasets.
  • Developing services for data curation, data analytics and data sharing of large scientific datasets.

Participating Interns

SPIN intern Kirthi Banothu

Kirthi Banothu - Electrical and Computer Engineering

Project scope: Exploring the use of Apache Hadoop and the Campus Cluster
Mentor: Martin Biernat - Collaborative eScience

Lauren Blackburn - Graphic Design

Project scope: Exploring visualization techniques for analyzing precision farming data and medical information
Mentor: Colleen Bushell - Visual Analytics

Chelsey B. Coombs - Molecular and Cellular Biology

Project scope: Creating a middle school and high school engagement program for K-12 students who visit NCSA
Mentor: Trish Barker - Public Affairs

SPIN intern Shunhua Fu

Shunhua Fu - Computer Science

Project scope: Developing a mobile port and GUIs for visualization and investigating the optimization of 4K compression and rendering
Mentor: Donna Cox - Advanced Visualization Laboratory

SPIN intern Shubham Gupta

Shubham Gupta - Computer Engineering

Project scope: Working on integrating 3D gesture recognition into Virtual Director using Leap Motion
Mentor: Donna Cox - Advanced Visualization Laboratory

SPIN intern JunYoung Gwak

JunYoung Gwak - Computer Science

Project scope: Working on computer vision techniques to automatically detect and track movement in construction sites
Mentor: Kenton McHenry - Image and Spatial Data Analysis

SPIN intern Yong Won Hong

Yong Won Hong - Statistics

Project scope: Studying motion detection and the meaning of motion for use in human-computer interaction
Mentor: Guy Garnett - Illinois Informatics Institute

SPIN intern Neville Jos

Neville Jos - Computer Science and Physics

Project scope: Developing a mobile app to monitor activity on the Blue Waters supercomputer
Mentor: Bill Kramer - Blue Waters

SPIN intern Kevin (Hyung Shin) Lee

Kevin (Hyung Shin) Lee - Mathematics and Economics

Project scope: Studying the system reliability and resiliency of the Blue Waters supercomputer
Mentor: Bill Kramer - Blue Waters

Nathan Russell - Industrial and Enterprise Systems Engineering

Project scope: Working on data modeling and machine learning techniques applied to precision farming
Mentor: Colleen Bushell - Visual Analytics

SPIN intern Rashad Russell

Rashad Russell - Computer Science

Project scope: Exploring emerging technologies to push the limits of the web
Mentor: Luigi Marini - Image and Spatial Data Analysis Division

SPIN intern Stacie Sansone

Stacie Sansone - Graphic Design

Project scope: Working on information design and interface concepts for communicating complex data
Mentor: Lisa Gatzke - Visual Analytics

SPIN intern Gil Shohet

Gil Shohet - Aerospace Engineering

Project scope: Working on data analytics for understanding big data, including analysis of GPS information
Mentor: Michael Welge - Visual Analytics

SPIN intern Colter Wehmeier

Colter Wehmeier - Architecture

Project scope: Developing new digital workspaces and presentation techniques by integrating methods from various virtual environments
Mentor: Donna Cox - Advanced Visualization Laboratory

SPIN intern Alexander Zahdeh

Alexander Zahdeh - Computer Science

Project scope: Evaluating new technologies in high-performance computing environments
Mentor: Peter Enstrom - XSEDE

Continuing Interns from 2012

SPIN intern Nikoli Dryden

Nikoli Dryden - Computer Science

Project: A Parallelized GDH-based Debugger
Mentor: Peter Enstrom - XSEDE

SPIN intern Jonathan Kirby

Jonathan Kirby - Computer Science

Project: Logging and Synchronization in Virtual Director
Mentor: Donna Cox - Advanced Visualization Laboratory

SPIN intern David Zmick

David Zmick - Computer Science

Project scope: Working with GPU accelerators on Blue Waters, interactive visualizations with emerging input devices, and various other Private Sector Program projects
Mentor: Evan Burness - Private Sector Program

Contact us

Please don't hesitate to contact us if you have questions regarding the application process

Contact info


map of NCSA