Summer Mentors by Year:
Michael Miller

Exploring Quantum Sound and Music
This project will be more exploratory in nature as the use of Quantum concepts and algorithms to create sound and music is still emerging. We will be looking for papers, conferences and other sources to find out the current trends and examine what avenues to pursue in the future.
Kevin Chang

Living Encyclopedia
Large language models aka ChatGPT have changed the landscape of artificial intelligence and promise to automate how we perform knowledge work. This project will explore LLMs as the “engines” to build agents for such automation– e.g., to help you find the knowledge you need, to synthesize the knowledge for a specific topic, to answer a technical question, to tutor a student with a personalized learning experience. Techniques: large language models, natural language processing, information retrieval, data mining, machine learning.
April Novak

Native Combinatorial Geometry using Unstructured Meshes for Nuclear Energy Applications
The Monte Carlo method is widely considered the state-of-the-art technique for solving the neutron transport equation. However, statistical simulation of billions (or more) neutrons are required to achieve converged simulations, and hence the Monte Carlo method remains computationally intensive. Faster-running multigroup methods instead discretize the energy phase space into tens or hundreds of energy intervals, a drastic approximation to the eight orders of magnitude in neutron energy observed in fusion and fission systems. However, these multigroup methods require generation of cross sections, or spectrum-averaged reaction rates using nuclear data. The gold standard for this cross section generation is using Monte Carlo methods. Therefore, a typical nuclear engineering is faced with a sizable inefficiency in computational analysis — (i) the need to generate computational geometry unique to a Monte Carlo code (combinatorial geometry) in order to generate constitutive models (cross sections), while at the same time (ii) generating computational geometry specific to a deterministic solver (unstructured meshes). In other words, nuclear engineers often expend 50% of more of their effort on model preparation across multiple different software tools in order to calculate the data needed for analysis. This project aims to eliminate this inefficiency by taking a novel approach to generate the combinatorial geometry in-line using geometry data structures in MOOSE’s reactor module, an open-source meshing library. The student will develop translation models from C++ geometry objects in MOOSE to the corresponding C++ combinatorial geometry objects in OpenMC to automate the “high-to-low” data generation needs of nuclear analysts. Experience in C++ and Python is encouraged, but no domain-specific knowledge of nuclear engineering is needed.
Kathryn Huff

Gaming-engine-enaabled interactive nuclear reactor dynamics
Development of a Roblox-based nuclear reactor kinetics and dynamics simulation model has begun. With an eye toward usability and performance as well as kinetic and dynamic accuracy, this project will explore numerical method acceleration strategies in the context of nuclear reactor dynamics to advance the nascent Roblox model toward scalability and real-time-responsive usability by a nuclear engineering novice.
Rachel Adler

Generative AI mHealth apps for older adults
The population of older adults is rising sharply across the world, especially so in the United States where the population is expected to rise by 47% from 58 million in 2022 to 82 million in 2050, becoming nearly a quarter of the population then (23%) from 17% now. Moreover, older adults, particularly those with disabilities, are more susceptible to physical inactivity. Research suggests that health-related outcomes and quality of life can be improved by including physical interventions. This research project is oriented towards creating custom health applications that would work on both mobile and web interfaces featuring physical activity interventions, with the interface of a generative AI based conversational agent, and also retrieving, processing, and personalising knowledge from wearable fitness trackers. This conversational agent would be trained on specialised health knowledge and will personalise interactions. We are looking for students who are interested in contributing to development of mobile health applications, in training and fine-tuning of generative AI models, and/or conducting usability testing.
Ismini Lourentzou

VLMs for Open Vocabulary SGG
Scene Graph Generation (SGG) involves extracting objects and their relationships from an image, forming a graph where nodes represent objects and edges represent relationships. Conventional SGG approaches are limited to a closed-set vocabulary, restricting their ability to detect and classify unseen objects and relationships. Subsequent works propose models for Open Vocabulary SGG (Ov-SGG), yet often rely on predefined textual inputs, e.g., region-caption pairs or a set of semantic labels, which limits their ability to detect unseen objects and relations effectively. This project will involve designing new Vision and Language Models (VLMs) for Open Vocabulary SGG and comparing against existing baselines.
T. Andrew Manning

Blast: A Web Application for Characterizing the Host Galaxies of Astrophysical Transients
Characterizing the host galaxies of astrophysical transients is important to many areas of astrophysics, including constraining the progenitor systems of core-collapse supernovae, correcting Type Ia supernova distances, and probabilistically classifying transients without photometric or spectroscopic data. Given the increasing transient discovery rate in the coming years, there is substantial utility in providing public, transparent, reproducible, and automatic characterization for large samples of transient host galaxies. We have developed a web application and workflow management system called Blast that ingests live streams of transient alerts, matches transients to their host galaxies, and performs photometry on coincident archival imaging data of the host galaxy. We are looking for a student interested in learning how to develop and deploy research applications and cyberinfrastructure within cloud-native platforms like OpenStack and Kubernetes. Development activities will include improving and optimizing the Blast system in preparation for the much larger volume of transients expected soon in the LSST (Legacy Survey of Space & Time) era: implementing resource limits and requests on task workers, designing an elastic horizontal scaling system to handle dynamic data loads, converting filesystem-based storage to S3-compatible object storage, optimizing workflow efficiency, and more. The student must have some familiarity with Python (Django), Linux, Git, and containerization. Although knowledge of astrophysics is optional, enthusiasm to support scientists by building research software is required.
Sever Tipei

Music on High-Performance Computers
The project centers on DISSCO, software for composition, sound design and music notation/printing developed at UIUC, NCSA and Argonne National Laboratory. Written in C++, it includes a Graphic User Interface using gtkmm. A parallel version was developed at the San Diego Supercomputer Center with support from XSEDE (Extreme Science and Engineering Discovery Environment).
DISSCO has a directed graph structure and uses stochastic distributions, sieves (part of Number Theory), Markov chains and elements of Information Theory to produce musical compositions. Presently, efforts are directed toward adding new features, refining a system for the notation of music as well as to the realization of an Evolving Entity, a composition whose aspects change when computed recursively over long periods of time thus mirroring the way living organisms are transformed in time (Artificial Life).
Due to the fact that DISSCO is a “black box” that does not allow the user to interfere during computations and that the computer makes decisions not controlled by the user, it shares features with AI type of projects. Further developments are considered in this area.
Felipe Menanteau

Thumbnail Service for Dark Energy Survey Public Data
The objective of this NCSA Spin project is to develop and deploy a cutout thumbnail service for the Dark Energy Survey (DES) public dataset, hosted at the National Center for Supercomputing Applications (NCSA). As the NSF-funded Dark Energy Survey Data Management (DESDM) Project sunsets, preserving and enhancing access to this data for the broader scientific community has become essential. The proposed service would allow users to request thumbnail “cutouts” of specific regions within the DES images, thereby enabling efficient data access without the need to download entire image files. This functionality would benefit astronomers, educators, and the general public, who would be able to engage with high-quality DES data with minimal technical overhead. By building on an existing Python codebase, the project will focus on adapting this service to meet current standards of usability, scalability, and data accessibility. The deployment will leverage Kubernetes for container orchestration, ensuring scalability and reliability, with Celery as the job management system to handle asynchronous cutout requests. Kubernetes will enable flexible resource allocation, allowing the service to meet variable demands without compromising performance, while Celery will manage job queuing, parallelization, and load balancing. This setup will ensure that even large numbers of cutout requests are handled efficiently and effectively. The service will integrate seamlessly into the DES public data infrastructure hosted at NCSA, ensuring long-term data accessibility and alignment with community standards. By the end of the project, we aim to deliver a robust, user-friendly web service that preserves the DES dataset’s accessibility and enhances the ease with which it can be explored.
Halil Kilicoglu

Developing NLP-based resources for complementary medicine
Complementary medicine (CM) interventions (e.g., natural products, supplements, exercise, diet) are increasingly used by health care consumers and accepted by the medical community, although much remains poorly understood about their effectiveness and safety as well as the underlying biological mechanisms through which they affect health and well-being. Published literature is a growing source of evidence on CM approaches; however, much of this evidence remains in unstructured text in specialty journals, creating barriers for effectively integrating the evidence with conventional medicine. This research project focuses on developing NLP-based literature mining tools and informatics resources (ontology, knowledge graphs) to consolidate the high-quality evidence on CM approaches and integrate it with conventional medicine in a machine-readable and AI-ready form, and aims to demonstrate the utility of these resources for knowledge management and scientific discovery in complementary and integrative health. We are looking for undergraduate students with strong Python programming skills, data cleaning/analysis experience, and knowledge of NLP and. ML. Familiarity with graph databases like neo4j and knowledge graphs is preferred.
Kaiyu Guan, Shenlong Wang

Using Satellite Data for Large-Scale Crop Monitoring
The project will focus on developing advanced machine learning to interpret satellite remote sensing data to quantify crop growth and yield. The students will work on large-scale satellite data processing, remote sensing algorithm development, and solving real-world questions related to global food production and environmental sustainability. Students with experience in Python programming, big data analytics, deep learning or computer vision are most welcome.
Matthew Krafczyk

DRYML: Don’t Repeat Yourself Machine Learning
Don’t Repeat Yourself Machine Learning: A Machine Learning library to reduce code duplication, automate testing, perform hyper paramter searches, and ease model serialization. DRYML aims to empower the Machine Learning practitioner to spend less time writing boilerplate code, and more time implementing new techniques. DRYML provides a model serialization framework along with serialization implementation for many common ML frameworks and model types, a framework for defining and training models on a specific problem, and a system to compare models from different ML frameworks on the same footing. The successful applicant will work to bring new ML workflows and frameworks into the DRYML umbrella and conduct cutting edge research on a topic of their choice in ML or AI.