Spin – 2024 Summer Mentors

2024 Summer Mentors

Summer Mentors

Encapsulated Quantum Optimization Algorithms

Bruno Abreu and Santiago Núñez-Corrales

Optimization problems comprise an interesting route for the broad adoption of quantum-inspired and quantum-approximate solutions that can shape and materialize quantum advantage beyond the traditional physics- or chemistry-based applications. In this project, the SPIN intern will work with a variety of quantum computing frameworks, such as Qiskit, PennyLane, CUDA Quantum, and TKET, to create a repository of examples that illustrate the process of mapping classes of optimization problems into Hamiltonians amenable to variational-based approaches such as VQE and QAOA.

Prerequisites:

– Basic knowledge of quantum computing: quantum circuits and gates, simulations of quantum circuits

– Python programming skills

For any questions about the project, contact Bruno Abreu

CILogon: Identity and Access Management for Science

Jim Basney

The NCSA CILogon team provides identity and access management services to NCSA projects and the broader scientific community. The team members operate highly-available authentication and authorization services in the cloud and consult with cyberinfrastructure operators to configure the services to meet their unique needs.

Preferred experience for intern candidates:

– Good written and verbal communication skills

– Familiarly with HTTP (Hypertext Transfer Protocol)

– Ability to work with others

For any questions about the project, contact Jim Basney

Trusted CI: the NSF Cybersecurity Center of Excellence

Jim Basney

The Trusted CI team at NCSA provides cybersecurity support for National Science Foundation projects and facilities. The team members meet with project/facility representatives to discuss their cybersecurity needs and produce cybersecurity documentation and training for the projects/facilities.

Preferred experience for intern candidates:

– Good written and verbal communication skills

– Familiarity with basic cybersecurity concepts

– Ability to work with others

For any questions about the project, contact Jim Basney

A Machine Learning and Geospatial Approach to Targeting Humanitarian Assistance Among Refugees in Lebanon

Angela Lyons and Aiman Soliman

An estimated 84 million persons are forcibly displaced worldwide, and at least 70% of these are living in conditions of extreme poverty. More efficient targeting mechanisms are needed to better identify vulnerable families who are most in need of humanitarian assistance. Traditional targeting models rely on a proxy means testing (PMT) approach, where support programs target refugee families whose estimated consumption falls below a certain threshold. Despite the method’s practicality, it provides limited insights, its predictions are not very accurate, and it can impact the targeting effectiveness and fairness. Alternatively, multidimensional approaches to assessing poverty are now being applied to the refugee context. Yet, they require extensive information that is often unavailable or costly. This project applies machine learning and geospatial methods to novel data collected from Syrian refugees in Lebanon to develop more effective and operationalizable targeting strategies that provide a reliable complementarity to current PMT and multidimensional methods. The insights from this project have important implications for humanitarian organizations seeking to improve current targeting mechanisms, especially given increasing poverty and displacement and limited humanitarian funding.

Preferred Skills:

– Background in data science and statistical modeling

– Programming languages: Python, R, and/or Stata

– Basic knowledge and skills in machine learning and/or geospatial analysis

– Expertise in creating mappings and other data visualizations

– Experience in programming and development of dashboards

For any questions about the project, contact Angela Lyons or Aiman Soliman

Improving Conserved to Primitive Variable Recovery in the Einstein Toolkit

Roland Haas

Computational fluild simulation in astrophysics deal with unique challenges not present in classical computational fluild simulations. Among these challenges is the need to maintain two copies of the state variables: converved variables that can be evolved in time using the evolution equations and primitive variables required to close the system of evolution equations via an equation of state. At timestep and for each point in space the simulation code solves non-linear equation to convert from one set to the other. This project aim to improve the treatment of the primitive variable recovery in simulations using the Einstein Toolkit. Sepcifically we will implement the methods described in https://arxiv.org/abs/0704.2608 (section 3.6) de-averaging the conserved variables before primitive recovery and study the effect this has on simulations.

Required skills:

– Solid experience in C++ code development, beyond class project

– Use of Linux operating system for daily work

– Numerical approximations class or knowledge about polynomial approximations

– Use of git version control system

– Some knowledge about partial differential equations may be useful

– Experience using numpy, scipy, matplotlib or other scientific plotting packages

For any questions about the project, contact Roland Haas

Co-designing a Prototype of an mHealth App with Peer Navigator Support for People with Disabilities

Rachel Adler

Adjusting to a newly acquired disability and transitioning from in-patient rehabilitation to community living poses numerous challenges for many individuals. Additionally, accessing in-person interventions and resources can be difficult. We are using a co-design approach, in collaboration with a team of peer navigators with a wide range of physical, cognitive, and sensory disabilities, to create a high-fidelity prototype that will be used to support the remote delivery of a peer support intervention to people with disabilities transitioning to community living. We will complete the prototype of the app and conduct usability testing on the prototype.

Required skills:

– Figma prototyping

– Good communication skills

For any questions about the project, contact Rachel Adler

Parallelizing Workflows for Scramjet Simulations

Douglas Friedel

The Center for Exascale-enabled Scramjet Design (CEESD) is developing physics-faithful predictive simulations enabled by advanced computer science methods to leverage massive-scale computational resources to advance scramjet designs that leverage advanced high-temperature composite materials. CEESD aims to expand access to space, defense, and accelerate global transport.

Part of this effort utilizes Parsl (an open-source parallel scripting library for Python that assists in programming and executing data-oriented workflows in parallel) and Globus Compute (a federated Function-as-a-Service (FaaS) platform that builds on Parsl to make remote computing easier) to parallelize and distribute the simulation workflow.

Required Skills: Python programming

Preferred Skills: Experience with HPC systems

For any questions about the project, contact Douglas Friedel

HPC Workload Monitoring: Python and Application Tracking

Greg Bauer

The NCSA Delta project would like to improve its understanding of the work being run on its Delta HPC GPU resource. The projects and users run a variety of applications ranging from traditional HPC codes that use C/C++/Fortran and external libraries to Python-based codes that use many packages, some of which are not provisioned in the deployed anaconda Python installation. We are looking for someone who is interested in learning about HPC systems, how centers like NCSA use packages like https://xalt.readthedocs.io/en/latest/ and utilities like https://github.com/NERSC/customs, and would like to help enable this type of tracking on the Delta HPC GPU resource.

Required Skills: Some Python programming

Preferred Skills: Some Experience with HPC systems

For any questions about the project, contact Greg Bauer

Visualizing Invisible Wireless Signals

Elahe Soltanaghai

We are developing a mixed-reality app that can visualize wireless signals between WiFi devices in homes and buildings and their interaction with objects and people in the environment—desired expertise: blender, unity, familiarity with LiDar and Mesh reconstruction algorithms. If you’re interested in this project, please take a look at this paper.

For any questions about the project, contact Elahe Soltanaghai

Gaze-based Immersive Interactions

Elahe Soltanaghai

This project explores a new gaze interaction technique for virtual and augmented reality applications that integrates focal depth into gaze input dimensions, facilitating users to actively shift their focus along the depth dimension for interaction. Familiarity with Unity and some mixed reality programming experience is required for this project. If you’re interested in this project, please take a look at this paper.

For any questions about the project, contact Elahe Soltanaghai

Radar-based Environmental Sensing

Elahe Soltanaghai

This project explores the use of radar for sensing the under-canopy layers of the forest such as soil or biomass moisture. The research activities include the use of signal processing and machine learning for estimating water content from wireless reflections. Required expertise: Python programming, signal processing, and deep learning. If you are interested in this project, review this paper.

For any questions about the project, contact Elahe Soltanaghai

LLM-based Knowledge Agents

Kevin Chang

Large language models aka ChatGPT have changed the landscape of artificial intelligence and promise to automate how we perform knowledge work. This project will explore LLMs as the “engines” to build agents for such automation– e.g., to help you find the knowledge you need, to synthesize the knowledge for a specific topic, to answer a technical question, to tutor a student with a personalized learning experience.

Techniques: large language models, natural language processing, information retrieval, data mining, machine learning.

For any questions about the project, contact Kevin Chang

MedTerms: Innovative Software Interfaces for Medical Education

Jessica Saw

MedTerms, developed by the Visual Analytics Group (NCSA) is a web-based platform that uses visualization design and innovative software user interface to support students in learning human disease from a first-principles approach. NCSA technical expertise the student would experience includes: (1) the unique combined skillset in software user interface and experience (UIX) design and medical domain knowledge, and (2) creativity and innovation in software interface design and development.

A good candidate for this project is pre-med, planning to apply to MD/PhD or MD, with an interest in user interface and experience design (no coding experience required)

For any questions about the project, contact Jessica Saw

Foundation Model for Brain MRI

Shirui Luo

Much of the diffusion models success stems from the ability to carefully control the generated outputs by conditioning & guidance, without retrain the dataset from scratch. In this project, we’ll train a unconditional diffusion model on a library of brain MRI images, acting as the foundation model for generic image generation. Based on this model, we’ll explore variuos down-stream tasks with universal guidance, ranging from linear inverse problem (reconstruction, inpainting, deblurring), nonlinear inverse problem, to blind inver problem (segmentation, image-text generation). A good candidate will need solid background in diffusion models.

For any questions about the project, contact Shirui Luo

Nutrition Data Collection and Analysis Tool With Generative AI Integration

Volodymyr Kindratenko

This project is an exciting opportunity to collaborate with a team of fellow students in the development and implementation of a cutting-edge tool for collecting nutrition data from images of foods consumed throughout the day and analyzing it using advanced generative AI techniques. By participating in this project, students will gain hands-on experience in the development of innovative tools, explore the intersection of technology and nutrition, and contribute to improving the understanding of nutrition data through advanced analytics and generative AI. Research activities will include developing a mobile application, data analysis based on the collected data, development of a large language model for data analysis, integration of various sources of data with the large language model.

For any questions about the project, contact Volodymyr Kindratenko

Enhancing Response Accuracy of an LLM-Based Teaching Assistant Tool

Volodymyr Kindratenko

We are looking for a dedicated and enthusiastic student to work on an exciting project aimed at improving the response accuracy of an LLM-based teaching assistant tool. Our team has developed a functional version of the tool, accessible at uiuc.chat, and we are eager to enhance its capabilities further. The primary goal of this project is to develop innovative methods to improve the accuracy of the teaching assistant tool’s responses, particularly in the context of factual information. We are inspired by the concepts outlined in the research literature, specifically the Chain-of-Verification methodology. We are interested in creating a novel factual consistency model that will ensure the correctness of fact-based answers provided by the tool.

For any questions about the project, contact Volodymyr Kindratenko

Design and Implementation of Digital Twin Models for Continuous Monitoring and Performance Prediction of Precast Concrete Bridges

Volodymyr Kindratenko

This research project aims to design and validate a digital twin model of a precast concrete bridge structure, subsequently integrating this model with sensor data derived from a real-world bridge. This integration will serve as the foundation for driving the simulation and analysis of the bridge model. The digital twin model will be constructed utilizing NVIDIA Omniverse, an innovative platform that facilitates the creation of collaborative and immersive 3D objects, equipped with real-time simulation capabilities.

For any questions about the project, contact Volodymyr Kindratenko

Rshiny Apps for Vector Control

Becky Smith

This project aims to convert statistical and mathematical models currently being developed for vector control into Rshiny apps for use by vector control agencies. Apps may be featured on CDC websites.

For any questions about the project, contact Becky Smith

Music on High-Performance Computers

Sever Tipei

The project centers on DISSCO, software for composition, sound design and music notation/printing developed at UIUC, NCSA and Argonne National Laboratory. Written in C++, it includes a Graphic User Interface using gtkmm. A parallel version has been developed at the San Diego Supercomputer Center with support from XSEDE (Extreme Science and Engineering Discovery Environment). Due to the fact that DISSCO is a “black box” that does not allow the user to interfere during computations and that the computer makes decisions not controlled by the user, it shares features with AI type of projects. Further developments are considered in this area.

Skills needed:

– Proficiency in C++ programming

– Familiarity with Linux Operating System

– Familiarity with music notation preferred but not required.

For any questions about the project, contact Sever Tipei

Translating State-of-the-art Image Analysis to the Biology Lab

Frank Brooks

Imaging is fundamental to biological and medical research. The use of deep neural networks (DNNs) is revolutionizing the entire field of biomedical imaging. DNNs can be applied to remove noise from images, increase image resolution, decrease imaging time, fill-in corrupted image regions, and even synthesize virtual image contrasts that replace traditional biomarkers. These DNNs usually are developed and trained by experts in machine learning, and published as code rather than software. Furthermore, even pre-trained DNNs often require very specific libraries and computing environments in order to work at all. The status quo is that many biologists cannot take advantage of state-of-the-art image processing and analysis techniques which are based on deep learning.

The goal of this project is to bridge the gap between GitHub and the biology lab. The first step toward this goal is to try a variety cutting-edge image analysis techniques on real data. This is not a trivial step; sometimes just pre-processing and re-packaging image data for input into a network—which is written by someone else—can be challenging. The hope is that the SPIN student will try 8-12 networks in a variety of environments ranging from the desktop to the high-performance computing cluster.

Learn More

Sometimes networks will have to be trained from scratch on new data, other times they will be used only in inference mode. Once it is understood which networks work in which computing environments, the job will be to set up those environments so that someone with very little training in modern computing can use the networks. For example, perhaps the SPIN student will discover that, with a small tweak to the pre-processing pipeline, a standard “weakly supervised image segmentation” network and a new “super-resolution” network have similar computing and library requirements. The SPIN student could then streamline use of these networks by creating a single docker or virtual environment that works for both. Ultimately, in a later project, this work might be extended to create a drop-down box in web interface, like that on HAL at the NCSA, where users can simply select and start environments for their favorite networks. Critical to this future extension, and keeping in mind who the current “customers” are, the SPIN student will have to very thoroughly and clearly document how to use each of the networks tried. It is envisioned that complete documentation would be done via Sphinx and set up like a standalone readthedocs.io page, and that an additional “how-to” presentation will be given upon project completion.

In summary, the job is for computation-savvy students to demonstrate and document how to apply existing state-of-the-art deep neural networks to specific problems in biomedical image analysis. The job is not about making new decisions from image data and not about creating or training new neural networks. The main challenge is not in data analysis at all but, instead, is in deciphering the codebases and documentation found on GitHub, and in navigating various computing environments.

Skills required:

Python—you should be able to read code written in PyTorch and Tensorflow well enough to figure out how to get image data and hyper-parameter values to a variety of networks found on GitHub

Attention to detail— your scripts need to be turnkey and your documentation complete

Excellent communication—the “customers” here won’t know the jargon so you’ll have to explain things very carefully and be serene when they don’t understand you the first n times that you try

Preferred experience (in no particular order):

– Demonstrated project completion

– Familiarity with biomedical image data

– Familiarity with the Sphinx documentation generator

– Computation using Linux-based computing clusters

For any questions about the project, contact Frank Brooks

Foundation Models in Physics

Eliu Huerta

The selected student will participate in the development of foundation models. both large language models and generative AI models, with specific applications in materials science discovery (design of metal-organic frameworks for carbon capture) and for the detection of neutron star mergers through gravitational wave and electromagnetic observations. Requirements: knowledge of python and linux is useful, practical knowledge of TensorFlow and PyTorch is also encouraged.

For any questions about the project, contact Eliu Huerta

AIMA: Developing AI-based Modeling Assistants

Bertram Ludaescher

In CS, optimization of machine (= CPU) cycles is important. Optimization of human (= brain) cycles arguably is even more important. In this project you will explore the use of AI/LLMs to assist human modelers and develop a prototype modeling assistant.

Two project options are available:

ERA: Conceptual modeling is a critical first step in database design and development. You will experiment with LLMs to translate natural language specifications into a conceptual data model in a formal (rule-based) language. The implications of the generated logic model can be symbolically analyzed and fed back to Entity-Relationship Assistant for prompt-engineering or fine-tuning.

XAFA: The eXplainable Argumentation Framework Assistant will use LLMs to translate natural language specifications into arguments in a graph-based, formal argumentation framework.

For any questions about the project, contact Bertram Ludaescher

WormFindr: Automatic Segmentation of Neurons in C.elegans

Jill Naiman

This project is a part of the larger WormAtlas and C.elegans project, recently funded by the NIH, which aims to study and visualize the anatomy of C.elegans in order to better understand their neural connections. For the WormFindr SPIN project, our goal is to apply machine learning segmentation models to test the effectiveness of these methods in automatically segmenting the neurons in images of C.elegans.

Preferred Skills:

Experience with programming, preferably in Python Knowledge or practice of machine learning methods is welcomed, but not required

For any questions about the project, contact Jill Naiman

Quantifying the Effectiveness of Scientific Documentaries Using Natural Language Processing

Jill Naiman

NCSA’s Advanced Visualization Lab (AVL) in collaboration with iSchool are looking for an undergraduate research intern to help with a research project that builds on the research of doctoral candidate Rezvaneh (Shadi) Rezapour and Professor Jana Diesner, which uses data mining and natural language processing techniques to study the effects of issue-focused documentary films on various audiences by analyzing reviews and comments on streaming media sites.

This new research will focus specifically on science-themed documentaries that use computational science research in their science explanations. Student researchers would be responsible for working with mentors in iSchool (Professor Jill Naiman) and AVL to collect data from streaming sites and analyze the data using existing purpose-built software and developing new tools.

No skills required. Students will be trained to conduct the classification of text documentary reviews. Preferred: background in interdisciplinary research.

For any questions about the project, contact Jill Naiman

Enhancing Optical Character Recognition (OCR) Capabilities for Historical Documents

Jill Naiman

This project is a subset of a NASA Astrophysics Data Analysis Program (ADAP) project aimed at creating several science-ready data products to help astronomers’ search the literature in new ways. This goal is being accomplished by extending the NASA Astrophysics Data System (ADS), known as an invaluable literature resource, into a series of data resources. One part of this project involves the “reading” of figure captions using Optical Character Recognition (OCR) from scanned article pages.

A large source of error in the OCR process comes from artifacts present on scanned pages — scan-effects such as warping, lighting gradients and dust can generate many misspellings. This project is focused on better understanding these types of effects using image processing and analysis to better clean old images before OCR processing AND to potentially generate artificial training data using “aged” images of newer, digitized documents.

For any questions about the project, contact Jill Naiman

Using Large Language Models to Decompose Research Papers Into Nanopublications

Tim McPhillips

The bulk of existing and newly reported scientific knowledge is embedded in the natural language texts of research papers. Figures and tables in these papers, along with the computational artifacts associated with them, provide one entrypoint to this knowledge; while recent NLP and LLM advances provide another. In this project we will demonstrate how knowledge in the scientific literature can be harvested via a combination of these approaches. The student will automate tools for extracting figures and tables from PDFs of research papers; train, finetune and use LLMs to detect in the text of these papers references to figures, to tables, and to computational artifacts used or produced by the research; extract portions of the text that make scholarly claims (assertions) supported by some combination of the figures, tables, and computational artifacts; and then decompose each publication into a set of nanopublications making subsets of the claims asserted in the original paper and supported by corresponding subsets of the figures, tables, and computational artifacts. Finally, the student will explore the potential for building a scholarly knowledge graph from the nanopublications extracted in this way from a set of related papers in a particular research field chosen jointly by the student and project supervisor.

Preferred experience for intern candidates: Python programming experience; interest in LLM and NLP.

For any questions about the project, contact Tim McPhillips

Modeling of the Complex Environment of the Cell

Taras Pogorelov

The cell environment is complex, crowded and is difficult to capture for sufficient l timescales with modern computational approaches. The Pogorelov Lab at Illinois uses the specialized supercomputer Anton to model cell-like environment for hundreds of microseconds. We develop computational analyses tools and workflow to mine these large and unique data. We work in close collaboration with experimental labs to cross validate when possible computational and experimental data. Modeling approaches include classical molecular dynamics and data analyses. These projects include development of workflows for analysis of protein-protein and protein-metabolite interactions, and water dynamics that are vital life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD, MDAnalysis, and VMD software packages.

For any questions about the project, contact Taras Pogorelov

Multiscale Modeling of the Cellular Membrane-Associated Phenomena

Taras Pogorelov

The cell membrane environment is complex and challenging to model. The Pogorelov Lab at Illinois develops workflows that combine computational and experimental molecular data. We work in close collaboration with experimental labs. Addressed questions include investigations of fundamental mechanisms of membrane activity, structural dynamics of peripheral and transmembrane proteins, and development of membrane-active drugs. Modeling approaches include classical molecular dynamics, quantum electronic structure, and quantum nuclear dynamics. These projects include development of workflows for modeling and analysis of the lipid interactions with proteins and ions that are vital for life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD molecular modeling software.

For any questions about the project, contact Taras Pogorelov

Capturing Structure and Dynamics of Signaling Proteins

Taras Pogorelov

A detailed understanding of the mechanisms of signal transduction from the surrounding environment into the cell is essential for our progress in structural molecular biology of disease and drug discovery efforts. The Pogorelov Lab at Illinois develops workflows that combine computational and experimental molecular data to capture dynamic structures of receptor tyrosine kinases (RTKs), a family of cell-surface receptors, that are key regulators of vital cellular processes. These structures are exceedingly difficult to capture using experimental techniques, mostly due to fluid nature of the membrane environment. We are combining all-atom molecular dynamics with and without distance restraints and the project includes among other developments, a novel implementation of minimally biasing potentials to expedite signaling protein dimer formations in realistic membranes. We work in close collaboration with experimental labs. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD or LAMMPS or other molecular modeling software.

For any questions about the project, contact Taras Pogorelov

Virtual Reality and Eye Tracking for Robotic Inspections

Mohamad Alipour

Interest in robotic visual inspections of critical infrastructure has rapidly risen in recent years. Many studies have been published in the last decade focusing on machine-learning-based computer models for automated defect detection in imagery. Research on ground, aerial, and marine robotic systems with sensing capabilities and autonomous navigation for structural inspections has also gained heightened traction. To create robotic systems that can deliver human-level actionable inspection output, there is a need for an improved understanding of how human inspectors perform different tasks. This project focuses on developing a data collection platform for sensing, data extraction, and human inspector behavior analysis. We develop a platform based on mixed reality (MR) while leveraging eye-tracking technology to sense and measure the visual performance of inspectors during virtual inspections. The platform created in this study will be helpful in future studies on integrating human inspection patterns into robotic platforms. Depending on the progress of the project, opportunities exist for participating in real tests of target structures with Unmanned Aerial Systems (drones) and robotic control system development.

Student Background and Research Activities: Successful applicants will have strong programming skills and experience with game engines and VR programming (e.g., Unity or similar platforms). No knowledge of civil, structural, and mechanical engineering is required. This project involves exciting research activities, including mixed reality application development for mixed reality headsets such as the Meta Oculus Quest, HTC Vive, and Magic Leap. Depending on the progress, this project may lead to a conference paper and/or a longer-term research position.

For any questions about the project, contact Mohamad Alipour

UniHOI: 3D Human-Object Interaction Benchmark and Visualization Toolsets

Yuxiong Wang

UniHOI is a community implementation to unify the 3D human-object interaction datasets, supplemented by a Blender add-on to enhance rendering realism and streamline human-object animation visualization. Key Responsibilities: 1) Data Collection and Standardization: Extract Mocap data from publicly available human-object interaction datasets and standardize their format. 2) Development and Coding: Implementing the add-on using Python and Blender’s API for interaction loading, uv mapping, scene configuration, and rendering, based on the collected data. 3) Testing and Optimization: Rigorous testing of the add-on for performance and reliability. 4) Documentation: Creating detailed documentation and tutorials for future users and developers. 5) Collaboration: Working closely with the project supervisor and potentially other team members, including regular progress updates and feedback sessions. What we need to see: 1) Proficiency in programming skills, particularly in Python. Successful completion of CS 225 (or a comparable course in data structures and algorithms). 2) Experience with at least one course such as CS 415, CS 418, CS 419, and CS 445. 3) Understanding of 3D Mathematics such as the concept of 3D transformation. Ways to stand out from the crowd: 1) Hands-on experience with 3D graphics engines and tools, such as Blender, Unity, and Unreal. 2) Skills in using physics simulators, such as IssacSim, IssacGym, Mujuco, Pybullet, and Taichi. 3) Understanding of animation techniques, such as rigging, retargeting, skinning, and UV mapping.

For any questions about the project, contact Yuxiong Wang

UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN

NCSA HOME

Spin – 2024 Summer Mentors