This project will continue ongoing work using 3D data visualization techniques to interpret top results from gene expression analysis. The student will augment an existing application built on the Unity game engine with a specialized mode for viewing complex gene networks and their relevant annotations in a three-dimensional interactive environment. Previous 3D design or programming experience is valuable.
This student will be co-mentored by the Advanced Visualization Lab (AVL) and the Visual Analytics team at the NCSA. The AVL specializes in cinematic data visualization, software development, and interactive computer graphics. The Visual Analytics team focuses on the application and development of visual information design techniques that aid in the comprehension of complex biological and medical data.
NEAT (NExt-generation sequencing Analysis Toolkit) is a software toolkit written in Python, primarily focused on generating simulated big data similar to next-generation sequencing machines. NCSA Genomics has recently taken over development of the NEAT software package, and updated it to Python 3, and now we are ready to take NEAT to the next level. NEAT includes advanced simulations of statistical properties of genetic data, visualization tools to help visualize and troubleshoot the parameters, and analysis tools to analyze real data to help shape the simulations. We are the process of adding advanced features such as wrapper scripts for bacterial population genomics and polyploid organisms. We have opportunities to learn Python, data visualization, production software workflows, and more! We are looking for interns to design, write and run unit tests, update documentation, and even develop some of the NEAT code using Python packages such as Biopython, Matplotlib, and Pandas.
The Community Data Clinic and Cunningham Township are seeking a student with programming and web design skills for a project rebuilding the Illinois PATH 211 database from scratch. The new version will be a web-based directory based on a relational database of social service providers in the Champaign County area. The student's main task will be building out this web application to filter results and integrate user feedback. We are collaborating with local partners and stakeholders to ensure that our platform is as accessible and intuitive as possible, so a keen UI sensibility or experience with frontend accessibility would be a plus.
For the first year we are imagining a lightweight proof-of-concept, perhaps just using JavaScript and Firebase, plus HTML and CSS. Future architectures might be designed to work from the PATH servers, but that's a long-term goal for now.
Interested applicants should provide links to examples of relevant past projects and briefly explain their interest in community-based work.
This project proposes the creation of an algorithm/library/tool that can be used to detect, analyze, and extract non-textual objects on page images scanned from print sources in the HathiTrust Digital Library and elsewhere. Pages in printed materials (books, journal articles, etc.) often contain important non-textual elements such as tables, figures, maps, musical notation, etc., sometimes with associated captions. When these are scanned and processed using Optical Character Recognition (for text-mining and other computational purposes), these non-textual features and their rich content are most often lost, greatly impoverishing the value of the original. This project will develop methods, algorithms, and tools to detect, extract, and describe these non-textual objects for further use and research.
Skills required: programming experience, computer vision experience (object detection), basic knowledge of metadata or Linked Open Data standards
In this project the goal is to create an algorithm and tool that allows researchers to identify, retrieve, and visualize dated events occurring in a collection of texts from the HathiTrust Digital Library (i.e., scanned printed books from library collections). Visualizing the timeline of events portrayed in a collection in chronological order is a great way to follow along and understand the key milestones or cause-and-effect relationships in a narrative or large collection of narratives. For an example of what a timeline of events might look like, see http://simile.mit.edu/timeline/examples/
Skills required: programming experience, NLP experience (identifying/extracting date entities from text), web development experience
Dr. Kaiyu Guan's lab is conducting research on using novel satellite data from the NASA satellites to study environmental impact on global and U.S. agriculture productivity, in the platform of the most powerful supercomputer in scientific research (Blue Waters). We are looking for highly motivated and programming-savvy undergraduate students to join the lab for the SPIN program. The chosen students will be closely mentored by Dr. Guan, and will be working on issues including processing large satellite data, understand and implement remote sensing algorithms, and solve questions that are related to the global food production and food security.
DataVault is an experimental data storage facility to house gravitational waves computed using the Einstein Toolkit. It was developed at NCSA to hold the eccentric waveform catalog of the NCSA gravity group. Scientists can upload waveforms as HDF5 files and search for individual waveforms using a content aware search engine using either a web form or REST API using the underlying "Girder" framework.
DataVault uses a Python backend running inside of a set of Docker containers in a virtual machine hosted at NCSA's Nebula cluster. Frontend functionality is provided using javascript and the pug templating engine.
This project aims to improve the functionality of DataVault to offer a more convenient frontend search interface, and to port it to the newest Girder version, to transition to Python 3. We will also finalize containerization of DataVault including its ancillary NGINX and MongoDB servers. Time permitting the project will explore minting DOIs for the uploaded waveforms.
The successful applicant will be collaborating with Roland Haas, the NCSA Gravity Group and the physics group of Helvi Witek.
Skill required: solid understanding of Linux, good grasp of Python, ideally with experience in web design, solid understanding of JavaScript for the frontend part. An ability to learn new frameworks and work with an existing code base is required as well.
Before applying: please read the exercise at https://wiki.ncsa.illinois.edu/display/~rhaas/SPIN+2021-2022+Exercise and send me the result as explained there. I will not consider applications without having received the exercise.
This project is part of the ongoing effort in the NCSA Gravity Group to study gravitational waves produced by colliding compact objects like black holes and neutron stars. We use machine learning techniques to improve LIGO's detection capabilities to be able to detect gravitational waves more quickly (lower latency) and increase the set of waveforms that can be detected (for example detecting black holes on elliptic orbits as well as circular orbits).
Training these networks requires large numbers of training waveforms to be available to the network. We currently use LIGO's LIGO Algorithm Library (LAL) to produce these waveforms offline then read them in during the network training phase. For large parameter spaces this become infeasible since the training dataset increases in size to hundreds of terabytes. This project aims to produce training waveforms on the fly, interleaving a training epoch (on the GPU) with waveform production for the next epoch (on the CPU). You will use Python, LAL and TensorFlow to set up a pipeline that uses CPU cores to produce waveforms while using the produced waveforms to train neural networks. Students working on this project will have access to the HAL cluster at UIUC and Summit at Oak Ridge National Lab.
Skills required
Before applying for this project please work through the exercise available at https://wiki.ncsa.illinois.edu/display/~rhaas/SPIN+2021+Exercise as this will be part of the interview with prospective candidates. I will not consider your application unless I have received the exercise.
Background: The phrase "a gut feeling" is cliché, but it may be truer than previously thought. The human gastrointestinal tract (or "gut") contains one of the most densely populated microbial communities on earth, comprised of trillions of microbes spanning at least 1,000 different species collectively known as the "gut microbiome." Interestingly, the gut microbiome and diet are independently linked to diseases, including obesity, type 2 diabetes, and cancer. However, there is little research on the impact of foods and nutrients and their impact on the gut microbiome, and subsequently, disease.
Our work: With the advent of advanced sequencing technologies and informatics techniques, we aim to fill this gap. We have collected bacterial samples from participants that consumed specific foods, walnuts, almonds, oats, barley, avocado, and broccoli, during five different clinical trials. Sequencing the DNA of these samples yielded big data on the bacterial composition of each subjects' gut before and after the trial. Our goal is to utilize these sizable datasets to find connections between the foods and changes in the gut microbiome, along with various health markers, like blood sugar and body weight.
Current progress: Thus far, we have built a machine learning model that can predict which of the six foods mentioned above a person consumed with 86% accuracy. We are working to improve our model's accuracy, both via advanced data science techniques as well as incorporating additional data taken from the clinical trials.
How you can contribute: This project offers opportunities to work with large clinical datasets and machine learning techniques along with industry-standard bioinformatics software to analyze and visualize data. Knowledge of basic data mining, analysis, and visualization techniques is preferred. Familiarity with using command line tools is also a plus.
The student will get familiarized with distributed training frameworks and optimization schemes in high performance computing clusters. These methods will be explored in a number of applications, including gravitational wave astrophysics, cosmology, cancer research, among others. The students will participate in the deployment of these models in different platforms to conduct AI discovery in edge computing resources.
We have a research opening for a student interested in the development of physics-inspired AI models for multi-messenger astrophysics. The selected student will work with a team of undergraduate and graduate students, postdocs and faculty with extensive expertise in AI, data and supercomputing. The student is expected to have knowledge of Python, version control (gitlab, GitHub, etc.). Knowledge of TensorFlow, PyTorch or any other open source platform for deep learning is desired. No knowledge of physics or astronomy is required. The student will become an affiliate of the NCSA Center of Artificial Intelligence Innovation and the NCSA Gravity Group, and will have access to the Hardware Accelerated Learning (HAL) cluster at NCSA, and to the entire ecosystem of AI supercomputers in the U.S., including Bridges-AI, Neocortex and Summit.
The Innovative Systems Lab (ISL) is looking for a student interested in deploying and operating private cloud infrastructure based on OpenStack, RHEL OpenShift, Kubernetes or similar technologies. The student with work with ISL system engineers and C3SR researchers to develop and deploy innovative solutions to support hybrid cloud infrastructure research. The student is expected to have knowledge of Linux system administration, CLI, and Python. Knowledge of any open source or commercial cloud platforms is desirable.
The Center for AI Innovation in collaboration with the Innovative Systems Lab (ISL) is looking for students interested in acceleration of machine learning algorithms on FPGAs and other unconventional architectures. The students will work with a team of other undergraduate and graduate students and a postdoc on several aspects of FPGA-based computing, ranging from machine learning frameworks integration with FPGA-based inference models to the development of HLS-based FPGA codes. The students are expected to have taken ECE 385 or similar class as well as an applied machine learning class. Knowledge of TensorFlow, PyTorch or any other open source platform for deep learning is desirable; knowledge of HLS design methodology is a plus. The students will become affiliates of the NCSA Center for AI Innovation, and will have access to FPGA systems at ISL and Xilinx Center of Excellence for Adaptive Computing at the Coordinated Systems Lab.
The Center for AI Innovation is looking for a student interested in the development of optimization techniques for reducing complexity of deep learning models. Previously we have developed a technique for network pruning carried out simultaneously with model training. This current project seeks to advance this technique by implementing it on new NVIDIA GPUs that have hardware support for sparse matrix operations. The student is expected to have taken ECE 408 or similar class as well as an applied machine learning class. Proficiency with TensorFlow, PyTorch or any other open source platform for deep learning is required. The student will become an affiliate of the NCSA Center for AI Innovation and will have access to GPU systems at the Innovative Systems Lab at NCSA.
The Center for AI Innovation is looking for a student interested in the development and implementation of machine learning models for recognizing human actions. We previously have developed models for human fall detection and aggression detection, and have implemented human fall detection model on RaspberryPI platform. The selected student will work on improving these models and developing new models and their implementations on low-power edge devices. The student is expected to have a good working knowledge of Python and C++. Knowledge of TensorFlow, PyTorch or any other open source platforms for deep learning is required as well. The student will become an affiliate of the NCSA Center for AI Innovation and will have access to advanced GPU hardware for model training.
Image-based pathology (histopathology) is the gold standard for cancer diagnosis but has been unable to differentiate high-risk cancer cases from low-risk ones. In the United States, this inability demands interventions for all patients and generates significant medical side effects, at an estimated annual cost of $4 billion for breast cancer alone. Since early 2000s, it has been recognized that high-risk cancer cases, but not the low-risk ones, are accompanied by an active tumor microenvironment for tumor cells to invade and metastasize. Thus, standard histopathology that focuses on detection of tumor cells but lacks information on the tumor microenvironment should ideally be complemented by an optical imaging technology that better reveals the tumor microenvironment. Using information extracted from the tumor microenvironment will permit to overcome this critical inability to differentiate low risk from high-risk. To better take into account the tumor micro-environment, here, we propose to use the multiphoton histopathology. Our hypothesis is that a combined standard and multiphoton histopathology, together with a demonstrated platform of a survival convolutional neural network, will dramatically improve the prediction accuracy for breast cancer outcome.
Skill requirements
The Legacy Survey of Space and Time (LSST) at Vera C. Rubin Observatory will obtain TB of data/night, producing a high-definition "movie" of the night sky. Correctly detecting, identifying, and segmenting astronomical sources efficiently is a top priority. LSST images will be so deep and detailed, that sources will be crowded or "blended" together. Looking to the rapidly-developing field of computer vision, our group has developed a proof-of-principle deep learning framework to process images and identify blended sources in them. However, the work and most others in the literature have thus far been limited to simulated images, untested on a real and large image dataset. To solve this problem, here we propose a SPIN research project to apply our novel image segmentation method to real images taken by the most powerful camera in the world, the Hyper-Suprime Cam (HSC), on one of the largest ground-based telescopes, the 8.5 meter Subaru telescope. Being the closest match to LSST in terms of the expected cutting-edge image data quality, HSC is the ideal dataset for evaluating and developing this interdisciplinary approach on real astronomical images. The SPIN student will develop and test deep learning architectures within our already proven framework. The student will leverage NCSA resources, such as the HAL GPU cluster, as well as collaborate with local experts. This is an opportunity to test and develop a competitive new approach combining astronomy big data with machine learning and computer vision.
Declarative problem solving involves writing constraints as logic rules. For example, the following two logic rules solve the problem of checking whether a graph can be colored with only two colors: The first rule assigns either red or blue to a node X, while the second rule filters out failed candidate solutions, i.e., having two neighboring nodes X and Y with the same color:
color(X,red) v color(X,blue) ← node(X). % Generate ..
false ← edge(X,Y), color(X,C), color(Y,C). % .. and test
For example the utility graph K3,3 can be colored in this way (houses are red, utilities blue or vice versa), while K5 (cf. Kuratowski's theorem) cannot. Graphs that can be two-colored are also called bipartite. A well-known example of bipartite graphs are Petri nets which consist of places and transitions and are used to model and analyze concurrency in distributed systems. In social network analysis, affiliation networks and folksonomies can be modeled as bipartite and tripartite graphs, respectively.
The goal of the project is to visualize and analyze declarative solutions obtained from constraint-based specifications similar to the example above (this includes scheduling problems, optimizations problems, etc.)
Visualizing Declarative Solutions
In this project you will apply information visualization techniques to display (i) problem instances, (ii) candidate solutions, (iii) failed candidates, and (iv) successful solutions in graph form. The goal is to come up with simple and intuitive renderings for a number of declarative problems and solutions.
Required skills: Programming skills in Python
Desired skills: Experience with Jupyter notebooks and databases (e.g. SQL)
Helpful skills: Experience in information visualization; interest in graph theory or combinatorics
The goal of this project is to investigate health disparities arising from birth and death outcomes that exist across the residents of Champaign County through the parsing of electronic birth and death records provided by the Champaign Urbana Public Health District (CUPHD) from the last decade. This project is currently focused on identifying maternal health disparities and how these disparities are associated with various birth outcomes. To accomplish this task, we develop a pipeline for digitizing birth and death records by using natural language processing (NLP), which allows our program to read and understand the language in the birth and death records.
This research will provide an autonomous tool for future projects, which require the parsing of not only electronic birth and death records but also other significant records in medicine.
A sample birth certificate file, which is fake filled for training purposes, is shown below. Note the complexity of the document. The data for the fields vary in type, shape, and length. For example, data may come in the form of checks crossed off in small boxes.
A successful candidate will help the STEM Illinois Nobel project to create a community health worker curriculum for middle and high school students. STEM Illinois is deeply rooted in the historic mission of land-grant institutions, which is to democratize higher education and to address the world's most pressing societal challenges. However, over 150 years after the Morrill Act was passed, social inequality reflects the harsh lived experiences of racially marginalized groups. These inequalities are especially visible in the field of computer science where, despite decades of pipeline programming, the number of underrepresented students remains alarmingly low. The goal of the STEM Illinois project is, to create a unique ecosystem that will nurture future computer scientists in industry and the academy. We believe that these students will follow Illinois tradition and address pressing societal challenges as innovators and Nobel Prize winners.
We seek to increase the number of marginalized students majoring in computer science and medicine. The student will employ thier skills as a pre-med student to help us consider the topics to cover and the areas for innovations as we train community health workers. Innovation in training is especially relevant during COVID-19. At the beginning of the pandemic, the governor of New York deployed the Army Corps of Engineers to create a temporary hospital to address anticipated high levels of morbidity and mortality. We argue that another critical step in addressing the pandemic is to create a corps of community health workers to document the impact of COVID-19 and to share their resiliency tools in real time. These young community health workers will create new knowledge and engage in competitions to improve health and wellness using innovative motives. The student will serve as a mentor to Nobel participants.
The intellectual contributions of this project is the documentation of the overwhelming cost of COVID-19 on the social, emotional, physical and financial health of U.S. citizens. The most important research contribution will be the volumes of data that we collect on marginalized groups' resiliency strategies during a pandemic.
Fertilizer use remains below recommended rates in most of Sub-Saharan Africa, contributing to low agricultural productivity, pervasive poverty, and food insecurity. Small farmers have voiced suspicion that fertilizer is often adulterated, and evidence suggests that these suspicions lead to inefficient fertilizer use: too much in some cases, and too little in others. A key problem is that the quality of fertilizer is unobservable to the uninformed eye at the point of purchase.
We have developed and are currently field-testing a machine-learning based, automated, rapid-response fertilizer quality verification service that farmers in sub-Saharan Africa can use to accurately assess the nutrient content of fertilizer at the point of purchase. A user can analyze a photograph of mineral fertilizer in the app and receives immediate evaluation of whether the fertilizer is adulterated.
We are looking for a student to work with us on improving our existing classification model and to help improve the app as we receive feedback from Tanzanian users.
Students with research experiences in machine learning and/or database management will be strongly preferred. Students should also have experience in Android app develop.
This project researches frameworks and workflows for speech-to-text recognition in order to facilitate live auto captioning and creation of standard caption files for use in live events and video editing, utilizing and enhancing speech-to-text HPC/cloud services and seeks to advance the state of the art in speech-to-text recognition. A successful candidate would need to have completed CS125 (Intro to Computer Science) or have equivalent experience.
This project is a subset of a NASA Astrophysics Data Analysis Program (ADAP) project aimed at creating several science-ready data products to help astronomers' search the literature in new ways. This goal is being accomplished by extending the NASA Astrophysics Data System (ADS), known as an invaluable literature resource, into a series of data resources. One part of this process will be classifying the figures that appear in journal articles by their "type" (for astronomical literature, classes will include things like "images of the sky," "graphs," "simulations," etc). For this summer research project, a student will help with this image classification both by by hand and testing with machine learning methods in collaboration with Dr. Jill Naiman and/or a grad student (School of Information Sciences and NCSA). The main parts of the project will involve developing the codebook of image classifications so that citizen scientists can complete more classifications on a large scale and running the by hand classification scripts. Options to extend this by working on the UI for the classification scripts (in Python, and/or for the Zooniverse citizen science platform) and working with the machine learning methods for image classification are available for interested students.
Required skills: patience (ok with classifying images by hand), attention to detail (to develop the codebook for different and tricky image classes) and curiosity about the machine learning image classification process.
Nice to have skills: experience with Python and machine learning (but these can be taught on the job)
NCSA's Advanced Visualization Lab (AVL) in collaboration with iSchool are looking for an undergraduate research intern to help with a research project that builds on the research of doctoral candidate Rezvaneh (Shadi) Rezapour and Professor Jana Diesner, which uses data mining and natural language processing techniques to study the effects of issue-focused documentary films on various audiences by analyzing reviews and comments on streaming media sites. This new research will focus specifically on science-themed documentaries that use computational science research in their science explanations. Student researchers would be responsible for working with mentors in iSchool (Professor Jill Naiman) and AVL to collect data from streaming sites and analyze the data using existing purpose-built software and developing new tools.
Any required skills or knowledge: none; students will be trained to conduct the classification of text documentary reviews. Preferred: background in interdisciplinary research.
Many disease states, particularly in psychiatry, neurology, and cardiology, are often overlooked in our healthcare systems due to treatment barriers and untimely diagnoses. New disease screening methods are necessary to address these problems. This project proposes developing automated disease screening techniques that can infer clinical states, such as anxiety and manic depressive disorders, using machine learning, modeling, and human speech and language data. The team will integrate models with Clowder to demonstrate the automated annotation of speech/language/health data.
The cell environment is complex, crowded and is difficult to capture for substantial timescales with modern computational approaches. The Pogorelov Lab at Illinois uses the specialized supercomputer Anton 2 to model cell-like environment for hundreds of microseconds. We develop computational analyses tools and workflow to mine this large amount of unique data. We work in close collaboration with experimental lab to cross validate when possible computational and experimental data. Modeling approaches include classical molecular dynamics and data analyses. These projects include development of workflows for analysis of protein-protein and protein-metabolite interactions, and water dynamics vital life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD, MDAnalysis, and VMD software packages.
The cell membrane environment is complex and challenging to model. The Pogorelov Lab at Illinois develops workflows that combining computational and experimental molecular data. We work in close collaboration with experimental labs. Addressed questions include investigations of fundamental mechanisms of membrane activity, structural dynamics of peripheral and transmembrane proteins, and development of drugs. Modeling approaches include classical molecular dynamics, quantum electronic structure, and quantum nuclear dynamics. These projects include development of workflows for modeling and analysis of the lipid interactions with proteins and ions that are vital for life of the cell. The qualified student should have experience with R/Python programming, use of Linux environment, and of NAMD molecular modeling software.
Computational materials science research produces large amounts of static and time-dependent data, including atomic positions and electron densities, that is rich in information. Determining underlying processes and mechanisms from this data, visualizing it in a comprehensive way, and using it to teach a broad general public as well as undergraduate students constitutes an important challenge. Over the last years, our team of SPIN students has used cutting-edge VR headsets and the LookingGlass holographic display to visualize electron-density data sets using isosurfaces and volume plots. Interaction with the data was implemented using the LeapMotion device or the VR controllers. It is now our goal to form an interdisciplinary team to develop immersive teaching experiences using these platforms. For this, existing Unity code will need to be extended to potentially include time-dependent visualizations and narration. Experience with Unity and Blender and teaching skills will be very helpful plus for this project.
To date, processing of conservation footage has been performed manually by trained volunteers, with each identification requiring verification by a member of the scientific leadership team, a highly time-consuming process. Moreover, obtaining accurate species population estimates from such footage presents a challenge since, for many species, individual animals cannot be identified, and assumptions must be made about the number of individuals detected. The objective of the project is to develop deep learning models for high-throughput processing of conservation footage collected to automate species identifications and tracking of individual animals to produce more precise population estimates for wild and marine life species.
Skills desired: TensorFlow, image analyses, basic Linux
Super-resolution ultrasound localization microscopy (ULM) has great potential as a medical imaging technology due to its unique combination of imaging penetration and spatial resolution. Capturing a ULM image takes an excessive amount of time and has consequently prevented deploying the technology in a clinical setting. This project proposes using deep learning to develop a shareable microvascular database that shortens data acquisition and post-processing time to facilitate faster technique development and ultimately achieve the clinical translation of ULM.
The project centers on DISSCO, software for composition, sound design and music notation/printing developed at Illinois and Argonne National Laboratory. Written in C++, it includes a graphical user interface using gtkmm, a parallel version is being developed at the San Diego Supercomputer Center. DISSCO has a directed graph structure and uses stochastic distributions, sieves (part of number theory) and elements of information theory to produce musical compositions. Presently, efforts are directed toward refining a system for the notation of music as well as to the realization of an evolving entity, a composition whose aspects change when computed recursively over long periods of time thus mirroring the way living organisms are transformed in time (artificial life).
Another possible direction of research is sonification, the aural rendition of computer generated complex data.
Skills desired: Proficiency in C++ programming, familiarity with Linux operation system, familiarity with music notation preferred but not required
Professor John Toenjes is creating a virtual reality dance adventure game in order to investigate how to create new forms of contemporary dance theater. This adventure game will hopefully incorporate motion data capture, storage, and recall in order to build a community of participants in the game, as well as allow humanities scholars to access the data outside of the game. He is looking for creative minds who are interested in game design and production. This project needs an undergraduate who could help with basic programming in Unity 3D and/or Unreal Engine. Knowledge of C++ is a plus; alternatively, someone who can create 3D graphic elements would be very useful to this research project. If you have other skills that would be applicable to the project he would love to hear about them in your application.
The Data Exploration Lab (DXL) is seeking a student interested in interactive 3D data visualization. The student will work with members of the DXL team to further develop new interactive visualization tools within the open source yt-platform. The main development pathways include a number of enhacements to the yt_idv package, a new python package based on imgui-opengl for interactive 3D visualation of yt, and a yt plugin for the napari visualization package. Specific goals for the yt_idv enhancements include UI transfer function control, UI vertex/fragment shader control and depth-buffer accumulation for blending visualization objects. The work requires a strong foundation in python including familiarity with packages in the scipy stack as well as proficiency in using Git and working in Linux environments. Familiarity with visualization methods and C, C++ or GLSL language are a plus, but not required. In addition to gaining hands on experience with 3D rendering methods and opengl graphics processing, the student will work on scientific communication skills through occasional blog articles submitted to the yt-blog. The student will participate in regular meetings with supervising DXL team members as well as weekly group meetings with the wider DXL team.
Modern financial markets generate vast quantities of data. As the data environment has become increasingly "big" and analyses increasingly computerized, the information that different market participants extract and use has grown more varied and diverse. At one extreme, high-frequency traders (HFTs) implement ultra-minimalist algorithms optimized for speed. At the other extreme, some industry practitioners apply sophisticated machine-learning techniques that take minutes, hours, or days to run. The proposed project seeks to understand this full spectrum of machine-based trading, with the purpose to inform the public policy and to augment theoretical studies on financial markets. The research agenda focuses on three main themes. 1) Taxonomy. Developing methodologies for estimating the amount of trading activity due to traders at each horizon from proprietary as well as publicly available data; 2) Machine-Machine Interaction. How do interactions among "cyber-traders" impact markets? Under what conditions do such interactions produce extreme disruptions like the Flash Crash of 2010? 3) Machine-Human Interaction. Does machine-based trading mitigate effects of human behavioral biases? Exacerbate them? Do the algorithms themselves introduce any novel types of biases? How do microstructure effects impact larger-scale outcomes for asset pricing and corporate finance? The proposed project will also organize six workshops on big-data research in finance, supported by NBER and the Extreme Science and Engineering Discovery Environment (XSEDE), to stimulate collaboration between financial economists and experts on high-performance computing (HPC) and big data.
Please see Dr. Mao Ye's keynote speech at the National Bureau of Economic Research if they want to know more information.
We looking for a student with excellent English writing skills. It will be a plus if they have programming skills, but it is not necessary.