Emory BMI GSoC Project Ideas
Welcome to the Department of Biomedical Informatics at Emory University, a beacon of innovation in open-source development for biomedical informatics research. Department of Biomedical Informatics, Emory University (often stylized "Emory BMI" for GSoC communications) is committed to open source development of several biomedical informatics research projects. As a research organization, its source code lives across several open-source project repositories, released with open-source licenses including BSD 3-Clause License and MIT license. Most of them can be accessed from the GitHub repositories of the research labs of Emory University School of Medicine: https://github.com/NISYSLAB
Celebrating a Decade of Impact: From our first steps in GSoC 2012 through the milestones of 2023, Emory BMI has thrived, welcoming 6 contributors in 2023, 8 in 2022, and 6 in 2021. As we gear up for GSoC 2024, we're excited to mentor a new wave of innovators, guiding them from contributors to long-term collaborators. Emory BMI takes pride in having past successes and GSoC contributors turn into long-term collaborators and mentors themselves. Emory BMI also encourages the contributors to collaborate further towards research outcomes, alongside their coding, as most of our GSoC projects include a fair amount of research.
Commitment to Diversity and Inclusion: At Emory BMI, we ardently believe in the power of diversity to drive innovation. We are dedicated to creating an inclusive environment that encourages participation from all backgrounds, especially aiming to empower women and minorities in science and technology. Your unique perspectives are invaluable to us, and we invite you to join our diverse community of thinkers and creators.
We have been using Slack as the primary medium of communication. Since Slack has limited the features of the Slack free version, we are slowly moving away from Slack and asking the contributors to communicate via the discussion forums of each project and this central repository instead. These discussion forums also give the contributors the potential to reach a larger audience through their public discussions, providing more transparency to our open-source development discussions.
Please refer to the contributor guidelines for more details on how to apply and a standard template for the application. The ideas list is given below.
Discuss the project on the project's discussion forum (as listed below under each project idea), and once you are ready to submit your application, use the template below. You must submit your application directly using the GSoC Program Site. If you have a project idea that is relevant for Emory Biomedical Informatics, but is not listed here, feel free to consult the mentors to discuss your idea. The ideas listed below can be open for interpretation. Feel free to discuss with the mentors for clarifications, questions, or alternative suggestions.
The ideas are marked easy, medium, and hard in difficulty level. They are also tagged 90, 175, and 350 hours. These three values represent small-size, medium, and large projects.
[1] Development of an Open-Source EEG Foundation Model
Mentor(s): Mahmoud Zeydabadinezhad (mzeydab -at- emory.edu) and Babak Mahmoudi, PhD
Overview:
Current Status: New project.
Expected Outcomes:
Required Skills:
Source Code: New Project
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions
Effort: 350 Hours
Difficulty Level: Hard
[2] Health-AI Ethics Atlas
Mentor(s): Selen Bozkurt (with possible other collaborators from Emory and Stanford)
Overview:
Key features:
Current Status: New project.
Expected Outcomes:
Required Skills:
Source Code: New Project.
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions/29
Effort: 350 Hours
Difficulty Level: Medium
[3] Python Expansion of the Open Source Electrophysiological Toolbox
Mentor(s): Reza Sameni, PhD (rsameni -at- dbmi.emory.edu)
Overview:
Current Status: Ongoing project.
Expected Outcomes:
Key features:
Required Skills:
Source Code: https://github.com/alphanumericslab/OSET
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions
Effort: 350 Hours
Difficulty Level: Medium
[4] A Framework for Unsupervised Deep Clustering
Mentor(s): Mahmoud Zeydabadinezhad (mzeydab -at- emory.edu) and Babak Mahmoudi, PhD
Overview:
The objective of this project is to develop an open-source framework that utilizes unsupervised deep-learning techniques for data clustering. The framework should be capable of handling low-resource scenarios and be able to scale to large datasets.
Current Status: New project.
Expected Outcomes: The proposed framework will utilize unsupervised deep learning techniques for data clustering. Specifically, the framework will be based on autoencoder networks, which can be trained to learn a compact, low-dimensional data representation. The encoded data will then be used as input to a clustering algorithm. To handle low-resource scenarios, the framework will also incorporate techniques such as active learning and transfer learning. The framework will be implemented in an open-source programming language, such as Python, and will be made publicly available on a platform such as GitHub.
Literature Review: Research and review existing unsupervised deep clustering methods for medical data, specifically EEG data. Identify the current limitations and challenges in this field.
Data Preprocessing: Develop preprocessing techniques to prepare EEG data for unsupervised deep clustering. This includes filtering, denoising, and feature extraction.
Clustering Framework: Design and develop an unsupervised deep clustering framework that can handle medical data, specifically EEG data. This framework should be able to extract meaningful patterns and insights from the data in an unsupervised manner.
Evaluation: Evaluate the performance of the developed framework using a variety of metrics and benchmarks. Compare the results to existing unsupervised deep clustering methods for medical data. The framework will be evaluated on publicly available electroencephalography (EEG) data. The performance of the proposed framework will be evaluated using metrics such as normalized mutual information. The framework will also be compared to existing state-of-the-art methods for unsupervised deep clustering.
Deployment: Create an open-source implementation of the developed framework and make it available for the community to use.
Applications: Explore the potential applications of the developed framework on various medical data, specifically EEG data. This includes but is not limited to, epilepsy diagnosis, brain-computer interface, and sleep stage classification.
Required Skills: Python and deep learning
Source Code: New Project
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions
Effort: 350 Hours
Difficulty Level: Hard
[5] A graphical user interface of Foundational Model Toolbox for Image Segmentation
Mentor(s): Ozgur Kara and Babak Mahmoudi, PhD
Overview
Key Features:
GUI: Integration of Foundational Models: The GUI will be designed to incorporate several foundational models, providing users with a range of options to choose from based on their specific requirements. Dataset Compatibility: It will support various datasets, including those commonly used in image segmentation tasks. The ability to handle different data formats and sizes is a crucial aspect of the GUI. Scalability to MRI Datasets: A significant feature of this project is the GUI's capability to scale and perform efficiently with MRI datasets. This involves handling high-resolution images and complex data structures typical in medical imaging. User-Centric Design: Emphasis will be placed on creating an intuitive and accessible interface. This involves clear navigation, real-time feedback mechanisms, and visualization tools that allow users to interact with the models and data effectively. Extensibility and Customization: The GUI will be designed with extensibility in mind, allowing for the future incorporation of additional models, features, and dataset types. Customization options will enable users to tailor the tool to their specific needs.
Literature Review: Comprehensive analysis of existing foundational models suitable for image segmentation. Well-Written Documentation Detailed installation and setup guide for the GUI. User manual explaining GUI features, tools, and navigation. Examples and tutorials for common tasks like loading datasets, model training, and image segmentation. Technical documentation for developers covering code structure, API usage, and extending the GUI functionality. Open Source Codes and Demo Regularly updated code repository on GitHub with source code, dependencies, and installation scripts. Demo version of the GUI showcasing its key features and capabilities.
Required Skills:
Core Technical Skills Advanced Python Programming: Proficiency in Python, with a strong understanding of software development best practices, including version control (Git), debugging, and code organization. Deep Learning Frameworks: Extensive experience with deep learning frameworks, preferably PyTorch. Foundational Model Expertise Self-Supervised Learning: Understanding of self-supervised learning principles and techniques, especially as they apply to large-scale foundational models. Foundational Model Implementation: Experience in implementing, training, and fine-tuning foundational models, with an emphasis on adaptability to various tasks.
Domain Knowledge Image Segmentation Techniques: Proficiency in image segmentation techniques, including traditional methods and deep learning approaches. Medical Image Processing: Familiarity with medical imaging datasets, particularly MRI, and understanding of unique challenges in medical image segmentation. Signal Processing: Knowledge of signal processing, particularly as it relates to image data.
Additional Skills UI/UX Design: Skills in UI/UX design for developing an intuitive and user-friendly graphical interface. Documentation and Communication: Excellent documentation skills for creating clear and comprehensive user manuals and technical guides. Effective communication skills for collaborating with team members and interacting with the open-source community. Desirable Additional Qualifications Contributions to Open Source: Previous contributions to open-source projects, especially in related domains, are highly desirable. Research Experience: Experience in research, particularly in areas related to machine learning, image segmentation, or medical imaging.
Source Code: New Project
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions
Effort: 350 Hours
Difficulty Level: Hard
[6] Auto-detect coverage bounding boxes for brain MRI images
Mentor(s): Puneet Sharma and Tony Pan (tony.pan -at- emory.edu)
Overview: In this project, we intend to assess whether or not a series of MR images encompass the anatomy of interest, specifically for brain regions. The contributor will develop a methodology to auto-detect and measure the extent of anatomical coverage on brain MR images and determine whether it complies with expected "bounding boxes" set forth by pre-defined protocol constraints. For example, a protocol may require specific anatomical coverage (e.g., top of the head to ~C2/3 vertebral bodies, with left ear-to-right ear extent), which must be met by acquired MR image data. This module will be part of a larger pipeline to assess overall quality and compliance on MRI modalities.
Current Status: This is a new module to run on DICOM images, specifically brain MRI images in DICOM format. This module will execute on DICOM images acquired in real-time or on-demand from a PACS. For testing purposes during the application period and early stages of development, brain MRI images obtained from public data sources such as the Cancer Imaging Archive (TCIA) can be used.
Expected Outcomes: We expect a versatile image processing methodology to autodetect coverage bounding boxes on brain MR images based on stated anatomic landmarks. The algorithm should output a binary compliance score based on comparing the expected "bounding box" and the actual MR image series. The algorithm will be trained on the brain (or other body parts) data and tested on more data of the same body parts for validation and accuracy.
Required Skills: Python. Prior experience with DICOM and basic human anatomy would be a plus.
Code Challenge: Experience working with machine learning libraries and similar problems is expected. Candidates are encouraged to include links/pointers to code samples or similar projects to highlight their experience in their proposal.
Source Code: New Project
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions
Effort: 90 Hours
Difficulty Level: Medium
[7] Development of a Graphical User Interface for Time Series Toolbox Using Deep Learning
Mentor(s): Ozgur Kara and Mahmoud Zeydabadinezhad (mzeydab -at- emory.edu)
Overview:
Current Status: new project.
Expected Outcomes:
Key features:
User-Friendly Interface: A primary goal is to create an intuitive interface that abstracts the complexity of underlying deep learning operations.
Deep Learning Integration: Incorporation of advanced algorithms tailored for time series data.
Automatic Feature Extraction: Automated processes for detecting and extracting significant features from time series datasets.
Visualization Tools: Tools for visualizing both the raw time series data and the features extracted, aiding in analysis and interpretation.
A report detailing the methodologies used and the performance of the model.
The model weights and code will be made publicly available on a platform such as GitHub.
Required Skills:
Source Code: new project.
Discussion Forum: https://github.com/NISYSLAB/Emory-BMI-GSoC/discussions
Effort: 350 Hours
Difficulty Level: Medium