Medical Datasets Save

tracking medical datasets, with a focus on medical imaging

Project README

List of Medical (Imaging) Datasets

I maintain this list mostly as a personal braindump of interesting medical datasets, with a focus on medical imaging.
Rather than try to group / cluster datasets, I'm going to try to maintain a set of keywords for each.
See commit log for a list of additions over time.

Please feel free to contribute!

Disclaimer: please remember to solve real clinical problems ☺

Datasets processed by us

Neurite-OASIS

414 T1 MRIs from the OASIS dataset, processed using FreeSurfer and SAMSEG
Includes original images, along with processed volumes and resulting anatomical segmentation maps
Keywords: large, MRI, segmentations, labels, annotations, processed

SynthStrip

Full-head images and ground-truth brain masks from 622 MRI, CT, and PET scans
Includes a landscape or MRI scans with different contrasts, resolutions, and populations from infants to glioblastoma patients
Also includes anatomical segmentation maps for a subset of the images
Keywords: large, diverse, multi-modal, brain masks, segmentations, brain extraction, skull stripping

Main Medical Imaging List

CheXpert

224,316 chest radiographs of 65,240 patients, with labels from reports
Keywords: very-large, X-ray, labels

ChestXray-NIHCC

100000 radiographs
Keywords: very-large, X-ray, labels

MIMIC-CXR

371,920 chest x-rays associated with 227,943 imaging studies
3/16/2019: Not yet linked with MIMIC ICU data. See news article
v2: free-text radiology reports
Need to request access
Keywords: very-large, X-ray, labels

PadChest

160,000 images from 67,000 patients that were interpreted and reported by radiologists
labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS)
Keywords: very-large, X-ray, labels

IBM Xray Eye Gaze

1000+ dataset of eye gaze, radiological reports, dictation, segmentation on MICMIC-CXR Database
code to reproduce experiments
Keywords: medium, X-ray, labels

Cancer Image Archive

Several collections
Tons of Images of various kinds, including CT, MR, Pathology, PT, with diagnoses
Keywords: vary-large, CT, MR, labels

National Lung Screening Trial

Part of Cancer Imaging Archive
50000+ patients with CT data, some pathology, limited availability
Keywords: very-large, CT, labels

DeepLesion

32000+ CT scans with annotations, meta-data, semantic labels from radiological reports
Keywords: very-large, CT, labels

EchoNet-Dynamic

10,000+ labeled echocardiogram videos and human expert tracing
Keywords: very-large, ultrasound, labels

ABCD Neurocognitive Prediction Challenge

MRI for 8500 young (9-10yo) subjects (about 4100 for training)
Keywords: large, MRI

AAPM Sparse-View CT Reconstruction Challenge

4,000 simulated sinogram/image pairs of 2D breast CTs Keywords: large, CT, reconstruction

Cross-Sectional Multidomain Lexical Processing

two large scale neuroimaging datasets on reading and language development
Over 3000 MRI, fMRI
article | more resources
Keywords: large, MRI

MRNet

1,370 knee MRI exams with diagonsis (healthy/ACL tear/meniscal tear)
Keywords: large, MRI, labels

fastMRI

k-space data
1500 fully sample knee MRIs and 10K clinical MRIs, and 6.5K brain MRIs.
Part of a challenge
Keywords: large, MRI, k-space

OCMR

Open-Access Multi-Coil k-Space Dataset for Cardiovascular Magnetic Resonance Imaging
k-space data, roughly 250 volumes
Keywords: medium, MRI, k-space

PREVENT-AD

1704 MRI, 556 amyloid and tau CSF samples, blood markers, genetic info and longitudinal cognitive data on ~400 at risk individuals
Keywords: medium, MRI, genetics, labels

Medical Segmentation Decathlon

10 Medical image datasets with segmentations
2000+ CT & MR images of various organs from different sources
Keywords: medium, MRI, segmentations

MASSIVE

Multiple Acquisitions for Standardization of Structural Imaging Validation and Evaluation
8000 diffusion-weighted volumes
10 3D FLAIR, T1-, and T2-weighted datasets of a single healthy subject
Keywords: large, MRI

AOMIC: the Amsterdam Open MRI Collection

1000+ fMRI and other modalities subjects with annotated event files; raw and preprocessed
Keywords: medium, fMRI

MRIdata

List of mri k-space datasets

Cancer Imaging Archive: LDCT

601 series of CT projection data, reconstructed images, and clinical data reports Keywords: medium, CT, reconstruction

Brain MRI LGG FLAIR abnormality segmentation

Brain MRI images together with manual FLAIR abnormality segmentation masks
110 subjects from TCIA LGG collection with lower-grade glioma cases
Keywords: medium, brain, MRI, segmentation, LGG, FLAIR

Studyforrest

Few subjects, but many modalities (T1,T2,SWI,Angio,DWI, fMRI during Forrest Gump at 3T (audio+visual+eyetracking+physio) and 7T (audio+physio only), some audio tasks, and other important visual tasks)
Keywords: small, multi-modal

Lung Image Database Consortium

LIDC-IDRI consists of diagonstic and lung cancer screening CTs.
1018 cases with some Radiologist Annotations/Segmentations and nodule counts
Also available through LUng Nodule Analysis (LUNA) challenge
Keywords: large, CT, labels

Breast Cancer MRI Dataset

922 breast cancer patients publicly available for machine learning and clinical research.
Contains breast MRIs, clinical, demographics, pathology, treatment, outcomes, and genomic data as well as image annotations (locations) and features.
Keywords: large, MRI, labels

UK Biobank

All imaging
Fundus imaging
Keywords: very-large

OpenOrganelle

high resolution tissue-scale volume electron microscopy (vEM) datasets acquired with the enhanced focused ion beam scanning electron microscopy (FIB-SEM) technology developed at Janelia. Accompanying these EM volumes are automated segmentations and analyses of intracellular sub-structures.
Keywords: very large, EM, segmentation

BrixIA: COVID19 severity score assessment databse

4703 CXR of COVID19 patients, manually annotated Brixia score
Keywords: large, x-ray, covid

COVID-CT

349 CT images collected from several COVID19-related papers
Image captions
Keywords: medium, CT, covid

Penumonia X-Ray

~5000 xrays
Keywords: medium, x-ray, pneumonia

Medical Imaging Data Resource Center (MIDRC)

998 Chest x-ray examinations from 361 COVID+ patients. Annotations with appearance classification and Airspace Disease Grading Clinical variables Keywords: large, x-ray, covid

BIMCV-COVID19

1350+ Xrays, 150+ CTs, 800 diagnoses
Keywords: medium, CT, covid

MosMedData Covid19

1000+ CTs of COVID19 patients
50 are annotated per pixel
Keywords: large, CT, covid, segmentations

COVID-19 LUNG CT LESION SEGMENTATION CHALLENGE

~250 chest CTs with positive RT-PCR SARS-CoV-2, annotations of COVID-19 lesions Keywords: medium, CT, covid, annotations, segmentations

MedSeg COVID-19 CT

~100 segmented CT slices
Keywords: medium, CT, segmentations, covid

COVID-Chest XRay

~150 xrays, ongoing, some hospital data
Keywords: medium, x-ray, covid

BSTI COVID19

ongoing, about 60 patients at last check, CT
paper pdf
Keywords: medium, CT, covid

Narratives fMRI

345 subjects, 891 functional scans, and 27 diverse stories of varying duration totaling ~4.6 hours of unique stimuli (~43,000 words).
Nature paper
Keywords: medium, fMRI

RICORD

1000 X-rays and 240 CTs with annotations (paper)
Keywords: large, CT, covid, segmentations

FIRE (Fundus Image Registration Dataset)

129 retinal images.
Keywords: small, fundus

DRIVE: Digital Retinal Images for Vessel Extraction

40 retinal images with segmentations
Keywords: small, retinal, segmentations

FLARE: Fast and Low GPU memory Abdominal oRgan sEgmentation

500+ CT scans from 11+ countries with Abdominal Organ Segmentation (the liver, kidney, spleen, and pancreas)
Keywords: large, abdominal, CT

ADNI

Various imaging (longitudinal MRI), Genetics, Clinical data
Several thousand patients
Keyworks: large, MRI, genetics, clinical

VISCERAL

~120 image volumes (whole body CT and MRI images)
more than 1900 annotated anatomical structures
Keywords: medium, MRI, CT, whole-body, manual-segmentation

Mindboggle

Seems like 101 manually labelled brain MRIs
Keywords: medium, MRI, brain, manual-segmentation

Cross-Sectional Multidomain Lexical Processing

3000 brain scans (T1w, bold, events)
Standardized tests, scores, demographics
Keywords: large, MRI, fMRI, tests

Duke Breast Cancer Screening DBT

A curated dataset of digital breast tomosynthesis images from 5,060 patients.
Keywords: large, tomosynthesis, DBT, breast, detection

CBIS-DDSM (Curated Breast Imaging Subset of DDSM)

2600+ scanned film mammography studies
Keywords: large, x-ray

Neuromorphometrics

63 manually labelled brain scans. Costs ($1500?) Discussion
Keywords: medium, MRI, brain, manual-segmentation, costly

Automatic Non-rigid Histological Image Registration

This is a challenge for ISBI2019

7-Tesla rs-fMRI

22 particiapnts with cognitive and physiological mreasures, and 7T rs-fMRI

SpineWeb

200+ subjects across several datasets (CTs, Xrays, MRIs)

Whole-Heart and Great Vessel Segmentation from 3D Cardiovascular MRI in Congenital Heart Disease

20 cardiac MR images in Congenital Heart Disease

Longitudinal Neuroimaging in Children

paper
~50 children (~10yo) with single follow-up with MRI, fMRI and assesments
Keywords: medium, fMRI, longitudinal

Longitudinal Neuroimaging on arithmetic processing in children

paper
3T fMRI 132 typical dev children, 2 time points, four tasks
Keywords: medium, fMRI, longitudinal

Narratives

aggregates auditory story-listening fMRI datasets acquired over the course of roughly seven years
Keywords: medium, fMRI

ATLAS: Anatomical Tracings of Lesions After Stroke

229 T1-weighted MRI scans (n=220) with lesion segmentation
MNI152 standard-space T1-weighted average structural template image
A .csv file containing lesion metadata
paper
Keywords: medium, MRI, segmentations

MITOS_WSI_CMC

21 Canine mammary carcinoma whole slide images.
Annotated by 2/3 experts Keywords: small, 2D, whole slide imaging

FeTA Dataset

48 manually annotated in utero fetal MR
Keywords: small, mri, fetal, labels

SIMON

Single voluneer, 73 Sessions at multiple sites over ~17 years
MRI, at least T1 at each session, with other modalities varying by session.
Phenotype file provided
Keywords: small, MRI, longitudinal

BigBrain

Single volume, histological space , 100 micron) with GM/WM surfaces and cortical layers
ftp://bigbrain.loris.ca | interactive
Keywords: small, histology, high-resolution, segmentations

100 micron MRI of Human Brain

Single volume, ultra-high resolution MRI dataset (100-micron)
Keywords: small, MRI, brain

Natural Scenes Dataset (CMRR initiative)

8-subjects large-scale fMRI (40-sessions, high sampling, high resolution). T1w, T2w, T2*w MRI
Video description
Keywords: small, MRI, brain, fMRI

Brain Catalogue

(ex-vivo) brain MRIs or brains of different animals
Keywords: small, MRI, brain, animals

Multishell diffusion

Three Diffusion of healthy traveling adults
Keywords: small, MRI, diffusion, brain

Pre-Natal MRI

Prenatal brain MRI samples (looks like single subject?)
Keywords: small, MRI, fetal

BCNB: Early Breast Cancer Core-Needle Biopsy WSI Dataset

1058 wholes slide images (WSIs) with corresponding clinical characteristics
Part of tumor regions are annotated in WSIs.
Clinical characteristics include age, tumor size, tumor type, ER, PR, HER2, HER2 expression, histological grading, surgical, Ki67, molecular subtype, number of lymph node metastases, and ALN status
Paper reference
Keywords: large, breast cancer, multi-modal, WSI, clinical characteristics

BCI: Breast Cancer Immunohistochemical Image Generation Dataset

4870 registered HE-IHC image pairs, covering four expression levels of HER2 (0, 1+, 2+, 3+).
Keywords: large, breast cancer, HE, IHC

Non-imaging

PhysioNet / Pulmonary Edema Severity Grades Based on MIMIC-CXR

This dataset is curated based on MIMIC-CXR, containing 3 metadata files that consist of pulmonary edema severity grades extracted from the MIMIC-CXR dataset through different means: 1) by regular expression (regex) from radiology reports, 2) by expert labeling from radiology reports, and 3) by consensus labeling from chest radiographs.
Keywords: pulmonary edema, severity grades, chest x-ray, radiology reports, MIMIC-CXR

PhysioNet / Computing in Cardiology 2019 Challenge

predict sepsis in an ICU population
5000 ICU patients in three separate hospital systems

eICU-CRD

detailed information about critical care stays for over 200,000 admissions at 200+ hospitals across the US.
With access to MIMIC, can access eICU-CRD immediately after signing an updated DUA.
paper

Non-medical but useful / fun

Moment in time

Other lists or pooling resources (relevant xkcd)

Open Source Agenda is not affiliated with "Medical Datasets" Project. README Source: adalca/medical-datasets
Stars
720
Open Issues
5
Last Commit
4 months ago

Open Source Agenda Badge

Open Source Agenda Rating