Benchmark datasets, data loaders, and evaluators for graph machine learning
This release introduces the following two:
ogbl-vessel
dataset (described here) @jqmcginnisWe have included two updates:
Thanks to the DGL Team, all the LSC data is now hosted on AWS. This significantly improves the download speed around the globe! The underlying data stays exactly the same.
This release includes the three large-scale datasets for OGB-LSC at KDD Cup 2021. Details of the datasets and the KDD Cup can be found here.
The dataset downloading now uses http instead of https.
This version provides a major change in ogbg-code
.
ogbg-code
has been deprecated due to prediction target (i.e., method name) leakage in input AST.ogbg-code2
has been introduced that fixes the issue., where the method name and its recursive definition in AST are replaced with a special token _mask_
.We sincerely thank Charles Sutton (@casutton) for finding the data leakage in our dataset.
This release fixes the dataset bug in negative samples in ogbl-wikikg
and ogbl-citation
and releases new versions of them: ogbl-wikikg2
and ogbl-citation2
. The old versions are deprecated.
This release enhances the OGB package in the following ways.
ogbn-papers100M
data loading more tractable by using compressed binary files https://github.com/snap-stanford/ogb/issues/46