The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
.. readme-intro
.. image:: https://www.repostatus.org/badges/latest/active.svg :target: https://www.repostatus.org/#active :alt: Project Status: Active
.. image:: https://github.com/catalyst-cooperative/pudl/workflows/pytest/badge.svg :target: https://github.com/catalyst-cooperative/pudl/actions?query=workflow%3Apytest :alt: PyTest Status
.. image:: https://img.shields.io/codecov/c/github/catalyst-cooperative/pudl?style=flat&logo=codecov :target: https://codecov.io/gh/catalyst-cooperative/pudl :alt: Codecov Test Coverage
.. image:: https://img.shields.io/readthedocs/catalystcoop-pudl?style=flat&logo=readthedocs :target: https://catalystcoop-pudl.readthedocs.io/en/latest/ :alt: Read the Docs Build Status
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black :alt: Any color you want, so long as it's black.
.. image:: https://results.pre-commit.ci/badge/github/catalyst-cooperative/pudl/main.svg :target: https://results.pre-commit.ci/latest/github/catalyst-cooperative/pudl/main :alt: pre-commit CI
.. image:: https://zenodo.org/badge/80646423.svg :target: https://zenodo.org/badge/latestdoi/80646423 :alt: Zenodo DOI
.. image:: https://img.shields.io/badge/calend.ly-officehours-darkgreen :target: https://calend.ly/catalyst-cooperative/pudl-office-hours :alt: Schedule a 1-on-1 chat with us about PUDL.
The PUDL <https://catalyst.coop/pudl/>
__ Project is an open source data processing
pipeline that makes US energy data easier to access and use programmatically.
Hundreds of gigabytes of valuable data are published by US government agencies, but it's often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.
The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch.
We want to make this data accessible and easy to work with for as wide an audience as possible: anyone from a grassroots youth climate organizers working with Google sheets to university researchers with access to scalable cloud computing resources and everyone in between!
PUDL is comprised of three core components:
Raw Data Archives
^^^^^^^^^^^^^^^^^
PUDL archives <https://github.com/catalyst-cooperative/pudl-archiver>
__ all our raw
inputs on Zenodo <https://zenodo.org/communities/catalyst-cooperative/?page=1&size=20>
__ to ensure
permanent, versioned access to the data. In the event that an agency changes how they
publish data or deletes old files, the data processing pipeline will still have access
to the original inputs. Each of the data inputs may have several different versions
archived, and all are assigned a unique DOI (digital object identifier) and made
available through Zenodo's REST API. You can read more about the Raw Data Archives in
the docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/#raw-data-archives>
__.
Data Pipeline
^^^^^^^^^^^^^
The data pipeline (this repo) ingests raw data from the archives, cleans and integrates
it, and writes the resulting tables to SQLite <https://sqlite.org>
__ and Apache Parquet <https://parquet.apache.org/>
__ files, with some acompanying metadata stored as
JSON. Each release of the PUDL software contains a set of of DOIs indicating which
versions of the raw inputs it processes. This helps ensure that the outputs are
replicable. You can read more about our ETL (extract, transform, load) process in the
PUDL documentation <https://catalystcoop-pudl.readthedocs.io/en/nightly/#the-etl-process>
__.
Data Warehouse
^^^^^^^^^^^^^^
The SQLite, Parquet, and JSON outputs from the data pipeline, sometimes called "PUDL
outputs", are updated each night by an automated build process, and periodically
archived so that users can access the data without having to install and run our data
processing system. These outputs contain hundreds of tables and comprise a small
file-based data warehouse that can be used for a variety of energy system analyses.
Learn more about how to access the PUDL data <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html>
__.
PUDL currently integrates data from:
Source Docs <https://www.eia.gov/electricity/data/eia860/>
__PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia860.html>
__Source Docs <https://www.eia.gov/electricity/data/eia860m/>
__Source Docs <https://www.eia.gov/electricity/data/eia861/>
__PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia861.html>
__Source Docs <https://www.eia.gov/electricity/data/eia923/>
__PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia923.html>
__Source Docs <https://campd.epa.gov/>
__PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/epacems.html>
__Source Docs <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-1-electric-utility-annual>
__PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/ferc1.html>
__Source Docs <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-no-714-annual-electric/data>
__PUDL Docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/ferc714.html>
__Source Docs <https://www.ferc.gov/industries-data/natural-gas/industry-forms/form-2-2a-3-q-gas-historical-vfp-data>
__Source Docs <https://www.ferc.gov/general-information-1/oil-industry-forms/form-6-6q-historical-vfp-data>
__Source Docs <https://www.ferc.gov/form-60-annual-report-centralized-service-companies>
__Source Docs <https://www.census.gov/geographies/mapping-files/2010/geo/tiger-data.html>
__Thanks to support from the Alfred P. Sloan Foundation Energy & Environment Program <https://sloan.org/programs/research/energy-and-environment>
__, from
2021 to 2024 we will be cleaning and integrating the following data as well:
EIA Form 176 <https://www.eia.gov/dnav/ng/TblDefs/NG_DataSources.html#s176>
__
(The Annual Report of Natural Gas Supply and Disposition)FERC Electric Quarterly Reports (EQR) <https://www.ferc.gov/industries-data/electric/power-sales-and-markets/electric-quarterly-reports-eqr>
__FERC Form 2 <https://www.ferc.gov/industries-data/natural-gas/overview/general-information/natural-gas-industry-forms/form-22a-data>
__
(Annual Report of Major Natural Gas Companies)PHMSA Natural Gas Annual Report <https://www.phmsa.dot.gov/data-and-statistics/pipeline/gas-distribution-gas-gathering-gas-transmission-hazardous-liquids>
__For details on how to access PUDL data, see the data access documentation <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html>
__. A quick
summary:
Datasette <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#-access-datasette>
__
provides browsable and queryable data from our nightly builds on the web:
https://data.catalyst.coop
Kaggle <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-kaggle>
__
provides easy Jupyter notebook access to the PUDL data, updated weekly:
https://www.kaggle.com/datasets/catalystcooperative/pudl-project
Zenodo <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-zenodo>
__
provides stable long-term access to our versioned data releases with a citeable DOI:
https://doi.org/10.5281/zenodo.3653158
Nightly Data Builds <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-nightly-builds>
__
push their outputs to the AWS Open Data Registry:
https://registry.opendata.aws/catalyst-cooperative-pudl/
See the nightly build docs <https://catalystcoop-pudl.readthedocs.io/en/nightly/data_access.html#access-nightly-builds>
__
for direct download links.The PUDL Development Environment <https://catalystcoop-pudl.readthedocs.io/en/nightly/dev/dev_setup.html>
__
lets you run the PUDL data processing pipeline locally.Find PUDL useful? Want to help make it better? There are lots of ways to help!
contribution guide <https://catalystcoop-pudl.readthedocs.io/en/nightly/CONTRIBUTING.html>
__
including our Code of Conduct <https://catalystcoop-pudl.readthedocs.io/en/nightly/code_of_conduct.html>
__.Github issue tracker <https://github.com/catalyst-cooperative/pudl/issues>
__.Make a recurring financial contribution <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=PZBZDFNKBJW5E&source=url>
__
to support our work liberating public energy data.Hire us to do some custom analysis <https://catalyst.coop/hire-catalyst/>
__ and
allow us to integrate the resulting code into PUDL.In general, our code, data, and other work are permissively licensed for use by anybody, for any purpose, so long as you give us credit for the work we've done.
the MIT License <https://opensource.org/licenses/MIT>
__.Creative Commons Attribution License v4.0 <https://creativecommons.org/licenses/by/4.0/>
__
(CC-BY-4.0).GitHub Issue <https://github.com/catalyst-cooperative/pudl/issues>
__.GitHub Discussions <https://github.com/catalyst-cooperative/pudl/discussions>
__sign up for our email list <https://catalyst.coop/updates/>
__.Office Hours <https://calend.ly/catalyst-cooperative/pudl-office-hours>
__Follow us here on GitHub <https://github.com/catalyst-cooperative/>
__@[email protected] <https://mastodon.energy/@CatalystCoop>
__@catalyst.coop <https://bsky.app/profile/catalyst.coop>
__Follow us on LinkedIn <https://www.linkedin.com/company/catalyst-cooperative/>
__Follow us on HuggingFace <https://huggingface.co/catalystcooperative>
__@CatalystCoop <https://twitter.com/CatalystCoop>
__Follow us on Kaggle <https://www.kaggle.com/catalystcooperative/>
__[email protected] <mailto:[email protected]>
__Catalyst Cooperative <https://catalyst.coop>
__ is a small group of data wranglers
and policy wonks organized as a worker-owned cooperative consultancy. Our goal is a
more just, livable, and sustainable world. We integrate public data and perform
custom analyses to inform public policy
(Hire us! <https://catalyst.coop/hire-catalyst>
__). Our focus is primarily on
mitigating climate change and improving electric utility regulation in the United
States.