Partridge Save

A fast, forgiving GTFS reader built on pandas DataFrames

Project README

========= Partridge

.. image:: https://img.shields.io/pypi/v/partridge.svg :target: https://pypi.python.org/pypi/partridge

.. image:: https://img.shields.io/travis/remix/partridge.svg :target: https://travis-ci.org/remix/partridge

Partridge is a Python 3.6+ library for working with GTFS <https://developers.google.com/transit/gtfs/>__ feeds using pandas <https://pandas.pydata.org/>__ DataFrames.

Partridge is heavily influenced by our experience at Remix <https://www.remix.com/>__ analyzing and debugging every GTFS feed we could find.

At the core of Partridge is a dependency graph rooted at trips.txt. Disconnected data is pruned away according to this graph when reading the contents of a feed.

Feeds can also be filtered to create a view specific to your needs. It's most common to filter a feed down to specific dates (service_id) or routes (route_id), but any field can be filtered.

.. figure:: dependency-graph.png :alt: dependency graph

Philosophy

The design of Partridge is guided by the following principles:

As much as possible

  • Favor speed
  • Allow for extension
  • Succeed lazily on expensive paths
  • Fail eagerly on inexpensive paths

As little as possible

  • Do anything other than efficiently read GTFS files into DataFrames
  • Take an opinion on the GTFS spec

Installation

.. code:: console

pip install partridge

GeoPandas support

.. code:: console

pip install partridge[full]

Usage

Setup

.. code:: python

import partridge as ptg

inpath = 'path/to/caltrain-2017-07-24/'

Examples

The following is a collection of gists containing Jupyter notebooks with transformations to GTFS feeds that may be useful for intake into software applications.

  • Find the busiest week in a feed and reduce its file size <https://gist.github.com/csb19815/aadef16178dfcb5ba7a8d88fbf718749>_
  • Combine routes by route_short_name <https://gist.github.com/csb19815/67c0247d1eed2286ca0b323a02a1179f>_
  • Merge GTFS with shapefile geometries <https://gist.github.com/csb19815/535ddb5d36a081abac3430f1a58bd875>_
  • Merge multiple agencies into one <https://gist.github.com/csb19815/682e0f6f30844313213fa5715e48df8c>_
  • Rewrite a feed to clean up formatting issues <https://gist.github.com/csb19815/659c8eba4742cc3f1b8f23d66a760a0c>_
  • If a feed has stop_code, replace the contents of stop_id with the contents of stop_code <https://gist.github.com/csb19815/5bf7923ffb1ce7ec155ac9a94a83ea70>_
  • Diff the number of service hours in two feeds <https://gist.github.com/csb19815/476335cb299ddb3d5a1a4b898424bb35>_
  • Investigate the the distance in meters of each stop to the closest point on a shape <https://gist.github.com/sgoel/bff9384129974967817404abe80e7c6a>_
  • Convert frequencies.txt to an equivalent trips.txt <https://gist.github.com/invisiblefunnel/6c9f3a9b537d3f0ad192c24777b6ae57>_
  • Calculate headway for a stop <https://gist.github.com/invisiblefunnel/6015e65684325281e65fa9339a78229b>_

Inspecting the calendar



**The date with the most trips**

.. code:: python

    date, service_ids = ptg.read_busiest_date(inpath)
    #  datetime.date(2017, 7, 17), frozenset({'CT-17JUL-Combo-Weekday-01'})


**The week with the most trips**


.. code:: python

    service_ids_by_date = ptg.read_busiest_week(inpath)
    #  {datetime.date(2017, 7, 17): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 18): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 19): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 20): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 21): frozenset({'CT-17JUL-Combo-Weekday-01'}),
    #   datetime.date(2017, 7, 22): frozenset({'CT-17JUL-Caltrain-Saturday-03'}),
    #   datetime.date(2017, 7, 23): frozenset({'CT-17JUL-Caltrain-Sunday-01'})}


**Dates with active service**

.. code:: python

    service_ids_by_date = ptg.read_service_ids_by_date(path)

    date, service_ids = min(service_ids_by_date.items())
    #  datetime.date(2017, 7, 15), frozenset({'CT-17JUL-Caltrain-Saturday-03'})

    date, service_ids = max(service_ids_by_date.items())
    #  datetime.date(2019, 7, 20), frozenset({'CT-17JUL-Caltrain-Saturday-03'})


**Dates with identical service**


.. code:: python

    dates_by_service_ids = ptg.read_dates_by_service_ids(inpath)

    busiest_date, busiest_service = ptg.read_busiest_date(inpath)
    dates = dates_by_service_ids[busiest_service]

    min(dates), max(dates)
    #  datetime.date(2017, 7, 17), datetime.date(2019, 7, 19)


Reading a feed
~~~~~~~~~~~~~~


.. code:: python

    _date, service_ids = ptg.read_busiest_date(inpath)

    view = {
        'trips.txt': {'service_id': service_ids},
        'stops.txt': {'stop_name': 'Gilroy Caltrain'},
    }

    feed = ptg.load_feed(path, view)


**Read shapes and stops as GeoDataFrames**

.. code:: python

    service_ids = ptg.read_busiest_date(inpath)[1]
    view = {'trips.txt': {'service_id': service_ids}}

    feed = ptg.load_geo_feed(path, view)

    feed.shapes.head()
    #       shape_id                                           geometry
    #  0  cal_gil_sf  LINESTRING (-121.5661454200744 37.003512297983...
    #  1  cal_sf_gil  LINESTRING (-122.3944115638733 37.776439059278...
    #  2   cal_sf_sj  LINESTRING (-122.3944115638733 37.776439059278...
    #  3  cal_sf_tam  LINESTRING (-122.3944115638733 37.776439059278...
    #  4   cal_sj_sf  LINESTRING (-121.9031703472137 37.330157067882...

    minlon, minlat, maxlon, maxlat = feed.stops.total_bounds
    #  -122.412076, 37.003485, -121.566088, 37.77639


Extracting a new feed
~~~~~~~~~~~~~~~~~~~~~

.. code:: python

    outpath = 'gtfs-slim.zip'

    service_ids = ptg.read_busiest_date(inpath)[1]
    view = {'trips.txt': {'service_id': service_ids}}

    ptg.extract_feed(inpath, outpath, view)
    feed = ptg.load_feed(outpath)

    assert service_ids == set(feed.trips.service_id)


Features
--------

-  Surprisingly fast :)
-  Load only what you need into memory
-  Built-in support for resolving service dates
-  Easily extended to support fields and files outside the official spec
   (TODO: document this)
-  Handle nested folders and bad data in zips
-  Predictable type conversions

Thank You
---------

I hope you find this library useful. If you have suggestions for
improving Partridge, please open an `issue on
GitHub <https://github.com/remix/partridge/issues>`__.
Open Source Agenda is not affiliated with "Partridge" Project. README Source: remix/partridge
Stars
146
Open Issues
6
Last Commit
5 months ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating