Conformal Time Series Save

Conformal prediction for time-series applications.

Project README

Conformal PID Control for Time-Series Prediction

This repository is about producing prediction sets for time series.

The methods here are guaranteed to have coverage for any, possibly adversarial sequence. We take a control systems outlook on performing this task, introducing a method called Conformal PID Control.

Several methods are implemented herein, including online quantile regression (quantile tracking/P control), adaptive conformal prediction, and more!

This codebase makes it easy to extend the methods/add new datasets. We will describe how to do so below.

Getting Started

To reproduce the experiments in our paper, clone this repo and run the following code from the root directory.

conda create --name pid
pip install -r requirements.txt
cd tests
bash run_tests.sh
bash make_plots.sh

The one exception is the COVID experiment. For that experiment, you must first run the jupyter notebook in conformal-time-series/tests/datasets/covid-ts-proc/statewide /death-forecasting-perstate-lasso-qr.ipynb. It requires the deaths.csv data file, which you can download from this Drive link.

Adding New Methods

Step 1: Defining the method.
The core/methods.py file contains all methods. Consider the following method header, for the P controller/quantile tracker, as an example:
def quantile(
    scores,
    alpha,
    lr,
    ahead,
    proportional_lr=True,
    *args,
    **kwargs
):

The first three arguments, scores, alpha, and lr, are required arguments for all methods. The first argument, scores, expects a numpy array of conformal scores. The second argument, alpha, is the desired miscoverage. Finally, the third argument, lr, is the learning rate. (In our paper, this is $\eta$, and in the language of control, this is $K_p$.)

The rest of the arguments listed are required arguments specific to the given method. The argument ahead determines how many steps ahead the prediction is made --- for example, if ahead=4, that means we are making 4-step-ahead predictions (one step is defined by the resolution of the input array scores). The function of *args and **kwargs is to allow methods to take arguments given in a dictionary form.

All methods should return a dictionary of results that includes the method name and the sequence of $q_{t}$. In the quantile example case, the dictionary should look like the following, where qs is a numpy array of quantiles the same length as scores: results = {"method": "Quantile", "q" : qs} Methods that follow this formatting will be able to be processed automatically by our testing infrastructure.

Step 2: Edit the config to include your method.
Tl;Dr: go to each config in tests/configs, and add a line under methods: for each method you want to run, along with what learning rates to test. The below example, from tests/configs/AMZN.yaml, will ask the testing suite to run the quantile tracker on the Amazon stock price dataset with five different learning rate choices.
  Quantile:
    lrs:
      - 1
      - 0.5
      - 0.1
      - 0.05
      - 0

As background, this is part of our little testing infrastructure for online conformal. The infrastructure spawns a parallel process for every dataset, making it efficient to test one method on all datasets with only one command (the command to run the tests is bash run_tests.sh, and to plot the results is bash make_plots.sh).

The infrastructure works like this.

  • The user defines a file in tests/configs/ describing an experiment, i.e., a dataset name and a combination of methods and settings for each method to run.
  • The script tests/run_tests.sh calls tests/base_test.py on every .yaml file in the tests/configs directory.
  • The script tests/make_plots.sh calls inset_plot.py
  • and base_plots.py to produce the plots in the main text and appendix of our paper, respectively.
Step 3: Edit base_test.py to include your method.
The code in line 5 of base_test.py imports all the methods --- import yours as well. Then add your method to the big if/else block starting on line 103.

Adding New Datasets

Step 1: Load and preprocess the dataset.

First, download your dataset and put it in tests/datasets. Then, edit the tests/datasets.py file to add a name for your dataset and some processing code for it. The dataset should be a pandas dataframe with a valid datetime index (it has to be evenly spaced, and correctly formatted with no invalid values), and at least one column simply titled y. This column represents the target value.

Alternatively, including a column titled forecasts or scorecasts will cause the infrastructure to use these forecasts/scorecasts instead of the ones it would have produced on its own. This is useful if you have defined a good forecaster/scorecaster outside our framework, and you just want to use our code to run conformal on top of that. Extra columns can be used to add information for more complex forecasting/scorecasting strategies.

Step 2: Create a config file for the dataset.
As mentioned above, a config file should be made for each dataset, describing what methods should be run with what parameters. The example of tests/configs/AMZN.yaml can be followed.

After executing these two steps, you should be able to run python base_test.py configs/your_dataset.yaml and the results will be computed! Alternatively, you can just execute the bash scripts.

Workarounds for Known Bugs

On M1/M2 Mac, in order to use Prophet, follow the instructions at this link: https://github.com/facebook/prophet/issues/2250.
Open Source Agenda is not affiliated with "Conformal Time Series" Project. README Source: aangelopoulos/conformal-time-series

Open Source Agenda Badge

Open Source Agenda Rating