Open source repo: Class wrapper for sci-kit’s predict and predict proba for time series backtesting

Github source code here

our repo contains a work in progress open source project which we plan to develop into a library

a python class wrapper for scikit-learn predict and predict_proba for walk forward backtesting in time series analysis

python’s scikit-learn classifiers have predict_proba and predict functions which can apply a fitted (trained) model against the input / training dataset

however, for purposes of time series, running the final trained model back against the full dataset is not “as of” – the final trained model allows for leakage of future knowledge into past results simply because the model’s been trained on all the data

rather, a true time series approach would fit models in recordset series – that is, apply the model up to a certain day, yield a result, store the result, then fit for the next day, and so on and then assemble the final results together

(in an effort to reduce the O, or runtime greediness of this approach, the Singleton class inherits from multiprocess Process)

and, this is a work in process…

Example: the standard predict and predict proba back-propogates by default

versus predict and predict proba “as of” in a recordset walkforward approach

as you can see, the differences are material.

to do:

  • allow for user input for ‘score’
  • allow for feature engineering inside the class
  • convert to a pip installable library module
  • test Process inheritance for runtime