by Nick
Posted on 17 June, 2019
As I’ve been writing up a progress report for my NIGMS R35 MIRA award, I’ve been reminded at how much of the work that we’ve been doing is focused on forecasting infrastructure. A common theme in the Reich Lab is making operational forecasts of infectious disease outbreaks. The operational aspect means that we focus on everything from developing and adapting statistical methods to be used in forecasting applications to thinking about the data science toolkit that you need to store, evaluate, and visualize forecasts. To that end, in addition to working closely with the CDC in their FluSight initiative, we’ve been doing a lot of collaborative work on new R packages and data repositories that I hope will be useful beyond the confines of our lab. Some of these projects are fully operational, used in our production flu forecasts for CDC, and some have even gone through a level of code peer review. Others are in earlier stages of development. My hope is that in putting this list out there (see below the fold) we will generate some interest (and possibly find some new open-source collaborators) for these projects.
Here is a partial list of in-progress software that we’ve been working on:
sarimaTD
is an R package that serves as a wrapper to some of the ARIMA modeling functionality in the forecast
R package. We found that we consistently wanted to be specifying some transformations (T) and differencing (D) in specific ways that we have found useful in modeling infectious disease time-series data, so we made it easy for us and others to use specifications.ForecastFramework
is an R package that we have collaborated on with our colleagues at the Johns Hopkins Infectious Disease Dynamics lab. We’ve blogged about this before, and we see a lot of potential in this object-oriented framework for both standardizing how datasets are specified/accessed and how models are generated. That said, there still is a long ways to go to document and make this infrastructure usable by a wide audience. The most success I’ve had using it so far was having PhD students write forecast models for a seminar I taught this spring. I used a single script that could run and score forecasts from each model, with a very simple plug-and-play interface to the models because they had been specified appropriately.predx
integration on the way (see next bullet) we are hoping that this will broaden the scope of possible use cases for Zoltar. To help facilitate our and others use of Zoltar, we are working on two interfaces to the Zoltar API, zoltpy
for python and zoltr
for R. Check out the documentation, especially for zoltr
. There is quite a bit of data available!predx
is an R package designed my colleague and friend Michael Johansson of the US Centers for Disease Control and Prevention and OutbreakScience. Evan Ray, from the Reich Lab team, has contributed to it as well. The goal of predx
is to define some general classes of data for both probabilistic and point forecasts, to better standardize ways that we might want to store and operate on these data.d3-foresight
is the main engine behind our interactive forecast visualizations for flu in the US. We have also integrated it with Zoltar, so that you can view forecasts stored in Zoltar (note, kind of a long load time for that last link) using some of the basic d3-foresight
functionality.The lion’s share of the credit for all of the above are due to some combination of Matthew Cornell, Abhinav Tushar, Katie House, and Evan Ray.