I'm a software developer working on a system that stores thousands of independent metrics, each with several tens of thousands of timeseries data points. I'd like to make predictions about where each metrics are headed.
For each metric, I want to do two things:
- First, I want to make a prediction for as many of the observations as we can, as if I had seen only the observations up to that point.
That is, if we have a sequence of timeseries observations O = [(t0,v0),(t1,v1),(t2,v2),⋯]
, then I want to produce a new prediction series P=[(t0,p0),(t1,p1),(t2,p2),⋯]
, with the same time indices. For a given time t
, the series can only use observations that came before t
to make the prediction at t
.
The prediction is permitted to require some minimum number of observations before making any predictions -- that is, the prediction series can start at an index other than t0
(but P
can only include indices present in O
).
The time indices t0,t1,⋯
are not guaranteed to be equally spaced.
- Second, in the future, when new observations for that metric come in, I don't want to have to recompute all the previous predictions to make one more prediction.
That is, we might first receive 500 observations, make ~500 predictions, receive 3 more observations, make 3 more predictions, and so on. But I don't want the cost of making those additional 3 predictions to include recomputing the first 500 -- the cost to compute one additional prediction should, ideally, be constant (or hopefully bounded).
Finally, because I'll be building this myself, predictive accuracy is less important than implementation complexity. I'd be writing this in Python or Ruby, which have passable statistical libraries, but they don't compare to R or MATLAB and one can't generally assume that things are done for you.
Are there well-known statistical predictive models with reference implementations that satisfy these conditions?
Subreddit
Post Details
- Posted
- 9 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/AskStatisti...