This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Given a variance σ2 for a sample of size N, I would like to remove a single data point, resulting in a subsample of size N-1. I want to do this without rerunning the whole algorithm to compute the variance.
There is Welford's method which allows you to add a data point without recalculating the whole thing (aka a running variance algorithm). This is proven to be numerically stable.
However, there is not much online about using Welford's method to remove data points. I have found this which discusses how one would remove a data point similarly to how you would add one. But, there is no proof that the algorithm mentioned in that StackOverflow article is numerically stable. When I run tests in Python and Numpy, I get a different answer than expected. This makes me think that this method is not numerically stable.
The problem is, I have very little knowledge of numerical stability. Looking into it, it seems that due to the subtraction in the method, this could result in catastrophic cancellation, or something like that.
So I have the following questions:
- Is the method described numerically stable?
- Does the method result in something like catastrophic cancellation? And
- If it is not numerically stable, how can I go about finding an algorithm that is in fact numerically stable? Is this necessarily possible?
Edit: It looks like there were a few bugs in my code, and the code seems to have the same output as the standard numpy implementation! At least for me, that's good enough!
Post Details
- Posted
- 8 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/math/commen...