Numerically Stable Method for Removing Data From A Running Variance Algorithm

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

Given a variance σ² for a sample of size N, I would like to remove a single data point, resulting in a subsample of size N-1. I want to do this without rerunning the whole algorithm to compute the variance.

There is Welford's method which allows you to add a data point without recalculating the whole thing (aka a running variance algorithm). This is proven to be numerically stable.

However, there is not much online about using Welford's method to remove data points. I have found this which discusses how one would remove a data point similarly to how you would add one. But, there is no proof that the algorithm mentioned in that StackOverflow article is numerically stable. When I run tests in Python and Numpy, I get a different answer than expected. This makes me think that this method is not numerically stable.

The problem is, I have very little knowledge of numerical stability. Looking into it, it seems that due to the subtraction in the method, this could result in catastrophic cancellation, or something like that.

So I have the following questions:

Is the method described numerically stable?
Does the method result in something like catastrophic cancellation? And
If it is not numerically stable, how can I go about finding an algorithm that is in fact numerically stable? Is this necessarily possible?

Edit: It looks like there were a few bugs in my code, and the code seems to have the same output as the standard numpy implementation! At least for me, that's good enough!

Author

Account Strength

100%

Account Age

10 years

Verified Email

Yes

Verified Flair

Total Karma

23,277

Link Karma

1,445

Comment Karma

21,421

Profile updated: 10 hours ago

Posts updated: 1 day ago

dyingpie1

Subreddit

r/math

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 8 months ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/math/commen...