Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

15
Numerically Stable Method for Removing Data From A Running Variance Algorithm
Post Body

Given a variance σ2 for a sample of size N, I would like to remove a single data point, resulting in a subsample of size N-1. I want to do this without rerunning the whole algorithm to compute the variance.

There is Welford's method which allows you to add a data point without recalculating the whole thing (aka a running variance algorithm). This is proven to be numerically stable.

However, there is not much online about using Welford's method to remove data points. I have found this which discusses how one would remove a data point similarly to how you would add one. But, there is no proof that the algorithm mentioned in that StackOverflow article is numerically stable. When I run tests in Python and Numpy, I get a different answer than expected. This makes me think that this method is not numerically stable.

The problem is, I have very little knowledge of numerical stability. Looking into it, it seems that due to the subtraction in the method, this could result in catastrophic cancellation, or something like that.

So I have the following questions:

  1. Is the method described numerically stable?
  2. Does the method result in something like catastrophic cancellation? And
  3. If it is not numerically stable, how can I go about finding an algorithm that is in fact numerically stable? Is this necessarily possible?

Edit: It looks like there were a few bugs in my code, and the code seems to have the same output as the standard numpy implementation! At least for me, that's good enough!

Author
Account Strength
100%
Account Age
10 years
Verified Email
Yes
Verified Flair
No
Total Karma
23,277
Link Karma
1,445
Comment Karma
21,421
Profile updated: 10 hours ago
Posts updated: 1 day ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
8 months ago