One of the main servers died last night while I was loading additional data (hard drive failure)

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

I am diagnosing the server now but it doesn't look too promising. It appears the Samsung drive went up completely an the RAID on that machine didn't perform as expected. I'll update soon.

Update

I've managed to recover some of the data on that node. Apparently the boot SSD drive failed (second Crucial SSD to fail this month) but the Samsung drive that held the Elasticsearch data survived (kind-of). I had to rip an SSD out of my workstation, stick it into the server (since it already had 20.04 Ubuntu on it) and then I was able to recover around 88% of the node's data. There are still some nodes unassigned so historical data will be affected until I can either recover those shards or reload the data into Elasticsearch.

However, the production API should be running and updating now with current submissions and comments which is one of the most important things. This will probably push me over the edge to just start migrating data to the new ES cluster (7.14) -- and once that happens, I'm going to bite the bullet and just enable replicas which is something that wasn't enabled on the original cluster (due to costs when I first started).

If you notice anything quirky with the production API, let me know -- I've set it up to start re-ingesting comments working backwards so that there the most recent 72 hours of data doesn't have gaps. I'll continue to work on restoring the shards that didn't want to come on line, but if they are lost, it isn't too big of a deal because a lot of that data can be easily reloaded. The bigger issue is that reloading data into a cluster that isn't set up properly with replication is merely kicking the can down the road and not addressing the root problem -- so I think it makes sense to bring up the new cluster with replicas enabled and start migrating data to it. It may take a few weeks to get it fully populated and a few thousand more in NVMe drives but it will be worth it long term.

Thanks for your patience as always!

Author

Account Strength

100%

Account Age

11 years

Verified Email

Verified Flair

Total Karma

143,730

Link Karma

34,810

Comment Karma

108,242

Profile updated: 2 days ago

Posts updated: 6 months ago

Stuck_In_the_Matrix

Subreddit

r/pushshift

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 3 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/pushshift/c...