Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

32
One of the main servers died last night while I was loading additional data (hard drive failure)
Post Body

I am diagnosing the server now but it doesn't look too promising. It appears the Samsung drive went up completely an the RAID on that machine didn't perform as expected. I'll update soon.

Update

I've managed to recover some of the data on that node. Apparently the boot SSD drive failed (second Crucial SSD to fail this month) but the Samsung drive that held the Elasticsearch data survived (kind-of). I had to rip an SSD out of my workstation, stick it into the server (since it already had 20.04 Ubuntu on it) and then I was able to recover around 88% of the node's data. There are still some nodes unassigned so historical data will be affected until I can either recover those shards or reload the data into Elasticsearch.

However, the production API should be running and updating now with current submissions and comments which is one of the most important things. This will probably push me over the edge to just start migrating data to the new ES cluster (7.14) -- and once that happens, I'm going to bite the bullet and just enable replicas which is something that wasn't enabled on the original cluster (due to costs when I first started).

If you notice anything quirky with the production API, let me know -- I've set it up to start re-ingesting comments working backwards so that there the most recent 72 hours of data doesn't have gaps. I'll continue to work on restoring the shards that didn't want to come on line, but if they are lost, it isn't too big of a deal because a lot of that data can be easily reloaded. The bigger issue is that reloading data into a cluster that isn't set up properly with replication is merely kicking the can down the road and not addressing the root problem -- so I think it makes sense to bring up the new cluster with replicas enabled and start migrating data to it. It may take a few weeks to get it fully populated and a few thousand more in NVMe drives but it will be worth it long term.

Thanks for your patience as always!

Author
Account Strength
100%
Account Age
11 years
Verified Email
No
Verified Flair
No
Total Karma
143,730
Link Karma
34,810
Comment Karma
108,242
Profile updated: 2 days ago
Posts updated: 6 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
3 years ago