This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I am diagnosing the server now but it doesn't look too promising. It appears the Samsung drive went up completely an the RAID on that machine didn't perform as expected. I'll update soon.
Update
I've managed to recover some of the data on that node. Apparently the boot SSD drive failed (second Crucial SSD to fail this month) but the Samsung drive that held the Elasticsearch data survived (kind-of). I had to rip an SSD out of my workstation, stick it into the server (since it already had 20.04 Ubuntu on it) and then I was able to recover around 88% of the node's data. There are still some nodes unassigned so historical data will be affected until I can either recover those shards or reload the data into Elasticsearch.
However, the production API should be running and updating now with current submissions and comments which is one of the most important things. This will probably push me over the edge to just start migrating data to the new ES cluster (7.14) -- and once that happens, I'm going to bite the bullet and just enable replicas which is something that wasn't enabled on the original cluster (due to costs when I first started).
If you notice anything quirky with the production API, let me know -- I've set it up to start re-ingesting comments working backwards so that there the most recent 72 hours of data doesn't have gaps. I'll continue to work on restoring the shards that didn't want to come on line, but if they are lost, it isn't too big of a deal because a lot of that data can be easily reloaded. The bigger issue is that reloading data into a cluster that isn't set up properly with replication is merely kicking the can down the road and not addressing the root problem -- so I think it makes sense to bring up the new cluster with replicas enabled and start migrating data to it. It may take a few weeks to get it fully populated and a few thousand more in NVMe drives but it will be worth it long term.
Thanks for your patience as always!
Subreddit
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/pushshift/c...