Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

3
TIFU by installing an esxi patch
Post Body

So this actually did happen today actually just a few hours ago now. And I originally posted this in r/TIFU but my post was removed because apparently the mod thinks that my entire organizations services/infrasturcture being down for 2 hours and subsequently causing me to pull my hair out to try and fix it isn't something bad happening to me..... so anyway I can atleast give you guys a laugh over here in /sysadmin

I come into the office like any other Friday, I check monitoring to make sure all my hosts/vms are running and everything's fine.

I go to start a new project in which I am moving the on-premises FileMaker Pro Server running on a little Mac mini in my office to a VM on my esxi host, part of this involved running a shell script on the host. So I run the patch, reboot the machine, host comes back up no problem, a little time goes by in which I have a meeting during, come back to my office and shit hits the fan, none of the vms auto started, my entire infrastructure was down for over and I didn't even realize it due to the meeting.

So fast forward I'm scrambling to get to the data center to gain physics access to the host to reinstall esxi and re-register the vms back up asap mean while my main clients phones, website, crm, project manager, and all other internal applications and intranet as well as the main domain controller we're all down! Thankfully I have a slave on premesis and the on site phones have a copper backup/fail over so all was not lost.

I finally make it to the data center after almost 2 hours of downtime, reinstall the OS, register all the VM's, check to make sure all my services where back up and then off to lunch to try and forget about the shit show of a Friday I had stumbled myself into today..... lesson learned either run HA with a cluster for live migration for no downtime or atleast do sketchy updates, patches, anything else that could cause unplanned downtime during businesss hours to a more isolated time like the middle of the night when no ones accessing services..

TLDR; I installed a patch, borked my virtualization platform and brought most of my companies services down for 2 hours today. Way to start my weekend!

Edit: just for some clarification here we are not a large company, only have 2 locations and one server in a DC, so those affected by this fuck up wasn't too bad and thankfully we still have some back up services for redundancy however certain services that were down today have no redundancy setup yet.

Author
Account Strength
100%
Account Age
7 years
Verified Email
Yes
Verified Flair
No
Total Karma
19,227
Link Karma
772
Comment Karma
18,020
Profile updated: 2 days ago
Posts updated: 1 month ago
Chief Technology Officer

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
7 years ago