Google Docs link to Changelog of this post - If anything needs correcting, please use the 'comment' function to tell me what needs to be changed.
The changes are in order at the BOTTOM of the gDoc link, along with the current and original post.
Last update: Day 7 - 2021-09-23 07:30 AEST
I apologise for taking so long to update this post. However, I was getting accused of being a shill for and/or employee of Voip.ms, was continuously abused for trying to provide helpful, useful, and as accurate as possible information, and I basically just gave up. I honestly thought it would be over by now, and I could just unpin this post without needing to update it again, but it looks like I was wrong yet again.
Useful Links
Background
I want to make it clear at this point that this is not me trying to prove someone wrong. This is me trying to figure out what mistakes SOMEONE ELSE might have made. I much prefer to learn what not to do from someone else's mistakes, rather than from my own!
You also have to realise that the people manning the social media pages are not technical! They are not in a position to provide a technical explanation of what's going on, and even if they tried and then made a mistake, people would scream about it.
Please accept that they are doing what they can, and they're in a very difficult position. If I make a mistake, then it doesn't matter. This is just me making educated guesses anyway! If/When they release a full technical post-mortem, everyone may be looking back at this and laughing about how wrong I was.
Now, onto day 1...
Rough Timeline
This all started on Thursday, Sep 16, 2021. I believe I have the date and timestamps right, but if they're wrong, please go to the gdocs link above and correct them!
Sep 16, 2021 (Day 1)
People using VoIP.ms started experiencing issues. VoIP.ms posted on twitter saying that they were having problems and were investigating.
A few hours later they said that their dns was under attack and blamed it on a DDoS.
This seemed a bit strange to quite a few people, as DNS is not something that can easily be DDOSed these days, with providers like CloudFlare and Amazon Route53, so a few people (including yours truly here!) went to do some digging to see what's up.
After doing some quick research, we all discovered that their DNS servers were managed by the previous owners of the company. As VoIP.ms had been sold a few years ago (I'm sorry, I don't know the exact amount, hopefully someone can tell me), most people thought that this was just a fumble-fingered transfer of the DNS registration. This is not an unusual event! Even google have done this to themselves.
Sep 17, 2021 (Day 2, Morning)
Whoever was in charge of the registrar credentials was discovered, and they finally managed to update their DNS records across to CloudFlare and there was much rejoicing.
Explicit hypothesis by xrobau: Things were working, but I saw plenty of people reporting slow things. When things are slow, for indeterminate reasons, the first question you must always ask is "Is the slowness an exact multiple of 5 seconds?". 5 seconds is the default timeout of a DNS request on linux. If you have a connection that is ALWAYS taking 10 seconds to respond, that usually means two DNS lookups were failing. I'm not a customer, so I can't validate that was what was happening. This is why this is a hypothesis! But my guess is that those two DNS servers were hard-coded into all of their POPs. As they had vanished, every single machine had to be logged into and updated. I may be totally off base here, and until (if?) we have a full post-mortem, no one will know any more.
They posted that everything was working, and indeed it looked like it was. They were still blaming everything on a DDoS, but who cares, right? The important thing is that they were back up, and calls were flowing.
Sep 17, 2021 (Day 2, Afternoon)
Cloudflare started reporting 'host down' errors. This error means that Cloudflare are unable to reach the REAL server. For those that don't know, Cloudflare is basically the worlds biggest Web Proxy, who also dabble in DNS and SSL Certificates.
This was, actually, the first time that there was a suggestion that they were really under attack, by the way. Everything prior to now (using Occam's Razor) was more likely a mixup that the front-end people manning twitter just described as a DDOS, because it was easier that way.
It could have just been them moving their core infrastructure to a new IP range, and doing a stop->copy->start process, rather than copy->stop->update->start process, but as they're claiming it's a DDoS, let's take them at their word, and that outage was a combination of both!
Sep 18, 2021 (Day 3)
People were reporting that they're having registration failures, bad audio, and inability to reach POPs totally. This ABSOLUTELY looks and feels like a DDoS now.
There's a brand new twitter account that has the name 'REvil' in it, claiming that they're responsible for the DDoS, and asking for the ludicrous amount of 100BTC. That's about USD$5m, and is an impossible amount to ask for.
They reference a pastebin which could be this one. That twitter account doesn't appear to have linked to a SPECIFIC pastebin, and that was linked to by someone else on twitter without saying how they found it.
Sep 19 to Sep 23 (Day 4 to now)
They are getting DDoSed by SOMEONE, but I am almost certain that this is not the 'real' REvil. REvil vanished in July. REvil did Ransomware - they would exfiltrate your data, and then blackmail you into paying a large (but not insanely large, like this) ransom. They didn't do DDoS's. But in the last few weeks, there has been someone (or some people) claiming to be REvil, using DDoS's as a way to blackmail people into paying them to stop.
This IS super effective, especially for VSPs, as it's not easy for SIP to be protected behind something like Cloudflare. You'd need a specialized, specific, global CDN that explicitly handles and manages SIP traffic, and is able to handle a sustained Level 4 DDoS (eg, INVITE, REGISTER, OPTIONS, etc) without dying... Hmmm, I think I know what my next project is going to be!
Technical details
What do we know they have done?
- Moved all their web infrastructure behind CloudFlare. This is usually the easy way to avoid a DDoS, but a persistent attacker (like whoever is attacking VoIP.ms) can still make people's lives miserable.
- They are moving (at least some) of the IP addresses of their POPs. This is confirmed by them tweeting that people should be trusting DNS again and not to use IP addresses any more, as well as some of the IP addresses of their POPs changing from the original document they shared.
What are they probably doing?
- Asking their hosting providers for more control over their networks and IP allocations.
- Beefing up hardware and infrastructure.
- HOPEFULLY putting something like a cluster of Kamailio SIP Proxies in front of everything, which can horizontally scale and absorb a Layer 4 attack.
Porting numbers out
Number porting is working for numbers from VoIP.ms. Most VSPs are processing them rapidly. If you really must leave them, please pick a provider from the list of alternative providers and tell them you're porting numbers from VoIP.ms - I know that at least we have a team dedicated to expediting these ports. I assume other VSPs do too.
If anyone has any updates or info, please post them here in the comments.
Do Not
- Kick them while they're down. It doesn't achieve anything. Everyone knows they're suffering.
- Link to unverifiable information - I've linked to Pastebin, but also explicitly pointed out that it probably isn't real.
You may also note that I removed the 'Do not encourage people to switch to other providers' rule. When I originally wrote that (5 days ago from this edit!) I truly believed that they just had some mopping up to do, and it was going to be all over within the next 24 hours. However, I was wrong.
Whoever is attacking them looks like they're persistent enough to keep paying for this DDoS forever.
Subreddit
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/VOIP/commen...