When I started Clarinet (then Hilink Communications) back in 1994, the guru of the Internet in Australia, K. Robert Elz, of the University of Melbourne, where I was working at the time, pressed upon me the importance of having nameserver diversity. Kre initially allowed some of my domains to use the University’s nameserver as a secondary, but then encouraged a quid-pro-quo relationship with an ISP in Ireland, so we could secondary each other’s domains.
It was an arrangement that worked well, and even when it faded, I made a point of arranging diverse secondaries for the domains Clarinet hosted.
A friend of mine (I’ll call him Charles in this post) runs a small ISP, the sort you don’t find much any more, still hanging on since the heyday of the late 1990’s. He used to have everything running from his home – dialup, mail and websites. Now the dialup is gone and the ADSL2 is outsourced. Some of the web hosting is outsourced too, but the DNS is (or rather was) hosted locally, and served via a 20M/20M SHDSL link. It was not just the primary DNS server, but the secondary server too.
One Friday, the 20M/20M link went down and Charles rang me for help. Simple request – can I help program his Cisco router so that the backup ADSL2 connection can be used for the customer email and websites. Sure, but there was a big problem. It’s not so hard to get the traffic flowing via the ADSL2 connection, correctly, with some port forwarding rules. The problem was that the DNS entries were ONLY accessible using the IP addresses which were assigned to the 20M/20M link which was down. That meant that customer email could not be delivered, because no-one can see the updated MX records. Similarly, for the websites to be visible at the new ADSL2 IP address, the DNS entries need to be updated. But since the nameservers are offline, no-one’s browsers can get the new IP address for the websites.
For Charles, looking at the prospect of an entire weekend of frustrated customers, there was just one solution. For the next 12 hours Charles and his son laboriously went through the 200 odd domains he hosted, and redelegated them to Cloudflare. Once each domain’s DNS was on Cloudflare, it was accessible, the MX records and www A records were visible, and the mail and web traffic could flow through the temporary IP address.
It was a painful lesson for Charles, but the next time his 20M/20M link goes down, he won’t have to do it again.
You might think that this only happens to tiny providers like Charles, but it ain’t necessarily so…
In August 2016 the dedicated server hosting provider that Clarinet uses for its main servers went offline. This organisation is multi-homed, etc, etc. But the multi-homing failed, due to a misconfiguration in a router and it was many, many hours before they came back online.
During that offline period, their website was down and so were their nameservers. The whole kit and caboodle was just gone from the Internet. I was starting to think the absolute worst had happened and the company had folded suddenly. Thankfully that was not the case.
I’ll mention that Clarinet’s DNS serving was not affected because while the primary nameserver was unreachable (hosted in the offline datacentre in Australia), the secondary servers in Los Angeles and St Louis were fine. For this reason, customers using Clarinet’s nameservers but with email or web services hosted elsewhere, such as Gmail, never even noticed the outage.
It just goes to show that even big datacentre companies can have hiccups and it actually makes sense for them to outsource their DNS. With their DNS hosted on Cloudflare, CloudNS or anywhere … and a simple outage notice page on an externally hosted website, they could have easily posted a ‘Back soon’ message where anxious customers like me could find it.
The cost of outsourcing your DNS hosting or getting external secondary servers is negligible when compared with the cost of customer relations failure when you (temporarily) disappear.