Meaningless codes

February 13th, 2014 by

Given just how much of our lives involves using apps, websites or embedded computers, you might think that an initiative to teach children the basics of how to create software and not just how to use it would be uncontroversial.

Well, that’s the stated goal of Year of Code, and whilst you might be able to find fault with its execution, that’s not the angle that Jeremy Paxman chose to take when interviewing Year of Code’s director, Lottie Dexter, on Newsnight last week. Instead, Paxman decided to ridicule the very notion using the curious argument that because he couldn’t understand some computer code, it was therefore meaningless, and by implication, worthless. That he can’t get his head around what code is, let alone what it means, rather confirms the importance of this initiative.

Here at Mythic Beasts we have some pretty strong views on starting coding young, so Pete took his Raspberry Pi and attempted to explain what coding is in terms that even Jeremy might understand, through the medium of a musical e-Card and via a few lines of Perl that can generate just about every pop hit ever. To read Pete’s views, or just to see a video of him playing the piano, click here.

Single Point of Failure

February 10th, 2014 by

The Tower Bridge Lifeboat station put up a picture on Twitter of what London would look like today if the Thames Barrier wasn’t closed.

I’ve annotated it with the four biggest nodes from the London Internet Exchange.

london-flood

It’s not known exactly what fraction of UK Internet traffic passes through these four buildings, but almost every major ISP or content provider exchanges traffic there, Mythic Beasts, Facebook, Twitter, Google, BT, Virgin, Yahoo, Microsoft, Akamai, Netflix, Talktalk, BSkyB, Vodafone, and hundreds of others. Very slowly regional exchanges are starting to be built, largely by LINX itself.

Monitoring service improvements

February 4th, 2014 by

We’ve just rolled out some improvements to our monitoring service. All server products, including virtual servers, get access to our basic ping monitoring service, allowing you to receive SMS and email alerts if your server goes off-line. For £5/month you can add enhanced monitoring, which allows you to confirm that individual services such as a web server are working correctly. Customers of our Managed Hosting service not only get access to enhanced monitoring, but also get the benefit of having our staff respond to the alerts for them.

The new features are:

Temporary silencing of alerts

You can now silence alerts for a set period. This is handy if you’re doing maintenance, and don’t want to be constantly pestered by alerts, but also don’t want the risk of forgetting to turn the monitor back on again afterwards.

Prowl notifications

Prowl is a notification system for iOS, allowing you to receive alerts on an iPhone or iPad. The advantage of Prowl notifications over SMS is that they’re not limited to 141 characters, so we can include a more verbose message, including direct links for silencing the alert. They’re also delivered over the internet, rather than the mobile network, so will work if you have a wifi connection, but no mobile signal.

Support for an Android equivalent (such as Notify My Android) is on the to-do list.

Improved email alerts

The email alerts previously included the same dense text that we use for SMS alerts. The new style notifications are now more verbose, and include links for silencing the monitor, and in the case of web alerts, a link to the page that failed.

Monitoring of arbitrary TCP ports
We provide monitors for most common services, including HTTP, SMTP, IMAP and POP3. You can now also monitor any TCP port. This check simply confirms that the host is accepting connections on this port, and then closes the connection.

Router fails, no packets dropped.

January 29th, 2014 by

This morning one of our routers in our Cambridge data centre stopped reporting bandwidth data to our billing system. We investigated and whilst it was still routing packets without issue, it appeared to be experiencing hardware failure.

We’ve powered the router down, pending full investigation on our data centre visit this afternoon. Currently all traffic from our Cambridge site is being handled by our other router. This seamlessly failed over with no customer impact.

Depending on your choice of terminology ‘Redundancy has been reduced to N’, or ‘The network is at-risk’. In Mythic Beasts we like to speak English so this translates to, if something else fails before the router is restored to service, there is a risk of a network outage to our Cambridge data centre.

Update : Friday 31st we fully restored our network to it’s usual redundant configuration by replacing the router with a similarly over specified replacement. Customers may have received free bandwidth for some of this period.

Saturday outage report

January 27th, 2014 by

Edit: we’ve now received a report from Telecity, so have updated this report to take account of this.

Further edit: explanation for extended outage in one rack added.

Summary

  • A power interruption occurred at around 8:09am on Saturday 25th January, affecting multiple floors in Sovereign House.
  • For the most part, the interruption was momentary (around 500ms), but long enough to cause a reboot of affected equipment.
  • One of our racks was without power until 10:38am, due to a tripped circuit breaker.
  • Our staff were onsite at 11:15am, and then worked to restore services that had not come back up cleanly. One such server was our SOV DHCP server which will have affected any virtual servers configured to boot via DHCP.

Details

The power outage was caused by an interruption to the external mains power supply, followed by a failure of the DRUPS (Diesel Rotary Uninterruptible Power Supply) system that is supposed to ensure that power to the data centre is maintained during such a power cut.

The DRUPS system contains three separate units with sufficient capacity to cope with the failure of any one unit. Unfortunately, in this event, the unit that failed did so in a manner that triggered a shutdown of the other two. From the Telecity report:

… one of the units on DRUPS System 1 experienced a fault on its synchronisation card. This fault caused the unit to go into overload which, in turn, had a direct impact on the remaining two units. During the overload condition, the faulty unit back-fed the other two units which, for protection and per design, automatically shut down.

At this point the system went into raw mains bypass mode (i.e. bypassing the UPS systems, and connecting the data centre load directly to the mains). This occurred around 2 minutes after the original mains supply failure, by which point the mains supply had been restored, but there was a 500ms interruption as the bypass occurred.

This much is consistent with our observations, which is that in all but one rack, the logs on our remote PDUs did not record an outage, but the vast majority of equipment attached to them did: the management interfaces in these PDUs draw very little electricity and are known to be able to survive very short power supply interruptions.

As noted above, one of our racks experienced a more extended outage. This was due to the circuit breaker on the power bar being tripped. This was noticed and rectified by data centre staff inspecting racks following the initial outage.

At this point, the faulty DRUPS unit is out of service, meaning that whilst the power supply is protected, there is no redundancy until the unit is repaired and tested.

Conclusion

Whilst we are certainly unhappy about the outage, at this point we have no cause to question our choice of data centre provider. Sovereign House is a major UK internet hub, and is a purpose-built 6 floor data centre, built to the highest industry standards. With the best will in the world, there will always be faults that can take an entire DC, or significant parts of it, off-line, and for this reason, we would always recommend that mission-critical applications are served from multiple sites. Independent routing ensured that our facilities at other sites were unaffected by the Sovereign House outage.

That said, the aftermath of the outage has revealed some areas in which we can improve. In particular, the extended outage of one rack had a knock on effect to connectivity of others. Following Sunday’s scheduled maintenance work, we’re now in a position to improve our network topology to make it more resilient. We are also planning improvements to our Virtual Server hosts and database servers to ensure that they can recover more quickly following such an outage, and we have already made changes to our support systems to make them more resilient.

Beyond directly fixing the affected units, Telecity are also planning improvements to their communications during such an incident. This will help us direct our efforts more effectively.

Notes

For the avoidance of doubt, this interruption was completely unrelated to the network upgrades scheduled for Sunday evening, which went ahead as planned.

Finally, thank you to all customers who monitored our status page during the outage.

More bits

January 10th, 2014 by

At the end of last year we took the decision to significantly upgrade our two connections to LINX – our busiest connections to the outside world.

This turned out to be a good plan as Mythic Beasts got a Christmas present in the form of a new company bandwidth record, thanks to two customers, Blinkbox Music and Raspberry Pi getting a substantial spike in hits as people unwrapped their Christmas presents.

And it seems that the excitement of all the presents hasn’t worn off, as the Christmas day record has just been toppled by a new all time high yesterday. With the Blinkbox apps very high in the free music app charts, we’re not expecting it to stand for long.

Raspi.tv

January 9th, 2014 by

Here’s an unsolicited customer review of a migration of a dedicated server to one of our managed virtual machines from Alex at raspi.tv who’s building a 9inch HDMI 1080p screen.

New Year, New Server At mythic Beasts

You can find the original twitter conversation at @Mythic_Beasts.

Coping with Christmas

January 7th, 2014 by

Our latest blog post is on the Raspberry Pi website. Coping with Christmas

LINX now running at 2x10Gbps

November 29th, 2013 by

Today we’ve upgraded both of our connections to the London Internet Exchange (LINX) from 1Gbps to 10Gbps.

Over the past few weeks we’ve repeatedly broken the company bandwidth record. And since we’ve recently secured more peering agreements — including every major UK connectivity provider — a greater proportion of our traffic is now going out over LINX. So at peak times our bandwidth usage has been enough that in the unlikely event of a failure of one of the LINX LANs, we would have come close to running out of capacity on our other link. Clearly an upgrade was in order!

Our network engineers performed the upgrade this morning, with no disruption as traffic was automatically and transparently rerouted during the brief down time. After the upgrade, we have 10Gbps from our data centre in Telecity Sovereign House to LINX Juniper; and 10Gbps from our Harbour Exchange data centre to LINX Extreme.

In the event of the failure of either link or router, traffic will automatically reroute around our internal fibre ring to our other site and out to the peering exchange via our other connection. And, for the time being, we have plenty of capacity to spare.

Sender Verify vs Hotmail

November 26th, 2013 by

We aim to give our users the choice of a range of anti-spam measures. One of the options we provider is sender verify, a simple check whereby before you accept a mail, you check that the sender of that email exists, and would accept mail from you. You can argue about how effective this is as an anti-spam measure, but it seems a perfectly reasonable check to want to make, in the same way that many people choose to not answer their phone to those who withhold caller ID.

Unfortunately, some people object to you asking the question.

We recently had some complaints from users who said that they couldn’t receive mail from people with addresses hosted on Microsoft’s Hotmail servers, and sure enough, Hotmail have blacklisted one of our servers’ IPs for daring to enquire about whether particular sender addresses were valid. This affects not just hotmail.com, but various other Microsoft domains.

Sadly, Microsoft aren’t going to change their policy for us, so we needed to whitelist them. This isn’t entirely trivial as what matters is where the sender’s email address is hosted, which means looking up the MX records for that domain. Fortunately, Exim makes this easy enough, provided that you’re not offended by curly brackets. Adding the following condition to a sender verify ACL will disable the check for Hotmail hosted domains:

!condition = ${if forany{${lookup dnsdb{>: mxh=$sender_address_domain}{$value}fail}}{match {$item}{\Nmx.\.hotmail\.com\N}}}

I should note that for quite some time, we’ve used a dedicated IP address for performing our sender verify checks in order to minimise the impact of exactly this type of blacklisting. If we hadn’t done this, the blacklist would have made it impossible for any users to send mail to Hotmail-hosted addresses too. As it was, the problem only affected users who had elected to use sender verify on their domains.