Rearranging Raspberry Pi

February 27th, 2017 by pete

Replacing the production DB under a running system.

Two years ago we migrated Raspberry Pi from a single big server to a series of virtual machines (VMs) on an even bigger server. As time has gone on this architecture served us well; we’ve managed the Pi Zero launch and the busier Raspberry Pi 3 launch and even briefly ran the website on Raspberry Pis as a test.

However, we had a number of issues with the setup that we were looking to address:

We were out of space at the back end and needed more capacity;
We wanted more redundancy: the setup was dependent on a single dedicated VM host in a single data centre; and
There’s an apparent hardware fault on the current VM host that causes it to very occasionally spontaneously reboot (or in one case, switch itself off altogether)

It was time for a bit of an upgrade.

Scalable WordPress

The biggest part of the Raspberry Pi configuration is the main WordPress site that serves the front page for www.raspberrypi.org. This consisted of three VMs: two web servers and one database server. WordPress doesn’t provide any built-in functionality for scaling to multiple servers and although the vast majority of pages are driven entirely by the database, some operations, such as installing plugins or uploading media, result in the creation of local files that need to be available to all web servers.

In order to support WordPress in a multi-server configuration, we arrange the two web servers as a primary and a secondary. The primary delivers half of the public requests and also the administration and content creation side. The relevant parts of the local filesystem are then regularly rsynced to the secondary server, which serves public requests and can maintain the support the full public usage of the site if necessary.

The website is fronted by our “CDN”. This is a cluster of Mac Mini servers that we use to offload much of the static content traffic, and to load balance across the two web servers.

Step 1 : Figure out what you’re trying to do and write a plan.

More capacity meant a new VM host, and more redundancy meant that it went in a different data centre (Sovereign House, or “SOV”) to the existing VM host (Harbour Exchange, or “HEX”).

Diagnosing the fault on the current VM host is tricky, firstly because it only occurs once or twice a year, and secondly because it was hosting quite a busy live website. So our plan for this was to migrate all VMs onto entirely new hardware in HEX so that we can prod the old box at our leisure. This also gave us a handy opportunity to do something we’ve been wanting to do for a while, which is an OS upgrade on the VM host itself.

To add redundancy to the main site, we can split the two web server VMs across the two sites, but this doesn’t help unless we also replicate the database at the new site. So overall, we wanted to move one web server VM to SOV, and add a database VM, and in HEX we wanted to move the database VM and remaining web server VM to the new hardware. This all needed to be achieved with minimal downtime; as a highly public site we’d rather not have two hours of downtime if we can avoid it.

Step 2 : Move the database

We brought up a second database VM at the second site (SOV). We set this up as a MySQL replica of the primary database VM, this requires only the briefest of interruptions to service to configure. We then simulated a failure of the primary database server and moved all database services to the alternative site – so the database is now in SOV. Again, this has only a very brief interruption to service (<5s).

We then moved the old database VM to the new VM host in the original site (HEX) and reconfigured it as a MySQL replica. We now have a primary/secondary setup for MySQL on the new VM hosts, the secondary in a different location (HEX) to the primary (SOV), and the HEX server is not longer on the faulty hardware.

It’s worth noting that in normal operation, both web servers use the same database server for all queries. In similar arrangements, it’s quite common to have one or more web server use the slave database server for read queries, and to only send write queries to the master, thereby reducing the load on the primary database server. Unfortunately, standard WordPress doesn’t support this, but there are plugins that do which we may look at in the future.

Step 3 : Move the web servers

We shut down the secondary web server (HEX), moved it to the replacement VM host in the same data centre and brought it back up. The CDN automatically redirected all web traffic to the primary web server until the secondary came backup. Once this was complete we took the primary web server offline disabling all administration functions for the main website. Fortunately we’d told everyone in the Raspberry Pi office to drink coffee while we did this so nobody complained. Again the CDN moved all production traffic to the secondary web server, we then moved the primary VM to our alternative data centre (SOV) to sit next to the primary database server (SOV).

Step 4 : Tell the CDN

Unlike many providers, we have independent routing at each of our sites. This gives us much greater resilience to network problems, but means that moving a VM between sites necessitates a change in IP address. We informed our cluster of Mac Minis in the CDN that the primary web server had moved, and the administration site sprung back into life and the traffic split evenly across the two sites.

Step 5 : Drink coffee

Over the course of about three hours, we’d migrated a high volume production website from a non-redundant, single site configuration to a geographically redundant configuration, moved the primary database and primary web server to a different location and provided a capacity upgrade. This was all done in the middle of the day with no user-facing downtime, and only a modest maintenance window for the administration portal.

With that done, we can start work on the next stage of the plan: migrating the remainder of the VMs away from the old VM host in HEX.

Posted in Raspberry Pi, Wordpress

Ten years on, Chris Lightfoot looks more prescient than ever

February 13th, 2017 by pete

(gif from imgur via gify)

(title shamelessly stolen from Tom Steinberg, MySociety founder from his tribute to Chris, Mythic Beasts founder who died ten years ago).

Lots of people have been excited recently about this script, which allows you to remotely reinstall a Linux system with a different version of Linux by giving you a shell in a ramdisk and letting you reinstall the operating system from there.

Chris did this the hard way. Back in 2005 I remember being asked to code review ‘evil.c’, a script that allocated a lot of RAM (800MB!), compressed a filesystem into it, then uncompressed it back to the disk. On reboot it should come up with Debian instead of FreeBSD that it had earlier. It’s really very important not to swap during this process.

Amazingly it worked, and the first test was on a remote box and it saved us a data centre visit. Here’s the code in its full glory.

#include <sys/types.h>

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <zlib.h>

#include <sys/mman.h>
#include <sys/reboot.h>

#define SIZE        ((size_t)896058269)

#define E(...)      fprintf(stderr, __VA_ARGS__)
#define die()       do { fprintf(stderr, "%s; aborted\n", strerror(errno)); exit(1); } while (0)
#define zdie()      do { fprintf(stderr, "%s; aborted\n", Z.msg); exit(1); } while (0)

int main(void) {
    unsigned char *buf, *outbuf, *p;
    int fd;
    FILE *fp;
    z_stream Z = {0};
    unsigned int nin = 0, nout = 0;

    E("size = %lu\n", (unsigned long)SIZE);

    E("open /dev/amrd0... ");
    if (-1 == (fd = open("/dev/amrd0", O_RDWR | O_DIRECT)))
        die();
    E("done\n");
    close(fd);

    E("allocate file buffer... ");
    if (!(buf = malloc(SIZE)))
        die();
    E("done\n");

    E("allocate write buffer... ");
    if (!(outbuf = malloc(1024 * 1024)))
        die();
    E("done\n");

    E("lock into memory... ");
    if (-1 == mlockall(MCL_CURRENT | MCL_FUTURE))
        die();
    E("done\n");

    E("open file... ");
    if (!(fp = fopen("/usr/bitter-first-2100M-of-sda.gz", "rb")))
        die();
    E("done\n");

    E("read file... ");
    p = buf;
    while (nin < SIZE) {
        size_t n;
        n = fread(p, 1, 262144, fp);
        if (n == 0)
            die();
        nin += n;
        p += n;
        E("\rread file... %.2f%% ", 100 * (float)nin / (float)SIZE);
    }
    E("done\n");

    fclose(fp);
    E("zlib version = \"%s\"\n", zlibVersion());

    /* Now we need to walk through the buffer decompressing it into the
     * write buffer, then writing the results to the device. */
    E("initialise inflate object... ");
    Z.next_in = buf;
    Z.avail_in = SIZE;
    if (Z_OK != inflateInit2(&Z, 15 + 32))
        zdie();
    E("done\n");

    while (nout < 2100) {
        int i;
        size_t N;

        Z.next_out = outbuf;
        Z.avail_out = 1024 * 1024;
        i = inflate(&Z, 0);
        if (i != Z_OK && i != Z_STREAM_END)
            zdie();
        if (Z.next_out != outbuf + 1024 * 1024) {
            fprintf(stderr, "\ndidn't get 1MB of output\n");
        }

        /* this is where we'd write the data */
        N = 0;
        p = outbuf;
        while (N < 1024 * 1024) {
            ssize_t n;
            do
                n = write(fd, p, 1024 * 1024 - N);
            while (n == -1 && errno == EINTR);
            if (n == -1)
                die();
            N += n;
            p += n;
        }

        ++nout;
        fprintf(stderr, "\r%d / 2100 MB", nout);
    }

    fprintf(stderr, "\n");

    /* this is where we reboot */
    reboot(RB_NOSYNC);

    E("we should have rebooted by now -- probably best to assume we're completely\n"
      "screwed at this point\n");

    return 0;
}

Posted in Misc, Staff Projects

Tax needn’t be taxing thanks to TaxCalc

February 1st, 2017 by pete

One of our customers has the lovely looking bandwidth graph on the right which plummeted to zero this morning. Normally a huge sudden drop in activity on a customer site would be cause for alarm but this is the excellent TaxCalc, they do software to calculate tax and it gets very busy in the run up to the deadline for self assessment at midnight last night.

As customers of our enhanced management services with a 24/7 SLA who run a fully mirrored setup across two of our data centres, we’re happy to report that everything went smoothly and their system scaled beautifully to handle the load.

Thankfully our elected overlords have decided to smooth out the load on our servers with new personal tax accounts and shortly we’ll all have to fill in four tax returns per year instead of one.

Posted in Misc

Managed WordPress

January 23rd, 2017 by pete

Analogue photo taken with film and real chemistry. Parallax Photographic Cooperative.

WordPress is an excellent content management system that is behind around 25% of all sites on the internet. Our busiest site is Raspberry Pi which is now constructed from multiple different WordPress installations and some custom web applications, stitched together in to one nearly seamless high traffic website.

We’ve taken the knowledge we’ve gained supporting this site and rolled it out as a managed service, allowing you to concentrate on your content, whilst we take care of keep the site up and secure. In addition to 24/7 monitoring, plugin security scans, and our custom security hardening, we’re also able to assist with improving site performance.

We’re now hosting a broad range of sites on this service with the simpler cases start with customers such as Ellexus, who make very impressive technology for IO profiling, and need a reliable, managed platform that they can easily update.

At the other end of the spectrum we have the likes of Parallax Photographic, a co-operative in Brixton who sell photography supplies for people interested in film photography, using real chemistry to develop the photographs and a full analogue feel to the resulting prints. Parallax Photographic use WordPress to host to their online shop, embedding WooCommerce into WordPress to create their fully functional e-commerce site.

Parallax were having performance and management issues with their existing self-managed installation of WordPress. We transferred it for them to our managed WordPress service, in the process adding not only faster hardware but performance improvements to their WordPress stack, custom security hardening, managed backups and 24/7 monitoring. We took one hour for the final switch-over at 9am on a Sunday morning leaving them with a faster and more manageable site. They now have more time to spend fulfilling orders and taking beautiful photographs.

Purrmetrix monitors temperature accurately and inexpensively, and as you can see above with excellent embeddable web analytics. In addition to hosting their website and WooCommerce site for people to place orders, we are also customers (directly, through their website!) using their site to monitor our Raspberry Pi hosting platform. The heatmap (above) is a real-time export from their system. At the time of writing, it shows a 5C temperature difference between the cold and hot aisles across one of our shelves of 108 Pi 3s. The service provides automated alerts; if that graph goes red indicating an over temperature situation alerts start firing. During the prototyping and beta phase for our Raspberry Pi hosting platform, we’ve used their graphing to demonstrate that it takes about six hours from dual fan failure to critical temperature issues. This is long enough to make maintenance straightforward.

Also embedded in our Raspberry Pi hosting platform are multiple Power over Ethernet modules from Pi Supply who make a variety of add-ons for the Raspberry Pi, including some decent high quality audio adapters. With the launch of the Raspberry Pi 3 we had to do some rapid vertical scaling of the Pi Supply managed WooCommerce platform – in thirty seconds we had four times the RAM and double the CPU cores to cope with the additional customer load.

We host a wide variety of WordPress sites include Scottish comedy club Mirth of Forth, personalised embroidery for work and leisure wear and our own blog that you’re currently reading. So if you’d like to have us run your WordPress site for you, from a simple blog to a fully managed e-commerce solution or one of the busiest sites on the Web, we’d love to hear from you at sales@mythic-beasts.com.

Posted in Raspberry Pi, Services, Wordpress

Don’t leave your laptop in the pub

December 9th, 2016 by pete

After about twenty pages of awesome beers, you discover they also have mead.

Last night we had our Christmas party. For a 24/7 operation, that means we have to have at least one laptop with us at the party. We had just one urgent customer issue which we dealt easily without ruining the night.

However, in addition to taking your laptop to the pub, Pete would like to remind everyone that it’s equally important to remember to take your laptop home from the pub too, as he didn’t. This means we have to have a brief security review to evaluate the risks of briefly losing a company laptop. Ten years ago when we had tens rather than thousands of servers, this would have resulted in a revocation and replacement of the company ssh key on every server under emergency conditions (and those of you with an unencrypted AWS key might worry about total company deletion).

Over the past decade we put more effort into improving our security. The laptop contains an encrypted filesystem, on that filesystem is an encrypted ssh key which will allow someone into our jump box. If they’ve worked out the password for the filesystem, and the password for the ssh key,they then also need to guess password on the jump box before they would be able to access customer or company systems. That’s three different passwords to guess, or two encryption breaks and one password to guess. The passwords are not chosen by the user, the come straight from pwgen and the random number generator. Whilst we’re not worried, we’ll do some extra monitoring the logs on the jump box for attempts on Pete’s account.

Of course there’s also a risk that someone physically tampered with the hardware to install a key-logger in between leaving it in the pub and recovering it the next day. The laptop passes a brief physical inspection. If it has been tampered with, it has been tampered with very well. If our attacker was sat in the pub with a spare key logger kit just in-case the laptop was left behind, it would have been easier and cheaper to stage a break-in at an employees house, or to have forced them to check their hand luggage on a flight, or to have installed the key logger before the laptop was bought, or maybe to have compromised the random number generator in any or all of our servers before they were bought. So our threat model remains relatively unchanged and we don’t think we’re under significantly more risk today than we were yesterday.

Posted in Misc

On the upside, the server room isn’t on fire.

December 8th, 2016 by pete

This is not the correct way to mix servers and water based fire suppressant.

One of our customers does embedded development and have some custom servers in their office as a build platform. This is hardware specific to the embedded designs they’re working on and they can’t locate it in a data centre as they require regular human attention. Unstable development drivers cause crashes and the root flash filesystems need to be re-imaged and replaced.

Recently they’ve moved office and their new office has a ‘server room’, ideal for putting their very expensive custom kit in, and a handful of other machines that they keep locally in the office. While doing the fit out, they noticed that their ‘server room’ is attached to the main sprinkler system. A fire in the building and whilst the bread may be saved from being overly toasted, their expensive hand built development boards are drowned.

They raised this with their landlords who billed them the best part of a thousand pounds to resolve the problem, see the picture on the right.

I’m not sure if it’s the belief that the plastic roof will help, the combustible struts to hold it up or the lack of guttering that really emphasises the mismatch between what a landlord things a server room looks like and what a real data centre actually provides.

We’re in further discussions to see if we can host their custom kit too, because our server room has non computer damaging halon as a fire suppressant and we will return the servers to them unwashed. If your office server room looks like this, please get in touch at sales@mythic-beasts.com.

Posted in Misc, Services

Backup Upgrade

November 25th, 2016 by pete

We’re using AES rather than 8 rotor enigma encryption.

We’ve just completed an upgrade to our backup services. We’ve relocated the London node into Meridian Gate, which means for all London hosted virtual machines your primary backup is now in a different building to your server. We’ve kept our secondary backup service in our Cambridge data centre 60 miles distant.

To further improve, we have taken the opportunity to enable disk-encryption, so that all data stored on the primary backup server is now encrypted at rest providing an additional layer of assurance for our clients and fewer questions to answer on security questionnaires.
We’ve also restricted the number of ssh ciphers allowed to access the backup server to further improve the security of data in transit. We’ve also increased the available space and provided a performance boost in the IO layer so backups and restores will complete more quickly.

Of course we’ve kept some important features from the old backup service such as scanning our managed customers’ backups to make sure they’re up to date and making sure that we alert customers before their backups start failing due to lack of space. Obviously all traffic to and from the backup server is free and it supports both IPv6 and IPv4.

If these are the sort of boring tasks on your todo list and you’d like us to do them for you, please get in touch at sales@mythic-beasts.com.

Posted in IPv6, Services

IPv6 Update

November 1st, 2016 by pete

Sky completed their IPv6 rollout – any device that comes with IPv6 support will use it by default.

Yesterday we attended the annual IPv6 Council to exchange knowledge and ideas with the rest of the UK networking industry about bringing forward the IPv6 rollout.

For the uninitiated, everything connected to the internet needs an address. With IPv4 there are only 4 billion addresses available which isn’t enough for one per person – let alone one each for my phone, my tablet, my laptop and my new internet connected toaster. So IPv6 is the new network standard that has an effectively unlimited number of addresses and will support an unlimited number of devices. The hard part is persuading everyone to move onto the new network.

Two years ago when the IPv6 Council first met, roughly 1 in 400 internet connections in the UK had IPv6 support. Since then Sky have rolled out IPv6 everywhere and by default all their customers have IPv6 connectivity. BT have rolled IPv6 out to all their SmartHub customers and will be enabling IPv6 for their Homehub 5 and Homehub 4 customers in the near future. Today 1 in 6 UK devices has IPv6 connectivity and when BT complete it’ll be closer to 1 in 3. Imperial College also spoke about their network which has IPv6 enabled everywhere.

Major content sources (Google, Facebook, LinkedIn) and CDNs (Akamai, Cloudflare) are all already enabled with IPv6. This means that as soon as you turn on IPv6 on an access network, over half your traffic flows over IPv6 connections. With Amazon and Microsoft enabling IPv6 in stages on their public clouds by default traffic will continue to grow. Already for a some number of ISPs, IPv6 is the dominant protocol. The Internet Society are already predicting that IPv6 traffic will exceed IPv4 traffic around two to three years from now.

LinkedIn and Microsoft both spoke about deploying IPv6 in their corporate and data centre environments. Both companies are suffering exhaustion of private RFC1918 address space – there just aren’t enough 10.a.b.c addresses to cope with organisations of their scale so they’re moving now to IPv6-only networks.

Back in 2012 we designed and deployed an IPv6-only architecture for Raspberry Pi, and have since designed other IPv6-only infrastructures including a substantial Linux container deployment. Educating the next generation of developers about how networks will work when they join the workforce is critically important.

Posted in IPv6, Network, Raspberry Pi

More bandwidth

October 19th, 2016 by pete

We’ve added 476892 kitten pictures per second of capacity.

We’ve brought up some new connectivity today; we’ve added a new 10Gbps transit link out of our Sovereign House data centre. This gives not only more capacity but also some improved DDoS protection options with distance-based blackholing.

We also added a 1Gbps private peering connection to IDNet. We’ve used IDNet for ADSL connections for a long time, not least for their native IPv6 support. A quick inspection shows 17% of traffic over this private link as native IPv6.

Posted in IPv6, Network

Sneak preview from Mythic Labs, Raspberry Pi netboot

August 5th, 2016 by pete

We don’t like to pre-announce things that aren’t ready for public consumption. It’s no secret that we’d love to offer hosted Raspberry Pis in the data centre, and in our view the blocker for this being possible is the unreliability of SD cards which require physical attention when they fail. So we’ve provided some assistance to Gordon to help with getting netboot working for the Raspberry Pi. We built a sensible looking netboot setup and spent a fair amount of time debugging and reading packets to try and help work out why the netboot was occasionally stopping.

This isn’t yet a production service and you can’t buy a hosted Raspberry Pi server. Yet. But if you’d be interested, we’d love to hear from you at sales@mythic-beasts.com.

This is a standard Raspberry Pi 3 with a Power over Ethernet (PoE) adapter. You have to boot the Pi once from a magic SD card which enables netboot. Then you remove the SD card and plug it in to the powered network port. PoE means we can power cycle it using the managed switch. At boot, it talks to a standard tftpd server and isc-dhcp-server, this then delivers the kernel which runs from an NFS root. It’s a minimal Raspbian Jessie from debootstrap plus sshd and occupies a mere 381M versus the 1.3G for a standard Raspbian install. The switch is reporting the Pi 3 consuming 2W.

The Raspberry Pi topple is just for fun.

Posted in Raspberry Pi, Services