The little computer that did

April 13th, 2016 by

At the end of March we migrated the Raspberry Pi website from a very big multi-core server to a tiny cluster of eight Raspberry Pi 3s. Here’s a bit more detail about how it worked.

The Pi rack not fooling anyone on April 1st

The Pi rack not fooling anyone on April 1st

Booting

For the Raspberry Pi 3 launch we tried out some Pis running in a data centre environment with high load using the SD card for the root filesystem. They kept crashing, if you exceed the write capability of the card the delays make the kernel think the storage has failed and the system falls over. We also want to be able to remotely rebuild the filesystem so we can fix a broken Pi remotely. So we’ve put the root filesystem on a network file server, which is accessed over NFS.

The Raspberry Pi runs the latest kernel, 4.1.18-v7+ and boots from the SD card with a configuration as follows:

dwc_otg.lpm_enable=0 console=ttyAMA0,115200 console=tty1 root=/dev/nfs rootfstype=nfs
  ip=10.46.189.2::10.46.189.1:255.255.255.252::eth0:off 
  nfsroot=10.46.189.1:/export/10.46.189.2 elevator=deadline 
  fsck.repair=yes rootwait

This brings up a block of 4 IP addresses on eth0. One address for the network, one for broadcast, one for the Pi and one for the network fileserver. It then mounts the NFS filesystem at:

nfsroot=10.46.189.1:/export/10.46.189.2

and uses that as the root filesystem.

Overly simple introduction to VLANs

On a traditional switch, you plug things and any ethernet port can talk to any other ethernet port. If you want to have two different networks you need two different switches, and any computer that needs to be on both networks needs two network ports. In our case we’re trying to have a private network for storage for each Raspberry Pi, so each Pi requires its own switch and the fileserver needs it’s own network port for every Raspberry Pi connected to keep them separate. This is going to get expensive very quickly.

Instead we turn on virtual LANs (VLAN). We connect our fileserver to port 24 and create a VLAN for ports 1 & 24, another for 2&24, etc. The switch configuration for the fileserver port specifies these VLANs as “tagged”, meaning our switch adds a header to the front of every packet from a Raspberry Pi port that allows the fileserver to tell which VLAN, and therefore which Raspberry Pi, the packet came from. The fileserver can reply with the same header, and that packet will only be sent to that specific Raspberry Pi. It behaves as if each Raspberry Pi has its own switch.

Network on the fileserver

The fileserver sees each VLAN as a separate network card, named eth0.N where N identifies the VLAN. We can configure them like any other network interface:

auto eth0.10
iface eth0.10 inet static
	address 10.46.189.1
	netmask 255.255.255.252

auto eth0.11
iface eth0.11 inet static
	address 10.46.189.5
	netmask 255.255.255.252

eth0.10 and eth0.11 appear to be network cards with a tiny network with one Raspberry Pi on the end, but in reality there’s a single physical ethernet connection underneath all of them.

Network on the Raspberry Pi

On the Raspberry Pi, eth0 is already configured on the Raspberry Pi by the boot line above to talk to the fileserver. On our switch configuration, we specify that private network is “untagged” on Raspberry Pi port, which means that it won’t have a VLAN header on it and we can access it as “eth0” rather than “eth0.N” as we did on the fileserver.

In order to do anything useful, we also need to give the Raspberry Pis access to the public network. On our network, the public network is accessible on VLAN 131. We configure this to be a “tagged” VLAN on the Raspberry Pi port, meaning it becomes accessible on the eth0.131 interface. We can configure this in the normal way, and in keeping with other back-end servers on the Raspberry Pi setup, it only has an IPv6 address:

auto eth0.131
iface eth0.131 inet6 static
	address	2a00:1098:0:84:1000:1::2
	netmask 64
	gateway	2a00:1098:0:84::1

Effectively the Raspberry Pi believes it has two network cards, one on eth0 which is a private network shared with the fileserver, one on eth0.131 which has an IPv6 address and is connected to the real internet.

Why all that configuration?

In an ideal world we’d have a single IPv6 address for each Pi, and mount the network filesystem with it. However, with an NFS root filesystem, potentially another user on the LAN who can steal your IPv6 address can access your files. There’s a second complication, IPv4 is built into the standard kernel on the Raspberry Pi and the differences per Pi are constrained to just the kernel command line, with IPv6 we’d have to build it into an initrd which would load up the IPv6 modules and set up the NFS mounts.

Planning for the future we’ve spoken to Gordon about how PXE boot on the Raspberry Pi will work and it’s extremely likely that it’s going to require IPv4 to pull in the bootloader, kernel and initrd. Whilst there is native IPv6 in the Raspberry Pi office, there isn’t any IPv6 on their test lan for developing the boot code and it’s a currently not a major priority for the Pi despite around 5% of the UK having native IPv6.

So if we want to make this commercial, each Pi needs its own storage network and it needs IPv4 on the storage network.

Power over Ethernet

We’ve added a Power over Ethernet HAT to our Raspberry Pis. This means that they receive power over the ethernet cable in addition to the two separate networks. As well as reducing the amount of space used by power bricks, it also means you can power cycle a Raspberry Pi just by re-configuring the switch.

Software

Each Raspberry Pi runs Raspbian with Apache2 installed. We’ve pulled in PHP7 from Debian Stretch to improve PHP performance and then copied all the files for the Raspberry Pi website onto the NFS root for each Raspberry Pi (so the fileserver effectively has 8 copies – one for each Pi). We then just added the IPv6 addresses of the Raspberry Pis into the site’s load balancer, deleted the addresses for the main x86 servers and waited for everything to explode.

Did it work?

Slightly to our surprise, yes and well. We had a couple of issues – the Pi is much slower than the x86 servers, not only clock speed but also the speed of the network card used to access the filesystem and the database server. Some rarely used functions, such as registering a new Raspberry Jam, weren’t really quick enough under the new setup and gave people some error pages as the connections timed out. Uploading images for new WordPress posts was similarly an issue as receiving a 3MB file and distributing eight copies on a 100Mbps network isn’t very fast. But mostly it worked.

Did power cycling the Pis via the switch work?

We never tested it in production, every Pi remained up and stable for the whole 3.5 day duration we had the system in use. In testing it’s been fine.

Can I buy one?

Not yet. At present you can still break a Pi by destroying the flash, and the enclosure doesn’t allow for replacement without taking the whole shelf (which in production would contain 96 Pis) offline. Once we have full netboot for the Pi, it is a service we could offer.

Can I register my interest to buy a Pi in the cloud?

Sure – email us at sales@mythic-beasts.com and we’ll add you to a list to keep you up to date.

Rebuilding Raspberry Pi

March 9th, 2016 by

After the Raspberry Pi 2 launch in February 2015, we had a review of how to improve and scale the hosting setup for Raspberry Pi. There were two components that caused us pain during the Pi 2 launch: the main site, running WordPress, and the forums, powered by phpBB.

The first question from our review was whether we should be putting effort into scaling a WordPress site. WordPress is estimated to be powering as many as a quarter of all websites, and it’s popular for a reason: it makes site development very easy. WordPress is easily extensible through themes and plugins, it’s supported by a vast array of existing third party plugins, and it provides a good built-in framework for delegating and moderating authoring roles.

Unfortunately, this ease of development is at least in part down to a very simplistic execution model, with each page being dynamically generated, executing code from every installed plugin, and typically resulting in multiple database queries. When the Raspberry Pi site gets busy, it’s usually down to a huge number of visitors hitting just a page which is essentially a static news story. WordPress provides no built-in mechanism for caching such content, so by default, we’re dynamically generating many copies of identical, or near identical pages.

Losing the flexibility and ease of development offered by WordPress just to cope with the handful of days when the site gets very busy would be unfortunate, so we decided to put effort into making the existing site scalable.

Caching

For pretty much every WordPress problem you can imagine, there’s at least one plugin offering to solve it for you. For site performance, there are a number of plugins such as WP Supercache, but as WordPress itself provides no framework for identifying cacheable parts of page, these can only take a very simple and typically over-cautious, page-based approach.

For example, if you’re a logged in user, you might get served a page that is in someway tailored to you, so Supercache bypasses its cache and serves you a dynamic page. Similarly, if Supercache sees a request that looks like a comment being posted, the cache is invalidated, and a dynamic page is served, and cached for future requests.

During the Pi 2 launch, we saw significant problems with load spikes when comments were posted. Clearly, small delays in comments being visible on the site is a minor annoyance compared to thousands of visitors being served an error page, so we set about making our caching more aggressive.

We wrote a small hack called staticify. This fetches the key pages from the blog every 60 seconds and renders them to static HTML. That way we always have a page in our static cache, and because we’re selecting the pages that we cache we can afford to be more brutal with our caching: we know that there’s no user-specific content on these pages, so we serve up the same cached page even if you’re logged in.

More virtualisation

An important goal after the Pi 2 launch review was to split out different parts of the site onto separate virtual servers. For example, having the WordPress blog and the forums software on different VMs made it much easier to experiment with using Hip Hop VM which offered a significant performance boost to the blog, but is incompatible with the forum software.

Although the Raspberry Pi setup runs as a private cloud on a single host machine, having different components split onto separate VMs makes it much easier to balance resources between them, and if necessary, spin up extra capacity quickly using our public cloud.

IPv6

When Raspberry Pi was hit by DDOS we built an IPv6-only backend network for the machines to communicate with each other. In the new setup all access to the back-end VMs comes from either one of four front-end load balancers, or a “gateway” VM. So we thought we’d remove IPv4 connectivity from the VMs entirely. For example this is ifconfig on one of the blog PHP VMs :

eth0      Link encap:Ethernet  HWaddr 52:54:00:3f:8a:5a  
          inet6 addr: 2a00:1098:0:82:1000:x:y:z/64 Scope:Global
          inet6 addr: fe80::5054:ff:fe3f:8a5a/64 Scope:Link

The VM needs to occasionally call out over IPv4. For example, akismet and Twitter don’t yet have full IPv6 support, so these requests go through a NAT64 gateway, provided by Mythic Beasts that proxies the connections so it appears almost seamless to the VM. This is part of the Mythic Beasts IPv6 education project, backward ISPs claim there is no demand for IPv6, whereas we provide multiple services from IPv6-only servers and give discounts if you use IPv6-only services.

SSL

Officially we enabled SSL because we wanted to improve our Google ranking but handy side effects include irritating the security services and preventing third party networks injecting adverts or corrupting downloads. The SSL decryption is done on the front-end load balancers and as they have lots of spare CPU incurs no performance issue. The only thing that isn’t is the image downloads because of incompatibilities with the current version of NOOBs. We hope to eventually resolve this.

Pi Zero Launch

November 26th at 7am, the Pi Zero is launched, a $5 computer given away on magazines. The bandwidth graph for the Raspberry Pi server does this:

Launch day bandwidth graph

Launch day bandwidth graph

It’s very exciting and quickly exceeds our previous records for the launch of Pi 2. The two VMs that generate all the webpages for the blog and deliver all the content are humming along at 10-25% capacity. The database VM is almost completely idle, we’ve successfully cached almost everything,our database server only sees load when a comment is being posted or the cache is being updated. Meanwhile we neatly exceed the 4500 users we had for the Pi2 breaking 10,000 simultaneous users at our peak.


A quick back-of-the-envelope calculation and we conclude that our staticify script avoided executing WordPress a large number of times and the following slightly dubious claim is mostly true:


The MagPi site was a bit more difficult, it hadn’t had the same level of optimisation and went through a number of changes throughout the day to accelerate it. However, the VM setup meant that the excess load was contained to specific virtual machines- under our original flat hosting setup the load from the MagPi would have taken everything offline and made identifying the underlying cause much harder.

Raspbian

We now run a full mirror of the main Raspbian site, and we’ve even done a test to make sure that the failover works.

The mirror director is a critical piece of infrastructure, without it package downloads will fail and updates can’t complete. So in the event of a failure we need to bring the mirror director back up much more quickly than we can restore 4TB+ of data from backup. As a result of this work we now have a hot spare, which has been fully tested.

Does it work and is WordPress still a good idea?

We weathered the PiZero and Christmas Day traffic peaks with ease and we think we can probably double or triple the number of people using the sites at peak times before we have to think much more or add hardware. The result is we’ve a really useful and very busy site, that supports our multiple contributors, moderators and users with a relatively minimal amount of engineering and administration time, on a comparatively small server setup.

Hosting the Raspberry Pi 3 launch, on a Raspberry Pi 3

February 29th, 2016 by

Four years ago we sat on the phone while Eben Upton pushed the button to launch his educational computer, the Raspberry Pi, and we joined them on a fairly remarkable journey.  “How do you sell and ship 10,000 Raspberry Pis?” turned into “how do you sell and ship 5,000,000 Raspberry Pis?” and “how do you contain the excitement of the internet when you put a computer on the front of a magazine?”

Today, we’re nervously watching all the server graphs as the new Raspberry Pi 3 launches and goes on sale. We’ve had one to play with for a while so we did what we do with any new shiny toy: benchmark it in a real world application.

Rasberry Pi 3

Our Raspberry Pi 3 next to a Raspberry Pi 2 serving requests for the Raspberry Pi 3 launch.

Our favourite application is rendering WordPress pages for the Raspberry Pi website, so we set up a testbed: Pi2 and Pi3 versus the virtual machines that run the blog. We picked a typical page and tried them out. Initial results weren’t promising – just one fifth the speed of the production VMs.  The VMs have the advantage of being on the same physical server as the VM that hosts the database.

Moving the Pis to the same switch as the database server, and upgrading from PHP 5.6 to PHP 7 brought Pi 3 page rendering times that were less than twice as long as the production servers.

Server Spec Seconds per page
Blog VM (PHP 5.6) 24 x 2.4Ghz Ivy Bridge 0.4
Pi 2 (PHP 7) 4 x 0.9Ghz A7 0.9
Pi 3 (PHP 7) 4 x 1.2Ghz A? 0.7

That’s fast enough to be usable. Parallelising requests across all cores, we can probably sustain about 4 hits/second from the Pi 2, 6 hits/second from the Pi 3 and around 50 hits/second for the main site.

These figures are for uncached pages.  As we’ve seen in the past, 50 hits/second isn’t even close to enough to cope with launch day traffic.  In reality, the vast majority of pages we serve are cached and both Pis can adequately serve 100Mbps of cached pages (versus 4Gbps for the main host).

So we’ve done what any sensible real world test would do, we’ve put them into the main hosting mix. If you read the headers you’ll see on some requests

  HTTP/1.1 200 OK
...
  X-Served-By: Raspberry Pi 3
...

indicating your page request came off a Raspberry Pi 3.

We’re aiming to serve about 1 in 12 requests from a Pi 2 or a Pi 3, but may adjust this up or down to keep the pi in action and not melting under the load.

How’s it done?

The backend for the Raspberry Pi site is built from virtual machines.  One VM  runs the database, and a pair that generate pages for the main, WordPress-based, website.  One of the pair is designated as primary, and also runs the admin backend for WordPress, which then synchronises files  to the other VM, now additionally, both the Raspberry Pis. All the backend servers exist on a pure IPv6 network.  We have a cluster of front-end servers that are dual stack, and load balance requests through to IPv6-only backends.

If you have IPv6 you can see the status of the two Pis here:

stats.pi2.raspberrypi.org
stats.pi3.raspberrypi.org

If you don’t have IPv6 complain to your ISP, then set up a tunnel at he.net.

The two Pis can tweet directly at @hostingpi3 and @hostingpi2. Sadly, Twitter doesn’t support IPv6 so traffic goes via our NAT 64 service that provides outbound connectivity for IPv6-only servers to legacy parts of the internet.

Decimal points are important

January 5th, 2016 by

Ben at Raspberry Pi wanted to use his new vanity domain rpf.io, as a URL shortener rather than the URL of the common big services. The easy solution was to use an existing service on a paid account which gives us analytics and tracking. However, demonstrating the age old principle of if you have to ask you can’t afford it, his email reads…

$695/month for a .htaccess file

We like Open Source software, so instead of paying enough money to rent quite a nice car for a trivial .htaccess file we instead chose to install yourls on a little IPv6-only virtual machine behind our NAT64 and IPv6 Proxy services.

We’ve done some benchmarking, out of the box we could sustain 500 hits/second, adding in php-apc boosted this to well over 2000 hits/second which should be enough, even if Liz Upton gets very excited with the Raspberry Pi twitter account.

So you can test out the service here http://rpf.io/mythic before we start making these links public.

I did not go to the ball

December 11th, 2015 by


Last night was the annual TrefBash, a large party for people involved in the UK networking industry. Several people asked why Pete didn’t attend, so we’ve lent him the company blog to explain his answer.

 

Pete writes:

I donate a lot of time to Raspberry Pi because I believe that we need more competent people in the tech industry. One thing I see is at younger ages we have a mixed gender balance, at older ages it becomes increasingly male dominated. A quick look at the boards of the UK Internet Exchanges (100% male) and attendance lists for conferences will tell you that networking is no different from the rest of the tech industry with an extremely strong male bias.

It’s improved: it’s been a while since there has been an AGM for a Internet Exchange in the Bavarian Beerhouse where busty women with cleavage serve male geeks beer. We have a Respect policy for conferences so it’s now official that you can’t be discriminatory. However as with all things it’s two steps forward, one step back.

Not a photo of the burlesque dancer, our blog is safe for work even if the industry events aren't

Not a photo of the burlesque dancer, this blog is safe for work even if the industry events aren’t

The theme for last nights TrefBash15 was Bond meets Rocky Horror. Whilst I’m highly entertained about a theme where equal opportunities rapist Frank ‘n’ Furter explains to misogynist Bond that you should treat women as well as you treat men, and Bond suggests to Frank that maybe you should ask permission first, I can’t help feeling that wasn’t quite the way the event was planned.

To check I wasn’t being hyper-sensitive I did ask a number of people what they thought. Liz Upton generously allowed me to quote her in full.

Liz Upton, Director of Communications, Raspberry Pi

Liz Upton, Director of Communications, Raspberry Pi

James Bond – well, I SUPPOSE there’s the option of evening dresses for the ladies. That’s the very kindest spin I can put on it. Jesus, this industry sucks.

Liz Upton, Director of Communications, Raspberry Pi

 

 

 

 

So instead of going, I donated the cost of a night out in London to The Women In Engineering society instead.

PiZero won!

December 10th, 2015 by
Pi Zero, 2p for scale, not included with prize.

Pi Zero, 2p for scale, not included with prize.

Last week we started a competition to win a Pi Zero. We’ve had a small number of entries, half from school age people eligible to win, the other half from entertained techies. We’ve also been using this as a job filter for some time so we have a rich depth of existing answers.

The competition is very simple, our web-page generates a mathematical sum for you to work out the answer to, and in order to succeed you have to send us the answer within one second. It’s an anti-turing test – a person can’t do it but a computer can. When you’ve succeeded we ask you to send us the code. This gives two important things, a code sample that a candidate wrote and an idea of how long it took them to work out they should automate it.

A text-book answer from an experienced techie is about 15-30 minutes and delivers a short piece of code that fetches the web-page, finds the sum with a regexp or similar, works out the answer with eval, creates the response and sends it back to us. However, people are much more creative than that.

One excellent answer was a single line of shell script, which did the whole thing in bash, but even more cleverly searched the process list for the command that had been typed in, added the escaping back in and then mailed itself to us – a one liner that did the puzzle, recreated its source code and emailed itself to us.

Another excellent answer was someone who guessed that our code had a 32 bit roll-over bug in, tried options until we generated an answer bigger than 2^32 and fixed the otherwise text-book code to implement the bug on our side.

The absolute worst answer we’ve ever seen was someone whose CV listed them as a professional programmer with five years experience. After two whole days of typing answers into the website they finally worked out that the problem had to be automated. After three days of development they wrote a vast piece of Java code that was able to download the page and find the problem but it was only able to work out the answer if there were only three numbers and they all had to be added together. Instead of improving the code for the general case they put it in a loop and repeatedly called the page until by sheer luck they got a page their code could answer. Creative genius but not in a good way.

On to the entrants

So this is a difficult challenge for school age children and teenagers. Most of the entries came from older children 16 and up, and it’s clear that it was difficult and they had to learn new things specifically to solve this. PHP and Python were the preferred languages – the most novice friendly of all the tools available. We were very torn as to who should win. After lots of deliberation our runner up is this php entry from Nils in Germany who was also the first to submit,


// That was fun. There should be more conetsts like this.
// Sorry for the incredibly hacked together code...
<?php

$challenge =
file_get_contents(‘http://sphinx.mythic-beasts.com/cgi-bin/job.pl’);

$task = preg_replace(‘/.*

Evaluate /is’, ”, $challenge);
$task = preg_replace(‘/ \, .*/is’, ”, $task);

$id = preg_replace(‘/.*name=\”id\” value\=\”/is’, ”, $challenge);
$id = preg_replace(‘/\”.*/is’, ”, $id);

function calc_string( $mathString ){
$cf_DoCalc = create_function(“”, “return (” . $mathString . “);” );

return $cf_DoCalc();
};

$ans = calc_string($task);

$url = ‘http://sphinx.mythic-beasts.com/cgi-bin/job.pl’;
$data = array(‘answer’ => $ans, ‘id’ => $id);

$options = array(
‘http’ => array(
‘header’ => “Content-type: application/x-www-form-urlencoded\r\n”,
‘method’ => ‘POST’,
‘content’ => http_build_query($data),
),
);

$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);

var_dump($result);

?>

Things we particularly like are that all the comments and code and email are in English to make it easier for us, even though it’s the authors second language.

Our winner though goes to 13 year old Nick. From a pure technical standpoint his code isn’t as good as Nils’ entry above, but through the comments it tells the story of slowly working his way towards the solution and appropriately credits the help he received – both personal and online.


#Mythic Beasts - Win a Pi Zero
#Written by Nick Lockhart from Chepstow, Wales, aged 13 3/4
#with help from an old fossil (his dad)
#You will need LXML for Python and Requests

from lxml import html
import requests
debugmode = 1

#get the page and parse all

elements into a Python list.
#For this purpose we should only get one element.
page = requests.get(‘http://sphinx.mythic-beasts.com/cgi-bin/job.pl’)
tree = html.fromstring(page.content)
paragraphs = tree.xpath(‘//p/text()’)

#Split out the question.
#First take out everything after the sum (space included)
#And then remove everything before the sum (space again included.)
#And convert to a string. Oddly, after the second time, we have to
reference the second element as there is a blank string in question[0]
#Finally, evaluate it.

question = paragraphs[0].split(‘ ,’)
question = question[0].split(‘Evaluate ‘)
question = question[1]
answer = eval(question)

if (debugmode == 1):
print(“The question was: ” + question)
print(“The answer should be: ” + str(answer))

#There’s a hidden input labeled “id”, which seems to be randomly generated.
#This is probably to track who’s submitting.
#We will also need to extract this.
#I learnt this piece of magic with help from StackOverflow. Thanks,
Mathias Muller!

secretkey = tree.xpath(‘//input[@name=”id”]/@value’)[0]

#That’s all we need to POST. Let’s generate a payload, send it off and
extract the response.
#The server expects the answer to be a string, so we convert it to a string.

payload = {“answer”:str(answer),”id”:secretkey}

if (debugmode == 1):
print payload

response =
requests.post(“http://sphinx.mythic-beasts.com/cgi-bin/job.pl”,
data=payload)
responsetree = html.fromstring(response.content)
print(responsetree.xpath(‘//p/text()’))

Of course the final comment to everyone who entered is if you ever need any kind of hosting, domain name or similar send us an email and include your entry number for a freebie / upgrade /discount. Secondly if you seek summer work or gap year employment, we’d invite you to get in touch and we guarantee that we’ll read your CV and take your application seriously.

Testing failure: Raspbian

December 6th, 2015 by
Programmer art, just say no.

Programmer art, just say no.

If you’ve had a look at the Raspbian website today you’ll have noticed the big red !!!FAILOVER TEST!!! logo at the top right corner. That’s because today is officially unimportant for Raspberry Pi, whereas in three weeks time it will be officially very important. Historically Christmas day sees our highest traffic loads as people unwrap their new Raspberry Pis and try them out. The most critical things for us to worry about are some of the educational and getting started resources on the website, and Raspbian and the mirror director so people can download new packages for their existing Raspberry Pis.

The majority of the website has a relatively small amount of data, so pulling an image from backup and redeploying is a relatively quick operation. Raspbian however is a bit harder – it’s a big image with around 4TB of data.

So we picked today to schedule a failover of Raspbian from it’s normal dedicated server to a VM hosted in the Raspberry Pi cloud. This is aiming to check

  • Is the failover server up to date and does it work?
  • Is the failover setup fast enough to keep up with the traffic load?
  • Does every service successfully fail over?

So far we’ve had a very smooth operation, we’ve had to add a couple of missing packages that had been overlooked during setup and testing, but basically we did a DNS flip and the whole site moved over.

If you like to discover that your disaster recovery system works before you have a disaster, have a look at our Managed Services or get in touch – sales@mythic-beasts.com.

Win a PiZero

December 2nd, 2015 by
Pi Zero, 2p for scale, not included with prize.

Pi Zero, 2p for scale, not included with prize.

Thanks to a recent visit to Pi Towers, we’re in possession of a very difficult to get hold of Raspberry Pi Zero. Within Mythic Beasts we don’t have an immediate need for a Pi Zero, so we thought we’d give it away to someone more deserving. So here’s a competition.

Our recruitment page frequently foxes experienced programmers but teenagers often have little difficulty,

http://sphinx.mythic-beasts.com/cgi-bin/job.pl

This competition is open until 9th December at 17:00. In order to enter you must have been born on or after 1st September 1997. Send us a successful answer, and we’ll pick the one we like best and send the winner a Pi Zero.

People over the age of 18 will have to be satisfied merely with the respect of their peers and can go buy their own Pi Zero, for example from our customer Pi Supply.

Detailed Rules


SPONSOR

The Sponsor is Mythic Beasts Ltd, 103 Beche Road, Cambridge, CB5 8HX.

TERM

The Mythic Beasts Contest begins 2nd December 2015 at 17:00:00 UTC and ends 9 December 17:00:00 UTC. By submitting an Entry, each Entrant (or, where appropriate, the Entrant’s parent or legal guardian) agrees to the Official Rules presented here, and warrants that his or her Entry complies with all requirements set out in the Official Rules. This is a skill-based contest and chance plays no part in the determination of winners.

WHO MAY ENTER

The Contest is open only to individuals born on or after 1st September 1997. Employees of the Sponsor and their immediate family members (spouse, parent, child, sibling and their respective spouses, regardless of where they live) or persons living in the same households of such employees, whether or not related, are not eligible. CONTEST IS VOID WHERE PROHIBITED.

HOW TO ENTER

Visit http://sphinx.mythic-beasts.com/cgi-bin/job.pl and follow the instructions. When the challenge is complete submission details will be provided. You need an email address to receive a reply.

CONTEST PRIZES AND JUDGING

The prize is a Raspberry Pi Zero. This will be given to the best entry at the discretion of the judges.

The Sponsor reserves the right to take such steps as it deems necessary to verify the validity and originality of any Entry and/or Entrant (including an Entrant’s age, identity, address and authorship of the Entry), and to disqualify any Entrant who submits an Entry that is not in accordance with these Official Rules.

LICENCE

By entering the Contest, all Entrants grant an irrevocable, perpetual, royalty-free, worldwide non-exclusive licence to the Sponsor, to reproduce, distribute, and display their Entry.

LIMITATION OF LIABILITY

By entering this Contest, the Entrant (or, where appropriate, the Entrant’s parent or legal guardian) agrees to release, discharge, and hold harmless the Sponsor and its partners, affiliates, subsidiaries, advertising agencies, agents and their employees, officers, directors, and representatives from any claims, losses, and damages arising out of their participation in this Contest.

CONDITIONS

This Contest shall be subject to and governed by the laws of England and Wales.

If for any reason the Contest is not capable of running as planned for any cause beyond the control of Sponsor, Sponsor reserves the right, at its sole discretion, to cancel, terminate, or suspend the Contest. The Sponsor reserves the right, at its sole discretion, to amend the Official Rules at any time during the Contest.

Raspberry Pi Zero: Not executing a trillion lines of PHP

November 27th, 2015 by

A number of people noticed that Raspberry Pi had launched their $5 Pi Zero yesterday. We had advance warning that something was going to happen, even if we didn’t know exactly what. When the Pi2 launched we had some difficulties keeping up with comment posting and cache invalidation. We gave a very well received talk on the history and launch at the UK Network Operators Forum which you can see below.


Since then we’ve worked with Ben Nuttall to rebuild the entire hosting setup into an IPv6-only private cloud, hosted on one of our very large servers. This gives us :

  • Containment: One part of the site can’t significantly impact the performance of another.
  • Scalability: We can pull VMs into our public cloud and duplicate them if required.
  • Flexibility: We no longer have to have a single software stack that supports everything.

For the Pi 2 launch we sustained around 4500 simultaneous users before we really started struggling with comment posting and cache invalidation. So our new plan was to be able to manage over 5,000 simultaneous site users before we needed to start adding more VMs. This equates to around 1000 hits per second.

In order to do this, we need to make sure we can serve any of the 90% of the most common requests without touching the disks or the database; and without using more than 10ms of CPU time. We want to reserve all our capacity for pages that have to be dynamic – comment-posting and forums, for example – and make all the common content as cheap as possible.

So we deployed a custom script staticify. This automatically takes the most popular and important pages, renders them to static HTML and rewrites the webserver configuration to serve the static pages instead. It runs frequently so the cache is never more than 60 seconds old, making it appear dynamic.  It also means that we serve a file from filesystem cache (RAM) instead of executing WordPress. During the day we improved and deployed this same code to the MagPi site including some horrid hackery to cache popular GET request combinations.


 


Some very vague back-of-the-envelope calculations give us:

 


It’s fair to say that we exceeded our target of 5,000 simultaneous users,

 


Liz Upton was quite pleased:

 


Not to mention a certain amount of respect from our peers

 


If you deployed the blog unoptimised to AWS and just had auto-magic scaling, we’d estimate the monthly bills to be many tens of thousands of dollars per month, money that instead can be spent on education. In addition you’d still need to make sure you can effortlessly scale to thousands of cores without a single bottleneck somewhere in the stack causing them all to lie idle. The original version of the site (with hopeless analytics plugin that processed the complete site logs on every request) would consume more computer power than has ever existed under the traffic mentioned above. At this scale optimisation is a necessity, and if you’re going to optimise, you might as well optimise well.

That said, we think some of our peers possibly overstated our importance in the big scheme of things,

 


WP Super Cache vs Raspberry Pi 2

March 3rd, 2015 by

On Monday, the Raspberry Pi 2 was announced, and The Register’s predictions of global geekgasm proved to be about right. Slashdot, BBC News, global trending on Twitter and many other sources covering the story resulted in quite a lot of traffic. We saw 11 million page requests from over 700,000 unique IP addresses in our logs from Monday, around 6x the normal traffic load.

The Raspberry Pi website is hosted on WordPress using the WP Super Cache plugin. This plugin generally works very well, resulting in the vast majority of page requests being served from a static file, rather than hitting PHP and MySQL. The second major part of the site is the forums and the different parts of the site have wildly differing typical performance characteristics. In addition to this, the site is fronted by four load balancers which supply most of the downloads directly and scrub some malicious requests. We can cope with roughly:

Cached WordPress 160 pages / second
Non cached WordPress 10 pages / second
Forum page 10 pages / second
Maintenance page at least 10,000 pages / second

Back in 2012, during the original launch, we had a rather smaller server setup. That meant we simply just put a maintenance page up and directed everyone to buy a Pi direct from Farnell or RS, both of whom had some trouble coping with the demand. We also launched at 6am GMT so that most of our potential customers would still be in bed, spreading the initial surge over several hours.

This time, being a larger organisation with coordination across multiple news outlets and press conferences, the launch time was fixed for 9am on Feb 2nd 2015. Everything would happen then, apart from the odd journalist with premature timing problems – you know who you are.

Our initial plan was to leave the site up as normal, but set the maintenance page to be the launch announcement. That way if the launch overwhelmed things, everyone should see the announcement served direct from the load balancers and otherwise the site should function as normal. Plan B was to disable the forums, giving more resources to the main blog so people could comment there.

The Launch

turtlebeach

It is a complete coincidence that our director Pete took off to go to this isolated beach in the tropics five minutes after the Raspberry Pi 2 launch.

At 9:00 the announcement went live. Within a few minutes traffic volumes on the site had increased by more than a factor of five and the forum users were starting to make comments and chatter to each other. The server load increased from its usual level of 2 to over 400 – we now had a massive queue of users waiting for page requests because all of the server CPU time was being taken generating those slow forum pages which starved the main blog of server time to deliver those fast cached pages. At this point our load balancers started to kick in and deliver the maintenance page to a large fraction of our site users – the fall back plan. This did annoy the forum and blog users who had posted comments and received the maintenance page back having just had their submission thrown away – sorry. During the day we did a little bit of tweaking to the server to improve throughput, removing the nf_conntrack in the firewall to free up CPU for page rendering, and changing the apache settings to queue earlier so people received either their request page or maintenance page more quickly.

Disabling the forums freed up lots of CPU time for the main page and gave us a mostly working site. Sometimes it’d deliver the maintenance page, but mostly people were receiving cached WordPress pages of the announcement and most of the comments were being accepted.

Super Cache not quite so super

Unfortunately, we were still seeing problems. The site would cope with the load happily for a good few minutes, and then suddenly have a load spike to the point where pages were not being generated fast enough. It appears that WP Super Cache wasn’t behaving exactly as intended.

When someone posts a comment, Super Cache invalidates its cache of the corresponding page, and starts to rebuild a new one, but providing you have this option ticked…

supercache-anonymouse

…(we did), the now out-of-date cached page should continue to be served until it is overwritten by the newer version.

After a while, we realised that the symptoms that we were seeing were entirely consistent with this not working correctly, and once you hit very high traffic levels this behaviour becomes critical. If cached versions are not served whilst the page is being rebuilt then subsequent requests will also trigger a rebuild and you spend more and more CPU time generating copies of the missing cached page which makes the rebuild take even longer so you have to build more copies each of which now takes even longer.

Now we can build a ludicrously overly simple model of this with a short bit of perl and draw a graph of how long it takes to rebuild the main page based on hit rate – and it looks like this.

Supercache performance

This tells us that performance reasonably suddenly falls off a cliff at around 60-70 hits/second. At 12 hits/sec (typical usage) a rebuild of the page completes in considerably under a second, at 40 hits/sec (very busy) it’s about 4s, at 60 hits/sec it’s 30s, at 80hits/sec it’s well over five minutes. At that point the load balancers kick in and just display the maintenance page, and wait for the load to die down again before starting to serve traffic as normal again.

We still don’t know exactly what the cause of this was, so either it’s something else with exactly the same symptoms, or this setting wasn’t working or was interacting badly with another plugin, but as soon as we’d figured out the issue, we implemented the sensible workaround; we put a rewrite hack in to serve the front page and announcement page completely statically, then created the page afresh once every five minutes from cron, picking up all the newest comments. As if by magic the load returned to sensible levels, although there was now a small delay on new comments appearing.

Re-enabling the forums

With stable traffic levels, we turned the forums back on. And then immediately off again. They very quickly backed up the database server with connections, causing both the forums to cease working and the main website to run slowly. A little further investigation into the InnoDB parameters and we realised we had some contention on database locks, we reconfigured and this happened.

Our company pedant points out that actually only the database server process fell over, and it needed restarted not rebooting. Cunningly, we’d managed to find a set of improved settings for InnoDB that allowed us to see all the tables in the database but not read any data out of them. A tiny bit of fiddling later and everything was happy.

The bandwidth graphs

We end up with a traffic graph that looks like this.

raspi-launch-bwgraph

On the launch day it’s a bit lumpy, this is because when we’re serving the maintenance page nobody can get to the downloads page. Downloads of operating system images and NOOBS dominates the traffic graphs normally. Over the next few days the HTML volume starts dropping and the number of system downloads for newly purchased Raspberry Pis starts increasing rapidly. At this point were reminded of the work we did last year to build a fast distributed downloads setup and were rather thankful because we’re considerably beyond the traffic levels you can sanely serve from a single host.

Could do a bit better

The launch of Raspberry Pi 2 was a closely guarded secret, and although we were told in advance, we didn’t have a lot of time to prepare for the increased traffic. There’s a few things we’d like to have improved and will be talking to with Raspberry Pi over the coming months. One is to upgrade the hardware adding some more cores and RAM to the setup. Whilst we’re doing this it would be sensible to look at splitting the parts of the site into different VMs so that the forums/database/Wordpress have some separation from each other and make it easier to scale things. It would have been really nice to have put our extremely secret test setup with HipHop Virtual Machine into production, but that’s not yet well enough tested for primetime although a seven-fold performance increase on page rendering certainly would be nice.

Schoolboy error

Talking with Ben Nuttall we realised that the stripped down minimal super fast maintenance page didn’t have analytics on it. So the difference between our stats of 11 million page requests and Ben’s of 1.5 million indicate how many people during the launch saw the static maintenance page rather than a WordPress generated page with comments. In hindsight putting analytics on the maintenance page would have been a really good idea. Not every http request which received the maintenance page was necessarily a request to see the launch, nor was each definitely a different visitor. Without detailed analytics that we don’t have, we can estimate the number of people who saw the announcement to be more than 1.5 million but less than 11 million.

Flaming, Bleeding Servers

Liz occasionally has slightly odd ideas about exactly how web-servers work: 

is-this-thing-on

Now, much to her disappointment we don’t have any photographs of servers weeping blood or catching fire. [Liz interjects: it’s called METAPHOR, Pete.] But when we retire servers we like to give them a bit of a special send-off.