DNS API – Implementing Dynamic DNS

September 21st, 2013 by

Last year we announced some improvements to the Mythic Beasts DNS API, and I asserted that this made it good for implementing a Dynamic DNS service. Dynamic DNS is simply a mechanism for programmatically updating a DNS record, typically used to provide a consistent name for a computer that is at the end of an internet connection with a dynamically assigned IP address.

Well, last weekend I had the opportunity to try implementing a Dynamic DNS service with our API, and realised that it actually makes the task unduly difficult. It can be done, but in order to change a record, you need to remove the old record, and to remove the old record you need to know what it is currently. This meant that you had to use the LIST command, grep out the old record, and then issue the necessary DELETE and ADD commands. Aside from being hassle, it introduces an unavoidable race condiition between the LIST and DELETE commands.

We’ve now implemented the obvious fix: a REPLACE command, which replaces all existing records for the specified host and type, and replaces them with the one provided. Obviously this doesn’t work if for some reason you want multiple records for a single host, but for the obvious use case it means that Dynamic DNS can be handled in a single command:

curl --data "domain=MY_DOMAIN&password=MY_PASSWORD&command=REPLACE \
myhost 300 A 1.2.3.4" https://ctrlpanel.mythic-beasts.com/customer/primarydnsapi

The DNS API is a standard feature included with all Mythic Beasts domain registrations. Full documentation can be be found here.

IPv6, End users starting to care

September 17th, 2013 by

We’ve had an IPv6 aware network for quite some time, and we’ve been gradually rolling it out to our services with the aim of eventually having every service we offer fully available over IPv6 and IPv4. We host the Raspberry Pi website which has an IPv6 only internal network, IPv6 only virtual machines and IPv4 on the front end to help out those of you with the ‘legacy’ internet.

A quick skim over the logfiles suggests that about 96% of you still access the site through the legacy IPv4 network – about 4% of hosts are now connecting over IPv6 which is starting to become a non trivial fraction of the traffic. Of course this is much higher than typical sites, Raspberry Pi users are much more technically aware than the general population.

Yesterday we had our first real connectivity problem to investigate – an end user within Ja.net (the UK academic network) was unable to access files from the Raspberry Pi download server on about half of the occasions. Further investigation showed that they could access the load balancers in our Sovereign House site with connectivity via the London Internet Exchange Juniper LAN, but not the load balancers in our Harbour Exchange site with connectivity over the London Internet Exchange Extreme LAN.

When we started investigating we confirmed that it seemed to be a problem with the Extreme LAN, if we forced the connectivity via the Juniper LAN it worked from both sites, if we forced it via the Extreme LAN it failed from both sites. Odder and odder though, a packet dump on our LINX interface didn’t show us passing the packets on.

Our IPv4 peering worked fine, this was IPv6 specific.

We then started looking at the routing table on the router. Over IPv4 it looks like

131.111.0.0/16 via 195.66.236.15 dev eth0

and over IPv6 it looks like

2001:630::/32 via fe80::5e5e:abff:fe23:2fc2 dev eth0

That gives us the netblock, and the next hop to send the packet to.

So the next step is to check you can reach the gateway happily enough.

# ping 195.66.236.15
PING 195.66.236.15 (195.66.236.15) 56(84) bytes of data.
64 bytes from 195.66.236.15: icmp_seq=1 ttl=64 time=0.220 ms

and

# ping6 fe80::5e5e:abff:fe23:2fc2
connect: Invalid argument

Odd. Then I realised that fe80:: in IPv6 means a link local address – the address is specific to the network card so to ping it you have to specify the destination address and the network interface.

# ping6 fe80::5e5e:abff:fe23:2fc2 -I eth9
PING fe80::5e5e:abff:fe23:2fc2(fe80::5e5e:abff:fe23:2fc2) from fe80::21b:21ff:fe65:a4c5 eth9: 56 data bytes
64 bytes from fe80::5e5e:abff:fe23:2fc2: icmp_seq=1 ttl=64 time=0.451 ms

Then the penny dropped. The routing table has eth0 in it but we’re actually connected to eth9. Under IPv4 this is fine because the next-hop address is globally unique and only accessible over eth9 so we send the packets out of eth9 and they go to the correct destination. Under IPv6 it’s a link local address and therefore valid over any interface, so we obey the routing table and throw the packets out of eth0 whereupon they fall onto the floor because there’s no fibre connected.

Fixing the config to put the right interface description in made it all work, and our end user is happily able to access all the load balancers on all the v6 addresses in all of the buildings.

Obviously if you’re a Mythic Beasts customer and you don’t already have an IPv6 allocation for your real or virtual server, drop us an email and we’ll hand you your own address space to play with.

I do not accept your silly software license

September 9th, 2013 by

So our newest Mythic Beast started working for us today. The first task is to start installing your new laptop and reading and signing employment contracts. Today we had our newest employee fail at the first hurdle.

The laptop in question is a shiny Toshiba z930. This one came with Windows 8 and a fully charged battery. On first powering it on it comes up with the Windows 8 licence. This has a tickbox option for ‘I accept the license’ and a big button labelled ‘Accept’ to click on.

If you don’t tick the box, it tells you you have to. There’s no option to reject the license.

If you press the power button the laptop suspends itself. If you press and hold the power button the laptop still suspends itself. Ctrl-Alt-Delete doesn’t work. You can’t remove the battery as it’s built in. In frustration our newest employee suggested pouring his coffee over the damn thing to make it power cycle. This was a really stupid idea, not only does the laptop have a spill proof keyboard he’d also then have no coffee.

The best plan we could come up with was to wait for the batteries to run out which requires pressing a key about every five minutes to stop the thing suspending itself.

New DNS resolvers

August 28th, 2013 by

We’ve upgraded our DNS resolvers in our SOV and HEX data centres. New features include DNSSEC validation and IPv6.

The addresses are,

SOV : 2a00:1098:0:80:1000::12 / 93.93.128.2
HEX : 2a00:1098:0:82:1000::10 / 93.93.130.2

They’re now DNSSEC aware and validating resolvers. That means if a site has correctly configured DNSSEC and we receive an answer that fails the security check we will return no answer rather than an incorrect/forged one.

To demonstrate the difference,

a non dns sec validating resolver :
# dig +short sigfail.verteiltesysteme.net
134.91.78.139

a mythic beasts server using our resolvers
# dig +short sigfail.verteiltesysteme.net
<no answer>
#

and on the DNS server it logs an error,

debug.log:28-Aug-2013 15:44:57.565 dnssec: info: validating @0x7fba880b69e0: sigfail.verteiltesysteme.net A: no valid signature found

and correctly drops the reply.

Googles DNS servers on 8.8.8.8 work the same as ours so we’re fairly confident that there will be no compatibility issues.

Downstream ASN

August 12th, 2013 by

With a customer of ours we have set them up their own full BGP network, split across two of our London sites. With advice from us we have

  • Helped them join RIPE as an LIR
  • Helped them apply for an IPv6 /32 and an ASN
  • Set up a full BGP IPv6 only network
  • Helped them apply for a final /22 of IPv4 space
  • Configured this in the global routing table

They have the option to now cable or fibre direct to peering exchanges and other ISPs should they wish to do so on individual machines hosted within our rackspace. In the mean time they’re taking advantage of our co-location, out of band access to their routers via serial and our IPv4 and IPv6 transit.

Joining the London Internet Exchange

August 7th, 2013 by

We’ve now joined the London Internet Exchange and are present on both of their peering LANs for redundancy. We’re connected to the Juniper LAN in Sovereign House and the Extreme LAN in Harbour Exchange. We’re now connected to three peering exchanges – Edge-ix,LoNAP and LINX-juniper in Sovereign House, and two – LINX-extreme and LoNAP in Harbour Exchange.

You can see the current traffic over the LINX public exchanges here

which is best described as rather a lot. We’re in the process of setting up more direct peers in addition to the route servers which provided immediate peering with hundreds of ISPs and tens of thousands of routes. So many UK destinations are now a few hops shorter – which probably won’t be very noticeable – but we have improved redundancy and increased capacity.

Dark Fibre

August 5th, 2013 by

Over the last twelve months we’ve made a series of networking changes and completely failed to blog about them. Our first announcement is that we now have a dark fibre ring around our core London sites.

This isn’t actually true. We now have a lit fibre ring around our core London sites. It’s currently running at 10Gbps and connects all of our routers together. All our routers connect to the local networks at 10Gbps so our entire network core is now 10Gbps. We also have some direct customer connections who are using our fibre as a layer 2 interlink between Telecity Sovereign House, Telecity Meridian Gate and Telecity Habour Exchange 6/7. Our standard is to offer a pair of ports in each site on redundant switches (so 6 x 1Gbps ports) with unlimited traffic between them.

As a result of our upgrade we’re able to continue to offer free traffic between all London hosted servers irrespective of the building the machines are in or which customer owns them – we bill only for traffic that leaves our network. Upgrading to progressively higher bandwidths is now straightforward as we can add CWDM / DWDM as required to increment in multiples of 10Gbits, or to 40Gbits or multiples of 40Gbits.

For those of you that are interested, the fibre lengths are

  • MER <-> SOV : 1672ns (or 1122ft)
  • SOV <-> HEX : 6423ns (or 4310ft)
  • HEX <-> MER : 5456ns (or 3687ft)

and the latencies across the network from core router to core router (average over 10 pings) are

  • MER <-> SOV : 0.096ms
  • SOV <-> HEX : 0.082ms
  • HEX <-> MER : 0.076ms

and from customer machine in SOV to customer machine in HEX, passing through at least two routers – 0.5ms.

Power efficiency

July 31st, 2013 by

We’ve just done a performance comparison of one of our little dedicated servers versus a dual core VM hosted on VMware through another provider.

The VMware machine has two cores of an Intel Xeon E5530 (Westmere) at 2.4Ghz, we have four hyperthreaded cores of an i7-3615QM (Ivy Bridge) at 2.3Ghz.

Both machines are running the same operating system install, same application code so we ran siege for 30s at a time with different concurrency levels as a benchmark to find out if our machine was faster and by how much.

The initial comparison was the dual core VMware service (green), versus our VM (red). At very low concurrency (1-2 simultaneous requests) our machine is slightly slower to render each page. Beyond this the existing machine has exactly the predicted load curve in that it slows linearly with additional simultaneous users – the new machine appears to slow only very slightly with minimal performance degradation.

By default we’re running the ondemand cpu scheduler which means when idle the cores are clocked at 1.2Ghz. The page render time remains almost constant to four cores as the host spreads the load around the four 1.2ghz cores keeping the render time constant. Beyond this the scheduler starts to turn up the core speed as the load rises, so at 8 cores we’re still rendering pages in the same average time because each core is now clocked at 2.3Ghz almost twice as fast – we’ve doubled the amount of CPU available. Only then does the performance begin to drop off and then sub-linearly.

On the existing host the performance is much more predictable – it takes a constant amount of CPU to render each page request and as you double the concurrency the render time doubles.

If you turn off the power-saving on the i7 and set it to performance mode it gives the expected linear performance decrease with increasing load. Interestingly it’s slightly slower at maximum CPU utilisation, I think (but haven’t confirmed) this is because it can’t use the turbo boost feature to increase the clock-speed as much as the power-saving option because it’s always running at a warmer temperature as it doesn’t cool down as much between each benchmark run.

We’re going to leave the machine in ondemand mode, whilst it’s slightly slower in normal use, it uses less electricity so it cheaper to run and less harmful to the polar bears, it also has significantly better performance for short peaks – it has a stockpile of cold that it can borrow against for short periods of time.

I wonder if they should start teaching thermodynamics in computer science courses.

Debugging IPv6 support

March 27th, 2013 by

One of our customers is running a monitoring network First2Know and has one of the monitoring nodes hosted with us in one of our London sites. He chose us because we have IPv4 and IPv6 support and the monitoring network does full IPv4/IPv6 monitoring from every location. He kept seeing connectivity issues between his Raleigh node hosted by RootBSD and his London node hosted by us.

Initial investigation indicated that only some IPv6 hosts on our network were affected, in particular he could reliably ping only one of two machines with IPv6 addresses in the same netblock hosted on the same switch within our network. We escalated the issue with us and RootBSD and they helpfully gave me a VM on their network so I could do some end to end testing.

Analysing at both ends with tcpdump indicated that packets were only being lost on the return path from RootBSD to Mythic Beasts, on the out path they always travelled fine. Testing more specifically showed that the connectivity issue was reproducible based on source/destination address and port numbers.

This connect command never succeeds,

# nc -p 41452 -6 2607:fc50:1:4600::2 22

This one reliably works,

# nc -p 41451 -6 2607:fc50:1:4600::2 22
SSH-2.0-OpenSSH_5.3

What’s probably happening is somewhere along the line the packets are being shared across multiple links using a layer3 hash, this means the link is chosen by an implementation like

md5($source_ip . $source_port . $destination_ip . $destination_port) % (number of links)

This means that each connection always sees the packets travel down the same physical link minimising the risk of a performance loss due to out of order packet arrival, but each connection effectively gets put down a different link at random.

Statistically we think that either 1 in 2 or 1 in 3 links at the affected point were throwing our packets away on this particular route. Now nobody in general has noticed because in dual stack implementations it falls back to IPv4 if the IPv6 connection doesn’t connect. We only found it because this application is IPv6 only; our IPv6 monitoring is single stack IPv6 only.

Conversation with RootBSD confirmed that the issue is almost certainly within one of the Tier 1 providers on the link between our networks, neither of us have any layer 3 hashing options enabled on any equipment on the path taken by the packets.

Now in this case we also discovered that we had some suboptimal IPv6 routing, once we’d fixed the faulty announcement our inbound routes changed and became shorter via a different provider and all the problems went away and we were unable to reproduce the issues again.

However as a result of this we’ve become a customer of First2Know and we’re using their worldwide network to monitor our global IPv4 and IPv6 connectivity so we can be alerted and fix issues like these well before our customers find them.

If this sounds like the sort of problem you’d like to work on, we’re always happy to accept applications at our jobs page.

RaspberryPi crash

February 20th, 2013 by

After seven months and two weeks of uptime our Raspberry Pi mirror server fell over yesterday and required a power cycle to bring it back up. It lasted longer than it’s first USB hard disk which failed after about six months. Examining the logs suggests that the flash card is dying, yesterday it remounted read only and the network stack fell over. /var/log is on the external USB drive so we were able to see that the machine was alive, it could log ethernet connect/disconnect, it just couldn’t start the network back up.

During the time it was up it shipped about 1.5TB of downloads running an average of 3Mbps of traffic, quite regularly peaking at 10Mbps+.