Sparks

Some Historical thoughts on Docker, Kubernetes, etc

Historical Perspective on Docker, Kubernetes, etc

What Network Services used to Look Like BACK THEN…

25 years ago this summer, after graduating, I started work at the JANET Web Cache Service. Well, not really, I actually started work at Manchester Computing - in the network services team of the University of Manchester. I was employed to work on what was then called the the National Janet Web Caching Service. However, the first thing I was asked to do was redesign the website, so I made a logo and started my tradition of creating search-able names. (Not google-able, because Google wasn’t around in the UK until 1999) As part of that it became clear that “JWCS” was pretty unique, so we renamed things.

After that, the interesting stuff was building out the service.

Anyway, timeline wise national scale services were relatively rare, let alone to build out from scratch. During my time at the JWCS, we went from servicing some of the traffic of a small a number of universities to servicing most of the web traffic across UK Academia. We also handled Manchester University’s (and UMIST’s) site-caches.

Traffic usage then was not as high then as now, but we definitely had the highest load web service in UK Academia at the time. We did stats on things (which we formally/peer-review published) and looked for new ways of building (and measuring) efficiencies - both in terms of server efficiency, cache efficiency but also in terms of building servers, etc.

We did everything from hardware builds, OS builds, software builds, kernel hacking, stats generation, support systems, database tooling etc. Interesting times.

Things like Kubernetes, Docker, Vagrant, IPVS (linux virtual server load balancing) didn’t exist. Even Erlang wasn’t opensource, and when it & Eddieware were released, we didn’t see much clear benefit after basic load testing (due to the front end needing to hold open connections for clients.

How things were when I joined (Lots of manual processes)

We had 2 small collections of individual servers - in Manchester and Loughborough. Some were inherited SGI/IRIX machines, some FreesBSD servers and some Linux boxes for testing. There were a small number of universities using the system, and install/load balancing was very manual.

To keep costs down, we built custom server hardware, installed FreeBSD onto it, configured and provisioned servers.

This meant:

  • Installing the OS
  • Completing the OS build
  • Installing the software
  • Configuring it for remote management
  • Adding DNS entries for the server
  • Pointing the entries at the server
  • We did also use “jails” for security (among other obvious things)

(mentioning the detail to point out the the number of manual steps here!)

Also, we had clients - which were universities. Each had 2 caches - a primary and a backup. To do this they had a name which they could point at which we controlled. Those names were CNAMEs to the real servers.

Servers connected, Things worked. Traffic was certainly orders of magnitude smaller!! Life was indeed simpler back then.

Or was it? There was a fair amount to maintain for new hardware, new clients. Failover was not as good as you’d like, and load balancing was very manual and not really dynamic.

Over time, we optimised things like new build install. We looked at load balancers. We considered things like caching appliances. Both cost more than the annual budget of the service. The trend in building out services at this point in time was to buy bigger and bigger iron (servers) with more CPUs, disk, memory. But again this was a very expensive route to take.

Instead, we looked at non-dedicated load balancers. During 1999 there was a nascent project called then “Linux Virtual Server” which I tried using, looked promising. It was also called IPVS. In the summer of ’99 I it installed on the Manchster University cluster and it worked great.

So were made plans to restructure the service around this. (which were published and are still on the linuxvirtualserver website today (!))

How things were when I left (Lots of automation)

We shifted to having 2 primary clusters on JANET backbone locations - in Manchester and London. Each location had 2 physical front end boxes (actually small boxes with very little storage) which were there as LVS (now called IPVS) front ends (called directors), with remote serial console access. Each cluster had between a dozen and 2 dozen physical servers. All had a local IPs and a private IP, and the LVS directors used IP-inside-IP tunneling to forward the packets from HTTP requests to the real servers. (those packets were decapsulated and connected via the private IP) The real servers all had the same IP on a private non-ARPing interface and could then reply directly to clients.

This allowed us to switch to giving each university a primary/secondary cluster to point at and it also enabled us to redirect traffic between clusters (and between geographic locations) if we needed to perform maintenance. (Meaning we could switch from doing maintenance overnight to the afternoon if needed)

We didn’t have any service downtime after this change while I was there.

I switched us over to storing tar ball file system images on a central admin server. Installing a new machine became a matter of starting the machine up with a floppy based Linux installation, which formatted drives, pulled the image (using ssh/tar) over the network, untarred it into the drives, configured themselves and were operational in the time it took for the install to complete.

We’d then plug the boxes into the right locations on the right network and location, and add them to the list of machines to be included in the cluster, and traffic would then flow over.

Upgrades could be done similarly by taking a server out of rotation, automated upgrades, test and the re-introduced into rotation.

NOTE: While still somewhat manual and homegrown, it was very much building out in the direction of travel that would look familar today.

Was this unusual?

As I said, in 1998 the method of building out services from servers manually wasn’t uncommon. As mentioned, building out of bigger and bigger boxes was definitely a trend. Load balancing using DNS was also fairly common. By the time I left, LVS was still very new and while we felt a sufficiently tested system it was barely a year old. The approach of managing lots of machines in the way we did (ssh shell scripts, ssh keys, centralised management) was common enough that I saw a number of companies do things this was in the early 00’s. However, the lack of existence of tools like containers, docker, k8s, etc meant that it was uncommon to do things quite the way we did it. Those that did do these things certainly had no standard way of doing this.

(If you’re familiar with k8s/docker/etc, you’ll realise that I essentially containerised our services and automated the rollout ala docker/k8s and managed the cluster in a similar way)

Fast Forward 25 years…

No, this isn’t a comment about “Fast Forward” - that was a different employer :-)

How would you build this out today? Well, you’d either build out the hardware yourself still or you’d use someone elses infratructure. (the latter being the far more common way, and easiest to test short term - well, not really, but most obvious)

Either way, you’d use a bunch of VMs. If you were doing this yourself, you’d probably still want to have a collection of VMs on the hardware you build out on.

Anyhow, you would:

  • Build out physical infrastructure using physical machines or VMs provisioned using tools like terraform/ansible/etc.
  • Build images of your servers using docker
  • Manage the cluster of serices on the machines using Kubernetes
  • And likely as part of that have a front end load balancer.

So what’s changed?

  • Physical infrastructure & jails replaced by VMs as far as possible
  • Terraform/ansible - replacing the floppy (!!) based provisioning
  • Docker - replacing the tar ball style service provisioning
  • K8s - replacing the manual process of kicking off of provisioning services
  • IPVS - perhaps ironically, but not too surprising, is still available as one of the load balancing mechanisms in K8s. (and I think for many would still make a lot of sense) (though it seems they’ve forgotten about the existence/benefit of L7 switching)

Why mention this?

Mainly because kubernetes keeps on coming up time and time again these days, alongside docker and the various other tools. So I did a bit of a dive into kubernetes while on leave, and realised that it did far less than I expected, while also handling a fair amount of the drudgery involved in running a public cluster/service.

If you’ve been thinking “I’ve been running services for decades, why pick up another tool?”, it’s probably worth it. Yes, there’s dockerisation of your application to sort out. Yes, you need a K8s cluster to test with as well as your public one. But once you’ve got these things set up it’s easier than you might expect.

On leave I played around with Linode’s k8s service - which is pretty good and you can get some free usage out the service. I found this video “You need to learn Kubernetes now” particularly useful and entertaining when learning this. (his docker tutorial is likewise fun) This is primarily because rather than being a 10 hour tutorial, it’s a really focussed 25 minutes. I find the video a BIT intense/over the top, but I think having seen too many boring tech videos, it’s refreshing. If you’re in the US, buy some of his coffee :-)

(On the flip side, if you’re just publishing a personal website for whatever reason - like this site is - there’s very little reason to bother with the K8s infrastructure, because frankly there’s simpler mechanisms. But if you already have a k8s cluster, it’s worth just using)

From there I went on to setting up my own test cluster for experimenting/dev services, and that’s what I’ll talk about in a follow up post.

Also, of note - I DON’T mean using minikube for local experimenting/development/ testing, I mean actually setting up a test cluster of VMs. It’s easier than you think.

Updated: 2023/09/15 21:43:33