Tuesday, January 3, 2012

Help, uploading makes my internet slow!

Consumer broadband connections aren't designed right. They sound good on paper, of course "The majority of users download far more than they upload, so giving them higher download speeds and only a tiny amount of upload speed makes sense". In truth, it usually sort of kind of works... so long as all you ever do is download.

I don't only ever download, though. I create content now and then, and some of it is quite large. Large enough to take well over 10 hours to upload through the wee tiny upload pipe my provider gives me.

On the face of it this isn't really a problem. The material I'm uploading isn't remarkably time sensitive, and I don't produce it at a rate that exceeds my connection's ability to upload it. The problem is what happens to the download side of the pipe while this upload is going on.

Communication is a two-way street, you see. We need to be able to send out a request before we can get any data back, and we need to send out an acknowledgement before we get the next chunk and so on. Unfortunately our ability to send out these requests and acknowledgements is essentially cut off when the upload pipe is saturated, and we end up with packets backing up to the point where having any sort of interactive online experience is impossible, as is achieving anything close to a reasonable download rate for the few requests that do manage to make it through.

The solution to this, of course, is actually remarkably simple: We just have to keep the pipe from backing up by limiting the rate that we send data out. Of course, to avoid just moving the logjam further upstream, we also want to make sure that we throw out stale packets so they don't take up as much space, and thus time, in the queue.

Now I use a linux box for my main internet gateway, sitting between my connection to the world at large, and the legion devices on the warm, cozy side of the LAN. This means that setting up this rate limiting is a simple and straightforward task.

Or it would be, if the documentation weren't utter shit.

I actually spent quite a few hours paging through man pages and google looking for the magic incantation that would allow me to set up a sane rate limit queue on my linux box before coming up with the solution. To save you the trouble, here it is:

tc qdisc add dev shaw root tbf rate 60kbps burst 1540 latency 100ms

The command we're executing is "tc". The "qdisc" part means were issuing a queueing discipline command. The "add" means we're adding a new queueing discipline. All fine and dandy so far.

Now we specify the device we want to operate on. I've used "ifrename" to name my external interface "shaw", because that's who my provider is. Yours will probably be something uncreative like "eth0", but that aside the "dev shaw" part specifies the correct ethernet interface on my particular system.

The "root" part is the simple part of a complex puzzle. You can actually create a whole tree full of queues that feed into each other, each with different limits, algorithms, and a whole lot of other overcomplicated and unnecessary junk. We just want one queue, so we want it to be the root.

The "tbf" part is where we've specified the type of queue, in our case it's a "token bucket filter". Without going into too much detail, data is only sent downstream when there's enough "tokens" in the "bucket" to "pay" for them, and the tokens are replenished in wall-time thus achieving the rate limiting.

We specify this rate limit explicitly with the next part, "rate 60kbps". My upload bandwidth is actually 0.5mbit which translates to roughly 64kbps, but we want to make sure that we don't run too close to that limit or the logjam might form again at the cable modem.

The next part, "burst 1540" is where things start going off into the weeds. The documentation regarding this part of the command goes on about some technical details of how this needs to be set very large for things to work properly on fast connections because of kernel tick resolution limits. Of course, this is at least 5 years out of date since the kernel went tickless back in 2007 or so. In either case, this specifies the maximum size of the bucket (after which no more tokens can be added to it) and we want it to be at least big enough to pay for a full ethernet frame.

Finally, perhaps the most important part for keeping the logjam from forming, is the "latency 100ms" part. This defines how long a packet can linger in the queue before it's thrown out to make way for newer packets. Now one might think that throwing packets out would tend to make a complete mess of things, but in fact this is exactly what we want to do as reliable stream protocols like TCP will throttle back in the face of packet loss, reducing the logjam, and isochronous protocols will of course benefit more from having packets dropped rather than delivered late.

By deploying this rate limiting queue on my linux box I was able to ensure interactive internet performance while at the same time sacrificing little to no upload speed performance. This makes me happy, and I hope if you come into a similar situation that it'll make you happy too.