If you are developing or deploying broadband CPE, routers, or Wifi devices, what are the things you should look for when testing performance? Hint - it’s more than just throughput, and having a fully automated system to exercise the entire system is key to ensuring true performance.
What is performance testing?
What do we mean when we talk about performance testing on a broadband gateway or access point? Well really we’re talking about exercising the device’s network processing ability - that is, the device’s ability to handle (route, forward, consume, or emit) packets. Depending on the design of your device, this may be a dedicated network processor or part of the overall chipset design.
Packets are sent using a client, or source, that sends packets to a server, or sink. The packets sent and received are then compared, and the client figures out if what it sent was received.
For gateways, you can test the network processing between WAN and LAN, or on the home network between LAN ports, and get different results, especially over wireless. You can also test bidirectionally (which creates two sources and two sinks) and this will but a lot of strain on the processor.
The importance of application/transport layer testing
Many performance testing tools and processes use lower layer (that is, layers 2 or 3) packets. This is all well and good if you’re looking for a simple “line rate” measurement, but it won’t exercise the things the device is supposed to do, that is, deal with different applications that users actually use.
The other thing to take note of is what the source and sink actually compare during the test. At the transport layer, it’s the success of transmission that matters, which will give us different results for TCP and UDP. Since we’ll be retrying lost packets in TCP, the actual packet loss will be a factor but not part of the actual success of the transmission - you’ll see latency issues rather than throughput. In UDP, loss of a packet will generally show a loss of throughput.
Throughput versus latency
Speaking of throughput vs. latency, the difference is that throughput measures the data rate of successful transmissions, whereas latency measures the delay between when the transmission was sent by the source and when it was successfully processed by the sink. Latency causes different problems with different applications - for example, watching a video in a high latency situation will require more buffering by the receiver. Both of these factors are affected by the performance of the device’s network processing.
This is all very subjective, of course. There’s no “right” answer on what “good” throughput and latency are - and users will have a different tolerance for different application performance. When you add Wifi to that, it gets pretty messy.
So what sorts of things affect throughput and latency? The first is of course that the network processing just can’t keep up with the demand, and drops packets. This will affect throughput if the packets never get there, but (click) retransmissions at lower layers will introduce latency. This may be out of the hands of the network processor - if there’s disruption at the physical layer in DSL or Wifi, retransmissions exist to make the connection robust, but may introduce delay.
If there’s bugs in your code or bad cleanup that is causing memory leaks, these can eventually affect the network processing ability of the device. This is often due to the operation of other protocols.
While baseline throughput and latency are good metrics, what we REALLY want to know is how the device handles particular applications, because that’s what the user will experience. Since these are subjective, we want to look for changes over time - do we see any severe changes?
Repeatability and automation
Performance is highly subjective. You’ll end up building your own metrics, but what we really want to know is how performance will change over time due to the real-world behavior of the device. That is, as users use it, will the performance change. Testing this in any way that can provide empirical results requires repeatability and consistency. We also want to be able to mix performance tests with functional tests to simulate user activity. The CDRouter Performance add-on lets you build test packages that do these combinations and executes them consistently. You can set your own thresholds for success and failure; so if you know that 90% is the number you’re looking for, you can do that. It of course also works with Continuous Integration through the CDRouter API, so if you’re trying to improve performance with firmware revisions you’ll be able to do that seamlessly.