At QA Cafe we deal directly with a lot of standards from many different organizations, all of which go into making the home routers and other connected devices that we test with CDRouter.
When it comes to performance testing, the most frequently cited standard is IETF RFC 2544, the “Benchmarking Methodology for Network Interconnect Devices”. Created in 1999, it is the methodology used by most router performance test tools in the market. But is it enough for the complex world of connected devices communicating with real Internet applications?
RFC 2544 defines a set of criteria for traffic to be used during testing, the duration of those tests, and the measurements used at the conclusion of the tests. For example, it defines:
RFC 2544 is more a set of guidelines than it is a test suite.
These requirements are then used to build whatever tests are appropriate to the testers, and goes little into defining what sorts of things must absolutely be tested. It does not give the procedures or metrics for results. This is of course by design - it was meant to provide the background information necessary to design throughput and latency tests on IP routers, switches, etc.
RFC 2544 was developed by people interested in enterprise routers and switches. As such, legacy use of the standard has created a de-facto set of common practices that were originally designed to test large scale routers. This includes running 120 second throughput at each of a number of frame sizes, etc. While these tests can provide some insight, they do not represent any realistic model of real-world traffic pattern.
The most common practice is to push UDP Echo Request frames for 120 seconds at a variety of frame sizes. Many test suites do this by blasting traffic at one particular frames size at a time and work up or down the defined sizes for Ethernet: 64, 128, 512, 1024, 1280, and 1514. This presents two issues:
As one might expect, trying to find what “average” real-world traffic is like is difficult. There’s not a lot of research. Even if there was, an average size wouldn’t exercise the real behavior of a DUT.
Efforts have been made like IMIX and more recently NetSecOPEN (though the latter is focused on Firewalls and other security applications) attempt to come up with some baseline mixes that will exercise what a router will actually go through in deployment. These efforts are much broader in scope than just trying to find an average frame size. They are trying to find good mixes of applications, burst profiles, etc. This is a good thing, but unfortunately many customer requirements come from the legacy of asking for RFC 2544 benchmarking.
When people ask for “testing in accordance with RFC 2544”, what they often mean is running the tests that have become the norm in most test tools. That is, attempting to pass line rate traffic for 120 seconds at each frame size, then moving on to the next size, etc. Frame loss and latency is then measured at the end.
This is not how traffic appears in the real world however. Even though there is some evidence that the majority of traffic is of smaller (80-128) frame sizes, it doesn’t mean that they originate from applications pushing line rate.
The small frames we see are certainly frequent due to signaling protocols, DNS, etc. But even a Voice service using smaller frame sizes would not consume the entirety of a connection’s bandwidth, particularly at multi-gigabit rates. Witnessing small frame sizes taking up the full bandwidth of your connection probably would mean something nefarious is going on!
Even with the right mix of frame sizes, pushing 100% of line rate and measuring frame loss does not provide a lot of information. Moreover, UDP Echo requests sent at line rate doesn’t represent a realistic traffic flow.
As we discuss in our training webinar on performance testing, line rate throughtput and latency can act as a good baseline, but it doesn’t provide a complete picture of how your DUT will handle real-world user behavior. Moreover, today’s gateways are application aware and are doing more than just routing IP traffic.
The way to get the most accurate picture of device performance is to treat each application as its own control variable. DNS latency is one of, if not the biggest, cause of poor user experience when it comes to basic Internet applications. TCP and UDP will behave differently, and they will behave differently over different mediums, especially if the physical layer has its own robustness protections - wired Ethernet will behave much differently than Wireless!
In CDRouter Performance, we solve this in two ways. The first is the ability to send line rate traffic in a number of different scenarios: UDP vs. TCP, IPv4 vs. IPv6, etc. The second are our application specific latency test cases, which will push traffic of a specific type (like DNS, or the DHCP process) to give an indication of how a user will experience delays.
One final word on performance testing. While pushing traffic of any type is a good way to measure DUT quality, performance itself is not the end-all of testing network equipment, particularly in broadband or home networks. Functional testing of individual protocols - making sure your device will behave correctly over time - is critical to proving your DUT will measure up to provider requirements and user expectations.