Wednesday, July 10, 2013

Postcards from the TCP frontier

Maybe it's the weather, maybe there's something in the water, maybe I'm just lucky.

Whatever it is, I seem to have been running across a lot of quite interesting work on TCP-related topics recently.

Here's a sampling:

  • The Visible Effects and Hidden Sources of Internet Latency
    We found in this study that last-mile latencies are often quite high, varying from about 10 ms to nearly 40 ms (ranging from 40–80% of the end-to-end path latency). Variance is also high. One might expect that variance would be lower for DSL, since it is not a shared medium like cable. Surprisingly, we found that the opposite was true: Most users of cable ISPs have last-mile latencies of 0–10 ms. On the other hand, a significant proportion of DSL users have baseline last-mile latencies more than 20 ms, with some users seeing last-mile latencies as high as 50 to 60 ms. Based on discussions with network operators, we believe DSL companies may be enabling an interleaved local loop for these users.
  • WTF? Locating Performance Problems in Home Networks
    In this paper, we design and develop WTF (Where’s The Fault?), a system that reliably determines whether a performance problem lies with the user’s ISP or inside the home network. The tool can also distinguish these problematic situations from the benign case when the network is simply under-utilized. WTF uses cross-layer techniques to discover signatures of various pathologies. We implemented WTF in an off-the-shelf home router; evaluated the techniques in controlled lab experiments under a variety of operating conditions; validated it in real homes where we can directly observe the home conditions and network setup; and deployed it in 30 home networks across North America. The real-world deployment sheds light on common pathologies that occur in home networks. We find, for instance, that many users purchase fast access links but experience significant (and frequent) performance bottlenecks in their home wireless network.
  • Google making the Web faster with protocol that reduces round trips
    Ultimately, Google's goal is not necessarily to replace the Web's current protocols but to bring improvements to how TCP is used with SPDY. SPDY already provides multiplexed connections over SSL, but it runs across TCP, causing some latency issues.
  • QUIC : Quick UDP Internet Connections. Multiplexed Stream Transport Over UDP
    We wish to reduce latency throughout the internet, providing a more responsive set of user interactions. Over time, bandwidth throughout the world will grow, but round trip times, governed by the speed of light, will not diminish. We need a protocol to move requests, responses, and interactions through the internet with less latency along with fewer time-consuming retransmits, and we believe that current approaches are holding us all back.
  • Reducing Web Latency: the Virtue of Gentle Aggression
    In this paper, we explore faster loss recovery methods that are informed by our measurements and that leverage the trend towards multi-stage Web service access. Given the immediate benefits that these solutions can provide, we focus on deployable, minimal enhancements to TCP rather than a clean-slate design. Our mechanisms are motivated by the following design ideal: to ensure that every loss is recovered within 1-RTT. While we do not achieve this ideal, our paper conducts a principled exploration of three qualitatively-different, deployable TCP mechanisms that progressively take us closer to this ideal.
  • MinimaLT: Minimal-latency Networking Through Better Security
    To meet these challenges, we have done a clean-slate design, starting from User Datagram Protocol (UDP), and concurrently considering multiple network layers. We found an unexpected synergy between speed and security. The reason that the Internet uses higher-latency protocols is that, historically, low-latency protocols such as T/TCP have allowed such severe attacks [18] as to make them undeployable. It turns out that providing strong authentication elsewhere in the protocol stops all such attacks without adding latency
  • Network Utilization: The Flow View
    The actual utilization of the network resources is not easy to predict or control. It depends on many parameters like the traffic demand and the routing scheme (or Traffic Engineering if deployed), and it varies over time and space. As a result it is very difficult to actually define real network utilization and to understand the reasons for this utilization. In this paper we introduce a novel way to look at the network utilization. Unlike traditional approaches that consider the average link utilization, we take the flow perspective and consider the network utilization in terms of the growth potential of the flows in the network.
  • packetdrill: Scriptable Network Stack Testing, from Sockets to Packets
    Testing today’s increasingly complex network protocol implementations can be a painstaking process. To help meet this challenge, we developed packetdrill, a portable, open-source scripting tool that enables testing the correctness and performance of entire TCP/UDP/IP network stack implementations, from the system call layer to the hardware network interface, for both IPv4 and IPv6. We describe the design and implementation of the tool, and our experiences using it to execute 657 test cases. The tool was instrumental in our development of three new features for Linux TCP—Early Retransmit, Fast Open, and Loss Probes—and allowed us to find and fix 10 bugs in Linux. Our team uses packetdrill in all phases of the development process for the kernel used in one of the world’s largest Linux installations.

All of these wonderful papers; there's just so much to learn!

Meanwhile, I've got an interesting problem of my own:

  • A network-related performance problem is reported to me
  • I am able to reproduce the problem, using a pair of machines located on different continents, with a complex network (multiple independent sub-networks in the route between the machines, VPNs, firewalls, and other network intelligence along the way, etc.) connecting them.
  • I would like to reproduce the problem in my laboratory environment.
  • I have access to various wonderful network simulation tools (e.g., netem)
  • However, I don't know what configuration to provide to my simulation tools to reproduce the behavior seen "in the real world"

What I'd like is a tool that I could run, where I would run this tool on machine A, tell it the IP address of machine B, and then have the tool examine, probe, measure, and characterize the actual properties of the communications between the two machines, and then spit out a nice tidy set of output telling me:

To simulate this connection, issue the following "tc" rules on machine LAB_A, and the following "tc" rules on machine LAB_B, and your connections between these two machines will then exhibit (as closely as we can arrange) the behaviors that are occurring on real connections between machines A and B.

However, I haven't found that tool. So, unfortunately, I just flail away trying different netem configurations, fail to reproduce the real world problem, and am sad.

Is there a way to make me happy? Let me know!

No comments:

Post a Comment