IP Performance Measurement                                     C. Paasch
Internet-Draft                                                  R. Meyer
Intended status: Experimental                                S. Cheshire
Expires: January 12, 2023                                     O. Shapira
                                                              Apple Inc.
                                                               M. Mathis
                                                             Google, Inc
                                                           July 11, 2022


                Responsiveness under Working Conditions
                   draft-ietf-ippm-responsiveness-01

Abstract

   For many years, a lack of responsiveness, variously called lag,
   latency, or bufferbloat, has been recognized as an unfortunate, but
   common, symptom in today's networks.  Even after a decade of work on
   standardizing technical solutions, it remains a common problem for
   the end users.

   Everyone "knows" that it is "normal" for a video conference to have
   problems when somebody else at home is watching a 4K movie or
   uploading photos from their phone.  However, there is no technical
   reason for this to be the case.  In fact, various queue management
   solutions (fq_codel, cake, PIE) have solved the problem.

   Our networks remain unresponsive, not from a lack of technical
   solutions, but rather a lack of awareness of the problem and its
   solutions.  We believe that creating a tool whose measurement matches
   people's everyday experience will create the necessary awareness, and
   result in a demand for products that solve the problem.

   This document specifies the "RPM Test" for measuring responsiveness.
   It uses common protocols and mechanisms to measure user experience
   specifically when the network is under working conditions.  The
   measurement is expressed as "Round-trips Per Minute" (RPM) and should
   be included with throughput (up and down) and idle latency as
   critical indicators of network quality.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute


Paasch, et al.          Expires January 12, 2023                [Page 1]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 12, 2023.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Design Constraints  . . . . . . . . . . . . . . . . . . . . .   4
   3.  Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . .   6
   4.  Measuring Responsiveness Under Working Conditions . . . . . .   6
     4.1.  Working Conditions  . . . . . . . . . . . . . . . . . . .   6
       4.1.1.  From single-flow to multi-flow  . . . . . . . . . . .   7
       4.1.2.  Parallel vs Sequential Uplink and Downlink  . . . . .   7
       4.1.3.  Reaching full link utilization  . . . . . . . . . . .   8
       4.1.4.  Final "Working Conditions" Algorithm  . . . . . . . .   8
     4.2.  Measuring Responsiveness  . . . . . . . . . . . . . . . .  10
       4.2.1.  Aggregating the Measurements  . . . . . . . . . . . .  11
   5.  Interpreting responsiveness results . . . . . . . . . . . . .  11
     5.1.  Elements influencing responsiveness . . . . . . . . . . .  11
       5.1.1.  Client side influence . . . . . . . . . . . . . . . .  12
       5.1.2.  Network influence . . . . . . . . . . . . . . . . . .  12
       5.1.3.  Server side influence . . . . . . . . . . . . . . . .  13
     5.2.  Root-causing Responsiveness . . . . . . . . . . . . . . .  13
   6.  RPM Test Server API . . . . . . . . . . . . . . . . . . . . .  13
   7.  RPM Test Server Discovery . . . . . . . . . . . . . . . . . .  15
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  16


Paasch, et al.          Expires January 12, 2023                [Page 2]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  16
   11. Informative References  . . . . . . . . . . . . . . . . . . .  16
   Appendix A.  Example Server Configuration . . . . . . . . . . . .  17
     A.1.  Apache Traffic Server . . . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

1.  Introduction

   For many years, a lack of responsiveness, variously called lag,
   latency, or bufferbloat, has been recognized as an unfortunate, but
   common, symptom in today's networks [Bufferbloat].  Solutions like
   fq_codel [RFC8290] or PIE [RFC8033] have been standardized and are to
   some extent widely implemented.  Nevertheless, people still suffer
   from bufferbloat.

   Although significant, the impact on user experience can be transitory
   - that is, its effect is not always visible to the user.  Whenever a
   network is actively being used at its full capacity, buffers can fill
   up and create latency for traffic.  The duration of those full
   buffers may be brief: a medium-sized file transfer, like an email
   attachment or uploading photos, can create bursts of latency spikes.
   An example of this is lag occurring during a videoconference, where a
   connection is briefly shown as unstable.

   These short-lived disruptions make it hard to narrow down the cause.
   We believe that it is necessary to create a standardized way to
   measure and express responsiveness.

   Existing network measurement tools could incorporate a responsiveness
   measurement into their set of metrics.  Doing so would also raise the
   awareness of the problem and would help establish a new expectation
   that the standard measures of network quality should - in addition to
   throughput and idle latency - also include latency under load, or, as
   we prefer to call it, responsiveness under working conditions.

1.1.  Terminology

   A word about the term "bufferbloat" - the undesirable latency that
   comes from a router or other network equipment buffering too much
   data.  This document uses the term as a general description of bad
   latency, using more precise wording where warranted.

   "Latency" is a poor measure of responsiveness, since it can be hard
   for the general public to understand.  The units are unfamiliar
   ("what is a millisecond?") and counterintuitive ("100 msec - that
   sounds good - it's only a tenth of a second!").


Paasch, et al.          Expires January 12, 2023                [Page 3]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   Instead, we create the term "Responsiveness under working conditions"
   to make it clear that we are measuring all, not just idle,
   conditions, and use "round-trips per minute" as the metric.  The
   advantage of round-trips per minute are two-fold: First, it allows
   for a metric that is "the higher the better".  This kind of metric is
   often more intuitive for end-users.  Second, the range of the values
   tends to be around the 4-digit integer range which is also a value
   easy to compare and read, again allowing for a more intuitive use.
   Finally, we abbreviate the measurement to "RPM", a wink to the
   "revolutions per minute" that we use for car engines.

   This document defines an algorithm for the "RPM Test" that explicitly
   measures responsiveness under working conditions.

2.  Design Constraints

   There are many challenges around measurements on the Internet.  They
   include the dynamic nature of the Internet, the diverse nature of the
   traffic, the large number of devices that affect traffic, and the
   difficulty of attaining appropriate measurement conditions.

   Internet paths are changing all the time.  Daily fluctuations in the
   demand make the bottlenecks ebb and flow.  To minimize the
   variability of routing changes, it's best to keep the test duration
   relatively short.

   TCP and UDP traffic, or traffic on ports 80 and 443, may take
   significantly different paths on the Internet and be subject to
   entirely different Quality of Service (QoS) treatment.  A good test
   will use standard transport-layer traffic - typical for people's use
   of the network - that is subject to the transport's congestion
   control that might reduce the traffic's rate and thus its buffering
   in the network.

   Traditionally, one thinks of bufferbloat happening on the routers and
   switches of the Internet.  However, the networking stacks of the
   clients and servers can have huge buffers.  Data sitting in TCP
   sockets or waiting for the application to send or read causes
   artificial latency, and affects user experience the same way as
   "traditional" bufferbloat.

   Finally, it is crucial to recognize that significant queueing only
   happens on entry to the lowest-capacity (or "bottleneck") hop on a
   network path.  For any flow of data between two communicating
   devices, there is always one hop along the path where the capacity
   available to that flow at that hop is the lowest among all the hops
   of that flow's path at that moment in time.  It is important to
   understand that the existence of a lowest-capacity hop on a network


Paasch, et al.          Expires January 12, 2023                [Page 4]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   path is not itself a problem.  In a heterogeneous network like the
   Internet it is inevitable that there must necessarily be some hop
   along the path with the lowest capacity for that path.  If that hop
   were to be improved to make it no longer the lowest-capacity hop,
   then some other hop would become the new lowest-capacity hop for that
   path.  In this context a "bottleneck" should not be seen as a problem
   to be fixed, because any attempt to "fix" the bottleneck is futile -
   such a "fix" can never remove the existence of a bottleneck on a
   path; it just moves the bottleneck somewhere else.  Arguably, this
   heterogeneity of the Internet is one of its greatest strengths.
   Allowing individual technologies to evolve and improve at their own
   pace, without requiring the entire Internet to change in lock-step,
   has enabled enormous improvements over the years in technologies like
   DSL, cable modems, Ethernet, and Wi-Fi, each advancing independently
   as new developments became ready.  As a result of this flexibility we
   have moved incrementally, one step at a time, from 56kb/s dial-up
   modems in the 1990s to Gb/s home Internet service and Gb/s wireless
   connectivity today.

   Note that in a shared datagram network, conditions do not remain
   static.  The hop that is the current bottleneck may change from
   moment to moment.  For example, changes in other traffic may result
   in changes to a flow's share of a given hop.  A user moving around
   may cause the Wi-Fi transmission rate to vary widely, from a few Mb/s
   when far from the Access Point, all the way up to Gb/s or more when
   close to the Access Point.

   Consequently, if we wish to enjoy the benefits of the Internet's
   great flexibility, we need software that embraces and celebrates this
   diversity and adapts intelligently to the varying conditions it
   encounters.

   Because significant queueing only happens on entry to the bottleneck
   hop, the queue management at this critical hop of the path almost
   entirely determines the responsiveness of the entire flow.  If the
   bottleneck hop's queue management algorithm allows an excessively
   large queue to form, this results in excessively large delays for
   packets sitting in that queue awaiting transmission, significantly
   degrading overall user experience.

   In order to discover the depth of the buffer at the bottleneck hop,
   the RPM Test mimics normal network operations and data transfers, to
   cause this bottleneck buffer to fill to capacity, and then measures
   the resulting end-to-end latency under these operating conditions.  A
   well managed bottleneck queue keeps its queue occupancy under
   control, resulting in consistently low round-trip time and
   consistently good responsiveness.  A poorly managed bottleneck queue
   will not.


Paasch, et al.          Expires January 12, 2023                [Page 5]

Internet-Draft   Responsiveness under Working Conditions       July 2022


3.  Goals

   The algorithm described here defines an RPM Test that serves as a
   good proxy for user experience.  This means:

   1.  Today's Internet traffic primarily uses HTTP/2 over TLS.  Thus,
       the algorithm should use that protocol.

       As a side note: other types of traffic are gaining in popularity
       (HTTP/3) and/or are already being used widely (RTP).  Traffic
       prioritization and QoS rules on the Internet may subject traffic
       to completely different paths: these could also be measured
       separately.

   2.  The Internet is marked by the deployment of countless middleboxes
       like transparent TCP proxies or traffic prioritization for
       certain types of traffic.  The RPM Test must take into account
       their effect on TCP-handshake [RFC0793], TLS-handshake, and
       request/response.

   3.  The test result should be expressed in an intuitive, nontechnical
       form.

   4.  Finally, to be useful to a wide audience, the measurement should
       finish within a short time frame.  Our target is 20 seconds.

4.  Measuring Responsiveness Under Working Conditions

   To make an accurate measurement, the algorithm must reliably put the
   network in a state that represents those "working conditions".
   During this process, the algorithm measures the responsiveness of the
   network.  The following explains how the former and the latter are
   achieved.

4.1.  Working Conditions

   There are many different ways to define the state of "working
   conditions" to measure responsiveness.  There is no one true answer
   to this question.  It is a tradeoff between using realistic traffic
   patterns and pushing the network to its limits.

   In this document we aim to generate a realistic traffic pattern by
   using standard HTTP transactions but exploring the worst-case
   scenario by creating multiple of these transactions and using very
   large data objects in these HTTP transactions.

   This allows to create a stable state of working conditions during
   which the network is used at its nearly full capacity, without


Paasch, et al.          Expires January 12, 2023                [Page 6]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   generating DoS-like traffic patterns (e.g., intentional UDP
   flooding).  This creates a realistic traffic mix representative of
   what a typical user's network experiences in normal operation.

   Finally, as end-user usage of the network evolves to newer protocols
   and congestion control algorithms, it is important that the working
   conditions also can evolve to continuously represent a realistic
   traffic pattern.

4.1.1.  From single-flow to multi-flow

   A single TCP connection may not be sufficient to reach the capacity
   of a path quickly.  Using a 4MB receive window, over a network with a
   32 ms round-trip time, a single TCP connection can achieve up to 1Gb/
   s throughput.  For higher throughput and/or networks with higher
   round-trip time, TCP allows larger receive window sizes, up to 1 GB.
   For most applications there is little reason to open multiple
   parallel TCP connections in an attempt to achieve higher throughput.

   However, it may take some time for a single TCP connection to ramp up
   to full speed, and one of the goals of the RPM test is to quickly
   load the network to capacity, take its measurements, and then finish.
   Additionally, traditional loss-based TCP congestion control
   algorithms react aggressively to packet loss by reducing the
   congestion window.  This reaction (intended by the protocol design)
   decreases the queueing within the network, making it harder to
   determine the depth of the bottleneck queue reliably.

   The purpose of the RPM Test is not to productively move data across
   the network in a useful way, the way a normal application does.  The
   purpose of the RPM Test is, as quickly as possible, to simulate a
   representative traffic load as if real applications were doing
   sustained data transfers, measure the resulting round-trip time
   occurring under those realistic conditions, and then end the test.
   Because of this, using multiple simultaneous parallel connections
   allows the RPM test to complete its task more quickly, in a way that
   overall is less disruptive and less wasteful of network capacity than
   a test using a single TCP connection that would take longer to bring
   the bottleneck hop to a stable saturated state.

4.1.2.  Parallel vs Sequential Uplink and Downlink

   Poor responsiveness can be caused by queues in either (or both) the
   upstream and the downstream direction.  Furthermore, both paths may
   differ significantly due to access link conditions (e.g., 5G
   downstream and LTE upstream) or the routing changes within the ISPs.
   To measure responsiveness under working conditions, the algorithm
   must explore both directions.


Paasch, et al.          Expires January 12, 2023                [Page 7]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   One approach could be to measure responsiveness in the uplink and
   downlink in parallel.  It would allow for a shorter test run-time.

   However, a number of caveats come with measuring in parallel:

   o  Half-duplex links may not permit simultaneous uplink and downlink
      traffic.  This means the test might not reach the path's capacity
      in both directions at once and thus not expose all the potential
      sources of low responsiveness.

   o  Debuggability of the results becomes harder: During parallel
      measurement it is impossible to differentiate whether the observed
      latency happens in the uplink or the downlink direction.

   Thus, we recommend testing uplink and downlink sequentially.
   Parallel testing is considered a future extension.

4.1.3.  Reaching full link utilization

   The RPM Test gradually increases the number of TCP connections and
   measures "goodput" - the sum of actual data transferred across all
   connections in a unit of time.  When the goodput stops increasing, it
   means that the network is used at its full capacity.  At this point
   we are creating the worst-case scenario within the limits of the
   realistic traffic pattern.

   The algorithm notes that throughput increases rapidly until TCP
   connections complete their TCP slow-start phase.  At that point,
   throughput eventually stalls, often due to receive window
   limitations, particularly in cases of high network bandwidth, high
   network round-trip time, low receive window size, or a combination of
   all three.  The only means to further increase throughput is by
   adding more TCP connections to the pool of load-generating
   connections.  If new connections leave the throughput the same, full
   link utilization has been reached and - more importantly - the
   working condition is stable.

4.1.4.  Final "Working Conditions" Algorithm

   The following algorithm reaches working conditions of a network by
   using HTTP/2 upload (POST) or download (GET) requests of infinitely
   large files.  The algorithm is the same for upload and download and
   uses the same term "load-generating connection" for each.  The
   actions of the algorithm take place at regular intervals.  For the
   current draft the interval is defined as one second.

   Where


Paasch, et al.          Expires January 12, 2023                [Page 8]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   o  i: The index of the current interval.  The variable i is
      initialized to 0 when the algorithm begins and increases by one
      for each interval.

   o  instantaneous aggregate goodput at interval p: The number of total
      bytes of data transferred within interval p, divided by the
      interval duration.  If p is negative (i.e., a time interval
      logically prior to the start of the test beginning, used in moving
      average calculations), the number of total bytes of data
      transferred within that interval is considered to be 0.

   o  moving average aggregate goodput at interval p: The number of
      total bytes of data transferred within interval p and the three
      immediately preceding intervals, divided by four times the
      interval duration.

   o  moving average stability during the period between intervals b and
      e: Whether or not, for all b<=x<e, the absolute difference is less
      than 5% between the moving average aggregate goodput at interval x
      and the moving average aggregate goodput at interval x+1.  If all
      absolute differences are below 5% then the moving average has
      achieved stability.  If any of the absolute differences are 5% or
      more then the moving average has not achieved stability.

   the steps of the algorithm are:

   o  Create four load-generating connections.

   o  At each interval:

      *  Compute the instantaneous aggregate goodput at interval i.

      *  Compute the moving average aggregate goodput at interval i.

      *  If the moving average aggregate goodput at interval i is more
         than a 5% increase over the moving average aggregate goodput at
         interval i - 1, the network has not yet reached full link
         utilization.

         +  If no load-generating connections have been added within the
            last four intervals, add four more load-generating
            connections.

      *  Else, the network has reached full link utilization with the
         existing load-generating connections.  The current state is a
         candidate for stable working conditions.


Paasch, et al.          Expires January 12, 2023                [Page 9]

Internet-Draft   Responsiveness under Working Conditions       July 2022


         +  If a) there have been load-generating connections added in
            the past four intervals and b) there has been moving average
            stability during the period between intervals i-4 and i,
            then the network has reached full link utilization and the
            algorithm terminates.

         +  Otherwise, add four more load-generating connections.

   In Section 3, it is mentioned that one of the goals is that the test
   finishes within 20 seconds.  It is left to the implementation what to
   do when full link utilization is not reached within that time-frame.
   For example, an implementation might gather a provisional
   responsiveness measurement or let the test run for longer.

4.2.  Measuring Responsiveness

   Measuring responsiveness during the previously explained working
   conditions creation is a continuous process during the duration of
   the test.  It requires a sufficiently large sample-size to have
   confidence in the results.

   The measurement of the responsiveness happens by sending probe-
   requests for a small object.  The probe requests are being sent in
   two ways:

   1.  A HTTP GET request on a separate connection.  This test mimics
       the time it takes for a web browser to connect to a new web
       server and request the first element of a web page (e.g.,
       "index.html"), or the startup time for a video streaming client
       to launch and begin fetching media.

   2.  A HTTP GET request multiplexed on the load-generating
       connections.  This test mimics the time it takes for a video
       streaming client to skip ahead to a different chapter in the same
       video stream, or for a navigation client to react and fetch new
       map tiles when the user scrolls the map to view a different area.
       In a well functioning system fetching new data over an existing
       connection should take less time than creating a brand new TLS
       connection from scratch to do the same thing.

   The former will provide 3 set of data-points.  First, the duration of
   the TCP-handshake (noted hereafter as tcp_foreign).  Second, the TLS
   round-trip-time (noted tls_foreign).  For this, it is important to
   note that different TLS versions have a different number of round-
   trips.  Thus, the TLS establishment time needs to be normalized to
   the number of round-trips the TLS handshake takes until the
   connection is ready to transmit data.  And third, the HTTP latency


Paasch, et al.          Expires January 12, 2023               [Page 10]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   between issuing the GET request for a 1-byte object until the entire
   response has been received (noted http_foreign).

   The latter will provide a single data-point between the time the HTTP
   GET request for the 1-byte object is issued on the load-generating
   connection until the full HTTP response has been received (noted
   http_self).

   It is important to issue multiple of these probes.  To have a large
   dataset, the methodology requires a client to issue these probes
   every 100 milli-seconds.  For the probes on the load-generating
   connections, the client needs to use one of the initial load-
   generating connections.  This means that every 100ms, 2 probes are
   being evaluated.  The total amount of data used for these probes
   would be no more than about 50KB worth of data within one second.

4.2.1.  Aggregating the Measurements

   The algorithm produces sets of 4 times for each probe, namely:
   tcp_foreign, tls_foreign, http_foreign, http_self (fromm the previous
   section).  Each of these sets will have a large number of sample.  To
   aggregate the methodology proposes the following:

   Among each set, we take the 90th percentile, thus resulting in 4
   individual numbers.  To aggregate these individual numbers into a
   single responsiveness number, we suggest the following weighted mean:

Responsiveness = 60000 / ((1/3*tcp_foreign + 1/3*tls_foreign + 1/3*http_foreign + http_self)/2)

   This responsiveness value presents round-trips per minute (RPM).

5.  Interpreting responsiveness results

   The described methodology uses a high-level approach to measure
   responsiveness.  By executing the test with regular HTTP requests a
   number of elements come into play that will influence the result.
   Contrary to more traditional measurement methods the responsiveness
   metric is not only influenced by the properties of the network but
   can significantly be influenced by the properties of the client and
   the server implementations.  This section describes how the different
   elements influence responsiveness and how a user may differentiate
   them when debugging a network.

5.1.  Elements influencing responsiveness

   Due to the HTTP-centric approach of the measurement methodology a
   number of factors come into play that influence the results.  Namely,
   the client-side networking stack (from the top of the HTTP-layer all


Paasch, et al.          Expires January 12, 2023               [Page 11]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   the way down to the physical layer), the network (including potential
   transparent HTTP "accelerators"), and the server-side networking
   stack.  The following outlines how each of these contributes to the
   responsiveness.

5.1.1.  Client side influence

   As the driver of the measurement, the client-side networking stack
   can have a large influence on the result.  The biggest influence of
   the client comes when measuring the responsiveness in the uplink
   direction.  Load-generation will cause queue-buildup in the transport
   layer as well as the HTTP layer.  Additionally, if the network's
   bottleneck is on the first hop, queue-buildup will happen at the
   layers below the transport stack (e.g., NIC firmware).

   Each of these queue build-ups may cause latency and thus low
   responsiveness.  A well designed networking stack would ensure that
   queue-buildup in the TCP layer is kept at a bare minimum with
   solutions like TCP_NOTSENT_LOWAT [draft-ietf-tcpm-rfc793bis].  At the
   HTTP/2 layer it is important that the load-generating data is not
   interfering with the latency-measuring probes.  For example, the
   different streams should not be stacked one after the other but
   rather be allowed to be multiplexed for optimal latency.  The queue-
   buildup at these layers would only influence latency on the probes
   that are sent on the load-generating connections.

   Below the transport layer many places have a potential queue build-
   up.  It is important to keep these queues at reasonable sizes or that
   they implement techniques like FQ-Codel.  Depending on the techniques
   used at these layers, the queue build-up can influence latency on
   probes sent on load-generating connections as well as separate
   connections.  If flow-queuing is used at these layers, the impact on
   separate connections will be negligible.

5.1.2.  Network influence

   The network obviously is a large driver for the responsiveness
   result.  Propagation delay from the client to the server as well as
   queuing in the bottleneck node will cause latency.  Beyond these
   traditional sources of latency, other factors may influence the
   results as well.  Many networks deploy transparent TCP Proxies,
   firewalls doing deep packet-inspection, HTTP "accelerators",... As
   the methodology relies on the use of HTTP/2, the responsiveness
   metric will be influenced by such devices as well.

   The network will influence both kinds of latency probes that the
   responsiveness tests sends out.  Depending on the network's use of
   Smart Queue Management and whether this includes flow-queuing or not,


Paasch, et al.          Expires January 12, 2023               [Page 12]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   the latency probes on the load-generating connections may be
   influenced differently than the probes on the separate connections.

5.1.3.  Server side influence

   Finally, the server-side introduces the same kind of influence on the
   responsiveness as the client-side, with the difference that the
   responsiveness will be impacted during the downlink load generation.

5.2.  Root-causing Responsiveness

   Once an RPM result has been generated one might be tempted to try to
   localize the source of a potential low responsiveness.  The
   responsiveness measurement is however aimed at providing a quick,
   top-level view of the responsiveness under working conditions the way
   end-users experience it.  Localizing the source of low responsiveness
   involves however a set of different tools and methodologies.

   Nevertheless, the responsiveness test allows to gain some insight
   into what the source of the latency is.  The previous section
   described the elements that influence the responsiveness.  From there
   it became apparent that the latency measured on the load-generating
   connections and the latency measured on separate connections may be
   different due to the different elements.

   For example, if the latency measured on separate connections is much
   less than the latency measured on the load-generating connections, it
   is possible to narrow down the source of the additional latency on
   the load-generating connections.  As long as the other elements of
   the network don't do flow-queueing, the additional latency must come
   from the queue build-up at the HTTP and TCP layer.  This is because
   all other bottlenecks in the network that may cause a queue build-up
   will be affecting the load-generating connections as well as the
   separate latency probing connections in the same way.

6.  RPM Test Server API

   The RPM measurement is built upon a foundation of standard protocols:
   IP, TCP, TLS, HTTP/2.  On top of this foundation, a minimal amount of
   new "protocol" is defined, merely specifying the URLs that used for
   GET and PUT in the process of executing the test.

   Both the client and the server MUST support HTTP/2 over TLS.  The
   client MUST be able to send a GET request and a POST.  The server
   MUST be able to respond to both of these HTTP commands.  The server
   MUST have the ability to provide content upon a GET request.  Both
   client and server SHOULD use loss-based congestion controls like
   Cubic.  The server MUST use a packet scheduling algorithm that


Paasch, et al.          Expires January 12, 2023               [Page 13]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   minimizes internal queueing to avoid affecting the client's
   measurement.

   The server MUST respond to 4 URLs:

   1.  A "small" URL/response: The server must respond with a status
       code of 200 and 1 byte in the body.  The actual message content
       is irrelevant.  The server SHOULD specify the content-type as
       application/octet-stream.  The server SHOULD minimize the size,
       in bytes, of the response fields that are encoded and sent on the
       wire.

   2.  A "large" URL/response: The server must respond with a status
       code of 200 and a body size of at least 8GB.  The server SHOULD
       specify the content-type as application/octet-stream.  The body
       can be bigger, and may need to grow as network speeds increases
       over time.  The actual message content is irrelevant.  The client
       will probably never completely download the object, but will
       instead close the connection after reaching working condition and
       making its measurements.

   3.  An "upload" URL/response: The server must handle a POST request
       with an arbitrary body size.  The server should discard the
       payload.  The actual POST message content is irrelevant.  The
       client will probably never completely upload the object, but will
       instead close the connection after reaching working condition and
       making its measurements.

   4.  A configuration URL that returns a JSON [RFC8259] object with the
       information the client uses to run the test (sample below).  The
       server SHOULD specify the content-type as application/json.
       Sample JSON:

   {
     "version": 1,
     "urls": {
       "large_https_download_url":"https://nq.example.com/api/v1/large",
       "small_https_download_url":"https://nq.example.com/api/v1/small",
       "https_upload_url":        "https://nq.example.com/api/v1/upload"
     }
     "test_endpoint": "hostname123.provider.com"
   }

   All of the fields in the sample configuration are required except
   "test_endpoint".  If the test server provider can pin all of the
   requests for a test run to a specific host in the service (for a
   particular run), they can specify that host name in the
   "test_endpoint" field.


Paasch, et al.          Expires January 12, 2023               [Page 14]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   The client begins the responsiveness measurement by querying for the
   JSON configuration.  This supplies the URLs for creating the load-
   generating connections in the upstream and downstream direction as
   well as the small object for the latency measurements.

7.  RPM Test Server Discovery

   It makes sense to host RPM Test Server instances in Internet Data
   Centers where they can be accessed easily by users wishing to test
   the quality of their Internet connection.  However, when a user
   performs an RPM test and determines that they are suffering from poor
   RPM during download, the logical next question might be, "What's
   causing my poor performance?  Is it poor buffer management by my ISP?
   Is it poor buffer management in my home Wi-Fi Access point?
   Something else?"

   To help an end user answer this question, it will be useful for home
   gateway equipment to host RPM Test Server instances.  In an example
   configuration, a user may have cable modem service offering 100 Mb/s
   download speed, connected via gigabit Ethernet to one or more Wi-Fi
   access points in the home, which then offer service to Wi-Fi client
   devices at different rates depending on distance, interference from
   other traffic, etc.  By having the cable modem itself host an RPM
   Test Server instance, the user can then run a test between the cable
   modem and their computer or smartphone, to help isolate whether
   bufferbloat they are experiencing is occurring in equipment inside
   the home (like their Wi-Fi access points) or somewhere outside the
   home.

   To aid in discoverability of these facilities, local RPM Test Server
   instances SHOULD advertise the availability of service type [RFC6335]
   "_nq._tcp" (Network Quality), via DNS-Based Service Discovery
   [RFC6763], using Multicast DNS on its local link(s) [RFC6762].  Where
   applicable, an RPM Test Server instance SHOULD also advertise the
   availability of its service via unicast discovery, for discovery by
   client devices not directly attached to the same link.  Population of
   the appropriate DNS zone with the relevant unicast discovery records
   can be performed automatically using a Discovery Proxy [RFC8766], or
   in some scenarios simply by having a human administrator manually
   enter the required records.  Similarly, a "cloud" service, providing
   Internet hosting service for "example.com" could choose to include
   the relevant DNS-SD records within the "example.com" domain [RFC6763]
   to communicate to clients the list of available RPM Test Server
   instances.


Paasch, et al.          Expires January 12, 2023               [Page 15]

Internet-Draft   Responsiveness under Working Conditions       July 2022


8.  Security Considerations

   TBD

9.  IANA Considerations

   IANA has been requested to record the service type "_nq._tcp"
   (Network Quality) for advertising and discovery of RPM Test Server
   instances.

10.  Acknowledgments

   We would like to thank Rich Brown for his editorial pass over this
   I-D.  We also thank Erik Auerswald and Will Hawkins for their
   constructive feedback on the I-D.

11.  Informative References

   [Bufferbloat]
              Gettys, J. and K. Nichols, "Bufferbloat: Dark Buffers in
              the Internet", Communications of the ACM, Volume 55,
              Number 1 (2012) , n.d..

   [draft-ietf-tcpm-rfc793bis]
              Eddy, W., "Transmission Control Protocol (TCP)
              Specification", Internet Engineering Task Force , n.d..

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, DOI 10.17487/RFC0793, September 1981,
              <https://www.rfc-editor.org/info/rfc793>.

   [RFC6335]  Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S.
              Cheshire, "Internet Assigned Numbers Authority (IANA)
              Procedures for the Management of the Service Name and
              Transport Protocol Port Number Registry", BCP 165,
              RFC 6335, DOI 10.17487/RFC6335, August 2011,
              <https://www.rfc-editor.org/info/rfc6335>.

   [RFC6762]  Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762,
              DOI 10.17487/RFC6762, February 2013,
              <https://www.rfc-editor.org/info/rfc6762>.

   [RFC6763]  Cheshire, S. and M. Krochmal, "DNS-Based Service
              Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013,
              <https://www.rfc-editor.org/info/rfc6763>.


Paasch, et al.          Expires January 12, 2023               [Page 16]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   [RFC8033]  Pan, R., Natarajan, P., Baker, F., and G. White,
              "Proportional Integral Controller Enhanced (PIE): A
              Lightweight Control Scheme to Address the Bufferbloat
              Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017,
              <https://www.rfc-editor.org/info/rfc8033>.

   [RFC8259]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
              Interchange Format", STD 90, RFC 8259,
              DOI 10.17487/RFC8259, December 2017,
              <https://www.rfc-editor.org/info/rfc8259>.

   [RFC8290]  Hoeiland-Joergensen, T., McKenney, P., Taht, D., Gettys,
              J., and E. Dumazet, "The Flow Queue CoDel Packet Scheduler
              and Active Queue Management Algorithm", RFC 8290,
              DOI 10.17487/RFC8290, January 2018,
              <https://www.rfc-editor.org/info/rfc8290>.

   [RFC8446]  Rescorla, E., "The Transport Layer Security (TLS) Protocol
              Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
              <https://www.rfc-editor.org/info/rfc8446>.

   [RFC8766]  Cheshire, S., "Discovery Proxy for Multicast DNS-Based
              Service Discovery", RFC 8766, DOI 10.17487/RFC8766, June
              2020, <https://www.rfc-editor.org/info/rfc8766>.

Appendix A.  Example Server Configuration

   This section shows fragments of sample server configurations to host
   an responsiveness measurement endpoint.

A.1.  Apache Traffic Server

   Apache Traffic Server starting at version 9.1.0 supports
   configuration as a responsiveness server.  It requires the generator
   and the statichit plugin.

   The sample remap configuration file then is:


Paasch, et al.          Expires January 12, 2023               [Page 17]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   map https://nq.example.com/api/v1/config \
       http://localhost/ \
       @plugin=statichit.so \
       @pparam=--file-path=config.example.com.json \
       @pparam=--mime-type=application/json

   map https://nq.example.com/api/v1/large \
       http://localhost/cache/8589934592/ \
       @plugin=generator.so

   map https://nq.example.com/api/v1/small \
       http://localhost/cache/1/ \
       @plugin=generator.so

   map https://nq.example.com/api/v1/upload \
       http://localhost/ \
       @plugin=generator.so

Authors' Addresses

   Christoph Paasch
   Apple Inc.
   One Apple Park Way
   Cupertino, California 95014
   United States of America

   Email: cpaasch@apple.com


   Randall Meyer
   Apple Inc.
   One Apple Park Way
   Cupertino, California 95014
   United States of America

   Email: rrm@apple.com


   Stuart Cheshire
   Apple Inc.
   One Apple Park Way
   Cupertino, California 95014
   United States of America

   Email: cheshire@apple.com


Paasch, et al.          Expires January 12, 2023               [Page 18]

Internet-Draft   Responsiveness under Working Conditions       July 2022


   Omer Shapira
   Apple Inc.
   One Apple Park Way
   Cupertino, California 95014
   United States of America

   Email: oesh@apple.com


   Matt Mathis
   Google, Inc
   1600 Amphitheatre Parkway
   Mountain View, CA  94043
   United States of America

   Email: mattmathis@google.com


Paasch, et al.          Expires January 12, 2023               [Page 19]