<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info" docName="draft-ietf-bmwg-benchmarking-stateful-09" number="9693" consensus="true" ipr="trust200902" obsoletes="" updates="" submissionType="IETF" xml:lang="en" tocInclude="true" tocDepth="4" symRefs="true" sortRefs="true" version="3">

  <front>
    <title abbrev="Benchmarking Stateful NATxy Gateways">Benchmarking Methodology for Stateful NATxy Gateways</title>
    <seriesInfo name="RFC" value="9693"/>
    <author fullname="Gábor Lencse" initials="G." surname="Lencse">
      <organization>Széchenyi István University</organization>
      <address>
        <postal>
          <street>Egyetem tér 1.</street>
          <city>Győr</city>
          <code>H-9026</code>
          <country>Hungary</country>
        </postal>
        <email>lencse@sze.hu</email>
      </address>
    </author>
    <author fullname="Keiichi Shima" initials="K." surname="Shima">
      <organization>SoftBank Corp.</organization>
      <address>
        <postal>
          <street>1-7-1 Kaigan</street>
          <region>Minato-ku, Tokyo</region>
          <code>105-7529</code>
          <country>Japan</country>
        </postal>
        <email>shima@wide.ad.jp</email>
        <uri>https://softbank.co.jp/</uri>
      </address>
    </author>
    <date year="2025" month="January"/>

    <area>OPS</area>
    <workgroup>bmwg</workgroup>

    <keyword>Benchmarking</keyword>
    <keyword>Stateful NATxy</keyword>
    <keyword>Measurement Procedure</keyword>
    <keyword>Throughput</keyword>
    <keyword>Frame Loss Rate</keyword>
    <keyword>Latency</keyword>
    <keyword>PDV</keyword>

    <abstract>
      <t>RFC 2544 defines a benchmarking methodology for network
      interconnect devices. RFC 5180 addresses IPv6 specificities, and it also
      provides a technology update but excludes IPv6 transition technologies.
      RFC 8219 addresses IPv6 transition technologies, including stateful
      NAT64. However, none of them discuss how to apply pseudorandom port
      numbers from RFC 4814 to any stateful NATxy (such as NAT44, NAT64, and NAT66)
      technologies.  This document discusses why using pseudorandom port
      numbers with stateful NATxy gateways is a difficult problem. It
      recommends a solution that limits the port number ranges and uses two
      test phases (phase 1 and phase 2). This document shows how the classic
      performance measurement procedures (e.g., throughput, frame loss rate,
      latency, etc.)  can be carried out.  New performance metrics and
      measurement procedures are also defined for measuring the maximum
      connection establishment rate, connection tear-down rate, and
      connection tracking table capacity.
      </t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>

      <t><xref target="RFC2544" format="default"/> defines a comprehensive
      benchmarking methodology for network interconnect devices that is still
      in use. It is mainly independent of IP version, but it uses IPv4 in its
      examples.  <xref target="RFC5180" format="default"/> addresses IPv6
      specificities and also adds technology updates but declares IPv6
      transition technologies are out of its scope. <xref target="RFC8219"
      format="default"/> addresses the IPv6 transition technologies, including
      stateful NAT64. It reuses several benchmarking procedures from <xref
      target="RFC2544" format="default"/> (e.g., throughput, frame loss rate),
      and it redefines the latency measurement and adds further ones (e.g., the
      Packet Delay Variation (PDV) measurement).</t>

      <t>However, none of them discuss how to apply pseudorandom port
      numbers from <xref target="RFC4814" format="default"/> when benchmarking
      stateful NATxy gateways (such as NAT44 <xref
      target="RFC3022" format="default"/>, NAT64 <xref target="RFC6146"
      format="default"/>, and NAT66). (It should be noted that stateful NAT66
      is not an IETF specification but refers to an IPv6 version of the
      stateful NAT44 specification.) The authors are not aware of any other
      RFCs that address this question.
      </t>

      <t>First, this document discusses why using pseudorandom port numbers with
      stateful NATxy gateways is a difficult problem. Then, a solution is
      recommended.</t>

      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
        "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
        NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
        "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
        "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document
        are to be interpreted as described in BCP 14 <xref target="RFC2119"
        format="default"/> <xref target="RFC8174" format="default"/> when, and
        only when, they appear in all capitals, as shown here.</t>
      </section>
	  
    </section>
    <section anchor="problem" numbered="true" toc="default">
      <name>Pseudorandom Port Numbers and Stateful Translation</name>

      <t>In its appendix, <xref target="RFC2544" format="default"/>
      defines a frame format for test frames, including specific source and
      destination port numbers.  <xref target="RFC4814" format="default"/>
      recommends using pseudorandom and uniformly distributed values for both
      source and destination port numbers. However, stateful NATxy (such as NAT44,
      NAT64, and NAT66) solutions use the port numbers to identify
      connections. The usage of pseudorandom port numbers causes different
      problems depending on the direction:
      </t>
      <ul spacing="normal">
        <li>
          <t>For the client-to-server direction, pseudorandom source and 
	  destination port numbers could be used; however, this approach would 
	  be a denial-of-service attack against the stateful NATxy gateway, 
	  because it would exhaust its connection tracking table capacity. To that end,
	  let us see some calculations using the recommendations of <xref target="RFC4814" format="default"/>:
          </t>
          <ul spacing="normal">
            <li>
              <t>The recommended source port range is 1024-65535; thus, its size is 64512.</t>
            </li>
            <li>
              <t>The recommended destination port range is 1-49151; thus, its size is 49151.</t>
            </li>
            <li>
              <t>The number of source and destination port number combinations is 3,170,829,312.</t>
            </li>
          </ul>
          <t>
      It should be noted that the usage of different source and destination IP addresses 
	  further increases the number of connection tracking table entries.</t>
        </li>
        <li>
          <t>For the server-to-client direction, the stateful Device Under Test (DUT) would drop any 
	  packets that do not belong to an existing connection; therefore, the 
	  direct usage of pseudorandom port numbers from the ranges mentioned above
	  is not feasible.</t>
        </li>
      </ul>
    </section>

    <section anchor="setup_term" numbered="true" toc="default">
      <name>Test Setup and Terminology</name>

      <t><xref target="RFC2544" sectionFormat="of" section="12"/> requires
      testing using a single protocol source and destination address pair
      first and then also using multiple protocol addresses. The same
      approach is followed: first, a single source and destination IP address
      pair is used, and then it is explained how to use multiple IP
      addresses.</t>

      <section anchor="setup_term_single" numbered="true" toc="default">
        <name>When Testing with a Single IP Address Pair</name>

        <t>The methodology works with any IP version to benchmark stateful
        NATxy gateways, where x and y are in {4, 6}. To facilitate an easy
        understanding, two typical examples are used: stateful NAT44 and
        stateful NAT64.</t>

        <t>The test setup for the well-known stateful NAT44 (also called
        Network Address and Port Translation (NAPT)) solution is shown in
        <xref target="test_setup_sfnat44" format="default"/>.</t>

        <t>Note that the private IP addresses from <xref target="RFC1918"
        format="default"/> are used to facilitate an easy understanding of the
        example, and the usage of the IP addresses reserved for benchmarking
        is absolutely legitimate.</t>

        <t keepWithNext="true"/>
        <figure anchor="test_setup_sfnat44">
          <name>Test Setup for Benchmarking Stateful NAT44 Gateways</name>
          <artwork align="left" name="" type="" alt=""><![CDATA[
              +--------------------------------------+
     10.0.0.2 |Initiator                    Responder| 198.19.0.2
+-------------|                Tester                |<------------+
| private IPv4|                         [state table]| public IPv4 |
|             +--------------------------------------+             |
|                                                                  |
|             +--------------------------------------+             |
|    10.0.0.1 |                 DUT:                 | 198.19.0.1  |
+------------>|        Stateful NAT44 gateway        |-------------+
  private IPv4|     [connection tracking table]      | public IPv4
              +--------------------------------------+
]]></artwork>
        </figure>
        <t keepWithPrevious="true"/>
        <t>The test setup for the stateful NAT64 solution <xref target="RFC6146"
        format="default"/>, which is also widely used, is shown in
        <xref target="test_setup_sfnat64" format="default"/>.</t>

        <t keepWithNext="true"/>
        <figure anchor="test_setup_sfnat64">
          <name>Test Setup for Benchmarking Stateful NAT64 Gateways</name>
          <artwork align="left" name="" type="" alt=""><![CDATA[
              +--------------------------------------+
    2001:2::2 |Initiator                    Responder| 198.19.0.2
+-------------|                Tester                |<------------+
| IPv6 address|                         [state table]| IPv4 address|
|             +--------------------------------------+             |
|                                                                  |
|             +--------------------------------------+             |
|   2001:2::1 |                 DUT:                 | 198.19.0.1  |
+------------>|        Stateful NAT64 gateway        |-------------+
  IPv6 address|     [connection tracking table]      | IPv4 address
              +--------------------------------------+
]]></artwork>
        </figure>
        <t keepWithPrevious="true"/>
        <t>As for the transport layer protocol, <xref target="RFC2544"
        format="default"/> recommended testing with UDP, and it was also kept
        in <xref target="RFC8219" format="default"/>. UDP is also kept for a
        general recommendation; thus, the port numbers in the following text
        are to be understood as UDP port numbers. The rationale and
        limitations of this approach are discussed in <xref
        target="udp_or_tcp" format="default"/>.</t>

        <t>The most important elements of the proposed benchmarking system are
        defined as follows:</t>

        <dl newline="false" spacing="normal">
          <dt>Connection:</dt>
	  <dd>Although UDP itself is a connectionless protocol, stateful
	  NATxy gateways keep track of their translation mappings in the form
	  of a "connection" as well as in the case of UDP using the same kind of
	  entries as in TCP.</dd>

	  <dt>Connection tracking table:</dt>
	  <dd>The stateful NATxy gateway uses a connection tracking table to
	  be able to perform the stateful translation in the server-to-client
	  direction. Its size, policy, and content are unknown to the
	  Tester.</dd>

	  <dt>Four tuple:</dt>
	  <dd>The four numbers that identify a connection are source IP
	  address, source port number, destination IP address, and destination
	  port number.</dd>

	  <dt>State table:</dt>
	  <dd>The Responder of the Tester extracts the four tuple from each
	  received test frame and stores it in its state table. A recommendation
	  is given for the writing and reading order of the state table in <xref
	  target="st_wr_order" format="default"/>.</dd>

	  <dt>Initiator:</dt>
	  <dd>The port of the Tester that may initiate a connection through
	  the stateful DUT in the client-to-server direction. Theoretically,
	  it can use any source and destination port numbers from the ranges
	  recommended by <xref target="RFC4814" format="default"/>: if the
	  used four tuple does not belong to an existing connection, the DUT
	  will register a new connection into its connection tracking
	  table.</dd>

	  <dt>Responder:</dt>
	  <dd>The port of the Tester that may not initiate a connection
	  through the stateful DUT in the server-to-client direction. It may
	  only send frames that belong to an existing connection. To that end,
	  it uses four tuples that have been previously extracted from the
	  received test frames and stores in its state table.</dd>

	  <dt>Test phase 1:</dt>
	  <dd>The test frames are sent only by the Initiator to the Responder
	  through the DUT to fill both the connection tracking table of the
	  DUT and the state table of the Responder. This is a newly introduced
	  operation phase for stateful NATxy benchmarking. The necessity of
	  this test phase is explained in <xref target="prelim"
	  format="default"/>.</dd>

	  <dt>Test phase 2:</dt>
	  <dd>The measurement procedures defined by <xref target="RFC8219"
	  format="default"/> (e.g., throughput, latency, etc.) are performed in
	  this test phase after the completion of test phase 1. Test frames
	  are sent as required (e.g., a bidirectional test or a unidirectional test
	  in any of the two directions).</dd>
        </dl>

        <t>One further definition is used in the text of this document:</t>
	<dl newline="false" spacing="normal">
          <dt>Black box testing:</dt>
	  <dd>A testing approach when the Tester is not aware of the
	  details of the internal structure and operation of the DUT. It can
	  send input to the DUT and observe the output of the DUT.</dd>
        </dl>
      </section>

      <section anchor="setup_term_multiple" numbered="true" toc="default">
        <name>When Testing with Multiple IP Addresses</name>

        <t>This section considers the number of the necessary and available IP
        addresses.</t>

        <t>In <xref target="test_setup_sfnat44" format="default"/>, the single
        198.19.0.1 IPv4 address is used on the WAN side port of the stateful
        NAT44 gateway. However, in practice, it is not a single IP address,
        but rather an IP address range that is assigned to the WAN side port
        of the stateful NAT44 gateways. Its required size depends on the
        number of client nodes and on the type of the stateful NAT44
        algorithm. (The traditional algorithm always replaces the source port
        number when a new connection is established. Thus, it requires a
        larger range than the extended algorithm, which replaces the source
        port number only when it is necessary. Please refer to Tables 1 and
        2 of <xref target="LEN2015" format="default"/>.)</t>

        <t>When router testing is done, <xref target="RFC2544"
        sectionFormat="of" section="12"/> requires testing using a
        single source and destination IP address pair first and then using
        destination IP addresses from 256 different networks. The 16-23 bits
        of the 198.18.0.0/24 and 198.19.0.0/24 addresses can be used to
        express the 256 networks.  As this document does not deal with router
        testing, no multiple destination networks are needed; therefore, these
        bits are available for expressing multiple IP addresses that belong to
        the same "/16" network. Moreover, both the 198.18.0.0/16 and the
        198.19.0.0/16 networks can be used on the right side of the test setup,
        as private IP addresses from the 10.0.0.0/16 network are used on its
        left side.</t>

        <t keepWithNext="true"/>
        <figure anchor="test_setup_sfnat44_multi">
          <name>Test Setup for Benchmarking Stateful NAT44 Gateways Using Multiple IPv4 Addresses</name>
          <artwork align="left" name="" type="" alt=""><![CDATA[
10.0.0.2/16 - 10.0.255.254/16      198.19.0.0/15 - 198.19.255.254/15
           \  +--------------------------------------+  /
            \ |Initiator                    Responder| /
+-------------|                Tester                |<------------+
| private IPv4|                         [state table]| public IPv4 |
|             +--------------------------------------+             |
|                                                                  |
|             +--------------------------------------+             |
| 10.0.0.1/16 |                 DUT:                 | public IPv4 |
+------------>|        Stateful NAT44 gateway        |-------------+
  private IPv4|     [connection tracking table]      | \
              +--------------------------------------+  \
                                   198.18.0.1/15 - 198.18.255.255/15
]]></artwork>
        </figure>

        <t keepWithPrevious="true"/>
        <t>A possible solution for assigning multiple IPv4 addresses is shown
        in <xref target="test_setup_sfnat44_multi" format="default"/>. On the
        left side, the private IP address range is abundantly large. (The
        16-31 bits were used for generating nearly 64k potential different
        source addresses, but the 8-15 bits are also available if needed.) On
        the right side, the 198.18.0.0./15 network is used, and it was cut
        into two equal parts. (Asymmetric division is also possible, if
        needed.)</t>
        <t>It should be noted that these are the potential address ranges. The
        actual address ranges to be used are discussed in <xref
        target="restr_port_range" format="default"/>.</t>
        <t>In the case of stateful NAT64, a single "/64" IPv6 prefix contains
        a high number of bits to express different IPv6 addresses. <xref
        target="test_setup_sfnat64_multi" format="default"/> shows an example
        where bits 96-111 are used for that purpose.
        </t>
        <t keepWithNext="true"/>
        <figure anchor="test_setup_sfnat64_multi">
          <name>Test Setup for Benchmarking Stateful NAT64 Gateways Using
          Multiple IPv6 and IPv4 Addresses</name>
          <artwork align="left" name="" type="" alt=""><![CDATA[
2001:2::[0000-ffff]:0002/64       198.19.0.0/15 - 198.19.255.254/15      
           \  +--------------------------------------+  /
  IPv6      \ |Initiator                    Responder| /
+-------------|                Tester                |<------------+
| addresses   |                         [state table]| public IPv4 |
|             +--------------------------------------+             |
|                                                                  |
|             +--------------------------------------+             |
| 2001:2::1/64|                 DUT:                 | public IPv4 |
+------------>|        Stateful NAT64 gateway        |-------------+
 IPv6 address |     [connection tracking table]      | \
              +--------------------------------------+  \
                                   198.18.0.1/15 - 198.18.255.255/15       
]]></artwork>
        </figure>
        <t keepWithPrevious="true"/>
      </section>
    </section>

    <section anchor="method" numbered="true" toc="default">
      <name>Recommended Benchmarking Method</name>
      <section anchor="restr_port_range" numbered="true" toc="default">
        <name>Restricted Number of Network Flows</name>
        <t>When a single IP address pair is used for testing, then the number
        of network flows is determined by the number of source and 
        destination port number combinations. </t>
        <t>The Initiator <bcp14>SHOULD</bcp14> use restricted ranges for
        source and destination port numbers to avoid the exhaustion of the
        connection tracking table capacity of the DUT as described in <xref
        target="problem" format="default"/>.  If it is possible, the size of
        the source port number range <bcp14>SHOULD</bcp14> be larger (e.g., in
        the order of a few tens of thousands), whereas the size of the
        destination port number range <bcp14>SHOULD</bcp14> be smaller (e.g., it may
        vary from a few to several hundreds or thousands as needed).  The
        rationale is that source and destination port numbers that can be
        observed in Internet traffic are not symmetrical. Whereas source
        port numbers may be random, there are a few very popular destination
        port numbers (e.g., 443 or 80; see <xref target="IIR2020"
        format="default"/>), and others hardly occur. Additionally, it was found that
        their role is also asymmetric in the Linux kernel routing hash
        function <xref target="LEN2020" format="default"/>.</t>
        <t>However, in some special cases, the size of the source port range
        is limited. For example, when benchmarking the Customer Edge (CE) and
        Border Relay (BR) of a Mapping of Address and Port using Translation
        (MAP-T) system <xref target="RFC7599" format="default"/> together (as
        a compound system performing stateful NAT44), the source port
        range is limited to the number of source port numbers assigned to each
        subscriber. (It could be as low as 2048 ports.)</t>
        
	<t>When multiple IP addresses are used, then the port number ranges
        should be even more restricted, as the number of potential network
        flows is the product of the size of:</t>
	<ul>
	  <li>the source IP address range,</li>
	  <li>the source port number range,</li>
	  <li>the destination IP address range, and</li>
	  <li>the destination port number range.</li>
	</ul>
	<t>In addition, the recommended method requires the enumeration of all
	their possible combinations in test phase 1 as described in <xref
	target="ctrl_conntrack" format="default"/>.</t>
        <t>The number of network flows can be used as a parameter. The
        performance of the stateful NATxy gateway <bcp14>MAY</bcp14> be
        examined as a function of this parameter as described in <xref
        target="sc_net_flows" format="default"/>.</t>
      </section>

      <section anchor="prelim" numbered="true" toc="default">
        <name>Test Phase 1</name>
        <t>Test phase 1 serves two purposes:</t>

        <ol spacing="normal" type="1">
	  <li>
	    <t>The connection tracking table of the DUT is filled. This is
	    important because its maximum connection establishment rate may
	    be lower than its maximum frame forwarding rate (that is,
	    its throughput).</t>
          </li>
          <li>
            <t>The state table of the Responder is filled with valid four
            tuples. It is a precondition for the Responder to be able to
            transmit frames that belong to connections that exist in the
            connection tracking table of the DUT.</t>
          </li>
        </ol>

        <t>Whereas the above two things are always necessary before test phase
        2, test phase 1 can be used without test phase 2. This is done when
        the maximum connection establishment rate is measured (as described in
        <xref target="meas_max_conn_est_rate" format="default"/>).</t>

        <t>Test phase 1 <bcp14>MUST</bcp14> be performed before all tests are
        performed in test phase 2. The following things happen in test phase
        1:</t>

        <ol spacing="normal" type="1">
	  <li>
            <t>The Initiator sends test frames to the Responder through the
            DUT at a specific frame rate.</t>
          </li>
          <li>
            <t>The DUT performs the stateful translation of the test frames,
            and it also stores the new connections in its connection tracking
            table.</t>
          </li>
          <li>
            <t>The Responder receives the translated test frames and updates
            its state table with the received four tuples. The Responder
            transmits no test frames during test phase 1.</t>
          </li>
        </ol>
	<t>When test phase 1 is performed in preparation for test phase 2, the
        applied frame rate <bcp14>SHOULD</bcp14> be safely lower than the
        maximum connection establishment rate. (It implies that maximum
        connection establishment rate measurement <bcp14>MUST</bcp14> be
        performed first.)  Please refer to <xref target="ctrl_conntrack"
        format="default"/> for further conditions regarding timeout and the
        enumeration of all possible four tuples.</t>
      </section>

      <section anchor="consider_stateful" numbered="true" toc="default">
        <name>Consideration of the Cases of Stateful Operation</name>
        <t>The authors consider the most important events that may happen
        during the operation of a stateful NATxy gateway and the Actions of
        the gateway as follows.</t>

        <ol>
	  <li>
	    <t>EVENT: A packet not belonging to an existing connection arrives
	    in the client-to-server direction.</t>
	    <t>ACTION: A new connection is registered into the connection
	    tracking table, and the packet is translated and forwarded.</t>
	  </li>
	  <li>
	     <t>EVENT: A packet not belonging to an existing connection
	     arrives in the server-to-client direction.</t>
	     <t>ACTION: The packet is discarded.</t>
	  </li>
          <li>
              <t>EVENT: A packet belonging to an existing connection arrives
              (in any direction).</t>
	      <t>ACTION: The packet is translated and forwarded, and the
	      timeout counter of the corresponding connection tracking table
	      entry is reset.</t>
	  </li>
          <li>
              <t>EVENT: A connection tracking table entry times out.</t>
	      <t>ACTION: The entry is deleted from the connection tracking
	      table.</t>
          </li>
	</ol>

      	<t>Due to "black box" testing, the Tester is not able to directly
      	examine (or delete) the entries of the connection tracking
      	table. However, the entries can and <bcp14>MUST</bcp14> be
      	controlled by setting an appropriate timeout value and carefully
      	selecting the port numbers of the packets (as described in <xref
      	target="ctrl_conntrack" format="default"/>) to be able to produce
      	meaningful and repeatable measurement results.</t>
        <t>This document aims to support the measurement of the following
        performance characteristics of a stateful NATxy gateway:</t>
        <ul spacing="normal">
	  <li>
            <t>maximum connection establishment rate</t>
          </li>
          <li>
            <t>all "classic" performance metrics like throughput, frame loss rate, latency, etc.</t>
          </li>
          <li>
            <t>connection tear-down rate</t>
          </li>
          <li>
            <t>connection tracking table capacity</t>
          </li>
        </ul>
      </section>

      <section anchor="ctrl_conntrack" numbered="true" toc="default">
        <name>Control of the Connection Tracking Table Entries</name>
        <t>It is necessary to control the connection tracking table entries of
	the DUT to achieve clear conditions for the measurements. One can
	simply achieve the following two extreme situations:</t>

        <ol spacing="normal">
	  <li>
            All frames create a new entry in the connection tracking table
            of the DUT, and no old entries are deleted during the test. This is
            required for measuring the maximum connection establishment
            rate.
          </li>
          <li>
            No new entries are created in the connection tracking table of
            the DUT, and no old ones are deleted during the test. This is ideal
            for the measurements to be executed in phase 2, like throughput,
            latency, etc.
          </li>
        </ol>

        <t>From this point, the following two assumptions are used:</t>

        <ol spacing="normal" type="1">
	  <li anchor="assumption1">
            The connection tracking table of the stateful NATxy is large
            enough to store all connections defined by the different four
            tuples.
          </li>
          <li anchor="assumption2">
            Each experiment is started with an empty connection tracking
            table. (This can be ensured by deleting its content before the
            experiment.)
          </li>
        </ol>

        <t>The first extreme situation can be achieved by:</t>
        <ul spacing="normal">
          <li>
            <t>using different four tuples for every single test frame in test phase 1 and</t>
          </li>
          <li>
            <t>setting the UDP timeout of the NATxy gateway to a value higher
            than the length of test phase 1.</t>
          </li>
        </ul>
        <t>The second extreme situation can be achieved by:</t>

        <ul spacing="normal">
          <li>
            <t>enumerating all possible four tuples in test phase 1 and</t>
          </li>
          <li>
            <t>setting the UDP timeout of the NATxy gateway to a value higher
            than the length of test phase 1 plus the gap between the two
            phases plus the length of test phase 2.</t>
          </li>
        </ul>

        <t>As described in <xref target="RFC4814" format="default"/>, pseudorandom
        port numbers are <bcp14>REQUIRED</bcp14>, which the authors believe is a good approximation of the
        distribution of the source port numbers a NATxy gateway on the
        Internet may be faced with.
        </t>

        <t>Although the enumeration of all possible four tuples is not a
        requirement for the first extreme situation and the usage of
        different four tuples in test phase 1 is not a requirement for the
        second extreme situation, pseudorandom
        enumeration of all possible four tuples in test phase 1 is a good
        solution in both cases. Pseudorandom enumeration of all possible four tuples may be generated in a computationally efficient way by using Durstenfeld's random shuffle algorithm <xref
        target="DUST1964" format="default"/> to prepare a
   random permutation of the previously enumerated all possible four
   tuples.</t>

        <t>The enumeration of the four tuples in increasing or decreasing
        order (or in any other specific order) <bcp14>MAY</bcp14> be used as
        an additional measurement.</t>

      </section>

      <section anchor="meas_max_conn_est_rate" numbered="true" toc="default">
        <name>Measurement of the Maximum Connection Establishment Rate</name>
        <t>The maximum connection establishment rate is an important
        characteristic of the stateful NATxy gateway, and its determination is
        necessary for the safe execution of test phase 1 (without frame loss)
        before test phase 2.
        </t>
        <t>The measurement procedure of the maximum connection establishment
        rate is very similar to the throughput measurement procedure defined
        in <xref target="RFC2544" format="default"/>.
        </t>

        <t>The procedure is as follows:</t>
	<ul>
          <li>The Initiator sends a specific number of test frames using all
          different four tuples at a specific rate through the DUT.</li>
	  <li>The Responder counts the frames that are successfully translated
	  by the DUT.</li>
	  <li>If the count of offered frames is equal to the count of received
	  frames, the rate of the offered stream is raised and the test is
	  rerun.</li>
	  <li>If fewer frames are received than were transmitted, the rate of
	  the offered stream is reduced and the test is rerun.</li>
	</ul>

        <t>The maximum connection establishment rate is the fastest rate at
        which the count of test frames successfully translated by the DUT is
        equal to the number of test frames sent to it by the Initiator.
        </t>

        <t>Note: In practice, the usage of binary search is
        <bcp14>RECOMMENDED</bcp14>.</t>
      </section>
      <section anchor="validation_of_conn" numbered="true" toc="default">
        <name>Validation of Connection Establishment</name>
        <t>Due to "black box" testing, the entries of the connection tracking
        table of the DUT may not be directly examined. However, the presence of the
        connections can be checked easily by sending frames from the Responder
        to the Initiator in test phase 2 using all four tuples stored in the
        state table of the Tester (at a low enough frame rate). The arrival of
        all test frames indicates that the connections are indeed present.
        </t>

        <t>The procedure is as follows:</t>
	  <t>When all the desired N number of test frames are sent by the
	  Initiator to the Receiver at frame rate R in test phase 1 for the
	  maximum connection establishment rate measurement and the Receiver
	  has successfully received all the N frames, the establishment
	  of the connections is checked in test phase 2 as follows:</t>
          <ul>
            <li>
              The Responder sends test frames to the Initiator at frame rate
              r=R*alpha for the duration of N/r, using a different four tuple
              from its state table for each test frame.
            </li>

            <li>
              The Initiator counts the received frames, and if all N frames
              have arrived, then the R frame rate of the maximum connection
              establishment rate measurement (performed in test phase 1) is
              raised for the next iteration; otherwise, it is lowered (as well as in
              the case that test frames were missing in the preliminary test
              phase, as well).
            </li>
         </ul>
	
	  <t>Notes:</t>
          <ul spacing="normal">
            <li>
              The alpha is a kind of "safety factor"; it aims to make sure
              that the frame rate used for the validation is not too high, and the
              test may fail only in the case of if at least one connection is not
              present in the connection tracking table of the DUT. (Therefore, alpha
              should be typically less than 1, e.g., 0.8 or 0.5.)
            </li>
            <li>
              The duration of N/r and the frame rate of r means that N frames
              are sent for validation.
            </li>
            <li>
              The order of four tuple selection is arbitrary, provided that
              all four tuples <bcp14>MUST</bcp14> be used.
            </li>
            <li>
              Please refer to <xref target="meas_contr_capacity"
              format="default"/> for a short analysis of the operation of the
              measurement and what problems may occur.
            </li>
          </ul>
	

      </section>

      <section anchor="real_test" numbered="true" toc="default">
        <name>Test Phase 2</name>

        <t>As for the traffic direction, there are three possible cases
        during test phase 2:</t>

        <ol spacing="normal" type="1">
          <li>
            <t>Bidirectional traffic: The Initiator sends test frames to the
            Responder, and the Responder sends test frames to the
            Initiator.</t>
          </li>
          <li>
            <t>Unidirectional traffic from the Initiator to the Responder: The
            Initiator sends test frames to the Responder, but the Responder
            does not send test frames to the Initiator.</t>
          </li>
          <li>
            <t>Unidirectional traffic from the Responder to the Initiator: The
            Responder sends test frames to the Initiator, but the Initiator
            does not send test frames to the Responder.</t>
          </li>
        </ol>

        <t>If the Initiator sends test frames, then it uses pseudorandom
        source port numbers and destination port numbers from the restricted
        port number ranges. (If it uses multiple source and/or destination IP
        addresses, then their ranges are also limited.)  The Responder
        receives the test frames, updates its state table, and processes the
        test frames as required by the given measurement procedure (e.g., only
        counts them for the throughput test, handles timestamps for latency or
        PDV tests, etc.).</t>

        <t>If the Responder sends test frames, then it uses the four tuples
        from its state table. The reading order of the state table may follow
        different policies (discussed in <xref target="st_wr_order"
        format="default"/>). The Initiator receives the test frames and
        processes them as required by the given measurement procedure.</t>

        <t>As for the actual measurement procedures, the usage of the updated
        ones from <xref target="RFC8219" sectionFormat="of" section="7"/> is
        <bcp14>RECOMMENDED</bcp14>.</t>
      </section>

      <section anchor="meas_conn_tear_down_rate" numbered="true" toc="default">
        <name>Measurement of the Connection Tear-Down Rate</name>
        <t>Connection tear-down can cause significant load for the NATxy
        gateway.  The connection tear-down performance can be measured as
        follows:</t>
        <ol spacing="normal" type="1">
	  <li>Load a certain number of connections (N) into the connection
	  tracking table of the DUT (in the same way as done to measure the
	  maximum connection establishment rate).</li>
          <li>Record TimestampA.</li>
          <li>Delete the content of the connection tracking table of the DUT.</li>
          <li>Record TimestampB.</li>
        </ol>

        <t>The connection tear-down rate can be computed as:</t>

        <t indent="5">connection tear-down rate = N / ( TimestampB - TimestampA)</t>

        <t>The connection tear-down rate <bcp14>SHOULD</bcp14> be measured for
        various values of N.</t>
        <t>It is assumed that the content of the connection tracking table may
        be deleted by an out-of-band control mechanism specific to the given
        NATxy gateway implementation (e.g., by removing the appropriate kernel
        module under Linux).</t>
        <t>It is noted that the performance of removing the entire content of
        the connection tracking table at one time may be different from
        removing all the entries one by one.</t>
      </section>

      <section anchor="meas_contr_capacity" numbered="true" toc="default">
        <name>Measurement of the Connection Tracking Table Capacity</name>
        <t>The connection tracking table capacity is an important metric of
        stateful NATxy gateways. Its measurement is not easy, because an
        elementary step of a validated maximum connection establishment rate
        measurement (defined in <xref target="validation_of_conn"
        format="default"/>) may have only a few distinct observable outcomes,
        but some of them may have different root causes:</t>
        <ul spacing="normal">
	  <li>
            <t>During test phase 1, the number of test frames received by the
            Responder is less than the number of test frames sent by the
            Initiator.  It may have different root causes, including:</t>
            <ul spacing="normal">
	      <li>
                <t>The R frame sending rate was higher than the maximum
                connection establishment rate. (Note that now the maximum
                connection establishment rate is considered unknown because
                one cannot measure the maximum connection establishment
                without <xref target="assumption1" format="none">assumption 1</xref> in <xref target="ctrl_conntrack"
                format="default"/>.)  This root cause may be eliminated by
                lowering the R rate and re-executing the test. (This step may
                be performed multiple times while R&gt;0.)</t>
              </li>
              <li>
                <t>The capacity of the connection tracking table of the DUT
                has been exhausted (and either the DUT does not want to
                delete connections or the deletion of the connections makes it
                slower; this case is not investigated further in test phase
                1).</t>
              </li>
            </ul>
          </li>
          <li>
            <t>During test phase 1, the number of test frames received by the
            Responder equals the number of test frames sent by the Initiator.
            In this case, the connections are validated in test phase 2.  The
            validation may have two kinds of observable results:</t>
            <ol spacing="normal" type="1">
	      <li>
                <t>The number of validation frames received by the Initiator
                equals the number of validation frames sent by the Responder.
                (It proves that the capacity of the connection tracking table
                of the DUT is enough and both R and r were chosen
                properly.)</t>
              </li>
              <li>
                <t>The number of validation frames received by the Initiator
                is less than the number of validation frames sent by the
                Responder.  This phenomenon may have various root causes:</t>
                <ul spacing="normal">
		  <li>
                    <t>The capacity of the connection tracking table of the
                    DUT has been exhausted. (It does not matter whether some
                    existing connections are discarded and new ones are
                    stored or if the new connections are discarded.  Some
                    connections are lost anyway, and it makes validation
                    fail.)</t>
                  </li>
                  <li>
                    <t>The R frame sending rate used by the Initiator was too
                    high in test phase 1; thus, some connections were not
                    established even though all test frames arrived at the
                    Responder. This root cause may be eliminated by lowering
                    the R rate and re-executing the test.  (This step may be
                    performed multiple times while R&gt;0.)</t>
                  </li>
                  <li>
                    <t>The r frame sending rate used by the Responder was too
                    high in test phase 2; thus, some test frames did not
                    arrive at the Initiator even though all connections were
                    present in the connection tracking table of the DUT.  This
                    root cause may be eliminated by lowering the r rate and
                    re-executing the test.  (This step may be performed
                    multiple times while r&gt;0.)</t>
                  </li>
                </ul>
                <t>This is the problem: As the above three root causes are
                indistinguishable, it is not easy to decide whether R or r
                should be decreased.</t>
              </li>
            </ol>
          </li>
        </ul>
        <t>Experience shows that the DUT may collapse if its memory is
        exhausted.  Such a situation may make the connection tracking table
        capacity measurements rather inconvenient. This possibility is
        included in the recommended measurement procedure, but the detection
        and elimination of such a situation is not addressed (e.g., how the
        algorithm can reset the DUT).</t>
        <t>For the connection tracking table size measurement, first, one needs
        a safe number: C0. It is a precondition that C0 number of connections
        can surely be stored in the connection tracking table of the
        DUT. Using C0, one can determine the maximum connection establishment
        rate using C0 number of connections.  It is done with a binary search
        using validation. The result is R0. The values C0 and R0 will serve as
        "safe" starting values for the following two searches.</t>
	<t>First, an exponential search is performed to find the order of
	magnitude of the connection tracking table capacity. The search stops
	if the DUT collapses OR the maximum connection establishment rate
	severely drops (e.g., to its one tenth) due to doubling the number of
	connections.</t>
        <t>Then, the result of the exponential search gives the order of
        magnitude of the size of the connection tracking table. Before
        disclosing the possible algorithms to determine the exact size of the
        connection tracking table, three possible replacement policies for the
        NATxy gateway are considered:</t>
        <ol spacing="normal" type="1">
	  <li>
            <t>The gateway does not delete any live connections until their timeout expires.</t>
          </li>
          <li>
            <t>The gateway replaces the live connections according to the Least Recently Used (LRU) policy.</t>
          </li>
          <li>
            <t>The gateway does a garbage collection when its connection
            tracking table is full and a frame with a new four tuple
            arrives. During the garbage collection, it deletes the K LRU connections, where K is greater than 1.</t>
          </li>
        </ol>
        <t>Now, it is examined what happens and how many validation frames
        arrive in the three cases.  Let the size of the connection tracking
        table be S and the number of preliminary frames be N, where S is less
        than N.</t>
        <ol spacing="normal" type="1">
	  <li>
            <t>The connections defined by the first S test frames are
            registered into the connection tracking table of the DUT, and
            the last N-S connections are lost.  (It is another question if the
            last N-S test frames are translated and forwarded in test phase 1
            or simply dropped.) During validation, the validation frames with
            four tuples corresponding to the first S test frames will arrive
            at the Initiator and the other N-S validation frames will be
            lost.</t>
          </li>
          <li>
            <t>All connections are registered into the connection tracking
            table of the DUT, but the first N-S connections are replaced (and
            thus lost). During validation, the validation frames with four
            tuples corresponding to the last S test frames will arrive to the
            Initiator, and the other N-S validation frames will be lost.</t>
          </li>
          <li>
            <t>Depending on the values of K, S, and N, maybe less than S
            connections will survive.  In the worst case, only S-K+1
            validation frames arrive, even though the size of the connection
            tracking table is S.</t>
          </li>
        </ol>

        <t>If one knows that the stateful NATxy gateway uses the first or
        second replacement policy and one also knows that both R and r rates
        are low enough, then the final step of determining the size of the
        connection tracking table is simple. If the Responder sent N
        validation frames and the Initiator received N' of them, then the size
        of the connection tracking table is N'.</t>

        <t>In the general case, a binary search is performed to find the exact
        value of the connection tracking table capacity within E error. The
        search chooses the lower half of the interval if the DUT collapses OR
        the maximum connection establishment rate severely drops (e.g., to its
        half); otherwise, it chooses the higher half.  The search stops if the
        size of the interval is less than the E error.</t>
	
        <t>The algorithms for the general case are defined using C-like
        pseudocode in <xref target="meas_contr_capacity_algo"
        format="default"/>. In practice, this algorithm may be made more
        efficient in the way that the binary search for the maximum connection
        establishment rate stops if an elementary test fails at a rate under
        RS*beta or RS*gamma during the external search or during the final
        binary search for the capacity of the connection tracking table,
        respectively. (This saves a high amount of execution time by
        eliminating the long-lasting tests at low rates.)
        </t>
        <figure anchor="meas_contr_capacity_algo">
          <name>Measurement of the Connection Tracking Table Capacity</name>
          <sourcecode type="pseudocode"><![CDATA[
// The binarySearchForMaximumConnectionCstablishmentRate(c,r) 
// function performs a binary search for the maximum connection 
// establishment rate in the [0, r] interval using c number of 
// connections.

// This is an exponential search for finding the order of magnitude 
// of the connection tracking table capacity
// Variables:
//   C0 and R0 are beginning safe values for the connection 
//     tracking table size and connection establishment rate, 
//     respectively
//   CS and RS are their currently used safe values
//   CT and RT are their values for the current examination
//   beta is a factor expressing an unacceptable drop in R (e.g., 
//     beta=0.1)
//   maxrate is the maximum frame rate for the media
R0=binarySearchForMaximumConnectionCstablishmentRate(C0,maxrate);
for ( CS=C0, RS=R0; 1; CS=CT, RS=RT )
{
  CT=2*CS;
  RT=binarySearchForMaximumConnectionCstablishmentRate(CT,RS);
  if ( DUT_collapsed || RT < RS*beta )
    break;
}
// At this point, the size of the connection tracking table is 
// between CS and CT.

// This is the final binary search for finding the connection  
// tracking table capacity within E error
// Variables:
//   CS and RS are the safe values for connection tracking table size 
//     and connection establishment rate, respectively
//   C and R are the values for the current examination
//   gamma is a factor expressing an unacceptable drop in R 
//     (e.g., gamma=0.5)
for ( D=CT-CS;  D>E; D=CT-CS )
{
  C=(CS+CT)/2;
  R=binarySearchForMaximumConnectionCstablishmentRate(C,RS);
  if ( DUT_collapsed || R < RS*gamma )
    CT=C; // take the lower half of the interval
  else
    CS=C,RS=R; // take the upper half of the interval
}
// At this point, the size of the connection tracking table is 
// CS within E error.
]]></sourcecode>

        </figure>
        <t keepWithPrevious="true"/>
      </section>

      <section anchor="st_wr_order" numbered="true" toc="default">
        <name>Writing and Reading Order of the State Table</name>
        <t>As for the writing policy of the state table of the Responder,
        round robin is <bcp14>RECOMMENDED</bcp14>, because it ensures that its
        entries are automatically kept fresh and consistent with that of the
        connection tracking table of the DUT.
        </t>
        <t>The Responder can read its state table in various orders, for
        example:
        </t>
        <ul spacing="normal">
          <li>
            <t>pseudorandom</t>
          </li>
          <li>
            <t>round robin</t>
          </li>
        </ul>
        <t>Pseudorandom is <bcp14>RECOMMENDED</bcp14> to follow the approach
        of <xref target="RFC4814" format="default"/>.  Round robin may be used
        as a computationally cheaper alternative.
        </t>
      </section>
    </section>
    <section anchor="meas_scalability" numbered="true" toc="default">
      <name>Scalability Measurements</name>

      <t>As for scalability measurements, no new types of performance metrics
      are defined, but it is <bcp14>RECOMMENDED</bcp14> to perform measurement
      series through which the value of one or more parameter(s) are
      changed to discover how the various values of the given parameter(s)
      influence the performance of the DUT.
      </t>
      <section anchor="sc_net_flows" numbered="true" toc="default">
        <name>Scalability Against the Number of Network Flows</name>
        <t>The scalability measurements aim to quantify how the performance of
        the stateful NATxy gateways degrades with the increase of the number
        of network flows.</t>
        <t>As for the actual values for the number of network flows to be used
        during the measurement series, it is <bcp14>RECOMMENDED</bcp14> to use
        some representative values from the range of the potential number of
        network flows the DUT may be faced with during its intended usage.</t>
        <t>It is important how the given number of network flows are
        generated. The sizes of the ranges of the source and destination IP
        addresses and port numbers are essential parameters to be reported
        together with the results. Please also see <xref
        target="reporting_format" format="default"/> about the reporting
        format.</t>
        <t>If a single IP address pair is used, then it is <bcp14>RECOMMENDED</bcp14> to use:
        </t>
        <ul spacing="normal">
          <li>
            <t>a fixed, larger source port number range (e.g., a few times
            10,000) and</t>
          </li>
          <li>
            <t>a variable-size destination port number range (e.g., 10, 100,
            1,000, etc.), where its expedient granularity depends on the
            purpose.</t>
          </li>
        </ul>
      </section>
      <section anchor="sc_cpu_cores" numbered="true" toc="default">
        <name>Scalability Against the Number of CPU Cores</name>
        <t>Stateful NATxy gateways are often implemented in software that is
        not bound to a specific hardware but can be executed by commodity
        servers. To facilitate the comparison of their performance, it can be
        useful to determine:
        </t>
        <ul spacing="normal">
          <li>
            <t>the performance of the various implementations using a single
            core of a well-known CPU and</t>
          </li>
          <li>
            <t>the scale-up of the performance of the various implementations
            with the number of CPU cores.</t>
          </li>
        </ul>
        <t>If the number of the available CPU cores is a power of two, then it
        is <bcp14>RECOMMENDED</bcp14> to perform the tests with 1, 2, 4, 8,
        16, etc. number of active CPU cores of the DUT.</t>
      </section>
    </section>

    <section anchor="reporting_format" numbered="true" toc="default">
      <name>Reporting Format</name>
      <t>Measurements <bcp14>MUST</bcp14> be executed multiple times. The
      necessary number of repetitions to achieve statistically reliable
      results may depend on the consistent or scattered nature of the results.
      The report of the results <bcp14>MUST</bcp14> contain the number of
      repetitions of the measurements.  The median is <bcp14>RECOMMENDED</bcp14>
      as the summarizing function of the results complemented with the first
      percentile and the 99th percentile as indices of the dispersion of the
      results.  The average and standard deviation <bcp14>MAY</bcp14> also be
      reported.
      </t>
      <t>All parameters and settings that may influence the performance of the
      DUT <bcp14>MUST</bcp14> be reported. Some of them may be specific to the
      given NATxy gateway implementation, like the "hashsize" (hash table
      size) and "nf_conntrack_max" (number of connection tracking table
      entries) values for iptables or the limit of the number of states for
      OpenBSD PF (set by the "set limit states number" command in the pf.conf
      file).
      </t>
      <t keepWithNext="true"/>

      <table anchor="iptables-conn-scale" align="left">
	<name>Example Table of the Maximum Connection Establishment Rate of
	Iptables Against the Number of Sessions</name>
	<tbody>
	  <tr>
	    <td align="left">number of sessions (req.)</td>
	    <td align="right">0.4M</td>
	    <td align="right">4M</td>
	    <td align="right">40M</td>
	    <td align="right">400M</td>
	  </tr>
	  <tr>
	    <td align="left">source port numbers (req.)</td>
            <td align="right">40,000</td>
	    <td align="right">40,000</td>
	    <td align="right">40,000</td>
	    <td align="right">40,000</td>
	  </tr>
	  <tr>
	    <td align="left">destination port numbers (req.)</td>
            <td align="right">10</td>
	    <td align="right">100</td>
	    <td align="right">1,000</td>
	    <td align="right">10,000</td>
	  </tr>
	  <tr>
	    <td align="left">"hashsize" (i.s.)</td>
            <td align="right">2<sup>17</sup></td>
	    <td align="right">2<sup>20</sup></td>
	    <td align="right">2<sup>23</sup></td>
	    <td align="right">2<sup>27</sup></td>
	  </tr>
	  <tr>
	    <td align="left">"nf_conntrack_max" (i.s.)</td>
            <td align="right">2<sup>20</sup></td>
	    <td align="right">2<sup>23</sup></td>
	    <td align="right">2<sup>26</sup></td>
	    <td align="right">2<sup>30</sup></td>
	  </tr>
	  <tr>
	    <td align="left">num. sessions / "hashsize" (i.s.)</td>
	    <td align="right">3.05</td>
	    <td align="right">3.81</td>
	    <td align="right">4.77</td>
	    <td align="right">2.98</td>
	  </tr>
	  <tr>
	    <td align="left">number of experiments (req.)</td>
            <td align="right">10</td>
	    <td align="right">10</td>
	    <td align="right">10</td>
	    <td align="right">10</td>
	  </tr>
	  <tr>
	    <td align="left">error of binary search (req.)</td>
	    <td align="right">1,000</td>
	    <td align="right">1,000</td>
	    <td align="right">1,000</td>
	    <td align="right">1,000</td>
	  </tr>
	  <tr>
	    <td align="left">connections/s median (req.)</td>
	    <td></td>
	    <td></td>
	    <td></td>
	    <td></td>	    
	  </tr>
	  <tr>
	    <td align="left">connections/s 1st perc. (req.)</td>
	    <td></td>
	    <td></td>
	    <td></td>
	    <td></td>
	  </tr>
	  <tr>
	    <td align="left">connections/s 99th perc. (req.)</td>
	    <td></td>
	    <td></td>
	    <td></td>
	    <td></td>	    
	  </tr>
	</tbody>
      </table>
      
      <t keepWithPrevious="true"/>

      <t><xref target="iptables-conn-scale" format="default"/> shows an
      example of table headings for reporting the measurement results regarding the
      scalability of the iptables stateful NAT44 implementation against the
      number of sessions. The table indicates the required fields
      (req.) and the implementation-specific ones (i.s.).  A computed value
      was also added in row 6; it is the number of sessions per hashsize
      ratio, which helps the reader to interpret the achieved maximum
      connection establishment rate.  (A lower value results in shorter linked
      lists hanging on the entries of the hash table, thus facilitating higher
      performance. The ratio is varying, because the number of sessions is
      always a power of 10, whereas the hash table size is a power of 2.)  To
      reflect the accuracy of the results, the table contains the value of the
      "error" of the binary search, which expresses the stopping criterion for
      the binary search. The binary search stops when the difference between
      the "higher limit" and "lower limit" of the binary search is less than
      or equal to the "error".

      </t>
      <t>The table <bcp14>MUST</bcp14> be complemented with reporting the
      relevant parameters of the DUT. If the DUT is a general-purpose computer
      and some software NATxy gateway implementation is tested, then the
      hardware description <bcp14>SHOULD</bcp14> include the following:</t>
      <ul>
	<li>computer type</li>
	<li>CPU type</li>
	<li>number of active CPU cores</li>
	<li>memory type, size, and speed</li>
	<li>network interface card type (also reflecting the speed)</li>
	<li>the fact that direct cable connections were use or the type of switch used for
      interconnecting the Tester and the DUT</li>
	</ul>
      <t>The operating system type and
      version, kernel version, and version of the NATxy gateway
      implementation (including the last commit date and number if applicable)
      <bcp14>SHOULD</bcp14> also be given.
      </t>
    </section>

    <section anchor="impl_exp" numbered="true" toc="default">
      <name>Implementation and Experience</name>

      <t>The stateful extension of siitperf <xref target="SIITPERF"
      format="default"/> is an implementation of this concept.  Its first
      version that only supports multiple port numbers is documented in this
      (open access) paper: <xref target="LEN2022" format="default"/>.  Its
      extended version that also supports multiple IP addresses is documented in
      this (open access) paper: <xref target="LEN2024b" format="default"/>.
      </t>

      <t>The proposed benchmarking methodology has been validated by
      performing benchmarking measurements with three radically different
      stateful NAT64 implementations (Jool, tayga+iptables, and OpenBSD PF) in this
      (open access) paper: <xref target="LEN2023" format="default"/>.</t>

      <t>Further experience with this methodology of using siitperf for measuring
      the scalability of the iptables stateful NAT44 and Jool stateful NAT64
      implementations are described in <xref
      target="I-D.lencse-v6ops-transition-scalability" format="default"/>.</t>

      <t>This methodology was successfully applied for the benchmarking of
      various IPv4-as-a-Service (IPv4aas) technologies without the usage of
      technology-specific Testers by reducing the aggregate of their Customer
      Edge (CE) and Provider Edge (PE) devices to a stateful NAT44 gateway
      documented in this (open access) paper: <xref target="LEN2024a"
      format="default"/>.</t>
    </section>

    <section anchor="udp_or_tcp" numbered="true" toc="default">
      <name>Limitations of Using UDP as a Transport Layer Protocol</name>

      <t>The test frame format defined in <xref target="RFC2544"/> exclusively uses UDP (and
      not TCP) as a transport layer protocol. Testing with UDP was kept in
      both <xref target="RFC5180"/> and <xref target="RFC8219"/> regarding the standard benchmarking
      procedures (throughput, latency, frame loss rate, etc.).  The
      benchmarking methodology proposed in this document follows this long-established benchmarking tradition using UDP as a transport layer
      protocol, too. The rationale for this is that the standard benchmarking
      procedures require sending frames at arbitrary constant frame rates,
      which would violate the flow control and congestion control algorithms
      of the TCP protocol. TCP connection setup (using the three-way
      handshake) would further complicate testing.</t>

      <t>Further potential transport layer protocols, e.g., the Datagram Congestion Control Protocol (DCCP) <xref
      target="RFC4340" format="default"/> and the Stream Control Transmission Protocol (SCTP) <xref target="RFC9260"
      format="default"/>, are outside of the scope of this document, as the
      widely used stateful NAT44 and stateful NAT64 implementations do not
      support them. Although QUIC <xref target="RFC9000" format="default"/> is
      also considered a transport layer protocol, QUIC packets are carried
      in UDP datagrams; thus, QUIC does not need a special handling.</t>

      <t>Some stateful NATxy solutions handle TCP and UDP differently,
      e.g., iptables use a 30s timeout for UDP and a 60s timeout for TCP. Thus,
      benchmarking results produced using UDP do not necessarily characterize
      the performance of a NATxy gateway well enough when they are used for
      forwarding Internet traffic. As for the given example, timeout values of
      the DUT may be adjusted, but it requires extra consideration.</t>

      <t>Other differences in handling UDP or TCP are also possible. Thus, the
      authors recommend that further investigations should be performed in
      this field.</t>

      <t>As a mitigation of this problem, this document recommends that
      testing with protocols using TCP (like HTTP and HTTPS up to version 2)
      can be performed as described in <xref target="RFC9411"
      format="default"/>.  This approach also solves the potential problem of
      protocol helpers that may be present in the stateful DUT.</t>

      <t>As for HTTP/3, it uses QUIC, which uses UDP as stated above. It
      should be noted that QUIC is treated as any other UDP payload. The
      proposed measurement method does not aim to measure the performance of
      QUIC, rather, it aims to measure the performance of the stateful NATxy
      gateway.</t>
    </section>

   <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>

    <section anchor="Security" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>This document has no further security considerations beyond that of
      <xref target="RFC8219" format="default"/>.  They should be cited here so
      that they can be applied not only for the benchmarking of IPv6 transition
      technologies but also for the benchmarking of any stateful NATxy
      gateways (allowing for x=y, too).</t>
    </section>
  </middle>
 <back>

   <displayreference target="I-D.lencse-v6ops-transition-scalability" to="SCALABILITY"/>
   <references>
      <name>References</name>
      <references>
        <name>Normative References</name>

	<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.1918.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2544.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3022.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4340.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4814.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5180.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6146.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7599.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8219.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9000.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9260.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9411.xml"/>
      </references>
      <references>
        <name>Informative References</name>

	<xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-lencse-v6ops-transition-scalability.xml"/>

        <reference anchor="DUST1964" target="https://dl.acm.org/doi/pdf/10.1145/364520.364540">
          <front>
            <title>Algorithm 235: Random permutation
            </title>
            <author initials="R." surname="Durstenfeld">
              <organization/>
            </author>
            <date month="July" year="1964"/>
          </front>
          <refcontent>Communications of the ACM, vol. 7, no. 7, p. 420</refcontent>
          <seriesInfo name="DOI" value="10.1145/364520.364540"/>
        </reference>

        <reference anchor="IIR2020" target="https://www.iij.ad.jp/en/dev/iir/pdf/iir_vol49_report_EN.pdf">
          <front>
            <title>Periodic Observation Report: Internet Trends as Seen from IIJ Infrastructure - 2020
            </title>
            <author initials="T." surname="Kurahashi">
              <organization/>
            </author>
            <author initials="Y." surname="Matsuzaki">
              <organization/>
            </author>
            <author initials="T." surname="Sasaki">
              <organization/>
            </author>
            <author initials="T." surname="Saito">
              <organization/>
            </author>
            <author initials="F." surname="Tsutsuji">
              <organization/>
            </author>
            <date month="December" year="2020"/>
          </front>
          <refcontent>Internet Initiative Japan Inc.</refcontent>
          <refcontent>Internet Infrastructure Review, vol. 49</refcontent>
        </reference>

        <reference anchor="LEN2015" target="https://www.hit.bme.hu/~lencse/publications/e98-b_8_1580.pdf">
          <front>
            <title>Estimation of the Port Number Consumption of Web Browsing
            </title>
            <author initials="G." surname="Lencse">
              <organization/>
            </author>
            <date month="August" year="2015"/>
          </front>
          <refcontent>IEICE Transactions on Communications, vol. E98-B, no. 8. pp. 1580-1588</refcontent>
          <seriesInfo name="DOI" value="10.1587/transcom.E98.B.1580"/>
        </reference>

        <reference anchor="LEN2020" target="http://ijates.org/index.php/ijates/article/view/291">
          <front>
            <title>Adding RFC 4814 Random Port Feature to Siitperf: Design, Implementation and Performance Estimation
            </title>
            <author initials="G." surname="Lencse">
              <organization/>
            </author>
            <date month="November" year="2020"/>
          </front>
          <refcontent>International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems, vol 9, no 3, pp. 18-26.</refcontent>
          <seriesInfo name="DOI" value="10.11601/ijates.v9i3.291"/>
        </reference>

        <reference anchor="LEN2022" target="https://www.sciencedirect.com/science/article/pii/S0140366422001803">
          <front>
            <title>Design and Implementation of a Software Tester for Benchmarking Stateful NAT64xy Gateways: Theory and Practice of Extending Siitperf for Stateful Tests
            </title>
            <author initials="G." surname="Lencse">
              <organization/>
            </author>
            <date month="August" year="2022"/>
          </front>
          <refcontent>Computer Communications, vol. 192, pp. 75-88</refcontent>
          <seriesInfo name="DOI" value="10.1016/j.comcom.2022.05.028"/>
        </reference>

        <reference anchor="LEN2023" target="https://www.sciencedirect.com/science/article/pii/S0140366423002931">
          <front>
            <title>Benchmarking methodology for stateful NAT64 gateways
            </title>
            <author initials="G." surname="Lencse">
              <organization/>
            </author>
            <author initials="K." surname="Shima">
              <organization/>
            </author>
            <author initials="K." surname="Cho">
              <organization/>
            </author>
            <date month="October" year="2023"/>
          </front>
          <refcontent>Computer Communications, vol. 210, pp. 256-272</refcontent>
          <seriesInfo name="DOI" value="10.1016/j.comcom.2023.08.009"/>
        </reference>

        <reference anchor="LEN2024a" target="https://www.sciencedirect.com/science/article/pii/S0140366424000999">
          <front>
            <title>Benchmarking methodology for IPv4aaS technologies: 
		Comparison of the scalability of the Jool implementation of 464XLAT and MAP-T
            </title>
            <author initials="G." surname="Lencse">
              <organization/>
            </author>
            <author initials="Á." surname="Bazsó">
              <organization/>
            </author>
            <date month="April" year="2024"/>
          </front>
          <refcontent>Computer Communications, vol. 219, pp. 243-258</refcontent>
          <seriesInfo name="DOI" value="10.1016/j.comcom.2024.03.007"/>
        </reference>

        <reference anchor="LEN2024b" target="https://www.sciencedirect.com/science/article/abs/pii/S0140366424001993">
          <front>
            <title>Making stateless and stateful network performance measurements unbiased
            </title>
            <author initials="G." surname="Lencse">
              <organization/>
            </author>
            <date month="September" year="2024"/>
	  </front>
          <refcontent>Computer Communications, vol. 225, pp. 141-155</refcontent>
          <seriesInfo name="DOI" value="10.1016/j.comcom.2024.05.018"/>
        </reference>

        <reference anchor="SIITPERF" target="https://github.com/lencsegabor/siitperf">
          <front>
            <title>Siitperf: An RFC 8219 compliant SIIT and stateful NAT64/NAT44 tester
            </title>
            <author>
              <organization/>
            </author>
            <date month="September" year="2023"/>
          </front>
	  <refcontent>commit 165cb7f</refcontent>
        </reference>
      </references>
    </references>

    <section anchor="Acknowledgements" numbered="false" toc="default">
      <name>Acknowledgements</name>

      <t>The authors would like to thank <contact fullname="Al Morton"/>,
      <contact fullname="Sarah Banks"/>, <contact fullname="Edwin Cordeiro"/>,
      <contact fullname="Lukasz Bromirski"/>, <contact fullname="Sándor
      Répás"/>, <contact fullname="Tamás Hetényi"/>, <contact
      fullname="Timothy Winters"/>, <contact fullname="Eduard Vasilenko"/>,
      <contact fullname="Minh Ngoc Tran"/>, <contact fullname="Paolo
      Volpato"/>, <contact fullname="Zeqi Lai"/>, and <contact
      fullname="Bertalan Kovács"/> for their comments.</t>
      <t>The authors thank <contact fullname="Warren Kumari"/>, <contact
      fullname="Michael Scharf"/>, <contact fullname="Alexey Melnikov"/>,
      <contact fullname="Robert Sparks"/>, <contact fullname="David Dong"/>,
      <contact fullname="Roman Danyliw"/>, <contact fullname="Erik Kline"/>,
      <contact fullname="Murray Kucherawy"/>, <contact fullname="Zaheduzzaman
      Sarker"/>, and <contact fullname="Éric Vyncke"/> for their reviews and
      comments.</t>
      <t>This work was supported by the Japan Trust International Research
      Cooperation Program of the National Institute of Information and
      Communications Technology (NICT), Japan.</t>
    </section>
  </back>
</rfc>
