Internet-Draft APVPF July 2024
Lim, et al. Expires 9 January 2025 [Page]
Workgroup:
avtcore
Internet-Draft:
draft-lim-rtp-apv-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
Y. Lim
Samsung Electronics
M. Park
Samsung Electronics
M. Budagavi
Samsung Electronics
R. Joshi
Samsung Electronics
K. Choi
Samsung Electronics

RTP payload format for APV

Abstract

This document describes RTP payload format for bitstream encoded with Advanced Professional Video (APC) codec.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 9 January 2025.

Table of Contents

1. Introduction

This document defines the RTP payload format for bitstream encoded with Advanced Professional Video (APV) codec [I-D.lim-apv]. This document defines how to packetize bitstream encoded with APV codec and set the payload header fields. This document also defines how to set the fields of RTP payload header when it carries bitstream encoded with APV codec.The APV codec is a professional video codec that was developed in response to the need for high quality video recording and post production.

The primary purpose of the APV codec is for use in professional video recording and editing workflows for various types of content. The APV codec supports the following features:

2. Terms

2.1. Terms and definitions

  • coded frame: a coded representation of a frame containing all macroblocks of the frame

  • coded representation: a data element as represented in its coded form

  • decoder: an embodiment of a decoding process

  • decoding process: a process specified that reads a bitstream and derives decoded frames from it

  • encoder: an embodiment of an encoding process

  • encoding process: a process that produces a bitstream conforming to [I-D.lim-apv]

  • frame: an array of luma samples and two corresponding arrays of chroma samples in 4:2:2, and 4:4:4 color format

  • Frame Data: a syntax structure containing coded representation of a frame

  • Level: a defined set of constraints on the values that may be taken by the syntax elements and variables of this document, or the value of a transform coefficient prior to scaling

  • MB (macroblock): square block of luma samples and two corresponding blocks of chroma samples of a frame

  • Partitioning: a division of a set into subsets such that each element of the set is in exactly one of the subsets

  • Profile: a specified subset of the syntax of this document

  • syntax element: an element of data represented in the bitstream

  • syntax structure: zero or more syntax elements present together in the bitstream in a specified order

  • tile: a rectangular region of MBs within a particular tile column and a particular tile row in a frame

3. Conventions used in this document

3.1. General

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

4. Overview of APV

4.1. Overview of APV coding tools

The APV codec encodes each frame individually from other frames so that there are no coding dependencies among the frames. A frame is divided into one or more rectangular tiles. Each tiles are also encoded independently from other tiles so that parallel processing of tiles is possible. A tile is further divided into a 16 pixel x 16 pixel size macroblock whch include 4 transform blocks of 8 pixel x 8 pixel. Each transform block is transformed using a fixed point DCT and then transformed coefficients are quantized using uniform scalar quantizer. A prediction is applied to the quantized coefficients in the frequency domain. Finally, entropy coding specially designed to support very high throughput is applied.

4.2. Structure of APV frame data

As the APV codec encodes each frame independently from other frames, the coded data of each individual frame, frame data, is selfcontained. The frame data is consisted of a frame header, a series of encoded tile data, a metadata and a filler data as shown in Figure 1. Metadata and filler data are optional. An encoded tile data is consist of tile_size_minus1 field and tile() syntax structure defined in section 5.3.7 of [I-D.lim-apv].

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|frame header|tile data|tile data|...|tile data| metadata |filler data|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 1: A structure of APV frame data

4.2.1. Frame Header

Frame header as defined in section 5.3.2 of [I-D.lim-apv] provides basic information for decoder configuration and bitstream processing. It includes profile and level of the required decoder, width and height of frame, chroma format and bit depths of pixel data and so on. It also provide distance of capture time between the previously encoded frame and the current frame to indirectly indicate frame rates of encoded video.

4.2.2. Tile Data

Tile data consist of a tile_size_minus1 field and tile() syntax structure. tile() syntax structure includes tile_header and encoded data of Y, Cb and Cr components for the tile. Each tile data contains the tile offset information that allows direct access to any tile for random access and parallel decoding of tiles.

4.2.3. Metadata

Various metadata can be optionally included in each frame through metadata() syntax structure defined in section 5.3.5 of [I-D.lim-apv]. For example, color description or HDR information can be included as metadata

4.2.4. Filler Data

Filler data can be optionally added at the end of encoded frame data as needed. It will be a series of 0xFF as defined in section 5.3.6 of [I-D.lim-apv].

5. Packetization rules

5.1. General rules

The start of an APV Frame Data MUST be alinged with the start of the payload of an RTP packet. The first byte of an APV Frame Data MUST be the first byte of the RTP packet payload after the payload header. There MUST be no RTP packet carrying data from two different APV Frame Data.

5.2. Simple mode

In this mode an APV Frame Data can be fragmented anywhere. The payload of RTP packets does not have to be aligned with begining or end of any particular internal data structure of an APV Frame Data. An example of this mode is shown in Figure 2

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|frame header |tile data|tile data|...|tile data|metadata |filler_data|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             |             \            /           |                |
|             |              \          /            |\               |
|             |\              \        /             | \              |
|             | \              \      /              |  \             |
+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+     +-+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+
|RTP packet #1|  |RTP packet #2| ... |RTP packet #k-1|  |RTP packet #k|
+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+     +-+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+

Figure 2: Example of simple mode

5.3. Low delay mode

In this mode the begining of tile data of an APV Frame Data MUST be aligned with the begining of RTP packet payloads. The first byte of the tile_size_minus1 field MUST be the first byte of a RTP packet payload after payload header except the first one following frame_header. Metadata and filler data can be added to the payload after the last tile data of a frame data. An example of this mode is shown in Figure 3

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|frame_header |tile data|tile data|...|tile data|metadata |filler_data|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                |      \             |              |                |
|               /|        \           |              |\               |
|              / |          \         |              | \              |
|             /  |             \     /               |  \             |
+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+     +-+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+
|RTP packet #1|  |RTP packet #2| ... |RTP packet #k-1|  |RTP packet #k|
+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+     +-+-+-+-+-+-+-+-+  +-+-+-+-+-+-+-+

Figure 3: Example of low delay mode

In this mode, frame header can be repeated in any packet containig the last part of a frame or a tile. When frame header is repeated it MUST be placed immediately after the end of tile data or filler data, if exist.

5.4. RTP Header Usage

The format of the RTP header is specified in [RFC3550] as reprinted below for convenience. This payload format uses the fields of the header in a manner consistent with that specification.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 4: RTP Header According to [RFC3550]

The RTP header information to be set according to this RTP payload format as follows and the usage of the fields not specified in this section follows the rules defined in [RFC3550] :

Marker bit (M): 1 bit

  • set to 1 for the last packet of a frame.

Timestamp: 32 bits

  • The RTP timestamp is set to the sampling timestamp of a frame. A 90 kHz clock rate MUST be used. The RTP packets containing the data belong to a single frame MUST have same value for this field.

5.5. Payload Header Usage

Each packet carries APV encoded bitstream MUST have a payload header as shown in Figure 5.

    0                   1                   2
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=0|OM |PT |H|S|     FRAGMENT COUNTER (FC)     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 5: Payload Header

Version (V) : 2 bits

  • This field indicates the version of the payload header. The version of the header shown in Figure 5 MUST have '0' as the value of this field.

Operation Mode (OM) : 2 bits

  • This field indicates which operation mode is used for packetization of the bitstream.

    • 00b : reserved

    • 11b : reserved

Payload Type (PT) : 2 bits

  • This field indicates the type of payload. Depending on the packetization mode the semantics of this field is slightly different. When a single packet carries entire frame in simple mode or tile in low delay mode then this field MUST be set to 01b.

    • For simple mode (OM == '01b')

        • 00b: neither the first payload nor the last payload

        • 01b: the last payload of a frame

        • 10b: the first payload of a frame

        • 11b: reserved

    • For low delay mode (OM == '10b')

        • 00b: neither the first payload nor the last payload

        • 01b: the last payload of a tile

        • 10b: the first payload of a tile

        • 11b: reserved

Frame Header repeated (H) : 1 bit

  • This field indicates that the frame header data is repeated in this payload. When the value of this field is equal to '1', the payload carries frame header data. The value of this field MUST NOT to be set to '1' when a payload carries a frame header at the begnining of a frame data. The value of this field MUST be set to 1 when the value of OM field is equal to 10b and the value of PT field is either 00b or 01b and the payload carries a copy of the frame header already sent. When the value of the OM field is equal to 10b and the value of this field is equal to '1', the payload includes a copy of frame header data after the end of tile data. If the payload carries the data from the last tile of a frame and there are metadata or filler data to be carried then the copy of a frame header data is carried after any existing metadata and filler data. When the value of the OM field is equal to 01b then the value of this field is ignored.

Static Frame Header (S) : 1 bit

  • This field indicates the values of frame header data is identical except the value of capture_time_distance field with the immediately preceding frame data sent.

Fragment Counter (FC) : 16 bit

  • This field indicates the number of remaining payload excluding the current one carrying the current frame or tile, depending on the operation mode. When the value of the Operation Mode field is '01b', then the value of this field indicates the number of payload carrying the bitstream from a same frame. When the value of the Operation Mode field is '10b', then the value of this field indicates the number of payload carrying the bitstream from a same tile. Metadata or filler data, when they exist, are considered as an integral part of a frame or tile. When the value of the H field is equal to '1', the frame header repeated after the end of a tile data is considered as an integral part of the data. The value of this field carrying the last part of frame or tile will be '0'.

6. Payload Format Parameters

6.1. Media Type Registration

The receiver MUST ignore any parameter unspecified in this document.

Type name: video

Subtype name: apv

Required parameters: N/A

Optional parameters: profile-id, level-id

Encoding considerations:

  • This type is only defined for transfer via RTP (RFC 3550).

Security considerations:

Interoperability considerations: N/A

Published specification:

Applications that use this media type:

  • Any application that relies on APV encoded video delivery over RTP

Fragment identifier considerations: N/A

Additional information: N/A

Person & email address to contact for further information:

  • Youngkwon Lim (yklwhite@gmail.com)

Intended usage: COMMON

Restrictions on usage: N/A

Author: See Authors' Addresses section of RFC XXXX.

Change controller:

  • IETF <avtcore@ietf.org>

6.1.1. Definition of optional parameters

profile-id:

  • When profile-id is not present, a value of 33 (i.e., the Baseline profile) MUST be inferred.

  • When used to indicate properties of a bitstream, profile-id MUST be derived from the profile_idc in the frame header. When there are more than one value of profile_idc field are found from frame headers then the largest value among them MUST be used.

  • APV encoded data transported over RTP using the technologies of this document SHOULD refer only to frame header that have the same or smaller value in profile_idc.

level-id:

  • When level-id is not present, a value of 153 (corresponding to level 5.1, the highest level) MUST be inferred.

  • When used to indicate properties of a bitstream, level-id MUST be derived from the level_idc in the level_idc in the frame header. When there are more than one value of profile_idc field are found from frame headers then the largest value among them MUST be used.

  • For either receiving or sending, all levels that are lower than the indicated level MUST also be supported.

6.2. SDP Parameters

The receiver MUST ignore any parameter unspecified in this document.

6.2.1. Mapping of Payload Type Parameters to SDP

When Session Description Protocol (SDP) [RFC8866] is used to describe the sessions using this payload format the mapping is done as follows:

  • The media name in the "m=" line of SDP MUST be video.

  • The encoding name in the "a=rtpmap" line of SDP MUST be apv (the media subtype).

  • The clock rate in the "a=rtpmap" line MUST be 90000.

  • The optional parameters profile-id and level-id, when present, MUST be included in the "a=fmtp" line of SDP. The fmtp line is expressed as a media type string, in the form of a semicolon-separated list of parameter=value pairs.

As main application area of APV is high quality video capturing and editing, it is expected that generally one way APV session is offered over RTP using SDP in a declarative stye. All parameters are used to indicate only bitstream properties. For example, in this case, the parameters profile-id and level-id declare the values used by the bitstream, not the capabilities for receiving bitstreams. An example of media representation in SDP for such case is as follows:

        m=video 49170 RTP/AVP 98
        a=rtpmap:98 apv/90000
        a=fmtp:98 profile-id=30; level_id=153;

The above represents a stream of data using [I-D.lim-apv] and its payload specification at the baseline profile and level 5.1.

It is not expected that [I-D.lim-apv] is offered over RTP using SDP in and Offer/Answer model with negotiation.

7. Congestion Control

Congestion control for RTP SHALL be used in accordance with RTP [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. If best-effort service is being used, an additional requirement is that users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within an acceptable range. Packet loss is considered acceptable if a TCP flow across the same network path and experiencing the same network conditions would achieve an average throughput, measured on a reasonable timescale, that is not less than all RTP streams combined are achieved. This condition can be satisfied by implementing congestion-control mechanisms to adapt the transmission rate, by implementing the number of layers subscribed for a layered multicast session, or by arranging for a receiver to leave the session if the loss rate is unacceptably high.

The bitrate adaptation necessary for obeying the congestion control principle is easily achievable when real-time encoding is used, for example, by adequately tuning the quantization parameter. However, when pre-encoded content is being transmitted, bandwidth adaptation requires the pre-coded bitstream to be tailored for such adaptivity. Regardless of the method used for bandwidth adpatation, the resulting bitstream MUST be compliant with [I-D.lim-apv].

8. Security Considerations

The scope of this section is limited to the payload format itself and to one feature of [I-D.lim-apv] that may pose a particularly serious security risk if implemented naively. Implementers are advised to read and understand relevant security-related documents, especially those pertaining to RTP (see the Security Considerations section in [RFC3550]). Implementers should also consider known security vulnerabilities of video coding and decoding implementations in general and avoid those.

Within this RTP payload format no security threats other than those common to RTP payload formats are known. In other words, neither the various media-plane-based mechanisms nor the signaling part of this document seem to pose a security risk beyond those common to all RTP-based systems.

Because the data compression used with this payload format is applied end-to-end, any encryption needs to be performed after compression. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the bitstream that are complex to decode and that cause the receiver to be overloaded.

APV data can include user-data as a part of metadata. [I-D.lim-apv] does not specify how to process such data. Depending on the user-data, it might be possible for a sender to generate user-data in a manner to crash the receiver. Receivers must ensure that it knows the format of user-data and trust the sender before it process user-data. In any case, processing of user-data is not required for decoding of APV data. So, receivers does not have to try to process unknown user-data.

9. IANA Considerations

A new media type, as specified in Section 6.1 of this document, is to be registered with IANA.

10. References

10.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC3550]
Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, , <https://www.rfc-editor.org/rfc/rfc3550>.
[RFC3551]
Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, DOI 10.17487/RFC3551, , <https://www.rfc-editor.org/rfc/rfc3551>.
[RFC8866]
Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: Session Description Protocol", RFC 8866, DOI 10.17487/RFC8866, , <https://www.rfc-editor.org/rfc/rfc8866>.

10.2. Informative References

[I-D.lim-apv]
Lim, Y., Park, M., Budagavi, M., Joshi, R., and K. P. Choi, "Advance Professional Video", Work in Progress, Internet-Draft, draft-lim-apv-00, , <https://datatracker.ietf.org/doc/html/draft-lim-apv-00>.

Authors' Addresses

Youngkwon Lim
Samsung Electronics
6105 Tennyson Pkwy, Ste 300
Plano, TX, 75024
United States of America
Minwoo Park
Samsung Electronics
34, Seongchon-gil, Seocho-gu
Seoul
3573
Republic of Korea
Madhukar Budagavi
Samsung Electronics
6105 Tennyson Pkwy, Ste 300
Plano, TX, 75024
United States of America
Rajan Joshi
Samsung Electronics
11488 Tree Hollow Ln
San Diego, CA, 92128
United States of America
Kwang Pyo Choi
Samsung Electronics
34, Seongchon-gil, Seocho-gu
Seoul
3573
Republic of Korea