ONVIF audio and video playback specification

1. RTSP usage
The replay protocol is based on RTSP [RFC 2326]. However because RTSP does not directly support many of the features required by CCTV applications, this standard defines several extensions to the protocol; these are detailed below.

This standard makes the following stipulations on the usage of RTSP:

1. RTP/RTSP/HTTP/TCP shall be supported by the server. This is the same transport protocol as a device that implements media streaming through the media service shall support, and the same requirements shall apply to replay streaming.
2. The server shall support the unicast RTP/UDP transport for streaming.
3. Clients should use a TCP-based transport for replay, in order to achieve reliable delivery of media packets.
4. The server MAY elect not to send RTCP packets during replay. In typical usage RTCP packets are not required, because usually a reliable transport will be used, and because absolute time information is sent within the stream, making the timing information in RTCP sender reports redundant.

2. RTSP describe
The SDP returned by the RTSP describe command shall include the TrackReference for each track of the recording to allow a client to map the tracks presented in the SDP to tracks of the recording. The tag shall use the following format:

        a:x-onvif-track:<TrackReference>

For example:



3. RTP header extension
In order to allow clients to report a stable and accurate timestamp for each frame played back regardless of the direction of playback, it is necessary to associate an absolute timestamp with each packet, or each group of packets with the same RTP timestamp (e.g. a video frame). This is achieved using an RTP header extension containing an NTP timestamp and some additional information also useful for replay.
The replay mechanism uses the extension ID 0xABAC for the replay extension.
Below shows the general form of an RTP packet containing this extension:



The fields of this extension are as follows:

•  NTP timestamp. An NTP [RFC 1305] timestamp indicating the absolute UTC time associated with the access unit.
•  C: 1 bit. Indicates that this access unit is a synchronization point or "clean point", e.g. the start of an intra-coded frame in the case of video streams.
•  E: 1 bit. Indicates the end of a contiguous section of recording. The last access unit in each track before a recording gap, or at the end of available footage, shall have this bit set. When replaying in reverse, the E flag shall be set on the last frame at the end of the contiguous section of recording.
•  D: 1 bit. Indicates that this access unit follows a discontinuity in transmission. It is primarily used during reverse replay; the first packet of each GOP has the D bit set since it does not chronologically follow the previous packet in the data stream
•  T: 1 bit. Indicates that this is the terminal frame on playback of a track. A device should signal this flag in both forward and reverse playback whenever no more data is available for a track.
•  mbz: This field is reserved for future use and must be zero.
•  Cseq: 1 byte. This is the low-order byte of the Cseq value used in the RTSP PLAY command that was used to initiate transmission. When a client sends multiple, consecutive PLAY commands, this value may be used to determine where the data from each new PLAY command begins.

The replay header extension shall be present in the first packet of every access unit (e.g. video frame).

3.1 NTP Timestamps
The NTP timestamps in the RTP extension header shall correspond to the wallclock time as measured at the original frame grabber before encoding of the stream.
For forward playback of I and P frames the NTP timestamps in the RTP extension header shall increase monotonically over successive packets within a single RTP stream.

3.2 Compatibility with the JPEG header extension
The replay header extension may co-exist with the header extension used by the JPEG RTP profile; this is necessary to allow replay of JPEG streams that use this extension. The JPEG extension is simply appended to the replay extension; its presence is indicated by an RTP header extension length field with a value greater than 3, and by the extension start codes of 0xFFD8 or 0xFFFF at the start of the fourth word of the extension content.

The following illustrates a JPEG packet that uses both extensions:



4. RTSP Feature Tag
The Replay Service uses the "onvif-replay" feature tag to indicate that it supports the RTSP extensions described in this standard. This allows clients to query the server's support for these extensions using the Require header as described in [RFC 2326] section 5.3.1.



The Replay Server shall accept a SETUP and PLAY command that includes a Require header containing the onvif-replay feature tag.

5. Initiating Playback
Playback is initiated by means of the RTSP PLAY method. For example:



The ReversePlayback capability defined in the ONVIF Replay Control Service Specification signals if a device supports reverse playback. Reverse playback is indicated using the Scale header field with a negative value. For example to play in reverse without no data loss a value of Ἷ0 would be used.



If a device supports reverse playback it shall accept a Scale header with a value of -1.0. A device MAY accept other values for the Scale parameter. Unless the Rate-Control header is set to "no" (see below), the Scale parameter is used in the manner described in [RFC 2326]. If Rate-Control is set to "no", the Scale parameter, if it is present, shall be either 1.0 or -1.0, to indicate forward or reverse playback respectively. If it is not present, forward playback is assumed.

5.1 Range header field
A device shall support the Range field expressed using absolute times as defined by [RFC 2326]. Absolute times are expressed using the utc-range from [RFC 2326].
Either open or closed ranges may be used. In the case of a closed range, the range is increasing (end time later than start time) for forward playback and decreasing for reverse playback. The direction of the range shall correspond to the value of the Scale header.
In all cases, the first point of the range indicates the starting point for replay
The time itself shall be given as

    utc-range = "clock" ["=" utc-range-spec]
    utc-range-spec = ( utc-time "-" [ utc-time ] ) / ( "-" utc-time )
    utc-time = utc-date "T" utc-clock "Z"
    utc-date = 8DIGIT
    utc-clock = 6DIGIT [ "." 1*9DIGIT ]

as defined in [RFC2326].

Examples:



5.2 Rate-Control header field
This specification introduces the Rate-Control header field, which may be either "yes" or "no". If the field is not present, "yes" is assumed, and the stream is delivered in real time using standard RTP timing mechanisms. If this field is "no", the stream is delivered as fast as possible, using only the flow control provided by the transport to limit the delivery rate.
The important difference between these two modes is that with "Rate-Control=yes", the server is in control of the playback speed, whereas with "Rate-Control=no" the client is in control of the playback speed. Rate-controlled replay will typically only be used by non-ONVIF specific clients as they will not specify "Rate-Control=no".
When replaying multiple tracks of a single recording, started by a single RTSP PLAY command and not using rate-control, the data from the tracks should be multiplexed in time in the same order as they were recorded.
An ONVIF compliant RTSP server shall support operation with "Rate-Control=no" for playback.

5.3 Frames header field
The Frames header field may be used to reduce the number of frames that are transmitted, for example to lower bandwidth or processing load. Three modes are possible:

1. Intra frames only. This is indicated using the value "intra", optionally followed by a minimum interval between successive intra frames in the stream. The latter can be used to limit the number of frames received even in the presence of "I-frame storms" caused by many receivers requesting frequent I-frames.
2. Intra frames and predicted frames only. This is indicated using the value "predicted". This value can be used to eliminate B-frames if the stream includes them.
3. All frames. This is the default.

Examples:

To request intra frames only:
    Frames: intra

To request intra frames with a minimum interval of 4000 milliseconds:
    Frames: intra/4000

To request intra frames and predicted frames only:
    Frames: predicted

To request all frames (note that it is not necessary to explicitly specify this mode but the example is included for completeness):
    Frames: all

The interval argument used with the "intra" option refers to the recording timeline, not playback time; thus for any given interval the same frames are played regardless of playback speed. The interval argument shall NOT be present unless the Frames option is "intra".
The server shall support the Frames header field. This does not preclude the use of the Scale header field as an alternative means of limiting the data rate. The implementation of the Scale header field may vary between different server implementations, as stated by [RFC 2326].
An ONVIF compliant RTSP server shall support the Frames parameters "intra" and "all" for playback.

5.4 Synchronization points
The transmitted video stream shall begin at a synchronization point (see section "Synchronization Point" of the ONVIF Media Service Specification). The rules for choosing the starting frame are as follows:

•  If the requested start time is within a section of recorded footage, the stream starts with the first clean point at or before the requested start time. This is the case regardless of playback direction.
•  If the requested start time is within a gap in recorded footage and playback is being initiated in the forwards direction, the stream starts with the first clean point in the section following the requested start time.
•  If the requested start time is within a gap in recorded footage and playback is being initiated in the reverse direction, the stream starts with the last clean point in the section preceding the requested start time.