ONVIF audio back channel specification

The backchannel connection handling is done using RTSP [RFC 2326]. Therefore a mechanism is introduced which indicates that a client wants to built up a backchannel connection. RTSP provides feature-tags to deal with such functionality additions. A device that supports bi-directional connections (e.g audio or metadata connections) shall support the introduced RTSP extensions.

1. RTSP Require- Tag
The RTSP standard [RFC 2326] can be extended by using additional headers objects. For that purpose a Require tag is introduced to handle special functionality additions (see [RFC 2326], 1.5 Extending Rtsp and 12.32 Require).
The Require-tag is used to determine the support of this feature. This header shall be included in any request where the server is required to understand that feature to correctly perform the request.
A device that supports backchannel and signals Audio output support via the AudioOutputs capability shall understand the backchannel tag:


An RTSP client that wants to built up an RTSP connection with a data backchannel shall include the Require header in its requests.

2. Connection setup for a bi- directional connection
A client shall include the feature tag in it's DESCRIBE request to indicate that a bidirectional data connection shall be established.
A server that understands this Require tag shall include an additional media stream in its SDP file as configured in its Media Profile.
An RTSP server that does not understand the backchannel feature tag or does not support bidirectional data connections shall respond with an error code 551 Option not supported according to the RTSP standard. The client can then try to establish an RTSP connection without backchannel.
A SDP file is used to describe the session. To indicated the direction of the media data the server shall include the a=sendonly in each media section representing media being sent from the client to the server and a=recvonly attributes in each media section representing media being sent from the server to the client.
The server shall list all supported decoding codecs as own media section and the client chooses which one is used. The payload type and the encoded bitstream shall be matched with one of the a=rtpmap fields provided by the server so that the server can properly determine the audio decoder.

This SDP file completely describes the RTSP session. The Server gives the client its control URLs to setup the streams.
In the next step the client can setup the sessions:

The third setup request establishes the audio backchannel connection.
In the next step the client starts the session by sending a PLAY request.

After receiving the OK response to the PLAY request the client MAY start sending audio data to the server. It shall not start sending data to the server before it has received the response.

The Require-header indicates that a special interpretation of the PLAY command is necessary. The command covers both starting of the video and audio stream from NVT to the client and starting the audio connection from client to server.

To terminate the session the client sends a TEARDOWN request.

3. Example: Server with Onvif backchannel support (with multiple decoding capability)
If a device supports multiple audio decoders as backchannel, it can signal such capability by listing multiple a=rtpmap fields illustrated as follows.