Abstract: | This specification defines an XMPP protocol extension for initiating and managing peer-to-peer media sessions between two XMPP entities in a way that is interoperable with existing Internet standards. The protocol provides a pluggable model that enables the core session management semantics (compatible with SIP) to be used for a wide variety of application types (e.g., voice chat, video chat, file transfer) and with a wide variety of transport methods (e.g., TCP, UDP, ICE, application-specific transports). |
Authors: | Scott Ludwig, Joe Beda, Peter Saint-Andre, Robert McQueen, Sean Egan, Joe Hildebrand |
Copyright: | © 1999 - 2012 XMPP Standards Foundation. SEE LEGAL NOTICES. |
Status: | Draft |
Type: | Standards Track |
Version: | 1.1 |
Last Updated: | 2009-12-23 |
NOTICE: The protocol defined herein is a Draft Standard of the XMPP Standards Foundation. Implementations are encouraged and the protocol is appropriate for deployment in production systems, but some changes to the protocol are possible before it becomes a Final Standard.
1. Introduction
2. How It Works
3. Requirements
4. Terminology
4.1. Glossary
4.2. Conventions
5. Concepts and Approach
5.1. Overall Session Management
6. Session Flow
6.1. Resource Determination
6.2. Initiation
6.3. Responder Response
6.3.1. Acknowledgement
6.3.2. Errors
6.4. Negotiation
6.5. Acceptance
6.6. Modifying an Active Session
6.7. Termination
6.8. Informational Messages
7. Formal Definition
7.1. Jingle Element
7.2. Action Attribute
7.2.1. content-accept
7.2.2. content-add
7.2.3. content-modify
7.2.4. content-reject
7.2.5. content-remove
7.2.6. description-info
7.2.7. security-info
7.2.8. session-accept
7.2.9. session-info
7.2.10. session-initiate
7.2.11. session-terminate
7.2.12. transport-accept
7.2.13. transport-info
7.2.14. transport-reject
7.2.15. transport-replace
7.2.16. Tie Breaking Related to Jingle Actions
7.3. Content Element
7.4. Reason Element
8. Transport Types
8.1. Datagram
8.2. Streaming
9. Security Preconditions
10. Error Handling
11. Determining Support
12. Conformance by Using Protocols
12.1. Application Formats
12.2. Transport Methods
12.3. Security Preconditions
13. Security Considerations
13.1. Transport Security
13.2. Denial of Service
13.3. Communication Through Gateways
13.4. Information Exposure
13.5. Redirection
14. IANA Considerations
15. XMPP Registrar Considerations
15.1. Protocol Namespaces
15.2. Namespace Versioning
15.3. Jingle Application Formats Registry
15.4. Jingle Transport Methods Registry
16. XML Schemas
16.1. Jingle
16.2. Jingle Errors
17. History
18. Acknowledgements
Appendices
A: Document Information
B: Author Information
C: Legal Notices
D: Relation to XMPP
E: Discussion Venue
F: Requirements Conformance
G: Notes
H: Revision History
The purpose of Jingle is to enable one-to-one, peer-to-peer media sessions between XMPP entities, where the negotiation occurs over the XMPP signalling channel and the media is exchanged over a data channel that is usually a dedicated non-XMPP transport. Jingle is designed in a modular way:
Developers can easily plug in support for a wide variety of application types, such as voice and video chat (see Jingle RTP Sessions [1]), file transfer (see Jingle File Transfer [2]), application sharing, collaborative editing, whiteboarding, and secure transmission of end-to-end XML streams (see Jingle XML Streams [3]).
The transport methods are also pluggable, so that Jingle implementations can use any appropriate datagram transport such as User Datagram Protocol (UDP; RFC 768 [4]) as negotiated via Jingle Raw UDP Transport Method [5] or Jingle ICE-UDP Transport Method [6], or any appropriate streaming transport such as Transmission Control Protocol (TCP; RFC 793 [7]), SOCKS5 Bytestreams [8] as negotiated via Jingle SOCKS5 Bytestreams Transport Method [9], and In-Band Bytestreams [10] as negotiated via Jingle In-Band Bytestreams Transport Method [11].
This modular approach also extends to the security preconditions that need to be met before application data can be exchanged over a given transport, such as negotiation of Transport Layer Security (TLS; RFC 5246 [12]) for streaming transports and negotiation of Datagram Transport Layer Security (DTLS; RFC 4347 [13]) for datagram transports.
It is expected that most application types, transport methods, and security preconditions will be documented in specifications produced by the XMPP Standards Foundation (XSF) [14] or the Internet Engineering Task Force (IETF) [15]; however, developers can also define proprietary methods for custom functionality.
Although Jingle provides a general framework for session management, the original target application for Jingle was simple voice and video chat. We stress the word "simple". The purpose of Jingle was not to build a full-fledged telephony application that supports call waiting, call forwarding, call transfer, hold music, IVR systems, find-me-follow-me functionality, conference calls, and the like. These features are of interest to some user populations, but adding support for them to the core Jingle layer would introduce unnecessary complexity into a technology that is designed for simple but generalized session negotiation.
Furthermore, Jingle is not intended to supplant or replace existing Internet technologies based on the Session Initiation Protocol (SIP; RFC 3261 [16]). Because dual-stack XMPP+SIP clients are difficult to build, Jingle was designed as a pure XMPP signalling protocol. However, Jingle is at the same time designed to interwork with SIP so that the millions of deployed XMPP clients can be added onto existing Voice over Internet Protocol (VoIP) networks, rather than limiting XMPP users to a separate and distinct network.
This section provides a friendly introduction to Jingle.
In essence, Jingle enables two XMPP entities (e.g., romeo@montague.lit and juliet@capulet.lit) to set up, manage, and tear down a multimedia session. The negotiation takes place over XMPP, and the media transfer typically takes place outside of XMPP. A simplified session flow would be as follows: [17]
Romeo Juliet | | | session-initiate | |---------------------------->| | ack | |<----------------------------| | session-accept | |<----------------------------| | ack | |---------------------------->| | MEDIA SESSION | |<===========================>| | session-terminate | |<----------------------------| | ack | |---------------------------->| | |
To illustrate the basic flow, we show a truncated example with a "stub" application format and transport method (skipping non-essential steps to enforce the most essential concepts and ignoring security preconditions for now).
Example 1. Initiator sends session-initiate (stub)
<iq from='romeo@montague.lit/orchard' id='zid615d9' to='juliet@capulet.lit/balcony' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-initiate' initiator='romeo@montague.lit/orchard' sid='a73sjjvkla37jfea'> <content creator='initiator' name='this-is-a-stub'> <description xmlns='urn:xmpp:jingle:apps:stub:0'/> <transport xmlns='urn:xmpp:jingle:transports:stub:0'/> </content> </jingle> </iq>
In this example, the initiator (romeo@montague.lit/orchard) sends a session initiation offer to the responder (juliet@capulet.lit/balcony), where the session is defined as the exchange of "stub" media over a "stub" transport.
After the responding client acknowledges receipt of the session-initiate message (not shown here), it prompts the responding user (if any) to choose whether she wants to proceed with the session (however, it does not need to prompt the user if for example she has configured her client to automatically accept session requests from this particular initiator). If she wants to proceed she selects the appropriate interface element and her client sends a session-accept message to the initiator.
Example 2. Responder definitively accepts the session
<iq from='juliet@capulet.lit/balcony' id='rc61n59s' to='romeo@montague.lit/orchard' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-accept' responder='juliet@capulet.lit/balcony' sid='a73sjjvkla37jfea'> <content creator='initiator' name='this-is-a-stub'> <description xmlns='urn:xmpp:jingle:apps:stub:0'/> <transport xmlns='urn:xmpp:jingle:transports:stub:0'/> </content> </jingle> </iq>
The initiating client acknowledges receipt of the session-accept message (not shown here) and the parties can exchange "stub" media data over the "stub" transport.
Eventually, one of the parties (here the responder) will terminate the session.
Example 3. Responder terminates the session
<iq from='juliet@capulet.lit/balcony' id='le71fa63' to='romeo@montague.lit/orchard' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-terminate' sid='a73sjjvkla37jfea'> <reason> <success/> </reason> </jingle> </iq>
The initiating client acknowledges receipt of the session-terminate message (not shown here) and the session is ended.
We now "fill in the blanks" for the <description/> and <transport/> elements with a more complex example: a voice chat session, where the application type is a Jingle RTP session (with several different codec possibilities) and the transport method is ICE-UDP.
Example 4. Initiator sends session-initiate
<iq from='romeo@montague.lit/orchard' id='ph37a419' to='juliet@capulet.lit/balcony' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-initiate' initiator='romeo@montague.lit/orchard' sid='a73sjjvkla37jfea'> <content creator='initiator' name='voice'> <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='audio'> <payload-type id='96' name='speex' clockrate='16000'/> <payload-type id='97' name='speex' clockrate='8000'/> <payload-type id='18' name='G729'/> <payload-type id='0' name='PCMU' /> <payload-type id='103' name='L16' clockrate='16000' channels='2'/> <payload-type id='98' name='x-ISAC' clockrate='8000'/> </description> <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1' pwd='asd88fgpdd777uzjYhagZg' ufrag='8hhy'> <candidate component='1' foundation='1' generation='0' id='el0747fg11' ip='10.0.1.1' network='1' port='8998' priority='2130706431' protocol='udp' type='host'/> <candidate component='1' foundation='2' generation='0' id='y3s2b30v3r' ip='192.0.2.3' network='1' port='45664' priority='1694498815' protocol='udp' rel-addr='10.0.1.1' rel-port='8998' type='srflx'/> </transport> </content> </jingle> </iq>
Upon receiving the session-initiate message, the responder determines whether it can proceed with the negotiation. If there is no error, the responder acknowledges the session initiation request.
Example 5. Responder acknowledges session-initiate
<iq from='juliet@capulet.lit/balcony' id='ph37a419' to='romeo@montague.lit/orchard' type='result'/>
When the responding user affirms that she would like to proceed with the session, the responding client sends a session-accept message to the initiator (including in this example the subset of offered codecs that the responding client supports and one or more transport candidates generated by the responder).
Example 6. Responder definitively accepts the session
<iq from='juliet@capulet.lit/balcony' id='yd71f495' to='romeo@montague.lit/orchard' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-accept' responder='juliet@capulet.lit/balcony' sid='a73sjjvkla37jfea'> <content creator='initiator' name='voice'> <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='audio'> <payload-type id='97' name='speex' clockrate='8000'/> <payload-type id='18' name='G729'/> </description> <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1'> <candidate component='1' foundation='1' generation='0' id='or2ii2syr1' ip='192.0.2.1' network='0' port='3478' priority='2130706431' protocol='udp' type='host'/> </transport> </content> </jingle> </iq>
And the initiating client acknowledges session acceptance:
Example 7. Initiator acknowledges session acceptance
<iq from='romeo@montague.lit/orchard' id='yd71f495' to='juliet@capulet.lit/balcony' type='result'/>
Once the parties finish the transport negotiation, they would then exchange media using any of the acceptable codecs.
Eventually, one of the parties (here the responder) will terminate the session.
Example 8. Responder terminates the session
<iq from='juliet@capulet.lit/balcony' id='vua614d9' to='romeo@montague.lit/orchard' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-terminate' sid='a73sjjvkla37jfea'> <reason> <success/> <text>Sorry, gotta go!</text> </reason> </jingle> </iq>
The other party then acknowledges termination of the session:
Example 9. Initiator acknowledges termination
<iq from='romeo@montague.lit/orchard' id='vua614d9' to='juliet@capulet.lit/balcony' type='result'/>
The protocol defined herein is designed to meet the following requirements:
This document defines the signalling protocol only. Additional documents specify the following:
Various application formats (audio, video, etc.) and, where possible, mapping of those types to the Session Description Protocol (SDP; see RFC 4566 [18]); examples include Jingle RTP Sessions and Jingle File Transfer.
Various transport methods; examples include Jingle ICE-UDP Transport Method, Jingle Raw UDP Transport Method, Jingle In-Band Bytestreams Transport Method, and Jingle SOCKS5 Bytestreams Transport Method.
Various methods of securing the transport before using it to send application data; the only method defined so far is Transport Layer Security as described in Jingle XTLS [19].
Procedures for mapping the Jingle signalling protocol to existing signalling standards such as the IETF's Session Initiation Protocol (SIP) and the ITU's H.323 protocol (see H.323 [20]); see for example draft-saintandre-sip-xmpp-media [21].
In diagrams, the following conventions are used:
Jingle consists of three parts, each with its own syntax and semantics:
This document defines the semantics and syntax for overall session management. It also provides pluggable "slots" for application formats and transport methods, which are specified in separate documents.
At the most basic level, the process for initial negotiation of a Jingle session is as follows:
Even after application data is being exchanged, the parties can adjust the session definition by sending additional Jingle messages, such as content-modify, content-remove, content-add, description-info, security-info, session-info, and transport-replace.
The state machine for overall session management (i.e., the state per Session ID) is as follows:
o | | session-initiate | | +---------->--------------+ |/ | PENDING o-----------------------+ | | | content-accept, | | | | content-add, | | | | content-modify, | | | | content-reject, | | | | content-remove, | | | | description-info, | | \|/ | session-info, | | | | transport-accept, | | | | transport-info, | | | | transport-reject, | | | | transport-replace | | | +-------------------+ | | | | session-accept \|/ | | ACTIVE o-----------------------+ | | | content-accept, | | | | content-add, | | | | content-modify, | | | | content-reject, | | | | content-remove, | | | | description-info, | | \|/ | session-info, | | | | transport-accept, | | | | transport-info, | | | | transport-reject, | | | | transport-replace | | | +-------------------+ | | | +------------>--------------+ | | session-terminate | o ENDED
As shown, there are three overall session states:
Note: While it is allowed to send all actions while in the PENDING state, typically the responder will send a session-accept message as quickly as possible in order to expedite the transport negotiation; see the Security Considerations section of this document regarding information exposure when the responder sends transport candidates to the initiator.
The actions related to management of the overall Jingle session are as follows (detailed definitions are provided in the Action Attribute section of this document).
This section defines the high-level flow of a Jingle session. More detailed descriptions are provided in the specifications for Jingle application formats and transport methods.
In order to initiate a Jingle session, the initiator needs to determine which of the responder's XMPP resources is best for the desired application format. Methods for doing so are out of scope for this specification. However, see the Determining Support section of this document for relevant information.
Once the initiator has discovered which of the responder's XMPP resources is ideal for the desired application format, it sends a session initiation request to the responder. This request is an IQ-set containing a <jingle/> element qualified by the 'urn:xmpp:jingle:1' namespace (see Namespace Versioning regarding the possibility of incrementing the version number), where the value of the 'action' attribute is "session-initiate" and where the <jingle/> element contains one or more <content/> elements. Each <content/> element defines a content type to be transferred during the session, and each <content/> element in turn contains one <description/> child element that specifies a desired application format and one <transport/> child element that specifies a potential transport method, as well as (optionally) one <security/> element that specifies a security precondition that needs to be met before the parties can exchange application data over the negotiated transport.
Example 10. Initiator sends session-initiate
<iq from='romeo@montague.lit/orchard' id='xs51r0k4' to='juliet@capulet.lit/balcony' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-initiate' initiator='romeo@montague.lit/orchard' sid='a73sjjvkla37jfea'> <content creator='initiator' name='voice'> <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='audio'> <payload-type id='96' name='speex' clockrate='16000'/> <payload-type id='97' name='speex' clockrate='8000'/> <payload-type id='18' name='G729'/> <payload-type id='0' name='PCMU' /> <payload-type id='103' name='L16' clockrate='16000' channels='2'/> <payload-type id='98' name='x-ISAC' clockrate='8000'/> </description> <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1' pwd='asd88fgpdd777uzjYhagZg' ufrag='8hhy'> <candidate component='1' foundation='1' generation='0' id='el0747fg11' ip='10.0.1.1' network='1' port='8998' priority='2130706431' protocol='udp' type='host'/> <candidate component='1' foundation='2' generation='0' id='y3s2b30v3r' ip='192.0.2.3' network='1' port='45664' priority='1694498815' protocol='udp' rel-addr='10.0.1.1' rel-port='8998' type='srflx'/> </transport> </content> </jingle> </iq>
Application types ought not to be mixed beyond necessity within a single session. Therefore the session initiation request (along with subsequent additions) will include only content-types that can be grouped together into a coherent session within a given Jingle application. For example, two parties might start an audio call but then add a video aspect to that call. If one of the parties decides to send a file to the other party as a result of discussion over the audio/video session or a text chat conversation, conceptually that is probably a separate session (unless file exchange or screen sharing or some other application type is an integral part of a broader collaboration experience and needs to be calibrated with the audio/video session).
Note: The syntax and semantics of the <description/>, <transport/>, and <security/> elements are out of scope for this document, since they are defined in related specifications. The syntax and semantics of the <jingle/> and <content/> elements are specified in this document under Formal Definition.
Unless one of the following errors occurs, the responder MUST acknowledge receipt of the initiation request.
Example 11. Responder acknowledges session-initiate
<iq from='juliet@capulet.lit/balcony' id='xs51r0k4' to='romeo@montague.lit/orchard' type='result'/>
However, after acknowledging the session initiation request, the responder might subsequently determine that it cannot proceed with negotiation of the session (e.g., because it does not support any of the offered application formats or transport methods, because a human user is busy or unable to accept the session, because a human user wishes to formally decline the session, etc.). In these cases, the responder SHOULD immediately acknowledge the session initiation request but then terminate the session with an appropriate reason as described in the Termination section of this document.
There are several reasons why the responder might immediately return an error instead of acknowledging receipt of the initiation request:
If the initiator is unknown to the responder (e.g., via presence subscription as defined in RFC 3921 [22]) and the responder has a policy of not communicating via Jingle with unknown entities, it MUST return a <service-unavailable/> error.
Example 12. Initiator is unknown to responder
<iq from='juliet@capulet.lit/balcony' id='xs51r0k4' to='romeo@montague.lit/orchard' type='error'> <error type='cancel'> <service-unavailable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
If the responder does not support Jingle, it MUST return a <service-unavailable/> error.
Example 13. Responder does not support Jingle
<iq from='juliet@capulet.lit/balcony' id='xs51r0k4' to='romeo@montague.lit/orchard' type='error'> <error type='cancel'> <service-unavailable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
If the responder wishes to redirect the request to another address, it MUST return a <redirect/> error.
Example 14. Responder redirection
<iq from='juliet@capulet.lit/balcony' id='xs51r0k4' to='romeo@montague.lit/orchard' type='error'> <error type='modify'> <redirect xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'> xmpp:voicemail@capulet.lit </redirect> </error> </iq>
If the responder does not have sufficient resources to participate in a session, it MUST return a <resource-constraint/> error.
Example 15. Responder has insufficent resources
<iq from='juliet@capulet.lit/balcony' id='xs51r0k4' to='romeo@montague.lit/orchard' type='error'> <error type='wait'> <resource-constraint xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
If the initiation request was malformed, the responder MUST return a <bad-request/> error.
Example 16. Initiation request malformed
<iq from='juliet@capulet.lit/balcony' id='xs51r0k4' to='romeo@montague.lit/orchard' type='error'> <error type='cancel'> <bad-request xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> </error> </iq>
Although in general it is preferable for the responder to send a session-accept message as soon as possible, some forms of negotiation might be necessary before the parties can agree on an acceptable set of application formats and transport methods. There are many potential parameter combinations, as defined in the relevant specifications for various application formats and transport methods.
The allowable negotiations (e.g., content-level and transport-level negotiations) include:
These forms of negotiation can also occur after the session has been accepted.
As soon as possible after receiving the session-initiate message, the responder informs the initiator that she wishes to proceed with the session by sending a session-accept message.
Example 17. Responder accepts the session
<iq from='juliet@capulet.lit/balcony' id='jd82f517' to='romeo@montague.lit/orchard' type='set'> <jingle xmlns='urn:xmpp:jingle:1' action='session-accept' responder='juliet@capulet.lit/balcony' sid='a73sjjvkla37jfea'> <content creator='initiator' name='voice'> <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='audio'> <payload-type id='97' name='speex' clockrate='8000'/> <payload-type id='18' name='G729'/> </description> <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1'> <candidate component='1' foundation='1' generation='0' id='or2ii2syr1' ip='192.0.2.1' network='0' port='3478' priority='2130706431' protocol='udp' type='host'/> </transport> </content> </jingle> </iq>
Note: After receiving and acknowledging the "session-initiate" action received from the initiator, the responding client SHOULD present an interface element that enables a human user to explicitly agree to proceeding with the session (e.g., an "Accept Incoming Call?" pop-up window including "Yes" and "No" buttons). However, the responding client SHOULD NOT return a "session-accept" action to the initiator until the responder has explicitly agreed to proceed with the session (unless the initiator is on a list of entities whose sessions are automatically accepted).
The initiator then acknowledges the responder's definitive acceptance.
Example 18. Initiator acknowledges session acceptance
<iq from='romeo@montague.lit/orchard' id='jd82f517' to='juliet@capulet.lit/balcony' type='result'/>
The session is now in the ACTIVE state. However, this does not necessarily mean that the parties can exchange application data yet, be