Started with WebRTC – HTML5 Rocks Real-time communication without pluginsGetting

http://www.html5rocks.com/en/tutorials/webrtc/basics/

WebRTC is a new front in the long war for an open and unencumbered web. — Brendan Eich, inventor of JavaScript

Imagine a world where your phone, TV and computer could all communicate on a common platform. Imagine it was easy to add video chat to your web application. That’s the vision of WebRTC.

WebRTC implements open standards for real-time, plugin-free video, audio and data communication. The need is real:

  • A lot of web services already use Real-time Communication (RTC), but need downloads, native apps or plugins. These includes Skype, Facebook (which uses Skype) and Google Hangouts (which use the Google Talk plugin).
  • For end users, plugin download, installation and update can be complex, error prone and annoying.
  • For developers, plugins can be difficult to deploy, debug, troubleshoot, test and maintain—and may require licensing and integration of complex, expensive technology. It can be hard to persuade people to install plugins in the first place!

The guiding principles of the WebRTC project are that its APIs should be open source, free, standardised, and more efficient than existing technologies.

Want to try it out? WebRTC is available now in Google Chrome.

A good place to start is the simple video chat application at apprtc.appspot.com. Open the page in Chrome, with PeerConnection enabled on the chrome://flags page, then open the URL again (with query string added) in a new window. There is a walkthrough of the code later in this article.

Quick start

Haven’t got time to read this article, or just want code?

  1. Get an overview of WebRTC from Justin Uberti’s Google I/O video:

  2. If you haven’t used getUserMedia, take a look at the HTML5 Rocks article on the subject, and view the source for Eric Bidelman‘s photobooth demo.
  3. Get to grips with the PeerConnection API by reading through the demo at webrtc-demos.appspot.com, which implements WebRTC on a single web page.
  4. Learn more about how WebRTC uses servers for signalling, NAT traversal and data communication, by reading through the code and the console logs from the video chat demo at apprtc.appspot.com.

A very short history of WebRTC

For many years, RTC components were expensive, complex and needed to be licensed—putting RTC out of the reach of individuals and smaller companies.

Gmail video chat became popular in 2008, and in 2011 Google introduced Hangouts, which use the Google Talk service (as does Gmail). Google bought GIPS, a company which had developed many of the components required for RTC, such as codecs and echo cancellation techniques. Google open sourced the technologies developed by GIPS and engaged with relevant standards bodies, the IETF and W3C, to ensure industry consensus. In May 2011, Ericsson built the first implementation of WebRTC.

Other JavaScript APIs used by WebRTC apps, such as getUserMedia and WebSocket, emerged at the same time. Future integration with APIs such as Web Audio will make WebRTC even more powerful—WebRTC has already shown huge promise when teamed up with technologies such as WebGL.

Where are we now?

WebRTC has been available in the stable build of Google Chrome since version 20. The getUserMedia API is ‘flagless’ in Chrome from version 21: you don’t have to enable MediaStream on the chrome://flags page.

Opera 12 shipped with getUserMedia; further WebRTC implementation is planned for Opera this year. Firefox has WebRTC efforts underway, and has demonstrated a prototype version of PeerConnection. Full getUserMedia support is planned for Firefox 17 on desktop and Android. WebRTC functionality is available in Internet Explorer via Chrome Frame, and Skype (acquired by Microsoft in 2011) is reputedly planning to use WebRTC. Native implementations with WebRTC include WebKitGTK+.

As well as browser vendors, WebRTC has strong support from Cisco, Ericsson and other companies such as Voxeo, who recently announced the Phono jQuery plugin for building WebRTC-enabled web apps with phone functionality and messaging.

A word of warning: be skeptical of reports that a platform ‘supports WebRTC’. Often this actually just means that getUserMedia is supported, but not any of the other RTC components.

My first WebRTC

WebRTC client applications need to do several things:

  • Get streaming audio, video or data.
  • Communicate streaming audio, video or data.
  • Exchange control messages to initiate or close sessions and report errors.
  • Exchange information about media such as resolution and format.

More specifically, WebRTC as implemented uses the following APIs.

  • MediaStream: get access to data streams, such as from the user’s camera and microphone.
  • PeerConnection: audio or video calling, with facilities for encryption and bandwidth management.
  • DataChannel: peer-to-peer communication of generic data.

Crossing the streams

The MediaStream API represents a source of streaming media. Each MediaStream has one or more MediaStreamTracks, each of which corresponds to a synchronised media source. For example, a stream taken from camera and microphone input has synchronised video and audio tracks. (Don’t confuse MediaStream tracks with the <track> element, which is something entirely different.)

The getUserMedia() function can be used to get a LocalMediaStream. This has a label identifying the source device (something like ‘FaceTime HD Camera (Built-in)’) as well as audioTracks and videoTracks properties, each of which is a MediaStreamTrackList. In Chrome, the webkitURL.createObjectURL() method converts a LocalMediaStream to a Blob URL which can be set as the src of a video element. (In Opera, the src of the video can be set from the stream itself.)

Currently no browser allows audio data from getUserMedia to be passed to an audio or video element, or to other APIs such as Web Audio. The WebRTC PeerConnection API handles audio as well as video, but audio from getUserMedia is not yet supported in other contexts.

You can try out getUserMedia with the code below, if you have a webcam. Paste the code into the console in Chrome and press return. Permission to use the camera and microphone will be requested in an infobar at the top of the browser window; press the Allow button to proceed. The video stream from the webcam will then be displayed in the video element created by the code, at the bottom of the page.

navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia ||
    navigator.mozGetUserMedia || navigator.msGetUserMedia;
window.URL = window.URL || window.webkitURL;

navigator.getUserMedia({video: true}, function(localMediaStream) {
  var video = document.createElement("video");
  video.autoplay = true;
  video.src = window.URL.createObjectURL(localMediaStream);
  document.body.appendChild(video);
}, function(error) {
  console.log(error);
});

The intention is eventually to enable a MediaStream for any streaming data source, not just a camera or microphone. This could be extremely useful for gathering and communicating arbitrary real-time data, for example from sensors or other inputs.

Signalling

WebRTC uses PeerConnection to communicate streams of data, but also needs a mechanism to send control messages between peers, a process known as signalling. Signalling methods and protocols are not specified by WebRTC: signalling is not part of the PeerConnection API. Instead, WebRTC app developers can choose whatever messaging protocol they prefer, such as SIP or XMPP, and any appropriate duplex (two-way) communication channel such as WebSocket, or XMLHttpRequest (XHR) in tandem with the Google Channel API.

The apprtc.appspot.com example uses XHR and the Channel API. Silvia Pfeiffer has demonstrated WebRTC signalling via WebSocket and in May 2012 Doubango Telecom open-sourced the sipml5 SIP client, built with WebRTC and WebSocket.

To start a session, WebRTC clients need the following:

  • Local configuration information.
  • Remote configuration information.
  • Remote transport candidates: how to connect to the remote client (IP addresses and ports).

Configuration information is described in the form of a SessionDescription. the structure of which conforms to the Session Description Protocol, SDP. Serialised, an SDP object looks like this:

v=0
o=- 3883943731 1 IN IP4 127.0.0.1
s=
t=0 0
a=group:BUNDLE audio video
m=audio 1 RTP/SAVPF 103 104 0 8 106 105 13 126

// ...

a=ssrc:2223794119 label:H4fjnMzxy3dPIgQ7HxuCTLb4wLLLeRHnFxh810

Signalling proceeds like this:

  1. Caller sends offer.
  2. Callee receives offer.
  3. Callee sends answer.
  4. Caller receives answer.

The SessionDescription sent by the caller is known as an offer, and the response from the callee is an answer. (Note that WebRTC currently only supports one-to-one communication.)

The offer SessionDescription is passed to the caller’s browser via the PeerConnection setLocalDescription() method, and via signalling to the remote peer, whose own PeerConnection object invokes setRemoteDescription() with the offer. This architecture is called JSEP, JavaScript Session Establishment Protocol. (There’s an excellent animation explaining the process of signalling and streaming in Ericsson’s demo video for its first WebRTC implementation.)

JSEP architecture diagram
JSEP architecture

 

Once the signalling process has completed successfully, data can be streamed directly, peer to peer, between the caller and callee, or via an intermediary server (more about this below). Streaming is the job of PeerConnection.

PeerConnection

Below is a WebRTC architecture diagram. As you will notice, the green parts are complex!

WebRTC architecture diagram
WebRTC architecture (from webrtc.org)

 

From a JavaScript perspective, the main thing to understand from this diagram is that PeerConnection shields web developers from myriad complexities that lurk beneath. The codecs and protocols used by WebRTC do a huge amount of work to make real-time communication possible, even over unreliable networks:

  • packet loss concealment
  • echo cancellation
  • bandwidth adaptivity
  • dynamic jitter buffering
  • automatic gain control
  • noise reduction and suppression
  • image ‘cleaning’.

PeerConnection sans servers

WebRTC from the PeerConnection point of view is described in the example below. The code is taken from the ‘single page’ WebRTC demo at webrtc-demos.appspot.com, which has local and remote PeerConnection (and local and remote video) on one web page. This doesn’t constitute anything very useful—caller and callee are on the same page—but it does make the workings of the PeerConnection API a little clearer, since the PeerConnection objects on the page can exchange data and messages directly without having to use intermediary servers.

First, a quick explanation of the name webkitPeerConnection00. When PeerConnection using the JSEP architecture was implemented in Chrome (see above), the original pre-JSEP implementation was renamed webkitDeprecatedPeerConnection. This made it possible to keep old demos working with a simple rename. The new JSEP PeerConnection implementation was named webkitPeerConnection00, and as the JSEP draft standard evolves, it might become webkitPeerConnection01, webkitPeerConnection02—and so on—to avoid more breakage. When the dust finally settles, the API name will become PeerConnection.

So, without further ado, here is the process of setting up a call using PeerConnection…

Caller

  1. Create a new PeerConnection and add a stream (for example, from a webcam):
    pc1 = new webkitPeerConnection00(null, iceCallback1);
    // ...
    pc1.addStream(localstream);
  2. Create a local SessionDescription, apply it and initiate a session:
    var offer = pc1.createOffer(null);
    pc1.setLocalDescription(pc1.SDP_OFFER, offer);
    // ...
    pc1.startIce(); // start connection process
  3. (Wait for a response from the callee.)
  4. Receive remote SessionDescription and use it:
    pc1.setRemoteDescription(pc1.SDP_ANSWER, answer);

Callee

  1. (Receive call from caller.)
  2. Create PeerConnection and set remote session description:
    pc2 = new webkitPeerConnection00(null, iceCallback2);
    pc2.onaddstream = gotRemoteStream;
    // ...
    pc2.setRemoteDescription(pc2.SDP_OFFER, offer);
  3. Create local SessionDescription, apply it, and kick off response:
    var answer = pc2.createAnswer(offer.toSdp(),
      {has_audio:true, has_video:true});
    // ...
    pc2.setLocalDescription(pc2.SDP_ANSWER, answer);
    pc2.startIce();

Here’s the whole process (sans logging):

// create the 'sending' PeerConnection
pc1 = new webkitPeerConnection00(null, iceCallback1);
// create the 'receiving' PeerConnection
pc2 = new webkitPeerConnection00(null, iceCallback2);
// set the callback for the receiving PeerConnection to display video
pc2.onaddstream = gotRemoteStream;
// add the local stream for the sending PeerConnection
pc1.addStream(localstream);
// create an offer, with the local stream
var offer = pc1.createOffer(null);
// set the offer for the sending and receiving PeerConnection
pc1.setLocalDescription(pc1.SDP_OFFER, offer);
pc2.setRemoteDescription(pc2.SDP_OFFER, offer);
// create an answer
var answer = pc2.createAnswer(offer.toSdp(), {has_audio:true, has_video:true});
// set it on the sending and receiving PeerConnection
pc2.setLocalDescription(pc2.SDP_ANSWER, answer);
pc1.setRemoteDescription(pc1.SDP_ANSWER, answer);
// start the connection process
pc1.startIce();
pc2.startIce();

PeerConnection plus servers

So… That’s WebRTC on one page in one browser. But what about a real application, with peers on different computers?

In the real world, WebRTC needs servers, however simple, so the following can happen:

  • Users discover each other.
  • Users send their details to each other.
  • Communication survives network glitches.
  • WebRTC client applications communicate data about media such as video format and resolution.
  • WebRTC client applications traverse NAT gateways.

In a nutshell, WebRTC needs two types of server-side functionality:

  • User discovery, communication and signalling.
  • NAT traversal and streaming data communication.

NAT traversal, peer-to-peer networking, and the requirements for building a server app for user discovery and signalling, are beyond the scope of this article. Suffice to say that the STUN protocol and its extension TURN are used by the ICE framework to enable PeerConnection to cope with NAT traversal and other network vagaries.

ICE is a framework for connecting peers, such as two video chat clients. Initially, ICE tries to connect peers directly, with the lowest possible latency, via UDP. In this process, STUN servers have a single task: to enable a peer behind a NAT to find out its public address and port. (Google has a couple of STUN severs, one of which is used in the apprtc.appspot.com example.)

Finding connection candidates
Finding connection candidates

 

If UDP fails, ICE tries TCP: first HTTP, then HTTPS. If direct connection fails—in particular, because of enterprise NAT traversal and firewalls—ICE uses an intermediary (relay) TURN server. In other words, ICE will first use STUN with UDP to directly connect peers and, if that fails, will fall back to a TURN relay server. The expression ‘finding candidates’ refers to the process of finding network interfaces and ports.

WebRTC data pathways
WebRTC data pathways

 

To find out more about how set up a server to deal with signalling and user discovery, take a look at the code repository for the apprtc.appspot.com demo, which is at code.google.com/p/webrtc-samples/source/browse/trunk/apprtc/. This uses the Google App Engine Channel API. For information about using a WebSocket server for signalling, check out Silvia Pfeiffer’s WebSocket WebRTC app.

A simple video chat client

A good place to try out WebRTC, complete with signalling and NAT traversal using a STUN server, is the video chat demo at apprtc.appspot.com.

This app is deliberately verbose in its logging: check the console to understand the order of events.

Below we give a detailed walk-through of the code.

What’s going on?

The demo starts by running the initalize() function:

function initialize() {
  console.log("Initializing; room=85444496.");
  card = document.getElementById("card");
  localVideo = document.getElementById("localVideo");
  miniVideo = document.getElementById("miniVideo");
  remoteVideo = document.getElementById("remoteVideo");
  resetStatus();
  openChannel();
  getUserMedia();
}

This code initializes variables for the HTML video elements that will display video streams from the local camera (localVideo) and from the camera on the remote client (remoteVideo). resetStatus() simply sets a status message.

The openChannel() function sets up messaging between WebRTC clients:

function openChannel() {
  console.log("Opening channel.");
  var channel = new goog.appengine.Channel('AHRlWrqwxKQHdOiOaux3JkDQaxmTvdlYgz1wL69DE20mE3Xq0WaxE3zznRLD6_jwIGiRFlAR-En4lAlLHWRKk862_JTGHrdCHaoTuJTCw8l6Cf7ChMWiVjU');
  var handler = {
    'onopen': onChannelOpened,
    'onmessage': onChannelMessage,
    'onerror': onChannelError,
    'onclose': onChannelClosed
  };
  socket = channel.open(handler);
}

For signalling, this demo uses the Google App Engine Channel API, which enables messaging between JavaScript clients without polling. (WebRTC signalling is covered in more detail above).

Architecture of the apprtc video chat application
Architecture of the apprtc video chat application

 

Establishing a channel with the Channel API works like this:

  1. Client A generates a unique ID.
  2. Client A requests a Channel token from the App Engine app, passing its ID.
  3. App Engine app requests a channel and a token for the client’s ID from the Channel API.
  4. App sends the token to Client A.
  5. Client A opens a socket and listens on the channel set up on the server.
The Google Channel API: establishing a channel
The Google Channel API: establishing a channel

 

Sending a message works like this:

  1. Client B makes a POST request to the App Engine app with an update.
  2. The App Engine app passes a request to the channel.
  3. The channel carries a message to Client A.
  4. Client A’s onmessage callback is called.
The Google Channel API: sending a message
The Google Channel API: sending a message

 

Just to reiterate: signalling messages are communicated via whatever mechanism the developer chooses: the signalling mechanism is not specified by WebRTC. The Channel API is used in this demo, but other methods (such as WebSocket) could be used instead.

After the call to openChannel(), the getUserMedia() function called by initialize() checks if the browser supports the getUserMedia API. (Find out more about getUserMedia on HTML5 Rocks.) If all is well, onUserMediaSuccess is called:

function onUserMediaSuccess(stream) {
  console.log("User has granted access to local media.");
  var url = webkitURL.createObjectURL(stream);
  localVideo.style.opacity = 1;
  localVideo.src = url;
  localStream = stream;
  // Caller creates PeerConnection.
  if (initiator) maybeStart();
}

This causes video from the local camera to be displayed in the localVideo element, by creating an object (Blob) URL for the camera’s data stream and then setting that URL as the src for the element. (createObjectURL is used here as a way to get a URI for an ‘in memory’ binary resource, i.e. the LocalDataStream for the video.) The data stream is also set as the value of localStream, which is subsequently made available to the remote user.

At this point, initiator has been set to 1 (and it stays that way until the caller’s session has terminated) so maybeStart() is called:

function maybeStart() {
    if (!started && localStream && channelReady) {
      setStatus("Connecting...");
      console.log("Creating PeerConnection.");
      createPeerConnection();
      console.log("Adding local stream.");
      pc.addStream(localStream);
      started = true;
      // Caller initiates offer to peer.
      if (initiator)
        doCall();
    }
  }

This function uses a handy construct when working with multiple asynchronous callbacks: maybeStart() may be called by any one of several functions, but the code in it is run only when localStream has been defined and channelReady has been set to true and communication hasn’t already started. So—if a connection hasn’t already been made, and a local stream is available, and a channel is ready for signalling, a connection is created and passed the local video stream. Once that happens, started is set to true, so a connection won’t be started more than once.

PeerConnection: making a call

createPeerConnection(), called by maybeStart(), is where the real action begins:

function createPeerConnection() {
    try {
      pc = new webkitPeerConnection00("STUN stun.l.google.com:19302", onIceCandidate);
      console.log("Created webkitPeerConnnection00 with config \"STUN stun.l.google.com:19302\".");
    } catch (e) {
      console.log("Failed to create PeerConnection, exception: " + e.message);
      alert("Cannot create PeerConnection object; Is the 'PeerConnection' flag enabled in about:flags?");
      return;
    }

    pc.onconnecting = onSessionConnecting;
    pc.onopen = onSessionOpened;
    pc.onaddstream = onRemoteStreamAdded;
    pc.onremovestream = onRemoteStreamRemoved;
  }

The underlying purpose is to set up a connection, using a STUN server, with onIceCandidate() as the callback (see above for an explanation of ICE, STUN and ‘candidate’). Handlers are then set for each of the PeerConnection events: when a session is connecting or open, and when a remote stream is added or removed. In fact, in this example these handlers only log status messages—except for onRemoteStreamAdded(), which sets the source for the remoteVideo element:

function onRemoteStreamAdded(event) {
console.log("Remote stream added.");
  var url = webkitURL.createObjectURL(event.stream);
  miniVideo.src = localVideo.src;
  remoteVideo.src = url;
  waitForRemoteVideo();
}

Once createPeerConnection() has been invoked in maybeStart(), a call is initiated:

function doCall() {
  console.log("Send offer to peer");
  var offer = pc.createOffer({audio:true, video:true});
  pc.setLocalDescription(pc.SDP_OFFER, offer);
  sendMessage({type: 'offer', sdp: offer.toSdp()});
  pc.startIce();
}

The offer creation process here is similar to the no-signalling example above but, in addition, a message is sent to the remote peer, giving a serialised SessionDescription for the offer. pc.startIce() starts the connection process using the ICE framework (as described above).

Signalling with the Channel API

The onIceCallback() function invoked when the PeerConnection is successfully created in createPeerConnection() sends information about a candidate that has been ‘gathered’:

function onIceCandidate(candidate, moreToFollow) {
  if (candidate) {
    sendMessage({type: 'candidate',
      label: candidate.label, candidate: candidate.toSdp()});
  }
  if (!moreToFollow) {
    console.log("End of candidates.");
  }
}

Outbound messaging, from the client to the server, is done by sendMessage() with an XHR request:

function sendMessage(message) {
  var msgString = JSON.stringify(message);
  console.log('C->S: ' + msgString);
  path = '/message?r=85444496' + '&u=34898650';
  var xhr = new XMLHttpRequest();
  xhr.open('POST', path, true);
  xhr.send(msgString);
}

XHR works fine for sending signalling messages from the client to the server, but some mechanism is needed for server–client messaging: this demo uses the Google App Engine Channel API. Messages from the API (i.e. from the App Engine server) are handled by processSignalingMessage():

function processSignalingMessage(message) {
  var msg = JSON.parse(message);
  if (msg.type === 'offer') {
    // Callee creates PeerConnection
    if (!initiator && !started)
      maybeStart();
    pc.setRemoteDescription(pc.SDP_OFFER, new SessionDescription(msg.sdp));
    doAnswer();
  } else if (msg.type === 'answer' && started) {
    pc.setRemoteDescription(pc.SDP_ANSWER, new SessionDescription(msg.sdp));
  } else if (msg.type === 'candidate' && started) {
    var candidate = new IceCandidate(msg.label, msg.candidate);
    pc.processIceMessage(candidate);
  } else if (msg.type === 'bye' && started) {
    onRemoteHangup();
  }
}

If the message is an answer from a peer (a response to an offer), PeerConnection sets the remote SessionDescription and communication can begin. If the message is an offer (i.e. a message from the callee) PeerConnection sets the remote SessionDescription, sends an answer to the callee, and starts connection by invoking the PeerConnection startIce() method:

function doAnswer() {
  console.log("Send answer to peer");
  var offer = pc.remoteDescription;
  var answer = pc.createAnswer(offer.toSdp(), {audio:true,video:true});
  pc.setLocalDescription(pc.SDP_ANSWER, answer);
  sendMessage({type: 'answer', sdp: answer.toSdp()});
  pc.startIce();
}

And that’s it! The caller and callee have discovered each other and exchanged information about their capabilities, a call session is initiated, and real-time data communication can begin.

DataChannel

As well as audio and video, WebRTC supports real-time communication for other types of data.

The DataChannel API will enable peer-to-peer exchange of arbitrary data, with low latency and high throughput.

There are many potential use cases for the API, including:

  • Gaming
  • Remote desktop applications
  • Real-time text chat
  • File transfer
  • Decentralized networks

The API has several features to make the most of PeerConnection and enable powerful and flexible peer-to-peer communication:

  • Leveraging of PeerConnection session setup.
  • Multiple simultaneous channels, with prioritization.
  • Reliable and unreliable delivery semantics.
  • Built-in security (DTLS) and congestion control.
  • Ability to use with or without audio or video.

The syntax is somewhat similar to WebSocket, with send() and onmessage, as you will see in the code sample below:

// PeerConnection setup and offer-answer exchange omitted
var dc1 = pc1.createDataChannel("mylabel");  // create the sending DataChannel (reliable mode)
var dc2 = pc2.createDataChannel("mylabel");  // create the receiving DataChannel (reliable mode)

// append received DataChannel messages to a textarea
var receiveTextarea = document.querySelector("textarea#receive");
dc2.onmessage = function(event) {
  receiveTextarea.value += event.data;
};  

var sendInput = document.querySelector("input#send");
// send message over the DataChannel
function onSend() {
  dc1.send(sendInput.value);
}

For more information about DataChannel, take a look at the IETF’s draft protocol spec.

In conclusion

The APIs and standards of WebRTC can democratise and decentralise tools for content creation and communication—for telephony, gaming, video production, music making, news gathering and many other applications.

Technology doesn’t get much more disruptive than this.

We look forward to seeing what inventive developers make of WebRTC as it becomes widely implemented over the next few months. As blogger Phil Edholm put it, ‘Potentially, WebRTC and HTML5 could enable the same transformation for real-time communications that the original browser did for information.’

Learn more

WebRTC support summary

MediaStream and getUserMedia

  • Chrome 18.0.1008+ (enable MediaStream on about:flags)
  • Opera, Opera Mobile 12
  • Firefox (Q4 2012)

PeerConnection

  • Chrome 20+ (enable on about:flags)
  • Targeting Chrome 22 for general availability
  • Firefox (Q4 2012)

DataChannel

  • Chrome + Firefox (Q4 2012)
  • Internet Explorer support via ChromeFrame
  • Mobile browser support in progress
  • Native APIs for PeerConnection also available

For more information about support for APIs such as getUserMedia, see caniuse.com.

此条目发表在 未分类 分类目录。将固定链接加入收藏夹。

评论功能已关闭。