WebRTC is pretty cool. It allows any web browser to make voice and video calls to any other browser – no extra software required!
The problem is that while there are tons of guides and gobs of sample code out there, nobody really explains how it works.
The end result is that you program away, and then discover that OOPS! It’s not connecting…
So, let’s see how to make WebRTC actually work!
An Intro to WebRTC
Instead of re-inventing the wheel, just go read webrtc.org and also the WebRTC API page on MDN.
You can also find some code example/demos here.
So why doesn’t your WebRTC code work?
The short answer is that you’re doing it wrong. Don’t worry – it’s not your fault.
The long answer will require some code to look at.
As a good starting point, I suggest this MDN github repo, which I’ll reproduce below in case it disappears:
// WebSocket and WebRTC based multi-user chat sample with two-way video // calling, including use of TURN if applicable or necessary. // // This file contains the JavaScript code that implements the client-side // features for connecting and managing chat and video calls. // // To read about how this sample works: http://bit.ly/webrtc-from-chat // // Any copyright is dedicated to the Public Domain. // http://creativecommons.org/publicdomain/zero/1.0/ "use strict"; // Get our hostname var myHostname = window.location.hostname; if (!myHostname) { myHostname = "localhost"; } log("Hostname: " + myHostname); // WebSocket chat/signaling channel variables. var connection = null; var clientID = 0; // The media constraints object describes what sort of stream we want // to request from the local A/V hardware (typically a webcam and // microphone). Here, we specify only that we want both audio and // video; however, you can be more specific. It's possible to state // that you would prefer (or require) specific resolutions of video, // whether to prefer the user-facing or rear-facing camera (if available), // and so on. // // See also: // https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamConstraints // https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia // var mediaConstraints = { audio: true, // We want an audio track video: { aspectRatio: { ideal: 1.333333 // 3:2 aspect is preferred } } }; var myUsername = null; var targetUsername = null; // To store username of other peer var myPeerConnection = null; // RTCPeerConnection var transceiver = null; // RTCRtpTransceiver var webcamStream = null; // MediaStream from webcam // Output logging information to console. function log(text) { var time = new Date(); console.log("[" + time.toLocaleTimeString() + "] " + text); } // Output an error message to console. function log_error(text) { var time = new Date(); console.trace("[" + time.toLocaleTimeString() + "] " + text); } // Send a JavaScript object by converting it to JSON and sending // it as a message on the WebSocket connection. function sendToServer(msg) { var msgJSON = JSON.stringify(msg); log("Sending '" + msg.type + "' message: " + msgJSON); connection.send(msgJSON); } // Called when the "id" message is received; this message is sent by the // server to assign this login session a unique ID number; in response, // this function sends a "username" message to set our username for this // session. function setUsername() { myUsername = document.getElementById("name").value; sendToServer({ name: myUsername, date: Date.now(), id: clientID, type: "username" }); } // Open and configure the connection to the WebSocket server. function connect() { var serverUrl; var scheme = "ws"; // If this is an HTTPS connection, we have to use a secure WebSocket // connection too, so add another "s" to the scheme. if (document.location.protocol === "https:") { scheme += "s"; } serverUrl = scheme + "://" + myHostname + ":6503"; log(`Connecting to server: ${serverUrl}`); connection = new WebSocket(serverUrl, "json"); connection.onopen = function(evt) { document.getElementById("text").disabled = false; document.getElementById("send").disabled = false; }; connection.onerror = function(evt) { console.dir(evt); } connection.onmessage = function(evt) { var chatBox = document.querySelector(".chatbox"); var text = ""; var msg = JSON.parse(evt.data); log("Message received: "); console.dir(msg); var time = new Date(msg.date); var timeStr = time.toLocaleTimeString(); switch(msg.type) { case "id": clientID = msg.id; setUsername(); break; case "username": text = "<b>User <em>" + msg.name + "</em> signed in at " + timeStr + "</b><br>"; break; case "message": text = "(" + timeStr + ") <b>" + msg.name + "</b>: " + msg.text + "<br>"; break; case "rejectusername": myUsername = msg.name; text = "<b>Your username has been set to <em>" + myUsername + "</em> because the name you chose is in use.</b><br>"; break; case "userlist": // Received an updated user list handleUserlistMsg(msg); break; // Signaling messages: these messages are used to trade WebRTC // signaling information during negotiations leading up to a video // call. case "video-offer": // Invitation and offer to chat handleVideoOfferMsg(msg); break; case "video-answer": // Callee has answered our offer handleVideoAnswerMsg(msg); break; case "new-ice-candidate": // A new ICE candidate has been received handleNewICECandidateMsg(msg); break; case "hang-up": // The other peer has hung up the call handleHangUpMsg(msg); break; // Unknown message; output to console for debugging. default: log_error("Unknown message received:"); log_error(msg); } // If there's text to insert into the chat buffer, do so now, then // scroll the chat panel so that the new text is visible. if (text.length) { chatBox.innerHTML += text; chatBox.scrollTop = chatBox.scrollHeight - chatBox.clientHeight; } }; } // Handles a click on the Send button (or pressing return/enter) by // building a "message" object and sending it to the server. function handleSendButton() { var msg = { text: document.getElementById("text").value, type: "message", id: clientID, date: Date.now() }; sendToServer(msg); document.getElementById("text").value = ""; } // Handler for keyboard events. This is used to intercept the return and // enter keys so that we can call send() to transmit the entered text // to the server. function handleKey(evt) { if (evt.keyCode === 13 || evt.keyCode === 14) { if (!document.getElementById("send").disabled) { handleSendButton(); } } } // Create the RTCPeerConnection which knows how to talk to our // selected STUN/TURN server and then uses getUserMedia() to find // our camera and microphone and add that stream to the connection for // use in our video call. Then we configure event handlers to get // needed notifications on the call. async function createPeerConnection() { log("Setting up a connection..."); // Create an RTCPeerConnection which knows to use our chosen // STUN server. myPeerConnection = new RTCPeerConnection({ iceServers: [ // Information about ICE servers - Use your own! { urls: "turn:" + myHostname, // A TURN server username: "webrtc", credential: "turnserver" } ] }); // Set up event handlers for the ICE negotiation process. myPeerConnection.onicecandidate = handleICECandidateEvent; myPeerConnection.oniceconnectionstatechange = handleICEConnectionStateChangeEvent; myPeerConnection.onicegatheringstatechange = handleICEGatheringStateChangeEvent; myPeerConnection.onsignalingstatechange = handleSignalingStateChangeEvent; myPeerConnection.onnegotiationneeded = handleNegotiationNeededEvent; myPeerConnection.ontrack = handleTrackEvent; } // Called by the WebRTC layer to let us know when it's time to // begin, resume, or restart ICE negotiation. async function handleNegotiationNeededEvent() { log("*** Negotiation needed"); try { log("---> Creating offer"); const offer = await myPeerConnection.createOffer(); // If the connection hasn't yet achieved the "stable" state, // return to the caller. Another negotiationneeded event // will be fired when the state stabilizes. if (myPeerConnection.signalingState != "stable") { log(" -- The connection isn't stable yet; postponing...") return; } // Establish the offer as the local peer's current // description. log("---> Setting local description to the offer"); await myPeerConnection.setLocalDescription(offer); // Send the offer to the remote peer. log("---> Sending the offer to the remote peer"); sendToServer({ name: myUsername, target: targetUsername, type: "video-offer", sdp: myPeerConnection.localDescription }); } catch(err) { log("*** The following error occurred while handling the negotiationneeded event:"); reportError(err); }; } // Called by the WebRTC layer when events occur on the media tracks // on our WebRTC call. This includes when streams are added to and // removed from the call. // // track events include the following fields: // // RTCRtpReceiver receiver // MediaStreamTrack track // MediaStream[] streams // RTCRtpTransceiver transceiver // // In our case, we're just taking the first stream found and attaching // it to the <video> element for incoming media. function handleTrackEvent(event) { log("*** Track event"); document.getElementById("received_video").srcObject = event.streams[0]; document.getElementById("hangup-button").disabled = false; } // Handles |icecandidate| events by forwarding the specified // ICE candidate (created by our local ICE agent) to the other // peer through the signaling server. function handleICECandidateEvent(event) { if (event.candidate) { log("*** Outgoing ICE candidate: " + event.candidate.candidate); sendToServer({ type: "new-ice-candidate", target: targetUsername, candidate: event.candidate }); } } // Handle |iceconnectionstatechange| events. This will detect // when the ICE connection is closed, failed, or disconnected. // // This is called when the state of the ICE agent changes. function handleICEConnectionStateChangeEvent(event) { log("*** ICE connection state changed to " + myPeerConnection.iceConnectionState); switch(myPeerConnection.iceConnectionState) { case "closed": case "failed": case "disconnected": closeVideoCall(); break; } } // Set up a |signalingstatechange| event handler. This will detect when // the signaling connection is closed. // // NOTE: This will actually move to the new RTCPeerConnectionState enum // returned in the property RTCPeerConnection.connectionState when // browsers catch up with the latest version of the specification! function handleSignalingStateChangeEvent(event) { log("*** WebRTC signaling state changed to: " + myPeerConnection.signalingState); switch(myPeerConnection.signalingState) { case "closed": closeVideoCall(); break; } } // Handle the |icegatheringstatechange| event. This lets us know what the // ICE engine is currently working on: "new" means no networking has happened // yet, "gathering" means the ICE engine is currently gathering candidates, // and "complete" means gathering is complete. Note that the engine can // alternate between "gathering" and "complete" repeatedly as needs and // circumstances change. // // We don't need to do anything when this happens, but we log it to the // console so you can see what's going on when playing with the sample. function handleICEGatheringStateChangeEvent(event) { log("*** ICE gathering state changed to: " + myPeerConnection.iceGatheringState); } // Given a message containing a list of usernames, this function // populates the user list box with those names, making each item // clickable to allow starting a video call. function handleUserlistMsg(msg) { var i; var listElem = document.querySelector(".userlistbox"); // Remove all current list members. We could do this smarter, // by adding and updating users instead of rebuilding from // scratch but this will do for this sample. while (listElem.firstChild) { listElem.removeChild(listElem.firstChild); } // Add member names from the received list. msg.users.forEach(function(username) { var item = document.createElement("li"); item.appendChild(document.createTextNode(username)); item.addEventListener("click", invite, false); listElem.appendChild(item); }); } // Close the RTCPeerConnection and reset variables so that the user can // make or receive another call if they wish. This is called both // when the user hangs up, the other user hangs up, or if a connection // failure is detected. function closeVideoCall() { var localVideo = document.getElementById("local_video"); log("Closing the call"); // Close the RTCPeerConnection if (myPeerConnection) { log("--> Closing the peer connection"); // Disconnect all our event listeners; we don't want stray events // to interfere with the hangup while it's ongoing. myPeerConnection.ontrack = null; myPeerConnection.onnicecandidate = null; myPeerConnection.oniceconnectionstatechange = null; myPeerConnection.onsignalingstatechange = null; myPeerConnection.onicegatheringstatechange = null; myPeerConnection.onnotificationneeded = null; // Stop all transceivers on the connection myPeerConnection.getTransceivers().forEach(transceiver => { transceiver.stop(); }); // Stop the webcam preview as well by pausing the <video> // element, then stopping each of the getUserMedia() tracks // on it. if (localVideo.srcObject) { localVideo.pause(); localVideo.srcObject.getTracks().forEach(track => { track.stop(); }); } // Close the peer connection myPeerConnection.close(); myPeerConnection = null; webcamStream = null; } // Disable the hangup button document.getElementById("hangup-button").disabled = true; targetUsername = null; } // Handle the "hang-up" message, which is sent if the other peer // has hung up the call or otherwise disconnected. function handleHangUpMsg(msg) { log("*** Received hang up notification from other peer"); closeVideoCall(); } // Hang up the call by closing our end of the connection, then // sending a "hang-up" message to the other peer (keep in mind that // the signaling is done on a different connection). This notifies // the other peer that the connection should be terminated and the UI // returned to the "no call in progress" state. function hangUpCall() { closeVideoCall(); sendToServer({ name: myUsername, target: targetUsername, type: "hang-up" }); } // Handle a click on an item in the user list by inviting the clicked // user to video chat. Note that we don't actually send a message to // the callee here -- calling RTCPeerConnection.addTrack() issues // a |notificationneeded| event, so we'll let our handler for that // make the offer. async function invite(evt) { log("Starting to prepare an invitation"); if (myPeerConnection) { alert("You can't start a call because you already have one open!"); } else { var clickedUsername = evt.target.textContent; // Don't allow users to call themselves, because weird. if (clickedUsername === myUsername) { alert("I'm afraid I can't let you talk to yourself. That would be weird."); return; } // Record the username being called for future reference targetUsername = clickedUsername; log("Inviting user " + targetUsername); // Call createPeerConnection() to create the RTCPeerConnection. // When this returns, myPeerConnection is our RTCPeerConnection // and webcamStream is a stream coming from the camera. They are // not linked together in any way yet. log("Setting up connection to invite user: " + targetUsername); createPeerConnection(); // Get access to the webcam stream and attach it to the // "preview" box (id "local_video"). try { webcamStream = await navigator.mediaDevices.getUserMedia(mediaConstraints); document.getElementById("local_video").srcObject = webcamStream; } catch(err) { handleGetUserMediaError(err); return; } // Add the tracks from the stream to the RTCPeerConnection try { webcamStream.getTracks().forEach( transceiver = track => myPeerConnection.addTransceiver(track, {streams: [webcamStream]}) ); } catch(err) { handleGetUserMediaError(err); } } } // Accept an offer to video chat. We configure our local settings, // create our RTCPeerConnection, get and attach our local camera // stream, then create and send an answer to the caller. async function handleVideoOfferMsg(msg) { targetUsername = msg.name; // If we're not already connected, create an RTCPeerConnection // to be linked to the caller. log("Received video chat offer from " + targetUsername); if (!myPeerConnection) { createPeerConnection(); } // We need to set the remote description to the received SDP offer // so that our local WebRTC layer knows how to talk to the caller. var desc = new RTCSessionDescription(msg.sdp); // If the connection isn't stable yet, wait for it... if (myPeerConnection.signalingState != "stable") { log(" - But the signaling state isn't stable, so triggering rollback"); // Set the local and remove descriptions for rollback; don't proceed // until both return. await Promise.all([ myPeerConnection.setLocalDescription({type: "rollback"}), myPeerConnection.setRemoteDescription(desc) ]); return; } else { log (" - Setting remote description"); await myPeerConnection.setRemoteDescription(desc); } // Get the webcam stream if we don't already have it if (!webcamStream) { try { webcamStream = await navigator.mediaDevices.getUserMedia(mediaConstraints); } catch(err) { handleGetUserMediaError(err); return; } document.getElementById("local_video").srcObject = webcamStream; // Add the camera stream to the RTCPeerConnection try { webcamStream.getTracks().forEach( transceiver = track => myPeerConnection.addTransceiver(track, {streams: [webcamStream]}) ); } catch(err) { handleGetUserMediaError(err); } } log("---> Creating and sending answer to caller"); await myPeerConnection.setLocalDescription(await myPeerConnection.createAnswer()); sendToServer({ name: myUsername, target: targetUsername, type: "video-answer", sdp: myPeerConnection.localDescription }); } // Responds to the "video-answer" message sent to the caller // once the callee has decided to accept our request to talk. async function handleVideoAnswerMsg(msg) { log("*** Call recipient has accepted our call"); // Configure the remote description, which is the SDP payload // in our "video-answer" message. var desc = new RTCSessionDescription(msg.sdp); await myPeerConnection.setRemoteDescription(desc).catch(reportError); } // A new ICE candidate has been received from the other peer. Call // RTCPeerConnection.addIceCandidate() to send it along to the // local ICE framework. async function handleNewICECandidateMsg(msg) { var candidate = new RTCIceCandidate(msg.candidate); log("*** Adding received ICE candidate: " + JSON.stringify(candidate)); try { await myPeerConnection.addIceCandidate(candidate) } catch(err) { reportError(err); } } // Handle errors which occur when trying to access the local media // hardware; that is, exceptions thrown by getUserMedia(). The two most // likely scenarios are that the user has no camera and/or microphone // or that they declined to share their equipment when prompted. If // they simply opted not to share their media, that's not really an // error, so we won't present a message in that situation. function handleGetUserMediaError(e) { log_error(e); switch(e.name) { case "NotFoundError": alert("Unable to open your call because no camera and/or microphone" + "were found."); break; case "SecurityError": case "PermissionDeniedError": // Do nothing; this is the same as the user canceling the call. break; default: alert("Error opening your camera and/or microphone: " + e.message); break; } // Make sure we shut down our end of the RTCPeerConnection so we're // ready to try again. closeVideoCall(); } // Handles reporting errors. Currently, we just dump stuff to console but // in a real-world application, an appropriate (and user-friendly) // error message should be displayed. function reportError(errMessage) { log_error(`Error ${errMessage.name}: ${errMessage.message}`); }
Note that the above code is just a starting point, and will require some modifications for your particular app!
Okay, here goes…
The actual process of negotiating a WebRTC call is not really clear from the above code.
Naturally, you need a signaling server – in the above code, that’s done via a websocket connection. You can use whatever other messaging system that you want between the 2 clients, but you’ll need something.
The first step is to invite() a remote user to a call.
The invitation involves you calling createPeerConnection(). That just tells your machine where to find the STUN/TURN server for negotiating an actual connection between puters. Your puter then calls getUserMedia() which grabs your cam/mic, and then adds the video/audio tracks to your RTCPeerConnection.
It’s the RTCPeerConnection OFFER that you need to send to your remote client, but hold your horses… Your code shouldn’t actually send any message, or OFFER, to the remote client until RTCPeerConnection.addTrack (or addTransceiver) is called!
When .addTrack() is called, the negotiationneeded even is fired. It’s inside the handleNegotiatioNeeded event handler that you will do createOffer(), setLocalDescription(offer), and then send that OFFER to the remote client.
On the remote client, they will receive the OFFER message via your signaling system, call createPeerConnection() themselves, set their own setLocalDescription(), setRemoteDescription() with your OFFER, and then call createAnswer() and then send that ANSWER back to your client!
This is where it gets hairy… You have to remember that all of this WebRTC stuff is asynchronous. That means that as soon as your client sends an OFFER to the remote party, your browser may also start sending ICE candidates to the remote puter.
Keep in mind that the remote client hasn’t even accepted your OFFER to talk yet!!
So, locally, your client will start firing handleICECandidateEvent several times. That handler will simply forward the ICE candidates to the remote party. The remote party will need to be able to accept those, and call handleNewICECandidateMsg() to do myPeerConnection.addIceCandidate(candidate).
Now, if you’re paying close attention, you should realize two things:
- As soon as you get an OFFER, you MUST create your own myPeerConnection even if you don’t intend to accept the remote call! Otherwise, you can’t accept ICE candidates, and the call won’t work.
- After the remote party sends its ANSWER back to the caller, that does not mean the call is starting! It just means that the remote party will also start sending ICE candidates back to the caller.
In other words, createAnswer() isn’t about answering a video call; it’s about responding to the OFFER with a “COUNTER-OFFER” and saying, “Yeah, I’m here, let’s start negotiating… but my user hasn’t yet decided to accept your call!”
As such, in my own code, the caller sends an OFFER. The remote client receives it, and sends a COUNTEROFFER (the createAnswer).
As the caller, I see my own cam view since my media is already attached to myPeerConnection. The remote party could be viewing that media since the (muted) stream is already being sent to him, but his client JS doesn’t load it until he clicks the “Answer Call” button. And as the caller, my client JS doesn’t unmute the video/audio until the remote client actually accepts my call.
And since the remote guy hasn’t yet clicked the “Answer Call” button, his client JS also hasn’t called .addTrack yet!
But in the meantime, everything has been negotiated in the background so that if he DOES actually answer the call, all is well.
Okay, one more thing: What happens when the remote party accepts your call, and clicks Answer Call??
Easy! The remote client then gets his media, and addTrack‘s or addTransceiver‘s it to his myPeerConnection. When that happens, generally new ICE candidates are sent. But since everything is all set up already, the re-negotiation happens in the blink of an eye…
And POOF! Both sides are connected to the other, my side unmutes the video, and both sides have remote audio/video streams which they display to their user.
Wow, that was crazy
Yeah, tell me about it… The #1 reason why your WebRTC code doesn’t work is because of bad timing.
Since everything is done via Promises or async/await, and since the browser will just sort of ‘do it’s own thing’ in terms of ICE candidates and such, your code needs to compensate for all of that.
I can’t count the number of posts I read where people were having exactly the above problems. Hopefully this explanation will make things a bit more clear.
There are other ways of doing it, but I liked this way because there is a minimal amount of mojinations that must occur when the callee clicks the Answer Call button. I prefer as much mojo as possible to happen in the background so that answering a call feels (and is) very speedy.
Also keep in mind that whatever signaling system you’re using, it’s probably also asynchronous. So you really can’t rely on anything happening in a certain order.
If you’re stuck, try adding tons of comments as in the code sample above. That was my savior, because it allowed me to see the ICE candidate timing issue. You only have a short amount of time to accept and addIceCandidate() on both sides. If that fails, the browser just closes the connection.
As long as you think of it as: OFFER, COUNTEROFFER, always createPeerConnection and accept ICE candidates as soon as possible, and then actually ANSWER the call using your own mechanism, everything should work okay.
Well, I hope that prevents somebody else from pulling all their hair out!
I’m off to the wig shop. 😉
Recent Comments