CLOSE
Updated on 12 Jun, 202635 mins read 90 views

Introduction

Today, we take things like video calls for granted.

We open:

  • Google Meet
  • Zoom
  • Microsoft Teams
  • Discord
  • WhatsApp

Click a button and instantly talk with someone thousands of kilometers away.

It feels simple.

But from a networking perspective, it is one of the most complicated things modern software does.

To understand why WebRTC exists, we must first understand how communication on the Internet evolved.

Because WebRTC did not appear randomly.

It was created after decaded of limitations in previous communication models.

The Original Internet Was Not Built for Real-Time Communication

This is the first thing every engineer must understand.

The Internet was not originally designed for:

  • video calls
  • voice calls
  • screen sharing
  • multiplayer gaming
  • live collaboration

The Internet was designed primarily for data exchange.

The assumption was:

Request
↓
Response
↓
Done

A computer asks for information.

Another computer provides information.

The communication ends.

This model worked perfectly for early Internet applications.

Understanding Communication Through a Simple Example

Imagine you visit a website:

You type:

https://example.com

into your browser.

What happens?

The browser sends a request:

GET / HTTP/1.1
HOST: example.com

The server receives it and responds:

<html>
  ...
</html>

The browser displays the page.

Communication completed.

The server does not keep talking.

The browser does not keep listening.

Everything ends after the response.

This model is known as:

Request-Response Communication

The Restaurant Analogy

Imagine a restaurant.

You:

  1. Place an order
  2. Wait
  3. Receive food

Conversation finished.

You don't keep an open communication channel with the chef.

This is exactly how traditional web communication works.

Browser
    |
Request
    |
Server
    |
Response
    |
Browser

Done.

Why Request-Response Worked So Well

For many years this was enough.

Most applications needed only occasional communication.

Examples:

Reading News

Request article
↓
Receive article

Viewing Products

Request product page
↓
Receive product page

Downloading Files

Request file
↓
Receive file

Searching

Request search results
↓
Receive results

Everything fit naturally into the request-response model.

The First Major Problem

Over timme, applications became more interactive.

Users no longer wanted static pages.

They wanted:

  • chats
  • notifications
  • live updates

Let's examine why traditional communication started failing.

Imagine a Chat Application

Suppose Alice sends a message to Bob.

Alice types:

Hello Bob

and clicks send.

The message reaches the server.

Now the server has Bob's message.

Question:

How does Bob know a new message arrived?

The Naive Solution

Bob continuously asks:

Any new messages?

Server:

No

One second later:

Any new messages?

Server:

No

Eventually:

Any new messages?

Server:

Yes

This technique is called:

Polling

What Is Polling?

Polling means repeatedly asking a server for updates.

Example:

setInterval(() => {
  fetch('/messages')
}, 1000);

Every second:

Client
↓
Any updates?
↓
Server
↓
No

Why Polling is Inefficient

Suppose:

  • 10,000 users online
  • each polls every second

Requests per second:

10,000

Most requests return:

No updates

The server spends enormous resources answering useless requests.

Poling Creates Latency

Imagine Bob polls every 5 seconds.

Timeline:

0s -> Poll
5s -> Poll
10s -> Poll
15s -> Poll

Alice sends a message at:

6s

Bob receives it at:

10s

Delay:

4 seconds

The message already existed.

Bob simply wasn't asking yet.

The Need for Real-Time Communication

Users expect:

Send Message
↓
Receive Immediately

Not:

Send Message
↓
Wait
↓
Receive

Thus the industry needed a better model.

Long Polling

An improvement over polling.

Instead of:

Client
↓
Any updates?
↓
Server
↓
No

The server waits.

Client
↓
Any updates?
↓
Server waits...

When a message arrives:

Server
↓
Here it is

This reduced useless requests.

But it still had limitations.

Problems with Long Polling

Every update requires:

Connection
↓
Response
↓
Close Connection
↓
Reconnect

Thousands or millions of users create enormous overhead.

Especially for real-time systems.

The Fundamental Limitation

Both polling and long polling suffer from the same problem:

The client always initiates communication.

The server cannot freely push data whenever it wants.

This becomes a major issue for:

  • chat
  • multiplayer
  • stock trading
  • live dashboards

and eventually:

  • voice calls
  • video calls

Enter WebSockets

The next evolution was WebSockets.

Instead of:

Request
↓
Response
↓
Disconnect

we create a permanent connection.

client <-> Server

Once connected:

Either side can send messages.

Why WebSockets Were Revolutionary

For the first time:

The server could push updates instantly.

Example:

Alice sends:

Hello

Server immediately pushes:

Hello

to Bob.

No polling.

No waiting.

No repeated requests.

Modern Applications Powered by WebSockets

Examples:

  • Chat applications
  • Notifications
  • Trading systems
  • Collaborative editors
  • Multiplayer games

WebSockets solved many problems.

But not all.

And this is where the story becomes interesting.

The New Challenge

Imagine we want to build Zoom.

Can WebSockets help?

Yes

Can WebSockets transport video?

Technically yes

Can WebSockets transport audio?

Technically yes.

So why wasn't Zoom built entirely using WebSockets?

Why did the industry invent WebRTC?

Because video communication introduces a completely different set of problems.

Problems that WebSockets were never designed to solve.

New Problem #1: Massive Bandwidth

A text message:

Hello

may be:

5 bytes

A video frame may be:

6 MB

Thousands of times larger.

New Problem #2: Continuous Streaming

Chat:
Message
Pause
Message
Pause

Video:

Frame
Frame
Frame
Frame
Frame
Frame
Frame
...

continuously.

30-60 times every second.

New Problem #3: Latency Sensitivity

A chat message arriving:

500ms late

is acceptable.

A video frame arriving:

500ms late

makes conversation painful.

Human conversation requires very low latency.

New Problem #4: Server Cost Explosion

Suppose:

1000 users in a video platform.

Each sends:

2 Mbps

video.

If everything flows through servers:

1000 x 2 Mbps

Incoming.

Then:

1000 x 2 Mbps

Outgoing.

Huge infrastructure costs.

New Problem #5: Media Processing

Video is not just data.

Video requires:

  • encoding
  • decoding
  • synchronization
  • packet recovery
  • bitrate adaptation
  • congestion control

WebSockets provide none of these.

 

 

 

Before understanding WebRTC, we must first understand the problem it was created to solve.

Many developers learn WebRTC by memorizing.

createOffer()
createAnswer()
setLocalDescription()
setRemoteDescription()

Without understanding:

  • Why these APIs exist
  • Why browsers exchange offers
  • Why STUN servers are needed
  • Why ICE candidate appear
  • Why signaling is required

As a result, they can build simple demos but struggle to design real-world systems.

We will first understand the networking problems that existed before WebRTC and then see how WebRTC solves them.

What Is WebRTC?

WebRTC stands for:

Web Real-Time Communication

It is a technology that allows browsers and applications to communicate directly with each other in real time.

It enables:

  • Video calls
  • Voice calls
  • Screen sharing
  • File sharing
  • Chat systems
  • Multiplayer games
  • Collaborative applications

without requiring media or file data to pass through a cental server.

Simple example:

Browser A
     |
     |
     ▼
Browser B

Direct communication.

This is called:

Peer-to-Peer Communication

or

P2P

What Did We Before WebRTC?

Before WebRTC, browsers had limited communication capabilities.

A browser could:

Browser
   |
HTTP Request
   |
Server

and

Server
   |
HTTP Response
   |
Browser

That's it.

Everything required a server.

Example:

Sending a file.

User A
   |
Upload
   |
Server
   |
Download
   |
User B

The server handled everything.

Traditional File Sharing

Imagine sharing a 1 GB file.

Without WebRTC:

User A
   |
Upload 1 GB
   |
Server
   |
Store File
   |
Download 1 GB
   |
User B

Total traffic:

2 GB

because:

1 GB Upload
+
1 GB Download

The server becomes responsible for:

  • Storage
  • Processing
  • Network bandwidth

Why This Is Expensive

Suppose:

1,000 users

share:

1 GB files

daily.

Server traffic:

1,000 x 2 GB = 2 TB

per day.

Video Calls Before WebRTC

Video calls had an even bigger problem.

Traditional architecture:

User A
   |
Video Stream
   |
Server
   |
Video Stream
   |
User B

The server continuously receives and sends video.

Every packet travels through server.

Result:

  • delay
  • buffering
  • lag

Example:

A 3 Mbps Video stream.

For two users:

3 Mbps Upload + 3 Mbps Download = 6 Mbps

server bandwidth.

With thousands of users:

Mega infrastructure costs

The Centralized Communication Problem

Traditional communication systems were:

Centralized

Everything flowed through a server.

Architecture:

Client A
    |
    ▼
 Server
    ▲
    |
Client B

The server becomes:

Single Point of Failure

If the server crashes:

Communication Stops

What Developers Wanted

Developers wanted:

Browser <-> Browser

Communication directly.

Without routing large amounts of data through servers.

Ideal architecture:

Client A <-> Client B

No middleman for the actual data.

This would:

  • Reduce latency
  • Reduce costs
  • Improve scalability
  • better quality
  • less server load

This idea is called:

Peer-to-Peer Communication

Why Not Use HTTP?

HTTP is request-response.

Example:

Browser -> GET /user
Server -> Response

Finished

Video calls require:

  • continuous communication
  • real-timme streaming
  • bidirectional

HTTP wasn't built for that.

Why Not Use WebSockets?

Many developers think:

“Can we build Zoom using WebSockets?”

Technically yes.

Practically terrible.

Why?

Because WebSockets:

  • transport bytes
  • don't understand audio
  • don't  understand video
  • don't handle packet loss
  • don't handle codecs
  • don't handle NAT traversal

You would have to build:

  • media engine
  • packet recovery
  • congestion control
  • encryption
  • peer discovery

from scratch.

That is what WebRTC already provides.

Why Browsers Could Not Do This

This sounds simple:

Browser A
connect
Browser B

But the Internet doesn't work like that.

Every device is usually behind:

NAT (Network Address Translation)
Firewall

Example:

Your laptop:

192.168.1.5

This IP exists only inside your home network.

Nobody on the Internet can directly reach it.

Therefore:

Browser A

doesn't know how to reach:

Browser B

This is one of the biggest problems WebRTC solves.

High-Level Architecture

Two peers:

Alice
Bob

Need:

  1. Discover each other
  2. Exchange capabilities
  3. Find route
  4. Connect
  5. Stream media

This sounds easy.

Reality is much harder.

Because of NAT.

The NAT Problem

Buy Me A Coffee

Leave a comment

Your email address will not be published. Required fields are marked *

Your experience on this site will be improved by allowing cookies Cookie Policy