Why WebRTC Exists and the Problem It Solves

Updated on 12 Jun, 202635 mins read 90 views

Introduction

Today, we take things like video calls for granted.

We open:

Google Meet
Zoom
Microsoft Teams
Discord
WhatsApp

Click a button and instantly talk with someone thousands of kilometers away.

It feels simple.

But from a networking perspective, it is one of the most complicated things modern software does.

To understand why WebRTC exists, we must first understand how communication on the Internet evolved.

Because WebRTC did not appear randomly.

It was created after decaded of limitations in previous communication models.

The Original Internet Was Not Built for Real-Time Communication

This is the first thing every engineer must understand.

The Internet was not originally designed for:

video calls
voice calls
screen sharing
multiplayer gaming
live collaboration

The Internet was designed primarily for data exchange.

The assumption was:

Request
↓
Response
↓
Done

A computer asks for information.

Another computer provides information.

The communication ends.

This model worked perfectly for early Internet applications.

Understanding Communication Through a Simple Example

Imagine you visit a website:

You type:

https://example.com

into your browser.

What happens?

The browser sends a request:

GET / HTTP/1.1
HOST: example.com

The server receives it and responds:

<html>
  ...
</html>

The browser displays the page.

Communication completed.

The server does not keep talking.

The browser does not keep listening.

Everything ends after the response.

This model is known as:

Request-Response Communication

The Restaurant Analogy

Imagine a restaurant.

You:

Place an order
Wait
Receive food

Conversation finished.

You don't keep an open communication channel with the chef.

This is exactly how traditional web communication works.

Browser
    |
Request
    |
Server
    |
Response
    |
Browser

Done.

Why Request-Response Worked So Well

For many years this was enough.

Most applications needed only occasional communication.

Examples:

Reading News

Request article
↓
Receive article

Viewing Products

Request product page
↓
Receive product page

Downloading Files

Request file
↓
Receive file

Searching

Request search results
↓
Receive results

Everything fit naturally into the request-response model.

The First Major Problem

Over timme, applications became more interactive.

Users no longer wanted static pages.

They wanted:

chats
notifications
live updates

Let's examine why traditional communication started failing.

Imagine a Chat Application

Suppose Alice sends a message to Bob.

Alice types:

Hello Bob

and clicks send.

The message reaches the server.

Now the server has Bob's message.

Question:

How does Bob know a new message arrived?

The Naive Solution

Bob continuously asks:

Any new messages?

Server:

No

One second later:

Any new messages?

Server:

No

Eventually:

Any new messages?

Server:

Yes

This technique is called:

Polling

What Is Polling?

Polling means repeatedly asking a server for updates.

Example:

setInterval(() => {
  fetch('/messages')
}, 1000);

Every second:

Client
↓
Any updates?
↓
Server
↓
No

Why Polling is Inefficient

Suppose:

10,000 users online
each polls every second

Requests per second:

10,000

Most requests return:

No updates

The server spends enormous resources answering useless requests.

Poling Creates Latency

Imagine Bob polls every 5 seconds.

Timeline:

0s -> Poll
5s -> Poll
10s -> Poll
15s -> Poll

Alice sends a message at:

6s

Bob receives it at:

10s

Delay:

4 seconds

The message already existed.

Bob simply wasn't asking yet.

The Need for Real-Time Communication

Users expect:

Send Message
↓
Receive Immediately

Not:

Send Message
↓
Wait
↓
Receive

Thus the industry needed a better model.

Long Polling

An improvement over polling.

Instead of:

Client
↓
Any updates?
↓
Server
↓
No

The server waits.

Client
↓
Any updates?
↓
Server waits...

When a message arrives:

Server
↓
Here it is

This reduced useless requests.

But it still had limitations.

Problems with Long Polling

Every update requires:

Connection
↓
Response
↓
Close Connection
↓
Reconnect

Thousands or millions of users create enormous overhead.

Especially for real-time systems.

The Fundamental Limitation

Both polling and long polling suffer from the same problem:

The client always initiates communication.

The server cannot freely push data whenever it wants.

This becomes a major issue for:

chat
multiplayer
stock trading
live dashboards

and eventually:

voice calls
video calls

Enter WebSockets

The next evolution was WebSockets.

Instead of:

Request
↓
Response
↓
Disconnect

we create a permanent connection.

client <-> Server

Once connected:

Either side can send messages.

Why WebSockets Were Revolutionary

For the first time:

The server could push updates instantly.

Example:

Alice sends:

Hello

Server immediately pushes:

Hello

to Bob.

No polling.

No waiting.

No repeated requests.

Modern Applications Powered by WebSockets

Examples:

Chat applications
Notifications
Trading systems
Collaborative editors
Multiplayer games

WebSockets solved many problems.

But not all.

And this is where the story becomes interesting.

The New Challenge

Imagine we want to build Zoom.

Can WebSockets help?

Yes

Can WebSockets transport video?

Technically yes

Can WebSockets transport audio?

Technically yes.

So why wasn't Zoom built entirely using WebSockets?

Why did the industry invent WebRTC?

Because video communication introduces a completely different set of problems.

Problems that WebSockets were never designed to solve.

New Problem #1: Massive Bandwidth

A text message:

Hello

may be:

5 bytes

A video frame may be:

6 MB

Thousands of times larger.

New Problem #2: Continuous Streaming

Chat:
Message
Pause
Message
Pause

Video:

Frame
Frame
Frame
Frame
Frame
Frame
Frame
...

continuously.

30-60 times every second.

New Problem #3: Latency Sensitivity

A chat message arriving:

500ms late

is acceptable.

A video frame arriving:

500ms late

makes conversation painful.

Human conversation requires very low latency.

New Problem #4: Server Cost Explosion

Suppose:

1000 users in a video platform.

Each sends:

2 Mbps

video.

If everything flows through servers:

1000 x 2 Mbps

Incoming.

Then:

1000 x 2 Mbps

Outgoing.

Huge infrastructure costs.

New Problem #5: Media Processing

Video is not just data.

Video requires:

encoding
decoding
synchronization
packet recovery
bitrate adaptation
congestion control

WebSockets provide none of these.

Before understanding WebRTC, we must first understand the problem it was created to solve.

Many developers learn WebRTC by memorizing.

createOffer()
createAnswer()
setLocalDescription()
setRemoteDescription()

Without understanding:

Why these APIs exist
Why browsers exchange offers
Why STUN servers are needed
Why ICE candidate appear
Why signaling is required

As a result, they can build simple demos but struggle to design real-world systems.

We will first understand the networking problems that existed before WebRTC and then see how WebRTC solves them.

What Is WebRTC?

WebRTC stands for:

Web Real-Time Communication

It is a technology that allows browsers and applications to communicate directly with each other in real time.

It enables:

Video calls
Voice calls
Screen sharing
File sharing
Chat systems
Multiplayer games
Collaborative applications

without requiring media or file data to pass through a cental server.

Simple example:

Browser A
     |
     |
     ▼
Browser B

Direct communication.

This is called:

Peer-to-Peer Communication

P2P

What Did We Before WebRTC?

Before WebRTC, browsers had limited communication capabilities.

A browser could:

Browser
   |
HTTP Request
   |
Server

and

Server
   |
HTTP Response
   |
Browser

That's it.

Everything required a server.

Example:

Sending a file.

User A
   |
Upload
   |
Server
   |
Download
   |
User B

The server handled everything.

Traditional File Sharing

Imagine sharing a 1 GB file.

Without WebRTC:

User A
   |
Upload 1 GB
   |
Server
   |
Store File
   |
Download 1 GB
   |
User B

Total traffic:

2 GB

because:

1 GB Upload
+
1 GB Download

The server becomes responsible for:

Storage
Processing
Network bandwidth

Why This Is Expensive

Suppose:

1,000 users

1 GB files

daily.

Server traffic:

1,000 x 2 GB = 2 TB

per day.

Video Calls Before WebRTC

Video calls had an even bigger problem.

Traditional architecture:

User A
   |
Video Stream
   |
Server
   |
Video Stream
   |
User B

The server continuously receives and sends video.

Every packet travels through server.

Result:

delay
buffering
lag

Example:

A 3 Mbps Video stream.

For two users:

3 Mbps Upload + 3 Mbps Download = 6 Mbps

server bandwidth.

With thousands of users:

Mega infrastructure costs

The Centralized Communication Problem

Traditional communication systems were:

Centralized

Everything flowed through a server.

Architecture:

Client A
    |
    ▼
 Server
    ▲
    |
Client B

The server becomes:

Single Point of Failure

If the server crashes:

Communication Stops

What Developers Wanted

Developers wanted:

Browser <-> Browser

Communication directly.

Without routing large amounts of data through servers.

Ideal architecture:

Client A <-> Client B

No middleman for the actual data.

This would:

Reduce latency
Reduce costs
Improve scalability
better quality
less server load

This idea is called:

Peer-to-Peer Communication

Why Not Use HTTP?

HTTP is request-response.

Example:

Browser -> GET /user
Server -> Response

Finished

Video calls require:

continuous communication
real-timme streaming
bidirectional

HTTP wasn't built for that.

Why Not Use WebSockets?

Many developers think:

“Can we build Zoom using WebSockets?”

Technically yes.

Practically terrible.

Why?

Because WebSockets:

transport bytes
don't understand audio
don't understand video
don't handle packet loss
don't handle codecs
don't handle NAT traversal

You would have to build:

media engine
packet recovery
congestion control
encryption
peer discovery

from scratch.

That is what WebRTC already provides.

Why Browsers Could Not Do This

This sounds simple:

Browser A
connect
Browser B

But the Internet doesn't work like that.

Every device is usually behind:

NAT (Network Address Translation)

Firewall

Example:

Your laptop:

192.168.1.5

This IP exists only inside your home network.

Nobody on the Internet can directly reach it.

Therefore:

Browser A

doesn't know how to reach:

Browser B

This is one of the biggest problems WebRTC solves.

High-Level Architecture

Two peers:

Alice
Bob

Need:

Discover each other
Exchange capabilities
Find route
Connect
Stream media

This sounds easy.

Reality is much harder.

Because of NAT.

The NAT Problem

Your email address will not be published. Required fields are marked *

Why WebRTC Exists and the Problem It Solves

Introduction

The Original Internet Was Not Built for Real-Time Communication

Understanding Communication Through a Simple Example

The Restaurant Analogy

Why Request-Response Worked So Well

Reading News

Viewing Products

Downloading Files

Searching

The First Major Problem

Imagine a Chat Application

The Naive Solution

What Is Polling?

Why Polling is Inefficient

Poling Creates Latency

The Need for Real-Time Communication

Long Polling

Problems with Long Polling

The Fundamental Limitation

Enter WebSockets

Why WebSockets Were Revolutionary

Modern Applications Powered by WebSockets

The New Challenge

New Problem #1: Massive Bandwidth

New Problem #2: Continuous Streaming

New Problem #3: Latency Sensitivity

New Problem #5: Media Processing

What Is WebRTC?

What Did We Before WebRTC?

Example:

Traditional File Sharing

Why This Is Expensive

Video Calls Before WebRTC

The Centralized Communication Problem

What Developers Wanted

Why Not Use WebSockets?

Why Browsers Could Not Do This

The NAT Problem

Leave a comment

How Characters are Stored in Memory

Variadic Function Working in C

Appending Characters to Strings in C++