Introduction
Today, we take things like video calls for granted.
We open:
- Google Meet
- Zoom
- Microsoft Teams
- Discord
Click a button and instantly talk with someone thousands of kilometers away.
It feels simple.
But from a networking perspective, it is one of the most complicated things modern software does.
To understand why WebRTC exists, we must first understand how communication on the Internet evolved.
Because WebRTC did not appear randomly.
It was created after decaded of limitations in previous communication models.
The Original Internet Was Not Built for Real-Time Communication
This is the first thing every engineer must understand.
The Internet was not originally designed for:
- video calls
- voice calls
- screen sharing
- multiplayer gaming
- live collaboration
The Internet was designed primarily for data exchange.
The assumption was:
Request
β
Response
β
DoneA computer asks for information.
Another computer provides information.
The communication ends.
This model worked perfectly for early Internet applications.
Understanding Communication Through a Simple Example
Imagine you visit a website:
You type:
https://example.cominto your browser.
What happens?
The browser sends a request:
GET / HTTP/1.1
HOST: example.comThe server receives it and responds:
<html>
...
</html>The browser displays the page.
Communication completed.
The server does not keep talking.
The browser does not keep listening.
Everything ends after the response.
This model is known as:
Request-Response CommunicationThe Restaurant Analogy
Imagine a restaurant.
You:
- Place an order
- Wait
- Receive food
Conversation finished.
You don't keep an open communication channel with the chef.
This is exactly how traditional web communication works.
Browser
|
Request
|
Server
|
Response
|
BrowserDone.
Why Request-Response Worked So Well
For many years this was enough.
Most applications needed only occasional communication.
Examples:
Reading News
Request article
β
Receive articleViewing Products
Request product page
β
Receive product pageDownloading Files
Request file
β
Receive fileSearching
Request search results
β
Receive resultsEverything fit naturally into the request-response model.
The First Major Problem
Over timme, applications became more interactive.
Users no longer wanted static pages.
They wanted:
- chats
- notifications
- live updates
Let's examine why traditional communication started failing.
Imagine a Chat Application
Suppose Alice sends a message to Bob.
Alice types:
Hello Boband clicks send.
The message reaches the server.
Now the server has Bob's message.
Question:
How does Bob know a new message arrived?
The Naive Solution
Bob continuously asks:
Any new messages?Server:
NoOne second later:
Any new messages?Server:
NoEventually:
Any new messages?Server:
YesThis technique is called:
Polling
What Is Polling?
Polling means repeatedly asking a server for updates.
Example:
setInterval(() => {
fetch('/messages')
}, 1000);Every second:
Client
β
Any updates?
β
Server
β
NoWhy Polling is Inefficient
Suppose:
- 10,000 users online
- each polls every second
Requests per second:
10,000Most requests return:
No updatesThe server spends enormous resources answering useless requests.
Poling Creates Latency
Imagine Bob polls every 5 seconds.
Timeline:
0s -> Poll
5s -> Poll
10s -> Poll
15s -> PollAlice sends a message at:
6sBob receives it at:
10sDelay:
4 secondsThe message already existed.
Bob simply wasn't asking yet.
The Need for Real-Time Communication
Users expect:
Send Message
β
Receive ImmediatelyNot:
Send Message
β
Wait
β
ReceiveThus the industry needed a better model.
Long Polling
An improvement over polling.
Instead of:
Client
β
Any updates?
β
Server
β
NoThe server waits.
Client
β
Any updates?
β
Server waits...When a message arrives:
Server
β
Here it isThis reduced useless requests.
But it still had limitations.
Problems with Long Polling
Every update requires:
Connection
β
Response
β
Close Connection
β
ReconnectThousands or millions of users create enormous overhead.
Especially for real-time systems.
The Fundamental Limitation
Both polling and long polling suffer from the same problem:
The client always initiates communication.
The server cannot freely push data whenever it wants.
This becomes a major issue for:
- chat
- multiplayer
- stock trading
- live dashboards
and eventually:
- voice calls
- video calls
Enter WebSockets
The next evolution was WebSockets.
Instead of:
Request
β
Response
β
Disconnectwe create a permanent connection.
client <-> ServerOnce connected:
Either side can send messages.
Why WebSockets Were Revolutionary
For the first time:
The server could push updates instantly.
Example:
Alice sends:
HelloServer immediately pushes:
Helloto Bob.
No polling.
No waiting.
No repeated requests.
Modern Applications Powered by WebSockets
Examples:
- Chat applications
- Notifications
- Trading systems
- Collaborative editors
- Multiplayer games
WebSockets solved many problems.
But not all.
And this is where the story becomes interesting.
The New Challenge
Imagine we want to build Zoom.
Can WebSockets help?
Yes
Can WebSockets transport video?
Technically yes
Can WebSockets transport audio?
Technically yes.
So why wasn't Zoom built entirely using WebSockets?
Why did the industry invent WebRTC?
Because video communication introduces a completely different set of problems.
Problems that WebSockets were never designed to solve.
New Problem #1: Massive Bandwidth
A text message:
Hellomay be:
5 bytesA video frame may be:
6 MBThousands of times larger.
New Problem #2: Continuous Streaming
Chat:
Message
Pause
Message
PauseVideo:
Frame
Frame
Frame
Frame
Frame
Frame
Frame
...continuously.
30-60 times every second.
New Problem #3: Latency Sensitivity
A chat message arriving:
500ms lateis acceptable.
A video frame arriving:
500ms latemakes conversation painful.
Human conversation requires very low latency.
New Problem #4: Server Cost Explosion
Suppose:
1000 users in a video platform.
Each sends:
2 Mbpsvideo.
If everything flows through servers:
1000 x 2 MbpsIncoming.
Then:
1000 x 2 MbpsOutgoing.
Huge infrastructure costs.
New Problem #5: Media Processing
Video is not just data.
Video requires:
- encoding
- decoding
- synchronization
- packet recovery
- bitrate adaptation
- congestion control
WebSockets provide none of these.
Before understanding WebRTC, we must first understand the problem it was created to solve.
Many developers learn WebRTC by memorizing.
createOffer()
createAnswer()
setLocalDescription()
setRemoteDescription()Without understanding:
- Why these APIs exist
- Why browsers exchange offers
- Why STUN servers are needed
- Why ICE candidate appear
- Why signaling is required
As a result, they can build simple demos but struggle to design real-world systems.
We will first understand the networking problems that existed before WebRTC and then see how WebRTC solves them.
What Is WebRTC?
WebRTC stands for:
Web Real-Time CommunicationIt is a technology that allows browsers and applications to communicate directly with each other in real time.
It enables:
- Video calls
- Voice calls
- Screen sharing
- File sharing
- Chat systems
- Multiplayer games
- Collaborative applications
without requiring media or file data to pass through a cental server.
Simple example:
Browser A
|
|
βΌ
Browser BDirect communication.
This is called:
Peer-to-Peer Communicationor
P2PWhat Did We Before WebRTC?
Before WebRTC, browsers had limited communication capabilities.
A browser could:
Browser
|
HTTP Request
|
Serverand
Server
|
HTTP Response
|
BrowserThat's it.
Everything required a server.
Example:
Sending a file.
User A
|
Upload
|
Server
|
Download
|
User BThe server handled everything.
Traditional File Sharing
Imagine sharing a 1 GB file.
Without WebRTC:
User A
|
Upload 1 GB
|
Server
|
Store File
|
Download 1 GB
|
User BTotal traffic:
2 GBbecause:
1 GB Upload
+
1 GB DownloadThe server becomes responsible for:
- Storage
- Processing
- Network bandwidth
Why This Is Expensive
Suppose:
1,000 usersshare:
1 GB filesdaily.
Server traffic:
1,000 x 2 GB = 2 TBper day.
Video Calls Before WebRTC
Video calls had an even bigger problem.
Traditional architecture:
User A
|
Video Stream
|
Server
|
Video Stream
|
User BThe server continuously receives and sends video.
Every packet travels through server.
Result:
- delay
- buffering
- lag
Example:
A 3 Mbps Video stream.
For two users:
3 Mbps Upload + 3 Mbps Download = 6 Mbpsserver bandwidth.
With thousands of users:
Mega infrastructure costsThe Centralized Communication Problem
Traditional communication systems were:
CentralizedEverything flowed through a server.
Architecture:
Client A
|
βΌ
Server
β²
|
Client BThe server becomes:
Single Point of FailureIf the server crashes:
Communication StopsWhat Developers Wanted
Developers wanted:
Browser <-> BrowserCommunication directly.
Without routing large amounts of data through servers.
Ideal architecture:
Client A <-> Client BNo middleman for the actual data.
This would:
- Reduce latency
- Reduce costs
- Improve scalability
- better quality
- less server load
This idea is called:
Peer-to-Peer CommunicationWhy Not Use HTTP?
HTTP is request-response.
Example:
Browser -> GET /user
Server -> Response
FinishedVideo calls require:
- continuous communication
- real-timme streaming
- bidirectional
HTTP wasn't built for that.
Why Not Use WebSockets?
Many developers think:
βCan we build Zoom using WebSockets?β
Technically yes.
Practically terrible.
Why?
Because WebSockets:
- transport bytes
- don't understand audio
- don't understand video
- don't handle packet loss
- don't handle codecs
- don't handle NAT traversal
You would have to build:
- media engine
- packet recovery
- congestion control
- encryption
- peer discovery
from scratch.
That is what WebRTC already provides.
Why Browsers Could Not Do This
This sounds simple:
Browser A
connect
Browser BBut the Internet doesn't work like that.
Every device is usually behind:
NAT (Network Address Translation)FirewallExample:
Your laptop:
192.168.1.5This IP exists only inside your home network.
Nobody on the Internet can directly reach it.
Therefore:
Browser Adoesn't know how to reach:
Browser BThis is one of the biggest problems WebRTC solves.
High-Level Architecture
Two peers:
Alice
BobNeed:
- Discover each other
- Exchange capabilities
- Find route
- Connect
- Stream media
This sounds easy.
Reality is much harder.
Because of NAT.
The NAT Problem
Leave a comment
Your email address will not be published. Required fields are marked *
