How Video Streaming Actually Works: HLS Explained from the Ground Up
DGateway is a unified payment and commerce platform for Africa. Beyond payments, it powers course platforms with built-in video streaming via MediaKit — our open-source video infrastructure. This post explains the engineering behind video streaming, why it's far more complex than most people realize, and why we built MediaKit instead of paying $100K+/year to existing providers.
The Problem Nobody Talks About
Here's what most people think video streaming is:
"You upload a video file, put it on a server, and people watch it."
That's like saying "you write code, put it on a server, and people use your app." Technically true. Practically, a disaster.
Try this experiment: Take a 1-hour lecture video (about 2GB as an MP4) and put it on a regular web server. Then try to watch it on a 3G connection in rural Uganda. Here's what happens:
- The browser tries to download the entire 2GB file before playing
- On a 1 Mbps connection, that's 4.5 hours to buffer a 1-hour video
- If the connection drops for 2 seconds, the download restarts
- Your student gives up and never comes back
This is the problem that HTTP Live Streaming (HLS) solves. And the solution is genuinely one of the most elegant pieces of engineering in modern computing.
What is HLS?
HLS (HTTP Live Streaming) was created by Apple in 2009. It's now the dominant video streaming protocol, used by:
- Netflix — every movie and show you've ever streamed
- YouTube — 500+ hours of video uploaded every minute
- Twitch — millions of concurrent live streams
- Disney+, HBO Max, Amazon Prime — all of them
- Every course platform — Udemy, Coursera, Skillshare
The core idea is deceptively simple: don't send one big file. Send many tiny files.
But the implementation? That's where it gets fascinating.
The Architecture: From Raw Video to Stream
When you upload a video to a system like MediaKit, here's what actually happens behind the scenes. This is the pipeline that runs on every single video:
┌──────────────────────────────────────────────────────────────────┐
│ THE HLS PIPELINE │
│ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ UPLOAD │───▶│ ANALYZE │───▶│TRANSCODE │───▶│ SEGMENT │ │
│ │ Raw MP4 │ │ FFprobe │ │ FFmpeg │ │ 6s chunks │ │
│ └─────────┘ └──────────┘ └──────────┘ └─────┬──────┘ │
│ │ │
│ ┌───────────────────┘ │
│ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ STORE │◀──│ MANIFEST │◀──│ THUMBNAILS │ │
│ │ CDN/R2 │ │ .m3u8 │ │ Sprites │ │
│ └────┬─────┘ └──────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ READY TO STREAM │ │
│ │ PlyrPlayer loads master.m3u8 │ │
│ │ ABR selects quality per bandwidth │ │
│ └──────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Step 1: Ingest & Analyze
The raw video file (MP4, MOV, MKV, WebM) is received and analyzed using FFprobe (part of the FFmpeg suite). We extract:
Duration: 00:42:17
Resolution: 1920x1080 (Full HD)
Codec: H.264 / AVC
Framerate: 30 fps
Bitrate: 8,500 kbps
Audio: AAC, 48kHz, stereo
File size: 2.67 GB
This metadata determines how the video will be transcoded.
Step 2: Multi-Quality Transcoding
This is the most computationally expensive step. The original video is re-encoded into multiple quality levels simultaneously:
| Quality | Resolution | Bitrate | Use Case |
|---|---|---|---|
| 1080p | 1920×1080 | 5,000 kbps | Desktop, fast WiFi |
| 720p | 1280×720 | 2,500 kbps | Tablet, moderate connection |
| 480p | 854×480 | 1,000 kbps | Mobile, slower connection |
Each quality level is a complete re-encode of the entire video. A 42-minute 1080p video might take:
- 1080p encode: ~8 minutes (on a modern CPU with hardware acceleration)
- 720p encode: ~5 minutes
- 480p encode: ~3 minutes
We skip 360p by default because the quality is too degraded for educational content. For a course platform where students need to read code on screen or see slides clearly, 480p is the minimum acceptable quality.
Step 3: Segmentation
Here's where HLS gets clever. Each quality level is chopped into small segments, typically 6 seconds each:
┌─────────────────────── Original 42-min Video ───────────────────────┐
│ │
│ ████████████████████████████████████████████████████████████████ │
│ │
└──────────────────────────────────────────────────────────────────────┘
│
┌─────────┼─────────┐
▼ ▼ ▼
┌──────────┐ ┌──────┐ ┌──────┐
│ 1080p │ │ 720p │ │ 480p │
└────┬─────┘ └──┬───┘ └──┬───┘
│ │ │
┌────────┘ ┌─────┘ ┌────┘
▼ ▼ ▼
┌───┬───┬───┐ ┌───┬───┐ ┌───┬───┐
│ 0 │ 1 │...│ │ 0 │...│ │ 0 │...│ ← 6-second .ts segments
└───┴───┴───┘ └───┴───┘ └───┴───┘
× 420 each × 420 × 420
Total: 1,260 individual segment files
A 42-minute video becomes:
1080p: 420 segments × 6 seconds each
720p: 420 segments × 6 seconds each
480p: 420 segments × 6 seconds each
─────────────────────────────────────
Total: 1,260 individual video files
Each segment is a self-contained video clip that can be played independently. This is the key insight that makes everything else possible.
Step 4: Manifest Files (The Brain)
HLS uses .m3u8 playlist files (called manifests) that tell the video player what's available:
Master playlist (master.m3u8):
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8Quality playlist (1080p/playlist.m3u8):
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:6.006,
segment_000.ts
#EXTINF:6.006,
segment_001.ts
#EXTINF:6.006,
segment_002.ts
...
#EXTINF:4.838,
segment_419.ts
#EXT-X-ENDLISTThe player reads the master playlist, sees all available qualities, and decides which one to play based on the viewer's bandwidth.
Step 5: Thumbnail Sprites
While transcoding happens, we also generate:
- Poster image — the thumbnail shown before play
- Sprite sheet — a grid of thumbnail images (one every 5 seconds) used for hover preview on the seek bar
- WebVTT file — maps timestamps to sprite positions
This is how YouTube shows you a preview when you hover over the progress bar. We generate the same thing automatically.
Adaptive Bitrate: The Magic
The most impressive part of HLS is Adaptive Bitrate Streaming (ABR). Here's how it works in real-time:
Bandwidth
(Mbps)
15 │ ████ ████████
│ ████ ← 1080p ████████
10 │ ████ ████████
│ ████ ▲ ████████
5 │ ████ │ ████████
│ ████ ██████████ ← 720p ████████
2.5 │ ████ ██████████ ████████
│ ████ ██████████ ████████
1 │ ████ ██████████ ▓▓▓▓ ← 480p ████████
│ ████ ██████████ ▓▓▓▓ ████████
0.5 │──████────██████████──▓▓▓▓────────████████──
│ WiFi 4G Tunnel 4G/WiFi
└────────────────────────────────────────────▶ Time
✓ No buffering ✓ No interruption ✓ Seamless switches
Student starts watching a lecture on WiFi
├── Player detects: 15 Mbps bandwidth
├── Selects: 1080p (needs 5 Mbps) ✓
├── Downloads segment_000.ts (1080p)
├── Downloads segment_001.ts (1080p)
│
├── Student walks to the bus, switches to 4G
├── Player detects: 3 Mbps bandwidth
├── Switches to: 720p (needs 2.5 Mbps) ✓
├── Downloads segment_042.ts (720p) ← seamless switch!
│
├── Bus enters a tunnel, drops to 500 Kbps
├── Switches to: 480p (needs 1 Mbps)
├── Buffer running low...
├── Downloads segment_089.ts (480p)
│
├── Bus exits tunnel, back to 4G
├── Gradually upgrades back to 720p
└── Then back to 1080p when stable
The viewer never notices. There's no buffering spinner. No interruption. The quality adjusts automatically and seamlessly. The video just keeps playing.
This is why you sometimes notice a YouTube video gets blurry for a few seconds when your connection dips, then sharpens again. That's ABR at work.
Why Regular MP4 Can't Do This
Let's compare what happens with a regular MP4 file vs HLS:
| Feature | Raw MP4 | HLS |
|---|---|---|
| Start time | Must download beginning of file first | Plays within 1-2 seconds |
| Seeking | May need to re-download from seek point | Instant — just fetch that segment |
| Bad connection | Endless buffering spinner | Drops to lower quality, keeps playing |
| Mobile data | Wastes bandwidth on full quality | Only downloads what's needed |
| Multiple viewers | Each re-downloads the full file | Segments are cached at CDN edge |
| Content protection | Anyone can download the file | Segments can be encrypted (AES-128) |
| Resume | Often restarts from beginning | Picks up from exact segment |
The CDN Layer: Global Distribution
Once the segments are generated, they need to be distributed globally. This is where a CDN (Content Delivery Network) comes in.
┌─────────────────────┐
│ ORIGIN SERVER │
│ (Cloudflare R2) │
│ All 1,260 segments│
└────────┬────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ CDN Edge │ │ CDN Edge │ │ CDN Edge │
│ Nairobi │ │ Lagos │ │ Cape Town │
│ ~20ms │ │ ~20ms │ │ ~20ms │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐
│ 🎓 Student│ │ 🎓 Student│ │ 🎓 Student│
│ Kampala │ │ Accra │ │ Joburg │
└───────────┘ └───────────┘ └───────────┘
First viewer: Origin → Edge → Viewer (~200ms)
Next viewers: Edge → Viewer (cached) (~20ms)
When a student in Kampala watches a video:
- Request goes to the nearest CDN edge (e.g., Nairobi)
- If the segment is cached there → served instantly (~20ms)
- If not cached → fetched from origin (e.g., Cloudflare R2 in Europe), cached for next viewer (~200ms first time)
After the first viewer in a region watches a segment, every subsequent viewer gets it from cache. This is why Netflix can serve 250 million subscribers without melting.
MediaKit supports multiple storage backends:
- Cloudflare R2 — zero egress fees, global distribution
- AWS S3 — the industry standard
- Backblaze B2 — cheapest per-GB storage
- MinIO — self-hosted, full control
AI-Powered Video Intelligence
Modern video platforms don't just stream — they understand the content. MediaKit integrates AI capabilities:
Auto-Generated Chapters
AI watches the video and identifies topic transitions:
00:00 - Introduction
03:42 - Setting up the development environment
08:15 - Understanding REST APIs
14:30 - Building your first endpoint
22:07 - Error handling patterns
31:45 - Testing and deployment
This is incredibly valuable for educational content. Students can jump directly to the topic they need.
Content Moderation
AI scans uploaded videos for:
- Inappropriate content
- Copyright violations
- Quality issues (extremely low resolution, no audio)
Auto Transcription & Captions
AI generates subtitles automatically, making content accessible to:
- Deaf and hard-of-hearing viewers
- Non-native speakers
- People watching in noisy environments
- Search engines (subtitles are indexable text)
The Cost Problem: Why We Built MediaKit
Let's talk about the elephant in the room. Video infrastructure is expensive.
What Mux Charges
Mux is the leading video API provider. Their pricing:
| Service | Cost |
|---|---|
| Video encoding | $0.015 per minute |
| Video storage | $0.007 per minute per month |
| Video streaming | $0.00075 per minute delivered |
Sounds cheap? Let's do the math for a course platform with 100 courses, 10 hours of content each, and 1,000 monthly active students watching 2 hours/month:
Encoding (one-time): 60,000 min × $0.015 = $900
Storage (monthly): 60,000 min × $0.007 = $420/month
Streaming (monthly): 120,000 min × $0.00075 = $90/month
───────────────────────────────────────────────────
Monthly cost: $510/month = $6,120/year
And that's a small platform. A platform like Udemy with millions of hours? We're talking millions of dollars per year.
What MediaKit Costs
MediaKit is open-source. Self-hosted. You pay for:
| Service | Cost |
|---|---|
| Server (2 vCPU, 4GB RAM) | ~$20/month |
| Storage (Cloudflare R2, 1TB) | ~$15/month |
| CDN delivery (R2 egress) | $0 (R2 has zero egress fees) |
Monthly cost: ~$35/month = $420/year
That's 93% cheaper than Mux for the same workload. And the savings increase exponentially with scale.
Feature Comparison
| Feature | Mux | MediaKit |
|---|---|---|
| HLS Streaming | ✅ | ✅ |
| Adaptive Bitrate | ✅ | ✅ |
| Thumbnail Sprites | ✅ | ✅ |
| Player SDK | ✅ (Mux Player) | ✅ (React SDK) |
| AI Chapters | ❌ | ✅ |
| Content Moderation | ❌ | ✅ |
| Image Transform API | ❌ | ✅ |
| Embed Widget | ✅ | ✅ |
| Self-Hostable | ❌ | ✅ |
| Open Source | ❌ | ✅ |
| Private Assets (JWT) | ✅ | ✅ |
| Webhooks | ✅ | ✅ |
| Analytics | ✅ | ✅ |
| Price | $500+/month | ~$35/month |
How MediaKit Integrates with DGateway
┌──────────────────────────────────────────────────────────────┐
│ COURSE CREATOR │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Upload Video │────▶│ DGateway │────▶│ MediaKit │ │
│ │ (Dropzone) │ │ API Proxy │ │ Server │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌─────────────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ FFmpeg │ │
│ │ Transcode │ │
│ │ 480p/720p/ │ │
│ │ 1080p + HLS │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Cloudflare │ │
│ │ R2 Storage │ │
│ └──────┬───────┘ │
└─────────────────────────────│────────────────────────────────┘
│
┌─────────────────────────────│────────────────────────────────┐
│ STUDENT │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PlyrPlayer │◀────│ CDN Edge │◀────│ HLS.js │ │
│ │ (React SDK) │ │ (cached) │ │ ABR Engine │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ✓ Adaptive quality ✓ Instant start ✓ Seek preview │
└──────────────────────────────────────────────────────────────┘
In DGateway, when a course creator uploads a lesson video:
- Upload → Video goes through DGateway API to MediaKit
- Transcode → MediaKit generates 480p, 720p, 1080p HLS streams
- Store → Segments stored in Cloudflare R2
- Deliver → Student opens lesson, PlyrPlayer loads HLS manifest
- Stream → Adaptive bitrate adjusts to student's connection
- Track → Analytics record play count, watch time, completion
The creator doesn't need to understand any of this. They upload a video, and it just works. Behind the scenes, there are 1,000+ files generated, a CDN distributing globally, and an adaptive algorithm running in the player.
The React SDK
For developers integrating MediaKit into their own apps, we provide a React SDK:
import { MediaKitProvider, PlyrPlayer, Dropzone } from '@mediakit-dev/react';
// Upload videos
<Dropzone
assetType="video"
onUploadComplete={(asset) => console.log(asset.id)}
/>
// Play videos with full HLS support
<PlyrPlayer videoId="42" />
// Optimized images with transforms
<MediaImage
assetId={42}
width={800}
format="webp"
alt="Course thumbnail"
/>Six components. Drop-in. Full video infrastructure in your app.
The Technical Details That Matter
Why .ts Files?
HLS segments use the .ts (MPEG Transport Stream) format, not .mp4. Why?
Transport Stream was designed for broadcast television — it's built to handle packet loss, random access, and multiplexing. Each .ts segment:
- Has its own timing information (no dependency on previous segments)
- Includes error correction data
- Can be decoded independently
- Handles audio/video sync within each segment
This means if a segment is corrupted or lost, only 6 seconds of video is affected. The player can skip to the next segment and continue.
Codec: H.264 vs H.265 vs AV1
The video codec determines compression efficiency:
| Codec | Compression | Browser Support | CPU Usage |
|---|---|---|---|
| H.264 | Baseline | Universal (100%) | Low |
| H.265 (HEVC) | 50% better than H.264 | Safari, some Android | Medium |
| AV1 | 30% better than H.265 | Chrome, Firefox | High |
MediaKit uses H.264 for maximum compatibility. Every browser, every device, every platform supports it. We'll add AV1 as browser support reaches critical mass.
Keyframe Intervals
A video is made up of different frame types:
Frame Types in a Video Stream:
I ─── P ─── P ─── B ─── P ─── P ─── I ─── P ─── P ─── B ─── P ─── I
│ │ │
└──────── Segment 1 (6 sec) ─────────┘──── Segment 2 (6 sec) ───────┘
I-frame (Keyframe): Full image — self-contained, can be decoded alone
┌─────────────┐
│ ████████████│ Complete picture
│ ████████████│ ~100 KB per frame
└─────────────┘
P-frame (Predicted): Only the DIFFERENCES from previous frame
┌─────────────┐
│ ██ │ Just the changes
│ █ │ ~15 KB per frame
└─────────────┘
B-frame (Bi-directional): Differences from BOTH previous AND next
┌─────────────┐
│ ░ │ Smallest size
│ ░ │ ~8 KB per frame
└─────────────┘
- I-frame (Keyframe): Full image, self-contained
- P-frame: Only stores differences from previous frame
- B-frame: Stores differences from both previous and next frames
For HLS, keyframes must align with segment boundaries. If your segments are 6 seconds and your video is 30fps, you need a keyframe every 180 frames. MediaKit enforces this during transcoding:
ffmpeg -i input.mp4 \
-g 180 \ # Keyframe every 180 frames (6s at 30fps)
-keyint_min 180 \ # Minimum keyframe interval
-sc_threshold 0 \ # Disable scene-change keyframes
-hls_time 6 \ # 6-second segments
-hls_segment_type mpegts \
output.m3u8
If keyframes don't align, seeking becomes janky — the player has to decode from the last keyframe to reach the seek point.
Content Protection: DRM & Signed URLs
For paid courses, you don't want people downloading your videos. MediaKit provides two levels of protection:
WITHOUT PROTECTION
──────────────────
Anyone with URL ──▶ CDN ──▶ Video segments ──▶ 🏴☠️ Piracy
WITH JWT-SIGNED URLs
────────────────────
Student logs in ──▶ DGateway ──▶ Signed URL (expires in 4h)
│
▼
CDN checks token
├── Valid? ──▶ ✅ Stream video
└── Expired? ──▶ ❌ 403 Forbidden
WITH AES-128 ENCRYPTION
────────────────────────
.ts segments are encrypted on disk
┌─────────────┐ ┌─────────────┐
│ 🔒 seg_001 │ │ 🔒 seg_002 │ ← Unplayable without key
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ Key Server (authenticated) │ ← Only logged-in students
│ Returns AES decryption key │ can get the key
└──────────────────────────────┘
1. JWT-Signed URLs
Each video URL includes a signed token:
https://cdn.example.com/videos/42/master.m3u8?token=eyJhbGciOiJI...
The token expires after a configurable duration (default: 4 hours). Without a valid token, the CDN returns 403. This prevents URL sharing.
2. AES-128 Encryption (Coming Soon)
Each segment is encrypted with AES-128. The decryption key is served from a separate endpoint that requires authentication. Even if someone downloads the .ts files, they're encrypted and unplayable.
Why Open Source Matters
We made MediaKit open source (GitHub) because:
- Transparency — You can audit exactly how your videos are processed
- No vendor lock-in — Your videos, your infrastructure, forever
- Community — Bug fixes and features from developers worldwide
- Customization — Fork it, modify it, make it yours
- Cost — No per-minute fees, no surprise bills, no limits
The video infrastructure market is dominated by closed-source, expensive APIs. Mux, Cloudinary, Wistia — they're all great products, but they're all proprietary. If they raise prices, change terms, or shut down, you're stuck.
MediaKit gives you the same capabilities with full ownership. Built with Go and React. Deploy it on a $20 VPS or a Kubernetes cluster. Your choice.
What's Next
We're actively building:
- Live streaming — RTMP ingest → HLS output for live classes
- AV1 encoding — 30% smaller files, same quality
- AI auto-captions — Real-time transcription during upload
- Multi-CDN — Automatic failover between CDN providers
- Video analytics v2 — Heatmaps, engagement scoring, drop-off analysis
If you're building a course platform, a video-heavy app, or any product that needs video — check out MediaKit. Star it, fork it, build with it.
And if you want a complete platform with payments, courses, and video streaming all integrated — DGateway is ready for you.
This post is part of our engineering series where we break down the technology behind DGateway. Follow us for more deep dives into payments, streaming, and building for Africa.