How Video Streaming Actually Works: HLS Explained from the Ground Up

DGateway is a unified payment and commerce platform for Africa. Beyond payments, it powers course platforms with built-in video streaming via MediaKit — our open-source video infrastructure. This post explains the engineering behind video streaming, why it's far more complex than most people realize, and why we built MediaKit instead of paying $100K+/year to existing providers.

The Problem Nobody Talks About

Here's what most people think video streaming is:

"You upload a video file, put it on a server, and people watch it."

That's like saying "you write code, put it on a server, and people use your app." Technically true. Practically, a disaster.

Try this experiment: Take a 1-hour lecture video (about 2GB as an MP4) and put it on a regular web server. Then try to watch it on a 3G connection in rural Uganda. Here's what happens:

The browser tries to download the entire 2GB file before playing
On a 1 Mbps connection, that's 4.5 hours to buffer a 1-hour video
If the connection drops for 2 seconds, the download restarts
Your student gives up and never comes back

This is the problem that HTTP Live Streaming (HLS) solves. And the solution is genuinely one of the most elegant pieces of engineering in modern computing.

What is HLS?

HLS (HTTP Live Streaming) was created by Apple in 2009. It's now the dominant video streaming protocol, used by:

Netflix — every movie and show you've ever streamed
YouTube — 500+ hours of video uploaded every minute
Twitch — millions of concurrent live streams
Disney+, HBO Max, Amazon Prime — all of them
Every course platform — Udemy, Coursera, Skillshare

The core idea is deceptively simple: don't send one big file. Send many tiny files.

But the implementation? That's where it gets fascinating.

The Architecture: From Raw Video to Stream

When you upload a video to a system like MediaKit, here's what actually happens behind the scenes. This is the pipeline that runs on every single video:

┌──────────────────────────────────────────────────────────────────┐
│                    THE HLS PIPELINE                              │
│                                                                  │
│  ┌─────────┐    ┌──────────┐    ┌──────────┐    ┌────────────┐  │
│  │  UPLOAD  │───▶│ ANALYZE  │───▶│TRANSCODE │───▶│  SEGMENT   │  │
│  │ Raw MP4  │    │ FFprobe  │    │ FFmpeg   │    │ 6s chunks  │  │
│  └─────────┘    └──────────┘    └──────────┘    └─────┬──────┘  │
│                                                       │         │
│                                   ┌───────────────────┘         │
│                                   ▼                             │
│  ┌─────────┐    ┌──────────┐    ┌──────────────┐                │
│  │  STORE   │◀──│ MANIFEST │◀──│  THUMBNAILS  │                │
│  │ CDN/R2   │    │ .m3u8    │    │  Sprites     │                │
│  └────┬─────┘    └──────────┘    └──────────────┘                │
│       │                                                          │
│       ▼                                                          │
│  ┌──────────────────────────────────────┐                        │
│  │           READY TO STREAM            │                        │
│  │  PlyrPlayer loads master.m3u8        │                        │
│  │  ABR selects quality per bandwidth   │                        │
│  └──────────────────────────────────────┘                        │
└──────────────────────────────────────────────────────────────────┘

Step 1: Ingest & Analyze

The raw video file (MP4, MOV, MKV, WebM) is received and analyzed using FFprobe (part of the FFmpeg suite). We extract:

Duration:     00:42:17
Resolution:   1920x1080 (Full HD)
Codec:        H.264 / AVC
Framerate:    30 fps
Bitrate:      8,500 kbps
Audio:        AAC, 48kHz, stereo
File size:    2.67 GB

This metadata determines how the video will be transcoded.

Step 2: Multi-Quality Transcoding

This is the most computationally expensive step. The original video is re-encoded into multiple quality levels simultaneously:

Quality	Resolution	Bitrate	Use Case
1080p	1920×1080	5,000 kbps	Desktop, fast WiFi
720p	1280×720	2,500 kbps	Tablet, moderate connection
480p	854×480	1,000 kbps	Mobile, slower connection

Each quality level is a complete re-encode of the entire video. A 42-minute 1080p video might take:

1080p encode: ~8 minutes (on a modern CPU with hardware acceleration)
720p encode: ~5 minutes
480p encode: ~3 minutes

We skip 360p by default because the quality is too degraded for educational content. For a course platform where students need to read code on screen or see slides clearly, 480p is the minimum acceptable quality.

Step 3: Segmentation

Here's where HLS gets clever. Each quality level is chopped into small segments, typically 6 seconds each:

┌─────────────────────── Original 42-min Video ───────────────────────┐
│                                                                      │
│  ████████████████████████████████████████████████████████████████    │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
              ┌──────────┐ ┌──────┐ ┌──────┐
              │  1080p   │ │ 720p │ │ 480p │
              └────┬─────┘ └──┬───┘ └──┬───┘
                   │          │        │
          ┌────────┘    ┌─────┘   ┌────┘
          ▼             ▼         ▼
    ┌───┬───┬───┐ ┌───┬───┐ ┌───┬───┐
    │ 0 │ 1 │...│ │ 0 │...│ │ 0 │...│  ← 6-second .ts segments
    └───┴───┴───┘ └───┴───┘ └───┴───┘
    × 420 each    × 420     × 420

    Total: 1,260 individual segment files

A 42-minute video becomes:

1080p: 420 segments × 6 seconds each
720p:  420 segments × 6 seconds each
480p:  420 segments × 6 seconds each
─────────────────────────────────────
Total: 1,260 individual video files

Each segment is a self-contained video clip that can be played independently. This is the key insight that makes everything else possible.

Step 4: Manifest Files (The Brain)

HLS uses .m3u8 playlist files (called manifests) that tell the video player what's available:

Master playlist (master.m3u8):

#EXTM3U
#EXT-X-VERSION:3
 
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
 
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
 
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8

Quality playlist (1080p/playlist.m3u8):

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:0
 
#EXTINF:6.006,
segment_000.ts
#EXTINF:6.006,
segment_001.ts
#EXTINF:6.006,
segment_002.ts
...
#EXTINF:4.838,
segment_419.ts
#EXT-X-ENDLIST

The player reads the master playlist, sees all available qualities, and decides which one to play based on the viewer's bandwidth.

Step 5: Thumbnail Sprites

While transcoding happens, we also generate:

Poster image — the thumbnail shown before play
Sprite sheet — a grid of thumbnail images (one every 5 seconds) used for hover preview on the seek bar
WebVTT file — maps timestamps to sprite positions

This is how YouTube shows you a preview when you hover over the progress bar. We generate the same thing automatically.

Adaptive Bitrate: The Magic

The most impressive part of HLS is Adaptive Bitrate Streaming (ABR). Here's how it works in real-time:

  Bandwidth
  (Mbps)
    15 │  ████                              ████████
       │  ████ ← 1080p                     ████████
    10 │  ████                              ████████
       │  ████                    ▲         ████████
     5 │  ████                    │         ████████
       │  ████    ██████████ ← 720p        ████████
   2.5 │  ████    ██████████              ████████
       │  ████    ██████████              ████████
     1 │  ████    ██████████  ▓▓▓▓ ← 480p ████████
       │  ████    ██████████  ▓▓▓▓        ████████
   0.5 │──████────██████████──▓▓▓▓────────████████──
       │  WiFi    4G          Tunnel      4G/WiFi
       └────────────────────────────────────────────▶ Time

  ✓ No buffering    ✓ No interruption    ✓ Seamless switches

Student starts watching a lecture on WiFi
├── Player detects: 15 Mbps bandwidth
├── Selects: 1080p (needs 5 Mbps) ✓
├── Downloads segment_000.ts (1080p)
├── Downloads segment_001.ts (1080p)
│
├── Student walks to the bus, switches to 4G
├── Player detects: 3 Mbps bandwidth
├── Switches to: 720p (needs 2.5 Mbps) ✓
├── Downloads segment_042.ts (720p)  ← seamless switch!
│
├── Bus enters a tunnel, drops to 500 Kbps
├── Switches to: 480p (needs 1 Mbps)
├── Buffer running low...
├── Downloads segment_089.ts (480p)
│
├── Bus exits tunnel, back to 4G
├── Gradually upgrades back to 720p
└── Then back to 1080p when stable

The viewer never notices. There's no buffering spinner. No interruption. The quality adjusts automatically and seamlessly. The video just keeps playing.

This is why you sometimes notice a YouTube video gets blurry for a few seconds when your connection dips, then sharpens again. That's ABR at work.

Why Regular MP4 Can't Do This

Let's compare what happens with a regular MP4 file vs HLS:

Feature	Raw MP4	HLS
Start time	Must download beginning of file first	Plays within 1-2 seconds
Seeking	May need to re-download from seek point	Instant — just fetch that segment
Bad connection	Endless buffering spinner	Drops to lower quality, keeps playing
Mobile data	Wastes bandwidth on full quality	Only downloads what's needed
Multiple viewers	Each re-downloads the full file	Segments are cached at CDN edge
Content protection	Anyone can download the file	Segments can be encrypted (AES-128)
Resume	Often restarts from beginning	Picks up from exact segment

The CDN Layer: Global Distribution

Once the segments are generated, they need to be distributed globally. This is where a CDN (Content Delivery Network) comes in.

                    ┌─────────────────────┐
                    │   ORIGIN SERVER     │
                    │   (Cloudflare R2)   │
                    │   All 1,260 segments│
                    └────────┬────────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
     ┌────────────┐  ┌────────────┐  ┌────────────┐
     │  CDN Edge  │  │  CDN Edge  │  │  CDN Edge  │
     │  Nairobi   │  │  Lagos     │  │  Cape Town │
     │  ~20ms     │  │  ~20ms    │  │  ~20ms     │
     └─────┬──────┘  └─────┬──────┘  └─────┬──────┘
           │               │               │
     ┌─────┴─────┐   ┌─────┴─────┐   ┌─────┴─────┐
     │ 🎓 Student│   │ 🎓 Student│   │ 🎓 Student│
     │ Kampala   │   │ Accra     │   │ Joburg    │
     └───────────┘   └───────────┘   └───────────┘

  First viewer:  Origin → Edge → Viewer  (~200ms)
  Next viewers:  Edge → Viewer (cached)  (~20ms)

When a student in Kampala watches a video:

Request goes to the nearest CDN edge (e.g., Nairobi)
If the segment is cached there → served instantly (~20ms)
If not cached → fetched from origin (e.g., Cloudflare R2 in Europe), cached for next viewer (~200ms first time)

After the first viewer in a region watches a segment, every subsequent viewer gets it from cache. This is why Netflix can serve 250 million subscribers without melting.

MediaKit supports multiple storage backends:

Cloudflare R2 — zero egress fees, global distribution
AWS S3 — the industry standard
Backblaze B2 — cheapest per-GB storage
MinIO — self-hosted, full control

AI-Powered Video Intelligence

Modern video platforms don't just stream — they understand the content. MediaKit integrates AI capabilities:

Auto-Generated Chapters

AI watches the video and identifies topic transitions:

00:00 - Introduction
03:42 - Setting up the development environment
08:15 - Understanding REST APIs
14:30 - Building your first endpoint
22:07 - Error handling patterns
31:45 - Testing and deployment

This is incredibly valuable for educational content. Students can jump directly to the topic they need.

Content Moderation

AI scans uploaded videos for:

Inappropriate content
Copyright violations
Quality issues (extremely low resolution, no audio)

Auto Transcription & Captions

AI generates subtitles automatically, making content accessible to:

Deaf and hard-of-hearing viewers
Non-native speakers
People watching in noisy environments
Search engines (subtitles are indexable text)

The Cost Problem: Why We Built MediaKit

Let's talk about the elephant in the room. Video infrastructure is expensive.

What Mux Charges

Mux is the leading video API provider. Their pricing:

Service	Cost
Video encoding	$0.015 per minute
Video storage	$0.007 per minute per month
Video streaming	$0.00075 per minute delivered

Sounds cheap? Let's do the math for a course platform with 100 courses, 10 hours of content each, and 1,000 monthly active students watching 2 hours/month:

Encoding (one-time): 60,000 min × $0.015 = $900
Storage (monthly):   60,000 min × $0.007 = $420/month
Streaming (monthly): 120,000 min × $0.00075 = $90/month
───────────────────────────────────────────────────
Monthly cost: $510/month = $6,120/year

And that's a small platform. A platform like Udemy with millions of hours? We're talking millions of dollars per year.

What MediaKit Costs

MediaKit is open-source. Self-hosted. You pay for:

Service	Cost
Server (2 vCPU, 4GB RAM)	~$20/month
Storage (Cloudflare R2, 1TB)	~$15/month
CDN delivery (R2 egress)	$0 (R2 has zero egress fees)

Monthly cost: ~$35/month = $420/year

That's 93% cheaper than Mux for the same workload. And the savings increase exponentially with scale.

Feature Comparison

Feature	Mux	MediaKit
HLS Streaming	✅	✅
Adaptive Bitrate	✅	✅
Thumbnail Sprites	✅	✅
Player SDK	✅ (Mux Player)	✅ (React SDK)
AI Chapters	❌	✅
Content Moderation	❌	✅
Image Transform API	❌	✅
Embed Widget	✅	✅
Self-Hostable	❌	✅
Open Source	❌	✅
Private Assets (JWT)	✅	✅
Webhooks	✅	✅
Analytics	✅	✅
Price	$500+/month	~$35/month

How MediaKit Integrates with DGateway

┌──────────────────────────────────────────────────────────────┐
│                    COURSE CREATOR                            │
│                                                              │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐ │
│  │ Upload Video  │────▶│  DGateway    │────▶│  MediaKit    │ │
│  │ (Dropzone)    │     │  API Proxy   │     │  Server      │ │
│  └──────────────┘     └──────────────┘     └──────┬───────┘ │
│                                                    │         │
│                              ┌─────────────────────┘         │
│                              ▼                               │
│                      ┌──────────────┐                        │
│                      │  FFmpeg      │                        │
│                      │  Transcode   │                        │
│                      │  480p/720p/  │                        │
│                      │  1080p + HLS │                        │
│                      └──────┬───────┘                        │
│                             │                                │
│                             ▼                                │
│                      ┌──────────────┐                        │
│                      │ Cloudflare   │                        │
│                      │ R2 Storage   │                        │
│                      └──────┬───────┘                        │
└─────────────────────────────│────────────────────────────────┘
                              │
┌─────────────────────────────│────────────────────────────────┐
│                    STUDENT                                   │
│                             ▼                                │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐ │
│  │ PlyrPlayer   │◀────│  CDN Edge    │◀────│  HLS.js      │ │
│  │ (React SDK)  │     │  (cached)    │     │  ABR Engine  │ │
│  └──────────────┘     └──────────────┘     └──────────────┘ │
│                                                              │
│  ✓ Adaptive quality    ✓ Instant start    ✓ Seek preview    │
└──────────────────────────────────────────────────────────────┘

In DGateway, when a course creator uploads a lesson video:

Upload → Video goes through DGateway API to MediaKit
Transcode → MediaKit generates 480p, 720p, 1080p HLS streams
Store → Segments stored in Cloudflare R2
Deliver → Student opens lesson, PlyrPlayer loads HLS manifest
Stream → Adaptive bitrate adjusts to student's connection
Track → Analytics record play count, watch time, completion

The creator doesn't need to understand any of this. They upload a video, and it just works. Behind the scenes, there are 1,000+ files generated, a CDN distributing globally, and an adaptive algorithm running in the player.

The React SDK

For developers integrating MediaKit into their own apps, we provide a React SDK:

import { MediaKitProvider, PlyrPlayer, Dropzone } from '@mediakit-dev/react';
 
// Upload videos
<Dropzone
  assetType="video"
  onUploadComplete={(asset) => console.log(asset.id)}
/>
 
// Play videos with full HLS support
<PlyrPlayer videoId="42" />
 
// Optimized images with transforms
<MediaImage
  assetId={42}
  width={800}
  format="webp"
  alt="Course thumbnail"
/>

Six components. Drop-in. Full video infrastructure in your app.

The Technical Details That Matter

Why .ts Files?

HLS segments use the .ts (MPEG Transport Stream) format, not .mp4. Why?

Transport Stream was designed for broadcast television — it's built to handle packet loss, random access, and multiplexing. Each .ts segment:

Has its own timing information (no dependency on previous segments)
Includes error correction data
Can be decoded independently
Handles audio/video sync within each segment

This means if a segment is corrupted or lost, only 6 seconds of video is affected. The player can skip to the next segment and continue.

Codec: H.264 vs H.265 vs AV1

The video codec determines compression efficiency:

Codec	Compression	Browser Support	CPU Usage
H.264	Baseline	Universal (100%)	Low
H.265 (HEVC)	50% better than H.264	Safari, some Android	Medium
AV1	30% better than H.265	Chrome, Firefox	High

MediaKit uses H.264 for maximum compatibility. Every browser, every device, every platform supports it. We'll add AV1 as browser support reaches critical mass.

Keyframe Intervals

A video is made up of different frame types:

Frame Types in a Video Stream:

  I ─── P ─── P ─── B ─── P ─── P ─── I ─── P ─── P ─── B ─── P ─── I
  │                                     │                               │
  └──────── Segment 1 (6 sec) ─────────┘──── Segment 2 (6 sec) ───────┘

  I-frame (Keyframe):  Full image — self-contained, can be decoded alone
                       ┌─────────────┐
                       │ ████████████│  Complete picture
                       │ ████████████│  ~100 KB per frame
                       └─────────────┘

  P-frame (Predicted):  Only the DIFFERENCES from previous frame
                       ┌─────────────┐
                       │     ██      │  Just the changes
                       │        █   │  ~15 KB per frame
                       └─────────────┘

  B-frame (Bi-directional): Differences from BOTH previous AND next
                       ┌─────────────┐
                       │   ░         │  Smallest size
                       │          ░  │  ~8 KB per frame
                       └─────────────┘

I-frame (Keyframe): Full image, self-contained
P-frame: Only stores differences from previous frame
B-frame: Stores differences from both previous and next frames

For HLS, keyframes must align with segment boundaries. If your segments are 6 seconds and your video is 30fps, you need a keyframe every 180 frames. MediaKit enforces this during transcoding:

ffmpeg -i input.mp4 \
  -g 180 \           # Keyframe every 180 frames (6s at 30fps)
  -keyint_min 180 \  # Minimum keyframe interval
  -sc_threshold 0 \  # Disable scene-change keyframes
  -hls_time 6 \      # 6-second segments
  -hls_segment_type mpegts \
  output.m3u8

If keyframes don't align, seeking becomes janky — the player has to decode from the last keyframe to reach the seek point.

Content Protection: DRM & Signed URLs

For paid courses, you don't want people downloading your videos. MediaKit provides two levels of protection:

                    WITHOUT PROTECTION
                    ──────────────────
  Anyone with URL ──▶ CDN ──▶ Video segments ──▶ 🏴‍☠️ Piracy

                    WITH JWT-SIGNED URLs
                    ────────────────────
  Student logs in ──▶ DGateway ──▶ Signed URL (expires in 4h)
                                      │
                                      ▼
                              CDN checks token
                              ├── Valid?  ──▶ ✅ Stream video
                              └── Expired? ──▶ ❌ 403 Forbidden

                    WITH AES-128 ENCRYPTION
                    ────────────────────────
  .ts segments are encrypted on disk
  ┌─────────────┐     ┌─────────────┐
  │ 🔒 seg_001  │     │ 🔒 seg_002  │  ← Unplayable without key
  └──────┬──────┘     └──────┬──────┘
         │                   │
         ▼                   ▼
  ┌──────────────────────────────┐
  │  Key Server (authenticated)  │  ← Only logged-in students
  │  Returns AES decryption key  │     can get the key
  └──────────────────────────────┘

1. JWT-Signed URLs

Each video URL includes a signed token:

https://cdn.example.com/videos/42/master.m3u8?token=eyJhbGciOiJI...

The token expires after a configurable duration (default: 4 hours). Without a valid token, the CDN returns 403. This prevents URL sharing.

2. AES-128 Encryption (Coming Soon)

Each segment is encrypted with AES-128. The decryption key is served from a separate endpoint that requires authentication. Even if someone downloads the .ts files, they're encrypted and unplayable.

Why Open Source Matters

We made MediaKit open source (GitHub) because:

Transparency — You can audit exactly how your videos are processed
No vendor lock-in — Your videos, your infrastructure, forever
Community — Bug fixes and features from developers worldwide
Customization — Fork it, modify it, make it yours
Cost — No per-minute fees, no surprise bills, no limits

The video infrastructure market is dominated by closed-source, expensive APIs. Mux, Cloudinary, Wistia — they're all great products, but they're all proprietary. If they raise prices, change terms, or shut down, you're stuck.

MediaKit gives you the same capabilities with full ownership. Built with Go and React. Deploy it on a $20 VPS or a Kubernetes cluster. Your choice.

What's Next

We're actively building:

Live streaming — RTMP ingest → HLS output for live classes
AV1 encoding — 30% smaller files, same quality
AI auto-captions — Real-time transcription during upload
Multi-CDN — Automatic failover between CDN providers
Video analytics v2 — Heatmaps, engagement scoring, drop-off analysis

If you're building a course platform, a video-heavy app, or any product that needs video — check out MediaKit. Star it, fork it, build with it.

And if you want a complete platform with payments, courses, and video streaming all integrated — DGateway is ready for you.

This post is part of our engineering series where we break down the technology behind DGateway. Follow us for more deep dives into payments, streaming, and building for Africa.

How Video Streaming Actually Works: HLS Explained from the Ground Up

How Video Streaming Actually Works: HLS Explained from the Ground Up

The Problem Nobody Talks About

What is HLS?

The Architecture: From Raw Video to Stream

Step 1: Ingest & Analyze

Step 2: Multi-Quality Transcoding

Step 3: Segmentation

Step 4: Manifest Files (The Brain)

Step 5: Thumbnail Sprites

Adaptive Bitrate: The Magic

Why Regular MP4 Can't Do This

The CDN Layer: Global Distribution

AI-Powered Video Intelligence

Auto-Generated Chapters

Content Moderation

Auto Transcription & Captions

The Cost Problem: Why We Built MediaKit

What Mux Charges

What MediaKit Costs

Feature Comparison

How MediaKit Integrates with DGateway

The React SDK

The Technical Details That Matter

Why .ts Files?

Codec: H.264 vs H.265 vs AV1

Keyframe Intervals

Content Protection: DRM & Signed URLs

1. JWT-Signed URLs

2. AES-128 Encryption (Coming Soon)

Why Open Source Matters

What's Next

Keep reading

Understanding Payment Webhooks: A Developer's Guide

Securing Your Payment API Integration: Best Practices

Accept Mobile Money on Framer Sites with the DGateway Plugin