← Back to blog
engineering17 min read

How Video Streaming Actually Works: HLS Explained from the Ground Up

A deep technical breakdown of HTTP Live Streaming (HLS) — the technology behind Netflix, YouTube, and every modern video platform. Learn how raw video becomes adaptive multi-quality streams, why it matters, and how we built MediaKit as an open-source alternative to Mux.

How Video Streaming Actually Works: HLS Explained from the Ground Up

How Video Streaming Actually Works: HLS Explained from the Ground Up

DGateway is a unified payment and commerce platform for Africa. Beyond payments, it powers course platforms with built-in video streaming via MediaKit — our open-source video infrastructure. This post explains the engineering behind video streaming, why it's far more complex than most people realize, and why we built MediaKit instead of paying $100K+/year to existing providers.


The Problem Nobody Talks About

Here's what most people think video streaming is:

"You upload a video file, put it on a server, and people watch it."

That's like saying "you write code, put it on a server, and people use your app." Technically true. Practically, a disaster.

Try this experiment: Take a 1-hour lecture video (about 2GB as an MP4) and put it on a regular web server. Then try to watch it on a 3G connection in rural Uganda. Here's what happens:

  1. The browser tries to download the entire 2GB file before playing
  2. On a 1 Mbps connection, that's 4.5 hours to buffer a 1-hour video
  3. If the connection drops for 2 seconds, the download restarts
  4. Your student gives up and never comes back

This is the problem that HTTP Live Streaming (HLS) solves. And the solution is genuinely one of the most elegant pieces of engineering in modern computing.


What is HLS?

HLS (HTTP Live Streaming) was created by Apple in 2009. It's now the dominant video streaming protocol, used by:

  • Netflix — every movie and show you've ever streamed
  • YouTube — 500+ hours of video uploaded every minute
  • Twitch — millions of concurrent live streams
  • Disney+, HBO Max, Amazon Prime — all of them
  • Every course platform — Udemy, Coursera, Skillshare

The core idea is deceptively simple: don't send one big file. Send many tiny files.

But the implementation? That's where it gets fascinating.


The Architecture: From Raw Video to Stream

When you upload a video to a system like MediaKit, here's what actually happens behind the scenes. This is the pipeline that runs on every single video:

┌──────────────────────────────────────────────────────────────────┐
│                    THE HLS PIPELINE                              │
│                                                                  │
│  ┌─────────┐    ┌──────────┐    ┌──────────┐    ┌────────────┐  │
│  │  UPLOAD  │───▶│ ANALYZE  │───▶│TRANSCODE │───▶│  SEGMENT   │  │
│  │ Raw MP4  │    │ FFprobe  │    │ FFmpeg   │    │ 6s chunks  │  │
│  └─────────┘    └──────────┘    └──────────┘    └─────┬──────┘  │
│                                                       │         │
│                                   ┌───────────────────┘         │
│                                   ▼                             │
│  ┌─────────┐    ┌──────────┐    ┌──────────────┐                │
│  │  STORE   │◀──│ MANIFEST │◀──│  THUMBNAILS  │                │
│  │ CDN/R2   │    │ .m3u8    │    │  Sprites     │                │
│  └────┬─────┘    └──────────┘    └──────────────┘                │
│       │                                                          │
│       ▼                                                          │
│  ┌──────────────────────────────────────┐                        │
│  │           READY TO STREAM            │                        │
│  │  PlyrPlayer loads master.m3u8        │                        │
│  │  ABR selects quality per bandwidth   │                        │
│  └──────────────────────────────────────┘                        │
└──────────────────────────────────────────────────────────────────┘

Step 1: Ingest & Analyze

The raw video file (MP4, MOV, MKV, WebM) is received and analyzed using FFprobe (part of the FFmpeg suite). We extract:

Duration:     00:42:17
Resolution:   1920x1080 (Full HD)
Codec:        H.264 / AVC
Framerate:    30 fps
Bitrate:      8,500 kbps
Audio:        AAC, 48kHz, stereo
File size:    2.67 GB

This metadata determines how the video will be transcoded.

Step 2: Multi-Quality Transcoding

This is the most computationally expensive step. The original video is re-encoded into multiple quality levels simultaneously:

QualityResolutionBitrateUse Case
1080p1920×10805,000 kbpsDesktop, fast WiFi
720p1280×7202,500 kbpsTablet, moderate connection
480p854×4801,000 kbpsMobile, slower connection

Each quality level is a complete re-encode of the entire video. A 42-minute 1080p video might take:

  • 1080p encode: ~8 minutes (on a modern CPU with hardware acceleration)
  • 720p encode: ~5 minutes
  • 480p encode: ~3 minutes

We skip 360p by default because the quality is too degraded for educational content. For a course platform where students need to read code on screen or see slides clearly, 480p is the minimum acceptable quality.

Step 3: Segmentation

Here's where HLS gets clever. Each quality level is chopped into small segments, typically 6 seconds each:

┌─────────────────────── Original 42-min Video ───────────────────────┐
│                                                                      │
│  ████████████████████████████████████████████████████████████████    │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
              ┌──────────┐ ┌──────┐ ┌──────┐
              │  1080p   │ │ 720p │ │ 480p │
              └────┬─────┘ └──┬───┘ └──┬───┘
                   │          │        │
          ┌────────┘    ┌─────┘   ┌────┘
          ▼             ▼         ▼
    ┌───┬───┬───┐ ┌───┬───┐ ┌───┬───┐
    │ 0 │ 1 │...│ │ 0 │...│ │ 0 │...│  ← 6-second .ts segments
    └───┴───┴───┘ └───┴───┘ └───┴───┘
    × 420 each    × 420     × 420

    Total: 1,260 individual segment files

A 42-minute video becomes:

1080p: 420 segments × 6 seconds each
720p:  420 segments × 6 seconds each
480p:  420 segments × 6 seconds each
─────────────────────────────────────
Total: 1,260 individual video files

Each segment is a self-contained video clip that can be played independently. This is the key insight that makes everything else possible.

Step 4: Manifest Files (The Brain)

HLS uses .m3u8 playlist files (called manifests) that tell the video player what's available:

Master playlist (master.m3u8):

#EXTM3U
#EXT-X-VERSION:3
 
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
 
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
 
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8

Quality playlist (1080p/playlist.m3u8):

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:0
 
#EXTINF:6.006,
segment_000.ts
#EXTINF:6.006,
segment_001.ts
#EXTINF:6.006,
segment_002.ts
...
#EXTINF:4.838,
segment_419.ts
#EXT-X-ENDLIST

The player reads the master playlist, sees all available qualities, and decides which one to play based on the viewer's bandwidth.

Step 5: Thumbnail Sprites

While transcoding happens, we also generate:

  • Poster image — the thumbnail shown before play
  • Sprite sheet — a grid of thumbnail images (one every 5 seconds) used for hover preview on the seek bar
  • WebVTT file — maps timestamps to sprite positions

This is how YouTube shows you a preview when you hover over the progress bar. We generate the same thing automatically.


Adaptive Bitrate: The Magic

The most impressive part of HLS is Adaptive Bitrate Streaming (ABR). Here's how it works in real-time:

  Bandwidth
  (Mbps)
    15 │  ████                              ████████
       │  ████ ← 1080p                     ████████
    10 │  ████                              ████████
       │  ████                    ▲         ████████
     5 │  ████                    │         ████████
       │  ████    ██████████ ← 720p        ████████
   2.5 │  ████    ██████████              ████████
       │  ████    ██████████              ████████
     1 │  ████    ██████████  ▓▓▓▓ ← 480p ████████
       │  ████    ██████████  ▓▓▓▓        ████████
   0.5 │──████────██████████──▓▓▓▓────────████████──
       │  WiFi    4G          Tunnel      4G/WiFi
       └────────────────────────────────────────────▶ Time

  ✓ No buffering    ✓ No interruption    ✓ Seamless switches
Student starts watching a lecture on WiFi
├── Player detects: 15 Mbps bandwidth
├── Selects: 1080p (needs 5 Mbps) ✓
├── Downloads segment_000.ts (1080p)
├── Downloads segment_001.ts (1080p)
│
├── Student walks to the bus, switches to 4G
├── Player detects: 3 Mbps bandwidth
├── Switches to: 720p (needs 2.5 Mbps) ✓
├── Downloads segment_042.ts (720p)  ← seamless switch!
│
├── Bus enters a tunnel, drops to 500 Kbps
├── Switches to: 480p (needs 1 Mbps)
├── Buffer running low...
├── Downloads segment_089.ts (480p)
│
├── Bus exits tunnel, back to 4G
├── Gradually upgrades back to 720p
└── Then back to 1080p when stable

The viewer never notices. There's no buffering spinner. No interruption. The quality adjusts automatically and seamlessly. The video just keeps playing.

This is why you sometimes notice a YouTube video gets blurry for a few seconds when your connection dips, then sharpens again. That's ABR at work.


Why Regular MP4 Can't Do This

Let's compare what happens with a regular MP4 file vs HLS:

FeatureRaw MP4HLS
Start timeMust download beginning of file firstPlays within 1-2 seconds
SeekingMay need to re-download from seek pointInstant — just fetch that segment
Bad connectionEndless buffering spinnerDrops to lower quality, keeps playing
Mobile dataWastes bandwidth on full qualityOnly downloads what's needed
Multiple viewersEach re-downloads the full fileSegments are cached at CDN edge
Content protectionAnyone can download the fileSegments can be encrypted (AES-128)
ResumeOften restarts from beginningPicks up from exact segment

The CDN Layer: Global Distribution

Once the segments are generated, they need to be distributed globally. This is where a CDN (Content Delivery Network) comes in.

                    ┌─────────────────────┐
                    │   ORIGIN SERVER     │
                    │   (Cloudflare R2)   │
                    │   All 1,260 segments│
                    └────────┬────────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
     ┌────────────┐  ┌────────────┐  ┌────────────┐
     │  CDN Edge  │  │  CDN Edge  │  │  CDN Edge  │
     │  Nairobi   │  │  Lagos     │  │  Cape Town │
     │  ~20ms     │  │  ~20ms    │  │  ~20ms     │
     └─────┬──────┘  └─────┬──────┘  └─────┬──────┘
           │               │               │
     ┌─────┴─────┐   ┌─────┴─────┐   ┌─────┴─────┐
     │ 🎓 Student│   │ 🎓 Student│   │ 🎓 Student│
     │ Kampala   │   │ Accra     │   │ Joburg    │
     └───────────┘   └───────────┘   └───────────┘

  First viewer:  Origin → Edge → Viewer  (~200ms)
  Next viewers:  Edge → Viewer (cached)  (~20ms)

When a student in Kampala watches a video:

  1. Request goes to the nearest CDN edge (e.g., Nairobi)
  2. If the segment is cached there → served instantly (~20ms)
  3. If not cached → fetched from origin (e.g., Cloudflare R2 in Europe), cached for next viewer (~200ms first time)

After the first viewer in a region watches a segment, every subsequent viewer gets it from cache. This is why Netflix can serve 250 million subscribers without melting.

MediaKit supports multiple storage backends:

  • Cloudflare R2 — zero egress fees, global distribution
  • AWS S3 — the industry standard
  • Backblaze B2 — cheapest per-GB storage
  • MinIO — self-hosted, full control

AI-Powered Video Intelligence

Modern video platforms don't just stream — they understand the content. MediaKit integrates AI capabilities:

Auto-Generated Chapters

AI watches the video and identifies topic transitions:

00:00 - Introduction
03:42 - Setting up the development environment
08:15 - Understanding REST APIs
14:30 - Building your first endpoint
22:07 - Error handling patterns
31:45 - Testing and deployment

This is incredibly valuable for educational content. Students can jump directly to the topic they need.

Content Moderation

AI scans uploaded videos for:

  • Inappropriate content
  • Copyright violations
  • Quality issues (extremely low resolution, no audio)

Auto Transcription & Captions

AI generates subtitles automatically, making content accessible to:

  • Deaf and hard-of-hearing viewers
  • Non-native speakers
  • People watching in noisy environments
  • Search engines (subtitles are indexable text)

The Cost Problem: Why We Built MediaKit

Let's talk about the elephant in the room. Video infrastructure is expensive.

What Mux Charges

Mux is the leading video API provider. Their pricing:

ServiceCost
Video encoding$0.015 per minute
Video storage$0.007 per minute per month
Video streaming$0.00075 per minute delivered

Sounds cheap? Let's do the math for a course platform with 100 courses, 10 hours of content each, and 1,000 monthly active students watching 2 hours/month:

Encoding (one-time): 60,000 min × $0.015 = $900
Storage (monthly):   60,000 min × $0.007 = $420/month
Streaming (monthly): 120,000 min × $0.00075 = $90/month
───────────────────────────────────────────────────
Monthly cost: $510/month = $6,120/year

And that's a small platform. A platform like Udemy with millions of hours? We're talking millions of dollars per year.

What MediaKit Costs

MediaKit is open-source. Self-hosted. You pay for:

ServiceCost
Server (2 vCPU, 4GB RAM)~$20/month
Storage (Cloudflare R2, 1TB)~$15/month
CDN delivery (R2 egress)$0 (R2 has zero egress fees)
Monthly cost: ~$35/month = $420/year

That's 93% cheaper than Mux for the same workload. And the savings increase exponentially with scale.

Feature Comparison

FeatureMuxMediaKit
HLS Streaming
Adaptive Bitrate
Thumbnail Sprites
Player SDK✅ (Mux Player)✅ (React SDK)
AI Chapters
Content Moderation
Image Transform API
Embed Widget
Self-Hostable
Open Source
Private Assets (JWT)
Webhooks
Analytics
Price$500+/month~$35/month

How MediaKit Integrates with DGateway

┌──────────────────────────────────────────────────────────────┐
│                    COURSE CREATOR                            │
│                                                              │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐ │
│  │ Upload Video  │────▶│  DGateway    │────▶│  MediaKit    │ │
│  │ (Dropzone)    │     │  API Proxy   │     │  Server      │ │
│  └──────────────┘     └──────────────┘     └──────┬───────┘ │
│                                                    │         │
│                              ┌─────────────────────┘         │
│                              ▼                               │
│                      ┌──────────────┐                        │
│                      │  FFmpeg      │                        │
│                      │  Transcode   │                        │
│                      │  480p/720p/  │                        │
│                      │  1080p + HLS │                        │
│                      └──────┬───────┘                        │
│                             │                                │
│                             ▼                                │
│                      ┌──────────────┐                        │
│                      │ Cloudflare   │                        │
│                      │ R2 Storage   │                        │
│                      └──────┬───────┘                        │
└─────────────────────────────│────────────────────────────────┘
                              │
┌─────────────────────────────│────────────────────────────────┐
│                    STUDENT                                   │
│                             ▼                                │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐ │
│  │ PlyrPlayer   │◀────│  CDN Edge    │◀────│  HLS.js      │ │
│  │ (React SDK)  │     │  (cached)    │     │  ABR Engine  │ │
│  └──────────────┘     └──────────────┘     └──────────────┘ │
│                                                              │
│  ✓ Adaptive quality    ✓ Instant start    ✓ Seek preview    │
└──────────────────────────────────────────────────────────────┘

In DGateway, when a course creator uploads a lesson video:

  1. Upload → Video goes through DGateway API to MediaKit
  2. Transcode → MediaKit generates 480p, 720p, 1080p HLS streams
  3. Store → Segments stored in Cloudflare R2
  4. Deliver → Student opens lesson, PlyrPlayer loads HLS manifest
  5. Stream → Adaptive bitrate adjusts to student's connection
  6. Track → Analytics record play count, watch time, completion

The creator doesn't need to understand any of this. They upload a video, and it just works. Behind the scenes, there are 1,000+ files generated, a CDN distributing globally, and an adaptive algorithm running in the player.

The React SDK

For developers integrating MediaKit into their own apps, we provide a React SDK:

import { MediaKitProvider, PlyrPlayer, Dropzone } from '@mediakit-dev/react';
 
// Upload videos
<Dropzone
  assetType="video"
  onUploadComplete={(asset) => console.log(asset.id)}
/>
 
// Play videos with full HLS support
<PlyrPlayer videoId="42" />
 
// Optimized images with transforms
<MediaImage
  assetId={42}
  width={800}
  format="webp"
  alt="Course thumbnail"
/>

Six components. Drop-in. Full video infrastructure in your app.


The Technical Details That Matter

Why .ts Files?

HLS segments use the .ts (MPEG Transport Stream) format, not .mp4. Why?

Transport Stream was designed for broadcast television — it's built to handle packet loss, random access, and multiplexing. Each .ts segment:

  • Has its own timing information (no dependency on previous segments)
  • Includes error correction data
  • Can be decoded independently
  • Handles audio/video sync within each segment

This means if a segment is corrupted or lost, only 6 seconds of video is affected. The player can skip to the next segment and continue.

Codec: H.264 vs H.265 vs AV1

The video codec determines compression efficiency:

CodecCompressionBrowser SupportCPU Usage
H.264BaselineUniversal (100%)Low
H.265 (HEVC)50% better than H.264Safari, some AndroidMedium
AV130% better than H.265Chrome, FirefoxHigh

MediaKit uses H.264 for maximum compatibility. Every browser, every device, every platform supports it. We'll add AV1 as browser support reaches critical mass.

Keyframe Intervals

A video is made up of different frame types:

Frame Types in a Video Stream:

  I ─── P ─── P ─── B ─── P ─── P ─── I ─── P ─── P ─── B ─── P ─── I
  │                                     │                               │
  └──────── Segment 1 (6 sec) ─────────┘──── Segment 2 (6 sec) ───────┘

  I-frame (Keyframe):  Full image — self-contained, can be decoded alone
                       ┌─────────────┐
                       │ ████████████│  Complete picture
                       │ ████████████│  ~100 KB per frame
                       └─────────────┘

  P-frame (Predicted):  Only the DIFFERENCES from previous frame
                       ┌─────────────┐
                       │     ██      │  Just the changes
                       │        █   │  ~15 KB per frame
                       └─────────────┘

  B-frame (Bi-directional): Differences from BOTH previous AND next
                       ┌─────────────┐
                       │   ░         │  Smallest size
                       │          ░  │  ~8 KB per frame
                       └─────────────┘
  • I-frame (Keyframe): Full image, self-contained
  • P-frame: Only stores differences from previous frame
  • B-frame: Stores differences from both previous and next frames

For HLS, keyframes must align with segment boundaries. If your segments are 6 seconds and your video is 30fps, you need a keyframe every 180 frames. MediaKit enforces this during transcoding:

ffmpeg -i input.mp4 \
  -g 180 \           # Keyframe every 180 frames (6s at 30fps)
  -keyint_min 180 \  # Minimum keyframe interval
  -sc_threshold 0 \  # Disable scene-change keyframes
  -hls_time 6 \      # 6-second segments
  -hls_segment_type mpegts \
  output.m3u8

If keyframes don't align, seeking becomes janky — the player has to decode from the last keyframe to reach the seek point.


Content Protection: DRM & Signed URLs

For paid courses, you don't want people downloading your videos. MediaKit provides two levels of protection:

                    WITHOUT PROTECTION
                    ──────────────────
  Anyone with URL ──▶ CDN ──▶ Video segments ──▶ 🏴‍☠️ Piracy

                    WITH JWT-SIGNED URLs
                    ────────────────────
  Student logs in ──▶ DGateway ──▶ Signed URL (expires in 4h)
                                      │
                                      ▼
                              CDN checks token
                              ├── Valid?  ──▶ ✅ Stream video
                              └── Expired? ──▶ ❌ 403 Forbidden

                    WITH AES-128 ENCRYPTION
                    ────────────────────────
  .ts segments are encrypted on disk
  ┌─────────────┐     ┌─────────────┐
  │ 🔒 seg_001  │     │ 🔒 seg_002  │  ← Unplayable without key
  └──────┬──────┘     └──────┬──────┘
         │                   │
         ▼                   ▼
  ┌──────────────────────────────┐
  │  Key Server (authenticated)  │  ← Only logged-in students
  │  Returns AES decryption key  │     can get the key
  └──────────────────────────────┘

1. JWT-Signed URLs

Each video URL includes a signed token:

https://cdn.example.com/videos/42/master.m3u8?token=eyJhbGciOiJI...

The token expires after a configurable duration (default: 4 hours). Without a valid token, the CDN returns 403. This prevents URL sharing.

2. AES-128 Encryption (Coming Soon)

Each segment is encrypted with AES-128. The decryption key is served from a separate endpoint that requires authentication. Even if someone downloads the .ts files, they're encrypted and unplayable.


Why Open Source Matters

We made MediaKit open source (GitHub) because:

  1. Transparency — You can audit exactly how your videos are processed
  2. No vendor lock-in — Your videos, your infrastructure, forever
  3. Community — Bug fixes and features from developers worldwide
  4. Customization — Fork it, modify it, make it yours
  5. Cost — No per-minute fees, no surprise bills, no limits

The video infrastructure market is dominated by closed-source, expensive APIs. Mux, Cloudinary, Wistia — they're all great products, but they're all proprietary. If they raise prices, change terms, or shut down, you're stuck.

MediaKit gives you the same capabilities with full ownership. Built with Go and React. Deploy it on a $20 VPS or a Kubernetes cluster. Your choice.


What's Next

We're actively building:

  • Live streaming — RTMP ingest → HLS output for live classes
  • AV1 encoding — 30% smaller files, same quality
  • AI auto-captions — Real-time transcription during upload
  • Multi-CDN — Automatic failover between CDN providers
  • Video analytics v2 — Heatmaps, engagement scoring, drop-off analysis

If you're building a course platform, a video-heavy app, or any product that needs video — check out MediaKit. Star it, fork it, build with it.

And if you want a complete platform with payments, courses, and video streaming all integrated — DGateway is ready for you.


This post is part of our engineering series where we break down the technology behind DGateway. Follow us for more deep dives into payments, streaming, and building for Africa.