Video Rendering
After your video script and timeline have been generated, the next step is rendering - transforming your composition into a final MP4 video file.
Generation vs. Rendering
It's important to understand the distinction between these two phases:
Generation
- What it creates: Script, timeline, and audio assets
- AI-powered: Uses LLMs to write content, structure scenes, and synthesize speech
- Output: JSON files (script.json, timeline.json) and audio file (WAV)
- Speed: Fast (typically 10-30 seconds)
- Cost: API calls to AI services
Rendering
- What it creates: Final MP4 video file
- Deterministic process: Captures browser frames and encodes video
- Output: MP4 file ready for playback
- Speed: Slower (1-5 minutes depending on video length)
- Cost: Compute resources (CPU + RAM)
Think of it this way: Generation is the creative phase where AI writes your video. Rendering is the production phase where that design gets turned into pixels.
Why Separate Generation and Rendering?
This separation provides several benefits:
- Iteration without regeneration - Tweak styles, timing, or layout without calling AI APIs again
- Multiple formats from one generation - Render different aspect ratios or quality levels
- Cost efficiency - Regenerate only when content changes, not for visual updates
- Parallel processing - Generate many videos quickly, render them later in batch
Rendering Modes
Babulus provides three different rendering modes to suit different use cases:
| Mode | Environment | Speed | Setup | Cost |
|---|---|---|---|---|
| Mode 1: Local | Your machine | Fast | Easy | Free |
| Mode 2: Container | Docker locally | Medium | Moderate | Free |
| Mode 3: Cloud | AWS Fargate | Auto-scales | Complex | Pay per render |
Each mode uses the same rendering engine (Playwright + ffmpeg) but runs in a different environment. Choose based on your needs:
- Use Mode 1 for development, testing, quick iterations
- Use Mode 2 for testing containerized deploys, reproducing production issues
- Use Mode 3 for production at scale, long videos, parallel batch processing
What Happens During Rendering?
Regardless of mode, the rendering process follows these steps:
- Load composition - Read script.json and timeline.json
- Launch browser - Start headless Chromium with Playwright
- Render frames - Capture video frames at your target FPS (typically 30fps)
- Navigate through timeline
- Apply animations and transitions
- Capture each frame as PNG
- Encode video - Use ffmpeg to:
- Combine PNG frames into video stream
- Merge audio track
- Encode as H.264 MP4
- Save output - Write final MP4 file
For a 60-second video at 30fps, this means capturing 1,800 individual frames and encoding them into a single video file.
Performance Characteristics
Local rendering (Mode 1):
- 60-second video: ~2-3 minutes
- Dependent on your machine's CPU/RAM
- No network latency
Container rendering (Mode 2):
- 60-second video: ~3-4 minutes
- Container startup overhead (~30 seconds)
- Consistent performance regardless of host
Cloud rendering (Mode 3):
- 60-second video: ~4-6 minutes
- Task provisioning overhead (~2 minutes)
- Scales to handle multiple renders in parallel
- No local resource consumption
Next Steps
- Mode 1: Local Rendering - Start here for development
- Mode 2: Container Rendering - Test production setup locally
- Mode 3: Cloud Rendering - Deploy to production
Technical Details
Want to understand what's happening under the hood?
- Video Encoding Guide - H.264, bitrates, quality settings
- Frame Capture - How Playwright captures browser output
- Performance Tuning - Optimize render speed