VideoML Glossary
Comprehensive reference of VideoML and Babulus terminology. All definitions use plain language first, with technical details available for developers.
A
Aspect Ratio
The proportional relationship between width and height of a video. Common ratios: 16:9 (widescreen, 1920×1080), 9:16 (vertical/mobile, 1080×1920), 1:1 (square, 1080×1080), 4:3 (traditional TV, 1440×1080).
Agent
An AI system or automation script that can perform tasks like drafting video scripts, generating variations, or proposing changes. Agents handle repetitive work while humans focus on approvals and creative direction.
Technical: Future feature. Currently, 'agent' refers to planned AI assistance capabilities.
Related:
C
Cue
A narration or dialogue segment within a scene. Each cue contains text that gets converted to speech using text-to-speech (TTS) providers. The duration of the generated audio determines how long the scene lasts. Think of cues as the 'script lines' for your voiceover.
Example:
<cue id="intro-1">Welcome to our product tour.</cue>
<cue id="intro-2">Let me show you the key features.</cue>
Component
A reusable UI building block for video content. Components include titles, subtitles, progress bars, lower thirds, code blocks, and more. They're similar to React components but designed specifically for video timelines.
Technical: Components are implemented as Web Components (custom HTML elements with hyphenated names).
Example:
<title-slide title="Welcome" subtitle="Let's get started" />
<lower-third name="Jane Doe" title="CEO" />
<progress-bar position="bottom" />
Related:
D
Duration
The length of time an element plays, specified in seconds (s) or frames (f). Duration can be set explicitly (duration='5s') or calculated automatically from child elements or generated audio.
Example:
<!-- Explicit duration -->
<scene duration="5s">...</scene>
<!-- Auto-calculated from audio -->
<scene>
<cue>This text will be converted to audio, and the scene duration will match the audio length.</cue>
</scene>
Related:
Deterministic
Producing the same output every time given the same inputs. Babulus videos are deterministic: the same VideoML script always generates the same video. This makes approvals meaningful and iterations predictable.
Technical: Determinism is key for production workflows. You can approve a preview knowing the final render will match exactly.
Data Binding
Connecting component properties to dynamic data from your scene, timeline, or JavaScript context. Allows components to automatically update based on playback state or custom data.
Example:
<!-- Bind title to scene data -->
<title-slide title="{{ scene.title }}" />
<!-- Bind progress to timeline -->
<progress-bar value="{{ timeline.time / timeline.duration }}" />
Related:
E
Encoding
The final step of rendering where individual PNG frames are combined with audio and compressed into an MP4 video file using ffmpeg. H.264 codec is used for broad compatibility.
Technical: Uses ffmpeg command-line tool. Typical settings: H.264 video codec, AAC audio codec, MP4 container format.
F
FPS
Frames Per Second. The number of still images (frames) shown per second of video. Standard values are 24fps (film), 30fps (TV/web), or 60fps (high frame rate). Higher FPS means smoother motion but larger file sizes.
Technical: Set on the root
Example:
<vml fps="30" width="1920" height="1080">
<!-- 30 frames per second -->
</vml>
Related:
G
Generation
The first phase of video creation where Babulus processes your script to create timeline data, generate TTS audio, and prepare assets. Generation is fast (10-30 seconds) and produces JSON files plus audio. This is separate from rendering (which creates the final video).
Technical: Generation creates: script.json (scene structure), timeline.json (timing data), and audio files (TTS output).
Related:
L
Layer
A visual container within a scene that holds components or content. Layers stack on top of each other (like Photoshop layers) and can have independent timing within a scene.
Technical: Layers use CSS z-index for stacking order. Higher z-index values appear on top.
Live Mode
A real-time editing mode where videos play continuously without a fixed end time. Useful for live presentations or interactive editing. In Live Mode, you can add scenes on-the-fly, and the video extends indefinitely. When you 'cut' to the next scene, the previous scene's duration is sealed. Contrast with Export Mode, where all durations must be set before rendering.
Technical: Live Mode uses unbounded timelines. Scenes can have open-ended durations (no explicit end time) until a cut event occurs.
Related:
Layout
A full-frame structural template that defines regions for content. Layouts include title screens, two-column splits, three-column grids, and content screens. They provide consistent structure across videos.
Example:
<content-screen
title="Feature Overview"
subtitle="Key Capabilities"
>
<!-- Content goes here -->
</content-screen>
N
P
Project Storage
Cloud-based file storage system for Babulus projects. Files are stored in S3, metadata in DynamoDB, and served via CloudFront CDN. Provides version control and multi-tenant isolation.
Technical: Uses AWS S3 for files, DynamoDB for metadata, CloudFront for CDN.
R
Rendering
The process of converting a VideoML document and generated assets (audio, images) into a final MP4 video file. Rendering captures browser frames at your target FPS and encodes them with ffmpeg.
Technical: Three rendering modes: Local (your machine), Container (Docker locally), Cloud (AWS Fargate). Local is fastest for development; cloud scales for production.
Related:
Recording
The process of capturing a live or interactive video session into a fixed-duration VideoML document. In Live Mode, recording seals all open-ended scene durations and creates a deterministic timeline that can be rendered.
S
Scene
A distinct section of video with its own content and timing. Scenes are the building blocks of videos, similar to slides in a presentation or chapters in a book. Each scene can contain visual layers, audio, and components.
Example:
<scene id="intro" duration="5s">
<layer>
<title-slide title="Welcome" />
</layer>
</scene>
Sequence
A container that plays its children back-to-back in order. If you have three 5-second scenes in a sequence, the total duration is 15 seconds. Each child starts when the previous one ends.
Example:
<sequence>
<scene duration="5s">Scene 1</scene> <!-- 0-5s -->
<scene duration="3s">Scene 2</scene> <!-- 5-8s -->
<scene duration="2s">Scene 3</scene> <!-- 8-10s -->
</sequence>
<!-- Total: 10 seconds -->
Related:
Stack
A container that plays its children simultaneously (in parallel). The total duration equals the longest child. Use stacks to combine audio narration with visuals, or to overlay multiple visual layers.
Example:
<stack>
<!-- Visual layer: 10s -->
<layer duration="10s">
<content-screen />
</layer>
<!-- Audio track: 10s -->
<audio src="narration.wav" duration="10s" />
</stack>
<!-- Total: 10 seconds (longest child) -->
Related:
T
Temporal Layout
The automatic calculation of video durations based on content length. When you add a 3-second scene to a video, the total duration grows by 3 seconds—just like adding a paragraph makes a webpage taller. You don't manually calculate total duration; VideoML handles it automatically.
Technical: Implementation uses a reflow algorithm similar to CSS box model layout, but applied to the time dimension instead of spatial dimensions.
Example:
<sequence>
<!-- This sequence automatically becomes 5s total -->
<scene duration="3s">First scene</scene>
<scene duration="2s">Second scene</scene>
</sequence>
Timeline API
JavaScript API for accessing and controlling video playback. Provides access to current frame number, playback time, and frame rate. Available globally as window.timeline.
Example:
// Access current playback state
window.timeline.frame // Current frame (0-indexed)
window.timeline.time // Current time in seconds
window.timeline.fps // Frame rate
// Listen for timeline events
window.addEventListener('timeline:tick', (e) => {
console.log('Frame:', e.detail.frame);
});
TTS
Text-to-Speech. Converts written text into spoken audio using AI voice synthesis. Babulus supports multiple TTS providers with different quality/cost profiles: ElevenLabs (high quality, ~$1/1000 chars), AWS Polly (good quality, ~$4/1M chars), Azure Speech (good quality, ~$16/1M chars). Generated audio determines scene duration in narration-driven videos.
V
VideoML
An XML-based markup language for creating videos. Similar to how HTML describes web pages, VideoML describes video scenes, timing, and components. Files use the .babulus.xml extension.
Technical: VideoML is the canonical format for Babulus projects. It can be authored directly as XML or generated from JavaScript/TypeScript DSL code.
Related:
VOM
Video Object Model. The in-memory representation of a VideoML document during playback. Just as web browsers parse HTML into a Document Object Model (DOM) for rendering, Babulus parses VideoML into a VOM for video playback. You can inspect and manipulate the VOM using JavaScript, similar to manipulating HTML with JavaScript.
Technical: The VOM is a browser DOM subtree. VideoML elements become actual DOM nodes that can be styled with CSS and controlled with JavaScript.
Related:
W
Web Component
A browser standard for creating custom HTML elements with encapsulated behavior. VideoML components are built as Web Components, allowing you to create custom video elements with hyphenated names (like
Technical: Uses the Custom Elements API. No React or Vue required—works with vanilla JavaScript and DOM APIs.
Can't find what you're looking for?
Use the documentation search (Cmd/Ctrl+K) to find terms across all pages, or check out these resources:
- • VideoML Standard - Complete language reference
- • Component Catalog - Visual components reference
- • Live VOM - Advanced timeline features