VideoML Standard

Canonical docs now live at videoml.org/docs/standard.

Status: Draft (v0.1)

VideoML is an XML-based standard for creating videos using declarative markup. It treats time as a layout axis and uses the browser DOM as the canonical runtime. VideoML feels like web programming with a timeline.

What You'll Learn


Canonical Root Element

Every VideoML document starts with a <vml> root element. This is the canonical format—XML files are the source of truth, and all derived outputs (MP4 videos, JSON timelines) are generated artifacts.

Required Attributes

<vml
  id="intro"
  title="Introduction Video"
  fps="30"
  width="1920"
  height="1080"
>
  <!-- scenes go here -->
</vml>
  • id: Unique identifier for this video
  • title: Human-readable name
  • fps: Frames per second (24, 30, or 60 typical)
  • width / height: Video dimensions in pixels

File Extension

VideoML files use the .babulus.xml extension. This disambiguates them from generic XML and signals they contain time-based video markup.

Design Decision: XML was chosen over JSON because it maps naturally to HTML/DOM, supports mixed content (text + elements), and allows Web Component syntax (<title-slide />).

Temporal Layout Model

VideoML automatically calculates durations based on content, treating time like CSS treats space. Add a 5-second scene, and the video grows by 5 seconds—no manual duration math required.

Sequence (Back-to-Back)

The <sequence> element plays children one after another. Total duration equals the sum of all child durations.

<sequence>
  <scene duration="3s">Scene 1</scene>  <!-- 0-3s -->
  <scene duration="2s">Scene 2</scene>  <!-- 3-5s -->
  <scene duration="4s">Scene 3</scene>  <!-- 5-9s -->
</sequence>
<!-- Total: 9 seconds -->

Stack (Parallel)

The <stack> element plays children simultaneously. Total duration equals the longest child.

<stack>
  <!-- Visual layer: 10s -->
  <layer duration="10s">
    <title-slide title="Welcome" />
  </layer>

  <!-- Audio narration: 8s -->
  <audio src="narration.wav" duration="8s" />
</stack>
<!-- Total: 10 seconds (longest child) -->

Automatic Duration

If you omit the duration attribute, VideoML calculates it from child elements or generated audio (TTS).

<scene>
  <cue>Welcome to the tutorial.</cue>
</scene>
<!-- Duration = length of TTS audio for "Welcome to the tutorial." -->
Tip: Narration-driven videos use automatic duration. Motion-driven videos (animations, screen recordings) set explicit durations.

Timeline API

Access current playback state via the global window.timeline object. This API is available in both Live Mode (real-time editing) and Export Mode (rendering).

Properties

Property Type Description
window.timeline.frame number Current frame (0-indexed)
window.timeline.time number Current time in seconds
window.timeline.fps number Frames per second
window.timelines Timeline[] Array of all timelines (for multi-player sync)

CSS Variables

VideoML sets CSS custom properties on the root element every frame. Use these in stylesheets for dynamic effects.

/* Access timeline state in CSS */
.progress-bar {
  width: calc(100% * var(--video-time) / var(--video-duration));
}

.fade-in {
  opacity: calc(var(--video-frame) / 30); /* Fade in over 30 frames */
}

JavaScript Example

// Log playback state every frame
window.addEventListener('timeline:tick', (e) => {
  console.log("Frame " + window.timeline.frame + " at " + window.timeline.time + "s");
});

// Calculate progress percentage
const progress = (window.timeline.frame / totalFrames) * 100;
Performance: timeline:tick fires every frame (30-60 times per second). Avoid expensive operations in tick handlers.

Lifecycle Events

VideoML dispatches events when scenes and cues start/end. Listen to these events to trigger animations, update UI, or log analytics.

Event Types

Event Dispatched On Detail
timeline:tick <vml>, window { frame, time, fps }
scene:start <scene>, window { sceneId, startTime }
scene:end <scene>, window { sceneId, endTime }
cue:start <cue>, window { cueId, text }
cue:end <cue>, window { cueId }

Example: Scene Transitions

<vml id="demo" fps="30" width="1920" height="1080">
  <script>
    // Log when scenes change
    window.addEventListener('scene:start', (e) => {
      console.log(`Scene started: ${e.detail.sceneId}`);
    });
  </script>

  <scene id="intro" duration="3s">
    <!-- scene content -->
  </scene>
</vml>

Inline JavaScript

VideoML supports <script> tags and on:* event handler attributes. Scripts execute when inserted into the DOM, just like HTML.

Script Blocks

<vml id="interactive" fps="30" width="1920" height="1080">
  <script>
    // Initialize state
    let clickCount = 0;

    function handleClick() {
      clickCount++;
      console.log(`Clicked ${clickCount} times`);
    }
  </script>

  <scene duration="5s">
    <button on:click="handleClick()">Click Me</button>
  </scene>
</vml>

Event Handler Attributes

Use on:* attributes to attach event handlers inline. The handler scope includes:

  • event: The DOM event object
  • target: The element that triggered the event
  • timeline: The current timeline object
  • root: The <vml> root element
<button on:click="console.log(timeline.frame)">Log Frame</button>
<input on:change="target.value = target.value.toUpperCase()" />

Ignoring Subtrees

Use data-videoml-ignore="true" to exclude DOM subtrees from handler rebinding and mutation recording. This is useful for third-party widgets or performance-sensitive areas.

<div data-videoml-ignore="true">
  <!-- This subtree is ignored by VideoML runtime -->
  <iframe src="external-widget.html"></iframe>
</div>

Scenes and Layers

Scenes are the primary structural unit. Layers provide z-index stacking within scenes.

Scene Element

<scene
  id="intro"
  start="0s"
  duration="5s"
>
  <!-- scene content -->
</scene>
  • id: Unique identifier
  • start: Absolute start time (optional, defaults to sequential)
  • duration: Scene length (optional if auto-calculated)

Layer Element

Layers stack visually using CSS z-index. Higher z-index appears on top.

<scene duration="8s">
  <!-- Background layer (z-index: 0) -->
  <layer style="z-index: 0">
    <background-gradient colors="blue,purple" />
  </layer>

  <!-- Content layer (z-index: 10) -->
  <layer style="z-index: 10">
    <title-slide title="Hello" />
  </layer>

  <!-- Overlay layer (z-index: 20) -->
  <layer style="z-index: 20">
    <lower-third name="Jane Doe" title="CEO" />
  </layer>
</scene>

Cues and Text-to-Speech

Cues contain narration text that gets converted to audio via TTS providers. The audio duration determines scene length.

Cue Element

<scene>
  <cue id="intro-1" voice="en-US-Neural">
    Welcome to the tutorial. Today we'll cover the basics.
  </cue>
</scene>
<!-- Scene duration = TTS audio length -->
  • id: Unique identifier for this cue
  • voice: TTS voice name (provider-specific)
  • Text content: The narration script

Multiple Cues

Multiple cues in a scene play sequentially (like a sequence).

<scene>
  <cue id="line-1">First sentence.</cue>
  <cue id="line-2">Second sentence.</cue>
</scene>
<!-- Total duration = sum of both audio clips -->

Narration Track

Use <narration> for voiceover that spans multiple transitions. Narration items sit on the timeline (like scenes/transitions) but render no visuals.

<transition effect="push" duration="12f" />
<narration id="layouts-voice">
  <cue id="layouts">
    Use one-column, two-column, three-column, and grid layouts.
  </cue>
</narration>

Place <narration> just before the scenes it should align with. It will start at the next scene's start unless you provide start or duration.


Web Components

VideoML uses Web Components (custom elements) for reusable UI. Any hyphenated tag is treated as a Web Component.

Component Syntax

<!-- Built-in components -->
<title-slide title="Welcome" subtitle="Get Started" />
<lower-third name="Jane Doe" title="CEO" />
<code-block language="javascript">
  console.log('Hello');
</code-block>

<!-- Custom components -->
<my-chart data="[1,2,3]" type="bar" />

Props Attribute

For complex data, use the props attribute with JSON.

<data-visualization
  props='{
    "data": [
      {"month": "Jan", "sales": 100},
      {"month": "Feb", "sales": 150}
    ],
    "chartType": "line",
    "showLegend": true
  }'
/>
Design Decision: Web Components avoid framework lock-in. They work with vanilla JavaScript, React, Vue, or any library.

Transitions

Full transitions guide & integration test gallery

Transitions are first-class timeline items that sit between scenes. A transition is a container (like a scene) that can hold visuals and audio, and it can either overlap adjacent scenes or insert time between them.

Timing: All time values accept seconds or frames (e.g. 0.6s, 12f). Time expressions can reference scene(), cue(), and mark().

Transition Element

Use <transition> between scenes for crossfades, wipes, and any branded or custom transitions.

Attribute Type Notes
id string Required unique id.
start / end / duration time Optional. Explicit timing for the transition window.
effect string Named transition preset (e.g. crossfade, fade, wipe).
ease string GSAP ease string (e.g. power2.inOut).
mode string overlap (default) or insert.
overflow string Visual overflow behavior: clip, extend, allow.
overflow-audio string Audio overflow behavior: clip, extend, allow.

Audio in Transitions

Transitions can include SFX and music via <sfx>, <music>, or <audio kind=...>.

<transition id="wipe-01" effect="wipe" duration="18f" ease="power2.inOut">
  <sfx id="whoosh" start="0f" />
</transition>

Scene Enter/Exit (Convenience)

For quick fades, use per-scene convenience attributes. These do not crossfade; they only animate the scene itself.

<scene
  id="intro"
  enter="fade"
  enter-duration="12f"
  exit="fade"
  exit-duration="12f"
>
  <cue id="intro">Welcome to the video.</cue>
</scene>
Crossfade: Use a dedicated <transition> element for true crossfades between scenes.

Complete Example

Here's a full VideoML document demonstrating all core features:

<vml
  id="product-demo"
  title="Product Demo Video"
  fps="30"
  width="1920"
  height="1080"
>
  <script>
    // Track scene transitions
    window.addEventListener('scene:start', (e) => {
      console.log(`Started: ${e.detail.sceneId}`);
    });
  </script>

  <!-- Title scene with explicit duration -->
  <scene id="title" duration="3s">
    <layer>
      <title-slide
        title="Product Demo"
        subtitle="Version 2.0"
      />
    </layer>
  </scene>

  <!-- Narration scene with auto duration from TTS -->
  <scene id="intro">
    <stack>
      <!-- Visual layer -->
      <layer>
        <content-screen title="Overview">
          <ul>
            <li>Fast performance</li>
            <li>Easy to use</li>
            <li>Secure by default</li>
          </ul>
        </content-screen>
      </layer>

      <!-- Narration track -->
      <cue id="intro-narration">
        Our new product offers fast performance, ease of use, and security.
      </cue>
    </stack>
  </scene>

  <!-- Code demo scene -->
  <scene id="code-demo" duration="8s">
    <layer>
      <code-block language="javascript" filename="app.js">
import { render } from 'babulus';

render({
  fps: 30,
  width: 1920,
  height: 1080
});
      </code-block>
    </layer>
  </scene>

  <!-- Outro with lower third -->
  <scene id="outro" duration="4s">
    <layer style="z-index: 0">
      <background-gradient colors="#667eea,#764ba2" />
    </layer>
    <layer style="z-index: 10">
      <lower-third
        name="Learn More"
        title="babulus.dev"
      />
    </layer>
  </scene>
</vml>

Design Decisions

Understanding the "why" behind VideoML's approach:

Why XML Over JSON?

  • HTML Familiarity: Web developers already know XML/HTML syntax
  • Mixed Content: XML supports text + elements naturally (JSON requires nested objects)
  • Web Components: Hyphenated custom elements (<title-slide />) map directly to Web Component spec
  • Tooling: XML parsers, validators, and editors are mature and widely available

Why DOM Runtime?

  • Zero Translation: VideoML elements become DOM nodes directly—no virtual layer
  • CSS Compatibility: Use standard CSS for styling, animations, and layout
  • JavaScript Integration: Manipulate video content with familiar DOM APIs
  • Browser Features: Leverage existing browser capabilities (accessibility, dev tools, extensions)

Why Temporal Layout?

  • No Duration Math: Eliminate error-prone manual calculations
  • Composability: Scenes and sequences nest naturally
  • Flexibility: Swap a 3s scene for a 5s scene—total duration updates automatically
  • Narration-Driven: TTS audio length determines timing—video adapts to script changes

Why No Sandboxing?

  • Trust Model: VideoML files are source code, not untrusted user input
  • Power vs Safety: Full JavaScript access enables rich interactions (at cost of security review)
  • Renderer Context: Videos render in isolated headless browsers anyway
  • Future Work: Sandboxed mode may be added for user-generated content scenarios

Quick Reference

Core Elements

Element Purpose Key Attributes
<vml> Root element id, fps, width, height
<scene> Video section id, start, duration
<sequence> Back-to-back duration (auto)
<stack> Parallel duration (auto = max)
<layer> Z-index container style
<cue> TTS narration id, voice

Timeline API Summary

API Access
Current frame window.timeline.frame
Current time window.timeline.time
Frame rate window.timeline.fps
CSS var (frame) var(--video-frame)
CSS var (time) var(--video-time)

Event Summary

Event Fires When
timeline:tick Every frame
scene:start Scene begins
scene:end Scene ends
cue:start Cue audio starts
cue:end Cue audio ends

Scope and Non-Goals

In Scope

  • XML syntax and root element definition
  • Temporal layout model (sequence, stack, auto-duration)
  • Timeline API specification
  • Lifecycle events (tick, scene, cue)
  • Inline JavaScript and event handlers
  • Web Component integration

Out of Scope

  • Determinism Enforcement: VideoML does not guarantee reproducible output if scripts use randomness or external state
  • Sandboxed Scripting: No security isolation for <script> blocks (treat VideoML as trusted source code)
  • Backward Compatibility: <video> root is deprecated; <vml> is canonical
  • Animation Keyframes: Use CSS animations or JavaScript—VideoML provides timing, not animation primitives

Next Steps