AWS Polly TTS Quick Start

AWS Polly is Amazon's text-to-speech service. It's cost-effective (~$4 per 1 million characters) and works well for development and production workloads.

Prerequisites

AWS Account with Polly access
AWS Credentials configured via one of:
- Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
- AWS credentials file: ~/.aws/credentials
- IAM role (if running on EC2/ECS)
Python boto3 library installed:
```
pip install boto3
```

Configuration

1. Add AWS Polly to your config

Edit .babulus/config.yml:

providers:
  aws_polly:
    region: "us-east-1"
    voice_id: "Joanna"
    engine: "standard"  # or "neural" for better quality

2. Set provider in your DSL

Edit your .babulus.yml file:

voiceover:
  provider: aws-polly  # or just "aws"
  sample_rate_hz: 16000  # Required: Polly PCM only supports 8000 or 16000 Hz

Important: AWS Polly's PCM format only supports 8000 Hz or 16000 Hz sample rates. Use 16000 Hz for best quality.

3. Environment-Based Configuration (Recommended)

For multi-environment workflows:

voiceover:
  provider:
    development: openai
    aws: aws           # Use Polly for AWS testing
    production: elevenlabs

  sample_rate_hz:
    development: 24000  # OpenAI
    aws: 16000          # Polly PCM maximum
    production: 44100   # ElevenLabs

Then generate with:

BABULUS_ENV=aws babulus generate your-video.babulus.yml

Available Voices

Standard Engine (Cheaper)

English (US): Joanna, Matthew, Ivy, Kendra, Kimberly, Salli, Joey, Justin
English (UK): Amy, Emma, Brian
Many other languages: See AWS Polly Voices

Neural Engine (Better Quality)

English (US): Joanna, Matthew, Ivy, Kendra, Kimberly, Salli, Joey, Justin, Kevin, Ruth, Stephen
Requires engine: "neural" in config
Slightly higher cost

Voice Selection

You can override the voice per DSL:

voiceover:
  provider: aws
  voice: Matthew  # Override default voice
  sample_rate_hz: 16000

Or use the default from config.

Pricing

Standard voices: 4.00per1millioncharacters( 0.004/1K chars)
Neural voices: 16.00per1millioncharacters( 0.016/1K chars)
Much cheaper than ElevenLabs (~$0.30/1K chars)
More expensive than OpenAI ($0.015/1K chars)

Troubleshooting

Error: "AWS Polly PCM only supports sample_rate_hz [8000, 16000]"

Solution: Set sample_rate_hz: 16000 in your voiceover config.

Error: "Unable to locate credentials"

Solutions:

Set environment variables:

export AWS_ACCESS_KEY_ID=your_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_key

Or create ~/.aws/credentials:

[default]
aws_access_key_id = your_key_id
aws_secret_access_key = your_secret_key

Or use IAM roles if running on AWS infrastructure

Low Audio Quality

Solution: Switch to neural engine for better quality:

providers:
  aws_polly:
    engine: "neural"

Note: Neural voices cost 4x more than standard but sound significantly better.

Comparison with Other Providers

Feature	AWS Polly	OpenAI TTS	ElevenLabs
Price (per 1K chars)	$0.004-0.016	$0.015	~$0.30
Quality	Good (neural) / OK (standard)	Very Good	Excellent
Sample rates	8000, 16000 Hz	24000 Hz	22050-44100 Hz
Voices	60+ voices, 20+ languages	6 voices	1000+ voices
Credentials	AWS credentials	API key	API key
Best for	Cost-conscious production	Development/prototyping	Final production

Example: Complete DSL

voiceover:
  provider: aws
  voice: Matthew
  sample_rate_hz: 16000
  seed: 1337
  lead_in_seconds: 0.25

scenes:
  - id: intro
    title: "Introduction"
    cues:
      - id: welcome
        label: "Welcome"
        voice: "Welcome to our presentation. Let's get started."

AWS Polly Features Not (Yet) Supported

SSML tags for prosody control
Speech marks for word-level timing
Lexicons for pronunciation customization

These may be added in future releases. For now, Babulus uses plain text synthesis.