Social Media Automation Engine

How It Works

The 11-Step Pipeline

A Temporal durable workflow orchestrates every step. If any step fails, it retries automatically. If the process crashes, it resumes exactly where it left off.

💬

Telegram
Input

text / photo / video

→

👁

Media
Analysis

GPT-4o Vision

→

✍️

Script
Generation

LLM + QA Gate

→

🎙

Voice
Over

TTS + QA Gate

→

📝

Captions

Whisper + QA

→

🎵

Background
Music

MusicGen

→

🎬

Video
Render

FFmpeg + QA

→

☁️

Cloud
Upload

Cloudinary

→

✅

Human
Approval

Temporal Signal

→

🚀

Multi-Platform
Publish

4 platforms

Rejection loops back to Script Generation with accumulated feedback (up to 5 rounds)

Engineering Decisions

Why It's Built This Way

Every architectural choice was deliberate. Here are the six decisions that define the system.

⚙️

Temporal.io for Workflow Orchestration

The entire pipeline is event-sourced via Temporal. Every step is recorded, replayable, and automatically recoverable on crash. Workflow state persists across process restarts.

Alternatives Considered

Celery, custom state machine, cron jobs

Result

Zero state management code. Full replay debugging.

🧠

Local-First AI with Cascading Fallbacks

Every AI call chains through local GPU → cloud free tier → paid fallback. The pipeline never fails from a single service outage.

LLM Chain

Ollama (local) → GitHub Models → Anthropic Claude

Result

$0 cost. No vendor lock-in. Graceful degradation.

⏸

Signal-Based Human-in-the-Loop

Telegram approval uses Temporal Signals — the workflow pauses indefinitely at the approval step, then resumes when the user taps Approve/Reject. No polling. No timeouts.

Mechanism

workflow.wait_condition() → Telegram inline keyboard → Signal

Result

Event-driven. Architecturally clean.

🔍

5-Gate QA Pipeline with Auto-Remediation

Five automated validation gates catch issues at every stage. Each gate retries up to 3× with corrective feedback injection before alerting.

Gates

Script → TTS → Captions → Render → Visual (screenshot sampling)

Result

Bugs caught before any human sees output.

🔐

Content-Addressable Asset Store

Every asset is SHA-256 fingerprinted and stored in SQLite. Trigram Jaccard similarity (threshold 0.35) catches near-duplicate scripts automatically.

Dedup Strategy

SHA-256 fingerprint → trigram Jaccard → same content = same row

Result

Zero redundant work. Automatic deduplication.

📈

Self-Learning Intelligence Layer

The system learns from its own performance data: hook type analysis, audience profiling, trend scoring. Performance metrics feed back into scriptwriter prompts via Mem0.

Feedback Loop

Engagement metrics → hook analysis → prompt enrichment → better scripts

Result

Genuine closed-loop. System improves every run.

Fault Tolerance

Multi-Tier Fallback Chains

Every external dependency has a cascading fallback. If the local GPU is down, cloud takes over. If cloud is unreachable, a free alternative kicks in.

LLM Ollama (local GPU) → GitHub Models → Anthropic Claude

TTS XTTS-v2 (local) → Speaches (local) → ElevenLabs → edge-tts (free)

Music MusicGen (local) → Freesound API → ElevenLabs → FFmpeg ambient

Image ComfyUI + FLUX → DALL-E 3

Video Pexels API → FFmpeg procedural

Local (GPU) Cloud (free tier) Free / unlimited

Quality Assurance

5 Automated QA Gates

Every artifact passes through automated validation before proceeding. Each gate retries 3× with corrective feedback injection, then sends a Telegram alert if remediation fails.

Gate 1

QA_SCRIPT

✓ Word count range ✓ Duration estimate ✓ Hook quality analysis ✓ CTA presence ✓ Code backtick validation

Gate 2

QA_TTS

✓ File size & existence ✓ Duration match ✓ Silence ratio ✓ Reverse transcription (Whisper verification)

Gate 3

QA_CAPTIONS

✓ ASS format parsing ✓ Word coverage % ✓ Timing range validation ✓ Font validation ✓ Spatial overlap detection

Gate 4

QA_RENDER

✓ Resolution (1080×1920) ✓ Duration & codec ✓ File size bounds ✓ Black frame detection ✓ Frozen frame detection

Gate 5

QA_VISUAL

✓ 5-point frame sampling ✓ Brightness analysis ✓ Text visibility check ✓ Layout overlap audit (screenshot-based)

Technology

Full Stack Breakdown

Layer	Technology	Role
Orchestration	`Temporal.io`	Durable workflow execution, event sourcing, crash recovery
Bot Interface	`python-telegram-bot`	User interaction — briefs, approvals, analytics
AI - Script	`GPT-4o` / `DeepSeek` / `Claude`	Multi-provider LLM chain with local-first priority
AI - Vision	`GPT-4o Vision` / `Llama 3.3`	Multimodal media analysis for photos/videos
AI - Image	`ComfyUI + FLUX` / `DALL-E 3`	AI-generated visuals with local GPU priority
TTS	`XTTS-v2` / `ElevenLabs` / `edge-tts`	4-tier voice synthesis with local voice cloning
Music	`MusicGen` (Meta) / `Freesound`	Local generative music on GPU
Captions	`OpenAI Whisper` (local)	Word-level timestamps for karaoke-style captions
Video	`FFmpeg`	1080×1920 compositing: backgrounds, code blocks, diagrams, captions
Diagrams	`Mermaid CLI` + Chromium	Mermaid → PNG with adaptive theming, overlaid on video
Code Blocks	`Pygments` + `Pillow`	VS Code-style syntax highlighting → PNG
Storage	`Cloudinary` / `SQLite`	Cloud hosting + local content-addressable asset DB
Memory	`Mem0` (self-hosted)	Semantic memory for script context and learning
Dashboard	`FastAPI` + `HTMX` + `Alpine.js`	12-page Command Center (zero build step)
Trends	Reddit / HN / Google Trends / GitHub / YouTube	5-source trend discovery with scoring algorithm
Publishing	Facebook / Instagram / YouTube / TikTok APIs	Multi-platform auto-publish with per-platform optimization
Deployment	`Docker Compose` (6 services)	PostgreSQL, Temporal, Worker, Bot, Dashboard
Testing	`pytest-asyncio`	8 test modules, shared fixtures, async-native

Economics

$0/month Total Recurring Cost

Every component is open-source, free tier, or self-hosted. The system delivers capabilities comparable to paid SaaS tools (Descript, Opus Clip, Repurpose.io) at zero cost.

Temporal

MIT, self-hosted

LLM (GPT-4o)

GitHub Models free

ElevenLabs

10k chars/mo free

Whisper

Local CPU, MIT

FFmpeg

Local rendering

Cloudinary

25GB free tier

Facebook API

Graph API

Docker

Self-hosted

Want the Full Technical Details?

The architecture deep-dive covers Temporal workflow anatomy, rendering engine internals, intelligence layer design, deployment topology, and the complete project structure.

Architecture Deep-Dive → 🧠 Runs on Cognitive Silo → View Source Code