Hobby Project Production Running

Social Media Automation Engine

End-to-end autonomous pipeline: send a text message to a Telegram bot → AI generates a script → voiceover → captions → video render → you approve → auto-publish to Facebook, Instagram, YouTube, and TikTok. Fully self-hosted. Zero recurring cost.

24,700
Lines of Python
80+
Source Files
5
Temporal Workflows
55
Pipeline Activities
$0
Monthly Cost
View Architecture Deep-Dive → GitHub Repository
The 11-Step Pipeline

A Temporal durable workflow orchestrates every step. If any step fails, it retries automatically. If the process crashes, it resumes exactly where it left off.

💬
Telegram
Input
text / photo / video
👁
Media
Analysis
GPT-4o Vision
✍️
Script
Generation
LLM + QA Gate
🎙
Voice
Over
TTS + QA Gate
📝
Captions
Whisper + QA
🎵
Background
Music
MusicGen
🎬
Video
Render
FFmpeg + QA
☁️
Cloud
Upload
Cloudinary
Human
Approval
Temporal Signal
🚀
Multi-Platform
Publish
4 platforms

Rejection loops back to Script Generation with accumulated feedback (up to 5 rounds)

Why It's Built This Way

Every architectural choice was deliberate. Here are the six decisions that define the system.

⚙️

Temporal.io for Workflow Orchestration

The entire pipeline is event-sourced via Temporal. Every step is recorded, replayable, and automatically recoverable on crash. Workflow state persists across process restarts.

Alternatives Considered
Celery, custom state machine, cron jobs
Result
Zero state management code. Full replay debugging.
🧠

Local-First AI with Cascading Fallbacks

Every AI call chains through local GPU → cloud free tier → paid fallback. The pipeline never fails from a single service outage.

LLM Chain
Ollama (local) → GitHub Models → Anthropic Claude
Result
$0 cost. No vendor lock-in. Graceful degradation.

Signal-Based Human-in-the-Loop

Telegram approval uses Temporal Signals — the workflow pauses indefinitely at the approval step, then resumes when the user taps Approve/Reject. No polling. No timeouts.

Mechanism
workflow.wait_condition() → Telegram inline keyboard → Signal
Result
Event-driven. Architecturally clean.
🔍

5-Gate QA Pipeline with Auto-Remediation

Five automated validation gates catch issues at every stage. Each gate retries up to 3× with corrective feedback injection before alerting.

Gates
Script → TTS → Captions → Render → Visual (screenshot sampling)
Result
Bugs caught before any human sees output.
🔐

Content-Addressable Asset Store

Every asset is SHA-256 fingerprinted and stored in SQLite. Trigram Jaccard similarity (threshold 0.35) catches near-duplicate scripts automatically.

Dedup Strategy
SHA-256 fingerprint → trigram Jaccard → same content = same row
Result
Zero redundant work. Automatic deduplication.
📈

Self-Learning Intelligence Layer

The system learns from its own performance data: hook type analysis, audience profiling, trend scoring. Performance metrics feed back into scriptwriter prompts via Mem0.

Feedback Loop
Engagement metrics → hook analysis → prompt enrichment → better scripts
Result
Genuine closed-loop. System improves every run.
Multi-Tier Fallback Chains

Every external dependency has a cascading fallback. If the local GPU is down, cloud takes over. If cloud is unreachable, a free alternative kicks in.

LLM Ollama (local GPU) GitHub Models Anthropic Claude
TTS XTTS-v2 (local) Speaches (local) ElevenLabs edge-tts (free)
Music MusicGen (local) Freesound API ElevenLabs FFmpeg ambient
Image ComfyUI + FLUX DALL-E 3
Video Pexels API FFmpeg procedural
Local (GPU) Cloud (free tier) Free / unlimited
5 Automated QA Gates

Every artifact passes through automated validation before proceeding. Each gate retries 3× with corrective feedback injection, then sends a Telegram alert if remediation fails.

Gate 1
QA_SCRIPT
✓ Word count range ✓ Duration estimate ✓ Hook quality analysis ✓ CTA presence ✓ Code backtick validation
Gate 2
QA_TTS
✓ File size & existence ✓ Duration match ✓ Silence ratio ✓ Reverse transcription (Whisper verification)
Gate 3
QA_CAPTIONS
✓ ASS format parsing ✓ Word coverage % ✓ Timing range validation ✓ Font validation ✓ Spatial overlap detection
Gate 4
QA_RENDER
✓ Resolution (1080×1920) ✓ Duration & codec ✓ File size bounds ✓ Black frame detection ✓ Frozen frame detection
Gate 5
QA_VISUAL
✓ 5-point frame sampling ✓ Brightness analysis ✓ Text visibility check ✓ Layout overlap audit (screenshot-based)
Full Stack Breakdown
Layer Technology Role
OrchestrationTemporal.ioDurable workflow execution, event sourcing, crash recovery
Bot Interfacepython-telegram-botUser interaction — briefs, approvals, analytics
AI - ScriptGPT-4o / DeepSeek / ClaudeMulti-provider LLM chain with local-first priority
AI - VisionGPT-4o Vision / Llama 3.3Multimodal media analysis for photos/videos
AI - ImageComfyUI + FLUX / DALL-E 3AI-generated visuals with local GPU priority
TTSXTTS-v2 / ElevenLabs / edge-tts4-tier voice synthesis with local voice cloning
MusicMusicGen (Meta) / FreesoundLocal generative music on GPU
CaptionsOpenAI Whisper (local)Word-level timestamps for karaoke-style captions
VideoFFmpeg1080×1920 compositing: backgrounds, code blocks, diagrams, captions
DiagramsMermaid CLI + ChromiumMermaid → PNG with adaptive theming, overlaid on video
Code BlocksPygments + PillowVS Code-style syntax highlighting → PNG
StorageCloudinary / SQLiteCloud hosting + local content-addressable asset DB
MemoryMem0 (self-hosted)Semantic memory for script context and learning
DashboardFastAPI + HTMX + Alpine.js12-page Command Center (zero build step)
TrendsReddit / HN / Google Trends / GitHub / YouTube5-source trend discovery with scoring algorithm
PublishingFacebook / Instagram / YouTube / TikTok APIsMulti-platform auto-publish with per-platform optimization
DeploymentDocker Compose (6 services)PostgreSQL, Temporal, Worker, Bot, Dashboard
Testingpytest-asyncio8 test modules, shared fixtures, async-native
$0/month Total Recurring Cost

Every component is open-source, free tier, or self-hosted. The system delivers capabilities comparable to paid SaaS tools (Descript, Opus Clip, Repurpose.io) at zero cost.

Temporal
$0
MIT, self-hosted
LLM (GPT-4o)
$0
GitHub Models free
ElevenLabs
$0
10k chars/mo free
Whisper
$0
Local CPU, MIT
FFmpeg
$0
Local rendering
Cloudinary
$0
25GB free tier
Facebook API
$0
Graph API
Docker
$0
Self-hosted
Want the Full Technical Details?

The architecture deep-dive covers Temporal workflow anatomy, rendering engine internals, intelligence layer design, deployment topology, and the complete project structure.