A technical walkthrough of how the Social Media Automation Engine works under the hood. Each section covers a distinct subsystem — from macro architecture to deployment topology.
The system is composed of 6 interconnected subsystems that form a closed feedback loop. Content performance data feeds back into trend scoring and script generation.
TREND DISCOVERY ───→ CONTENT CALENDAR ───→ CONTENT PIPELINE ───→ MULTI-PLATFORM PUBLISHER ↑ │ │ ↓ INTELLIGENCE LAYER ←───── ANALYTICS ENGINE ←────────────────────────────────┘ ─────────────────────────────────────────────────────────────────────────────── Each subsystem is a separate Temporal workflow running on independent schedules. The pipeline is event-sourced — every state transition is durable and replayable.
Scans 5 data sources (Reddit, HN, Google Trends, GitHub, YouTube) every 4 hours. Normalizes scores, deduplicates via trigram Jaccard, filters by niche, checks Mem0 for already-covered topics.
Picks top-scored topics, assigns optimal publish times based on historical engagement data. Respects per-platform cooldowns and daily production limits.
The core 11-step workflow: from script generation through video rendering to approval. Durable execution via Temporal with 5 QA gates and cascading fallback chains.
Publishes to Facebook (Graph API), Instagram (Graph API), YouTube (Data API v3), and TikTok (API). Per-platform metadata optimization (aspect ratio, hashtags, descriptions).
Collects engagement metrics at checkpoints: 1h, 6h, 24h, 48h, 168h post-publish. Calculates viral scores and engagement rates per platform.
Learns from performance data: hook type analysis (6 styles), content feature extraction, audience fingerprinting. Enriches scriptwriter prompts via Mem0 semantic memory.
The ContentPipelineWorkflow
is a Temporal durable workflow with 11 activity steps, 5 QA gates, and a human approval signal.
Rejection loops back to step 2 with accumulated feedback (up to 5 rounds).
GPT-4o Vision analyses photos/videos sent to the bot. Extracts context, mood, subjects for script generation.
optional — skipped for text-onlyMulti-provider LLM generates a video script. Enriched with Mem0 learned preferences and intelligence insights.
QA_SCRIPT — 3× retry with corrective feedbackTelegram notification to user: script preview and generation progress.
4-tier TTS chain: XTTS-v2 (local voice cloning) → Speaches/Kokoro → ElevenLabs → edge-tts.
QA_TTS — reverse transcription verification via WhisperOpenAI Whisper (local) generates word-level timestamps. Custom ASS generator produces karaoke-style word-by-word highlighting.
QA_CAPTIONS — coverage, timing, font, spatial overlap checksMusicGen (local GPU) → Freesound API → ElevenLabs → cached loops → FFmpeg procedural ambient.
best-effort — pipeline continues without musicPexels API with freshness-aware selection. Falls back to procedural FFmpeg animations (gradient, bokeh).
best-effortComfyUI + FLUX on local GPU → DALL-E 3 cloud fallback. Used for visual overlays in the video.
best-effortFFmpeg compositing: animated backgrounds, text cards, code blocks, diagrams, karaoke captions, audio mixing. 1080×1920 output.
QA_RENDER + QA_VISUAL — frame sampling, black/frozen detectionUpload to Cloudinary → send preview via Telegram → Temporal Signal pauses workflow indefinitely until user taps Approve/Reject.
Temporal Signal — indefinite pausePublish to all enabled platforms (Facebook, Instagram, YouTube, TikTok) with per-platform metadata. Log result to DB.
Rejection → loop to step 02 (up to 5 rounds)Four supporting workflows run on Temporal Schedules, operating autonomously alongside the main content pipeline.
| Workflow | Schedule | Purpose | Key Activities |
|---|---|---|---|
| TrendScanWorkflow | Every 4h | Discover trending topics from 5 data sources | Scan → normalize → deduplicate → filter by niche → store to queue |
| ContentSchedulerWorkflow | Every 12h | Pick top topics, assign optimal publish slots | Score ranking → time slot optimization → platform assignment |
| AnalyticsCollectorWorkflow | Every 4h | Collect engagement metrics at checkpoints | Pull metrics at 1h, 6h, 24h, 48h, 168h → calculate viral scores |
| IntelligenceWorkflow | Every 24h | Learn from performance data | Hook analysis → feature extraction → audience profiling → Mem0 update |
The render activity (909 lines) builds complex FFmpeg filter chains that compose multiple visual layers into a final 1080×1920 vertical video with synchronized audio.
Gradient animations with hue rotation + soft bokeh glow effects. Falls back to Pexels stock footage or solid gradients.
Pygments + Pillow render VS Code-style syntax-highlighted code to PNG. Title bar, line numbers, Monokai theme.
Mermaid CLI (mmdc) + Chromium renders diagrams to PNG. Adaptive theming: samples frame brightness to pick dark/light theme.
Custom ASS generator with word-by-word highlighting. 68px MrBeast/Hormozi style. Word-level timestamps from Whisper.
Voiceover + generated background music at calibrated levels. Ducking when speech is active. Fade in/out.
Centralized 246-line positioning logic with platform-specific safe zones (TikTok, Instagram, YouTube). Intelligent overlay coexistence.
# Build the FFmpeg filter graph for final video composition def build_filter_chain(ctx: RenderContext) -> str: filters = [] # 1. Background: animated gradient OR Pexels stock footage if ctx.background_video: filters.append(f"[0:v]scale=1080:1920,setsar=1[bg]") else: filters.append(generate_gradient_background(ctx.duration)) # 2. Overlay: code blocks, diagrams (position from layout engine) for overlay in ctx.overlays: x, y = layout_engine.position(overlay, ctx.platform) filters.append(f"[bg][{overlay.id}]overlay={x}:{y}:enable='...") # 3. Captions: ASS subtitle burn-in with karaoke styling filters.append(f"subtitles={ctx.caption_file}:force_style='...'") # 4. Audio: mix voiceover + music with ducking filters.append(f"[voice][music]amix=inputs=2:duration=first") return ";".join(filters)
The system discovers content ideas autonomously and learns from its own performance to improve over time.
score =
popularity × 0.35
+ velocity × 0.30
+ freshness × 0.20
+ multi_source_bonus × 0.15
− coverage_penalty × 0.50
Self-hosted Mem0 stores learned insights as semantic memories. The LLM scriptwriter receives relevant memories as context — creating a genuine feedback loop where each content piece is informed by the performance of all previous ones.
SQLite for structured data + Mem0 for semantic memory. Pydantic v2 models enforce type safety across all boundaries.
Built with FastAPI + Jinja2 + HTMX + Alpine.js + Tailwind CSS. Zero build step — no webpack, no npm. 2,905 lines serving 12 pages with real-time updates via HTMX polling.
Docker Compose orchestrates 6 services. A separate local GPU server ("Cognitive Silo") runs all AI models — keeping the pipeline fully self-hosted.
FROM python:3.11-slim # System deps: FFmpeg for rendering, Node.js for Mermaid diagrams RUN apt-get update && apt-get install -y \ ffmpeg fonts-dejavu \ nodejs npm chromium # Mermaid CLI for diagram rendering via Puppeteer RUN npm install -g @mermaid-js/mermaid-cli # Python deps + Whisper for local captioning COPY pyproject.toml . RUN pip install -e . && pip install openai-whisper COPY . /app WORKDIR /app
~24,700 lines of Python across 80+ source files. Every module has a clear responsibility boundary.
sma/ ├── bot/ # Telegram bot: handlers, keyboards, 12 commands │ └── main.py ├── config/ # YAML config + Pydantic Settings + secrets │ ├── config.yaml │ └── loader.py # Strongly typed hierarchical config ├── dashboard/ # FastAPI + Jinja2: 12 pages, 2,905 lines │ ├── app.py │ ├── templates/ # HTMX + Alpine.js + Tailwind │ └── static/ ├── data/ # SQLite DB: content-addressable asset store │ ├── db.py │ └── models.py ├── logger/ # structlog: dual-write JSONL + human-readable ├── models/ # Pydantic v2: 4 schema modules │ ├── schemas.py # Pipeline data models │ ├── trend_models.py # Trend discovery models │ ├── scheduler_models.py # Content calendar models │ └── autonomy_models.py # Autonomy control models ├── pipeline/ # Temporal workflows + activities │ ├── workflow.py # ContentPipelineWorkflow (11 steps) │ ├── worker.py # Registers 5 workflows + 55 activities │ ├── trend_workflow.py # TrendScanWorkflow (4h schedule) │ ├── scheduler_workflow.py # ContentSchedulerWorkflow (12h) │ ├── analytics_workflow.py # AnalyticsCollectorWorkflow (4h) │ ├── intelligence_workflow.py │ ├── layout_engine.py # Platform-specific safe zones (246 lines) │ └── activities/ # 27 activity modules │ ├── ai_activity.py # Multi-provider LLM orchestration │ ├── tts_activity.py # 4-tier TTS fallback chain │ ├── caption_activity.py # Whisper + ASS generator │ ├── render_activity.py # FFmpeg composition (909 lines) │ ├── qa_activity.py # 5 QA gates (807 lines) │ ├── code_renderer.py # VS Code-style code → PNG │ ├── diagram_activity.py # Mermaid → PNG │ ├── facebook_activity.py # Facebook Graph API │ ├── instagram_activity.py # Instagram Graph API │ ├── youtube_activity.py # YouTube Data API v3 │ ├── tiktok_activity.py # TikTok API │ └── multiplatform_activity.py ├── prompts/ # AI system prompts for each pipeline stage ├── services/ # Business logic services │ ├── ai_service.py # Multi-provider gateway │ ├── trend_scanner.py # 5-source scanner (504 lines) │ ├── trend_aggregator.py # Normalize + score + dedup (275 lines) │ ├── intelligence.py # Self-learning brain (603 lines) │ ├── autonomy.py # 3-level autonomy with safety guards │ └── posting_optimizer.py # Time slot optimization ├── tests/ # pytest-asyncio: 8 test modules │ ├── conftest.py # Fixtures: isolated DB, seeded data (248 lines) │ ├── test_e2e.py # End-to-end integration (431 lines) │ ├── test_qa.py # All 5 QA gates (388 lines) │ ├── test_intelligence_unit.py # Intelligence brain unit tests (374 lines) │ └── ... # + regression, diagram, integration tests ├── docker-compose.yaml # 6 services with health checks ├── Dockerfile # Python 3.11 + FFmpeg + Node.js + Chromium + Whisper └── pyproject.toml # Dependencies + project metadata
Architecture-first engineering: spec before code. Every feature started as a plan document with alternatives considered, risks assessed, and implementation phased.
The system supports three autonomy levels, controllable via Telegram or dashboard:
View the source code, read the plan documents, or go back to the project overview.