HeyGen Apps
Strategic roadmap analysis
36 apps · Two-lens assessment · Prioritized action plan
Video Audio Image Text Launched Not yet
Lens 1 — Competitive advantage × User demand
Where to compete: upper-right = strongest strategic position
Lens 2 — Industry tech maturity × User demand (ICP-weighted)
When to act: upper-left = emerging tech with high demand = golden window
Strategic action plan
Now P0 capability gaps — complete the core
Video Podcast — Both lenses confirm this is the highest-priority gap. In Lens 1 it sits inside P0 (deep moat: Avatar multi-character + Voice Cloning, no viable competitor can replicate this combination). In Lens 2 it's at the Stage 1→2 boundary — Google Podcast API, ElevenLabs GenFM, and ByteDance Coze Space are all pushing this category toward mainstream. Action: ship within Q2 before the tech window narrows.
Face Swap — The low-hanging fruit. Lens 1 shows it at (86, 78): massive demand (200K-800K search volume) with strong moat from HeyGen's facial reconstruction tech. Lens 2 puts it solidly in Stage 2 — proven tech, ready to productize. Unlike Video Podcast which requires a new creation flow, Face Swap can leverage the existing video processing pipeline. Action: fast-follow after Podcast, aim for Q3.
Filler Words Remover (expand) — Already launched but underexposed. The jump-cut smoothing capability is exclusive (y=97 moat, near the ceiling). Package it more prominently as a standalone App to capture the "filler word removal" SEO keyword, while deepening integration with Instant Highlights.
Next P1 expansion — two distinct plays
Play 1: High moat — defend and deepen
Product Demo Video Maker · Presentation Videos · Video Redub · Voice Enhance · Voice Swap · Video Inpaint/Edit · Video BG Remover
These apps share a common trait: they leverage HeyGen's core assets (Avatar overlay, voice pipeline, video AI) in ways that competitors can't easily replicate. In Lens 1 they cluster at the P1 upper band (moat y=58-80). The strategic play is "defend and deepen" — each one extends HeyGen's moat into an adjacent use case. Product Demo Video Maker is especially interesting: Lens 2 puts it in Stage 1, meaning HeyGen can define the category before Loom, VEED, or Synthesia catch up. Prioritize by how directly each one reuses existing infrastructure — Video Redub and Presentation Videos share pipeline with Video Translate; Video BG Remover can use existing segmentation models.
Play 2: High demand, lower moat — capture and convert
Auto Captions · AI Video Generator · Voice Clone · Text to Speech · UGC Ad Generator · AI Image Generator
The opposite profile: massive search volumes (TTS at 500K-2M, Voice Clone at 150K-400K, AI Video Generator at 500K-2M) but competitive moats that are thin to moderate. In Lens 1 they scatter across the P1 lower-right quadrant (high x, low-mid y). In Lens 2 they're mostly Stage 2-3 — the tech is proven, competitors are abundant (ElevenLabs, Runway, Midjourney). The play is not to win on raw capability, but to capture traffic and cross-sell into the HeyGen core. A user who clones their voice is one click away from creating an Avatar video with it. A user generating AI video discovers they can add lip-synced translation. Build these as on-ramps, not as standalone products.
Later P2 utility pack — volume play for "all-in-one" positioning
Don't build these one by one. Bundle Compress, Trim, Resize, Convert, Merge, Video-to-GIF, Change Speed, Rotate, and Downloader into a single "Video Toolkit" pack. Combined raw search volume exceeds 10M+/month — a massive SEO surface area, which is why in Lens 1 these tools spread far to the right on the demand axis despite sitting at zero moat. Each tool is zero-moat Stage 3 tech, so use lightweight ffmpeg-based implementations, not AI models. The strategic value is the mental model shift: HeyGen is the place for everything video, not just AI avatars. Users who arrive for "compress video" see Avatar Creator, Translate, and Highlights in the sidebar. The funnel: utility tool → awareness → core product trial.
Key insight Watch the Stage 1 → 2 transition zone
Lens 2 reveals the most time-sensitive signal: apps at the Stage 1/2 boundary (x=26-28) are about to cross from "emerging" to "maturing." That cluster contains Video Agent, Video Podcast, and Filler Words Remover — all three with deep HeyGen moat confirmed by Lens 1. Once these capabilities enter Stage 2, competitors will move fast. The next 2-3 quarters are the window to establish category leadership.

Also in Stage 1 but further from the boundary: Interactive Avatar, Product Demo Video Maker, and Video Inpaint/Edit. These have a longer runway before commoditization, but should be on the active roadmap — not in the "someday" pile.

Contrast with Stage 3 (TTS, Voice Clone, Auto Captions): those ships have sailed for differentiation. Still worth having for traffic, but the strategic alpha is gone. Every R&D dollar in Stage 1→2 yields more long-term defensibility than Stage 3 catch-up.