Wan Yi

Pursuit of Perfection in Transcription: A Deep Dive into Multi-Model Technology in SubEnvoy 1.2.1

Pre-release deep dive into SubEnvoy 1.2.1 for macOS, featuring the new multi-model switching capability. Learn the technical differences between Lite and Pro models and why iOS performance constraints shaped our strategy.

Pursuit of Perfection in Transcription: A Deep Dive into Multi-Model Technology in SubEnvoy 1.2.1

In the upcoming SubEnvoy 1.2.1 release, we are bringing a major update for macOS users: Multi-Model Switching.

As a developer deeply rooted in the Apple ecosystem, I have always strived to find the perfect balance between on-device AI and user experience. Today, I want to share a real-world comparison between the built-in Lite and Pro models, and explain our differentiated technical strategy across different platforms.


🔬 Lite vs. Pro: What the Data Shows

Transcription progress showing Hokuto no Ken benchmark

Using Episode 4 of the classic anime Fist of the North Star (Season 1) as a benchmark, we compared official Japanese ASS subtitles with the results from both models.

1. The “Divide” in Text Precision

When dealing with complex proper nouns and technique names, the Pro model shows an overwhelming advantage.

  • Technique Recognition: For specific terms like “Hokuto Jūha Zhan” (北斗柔破斬), the Pro model captures vocal nuances with extremely high accuracy. In contrast, the Lite model occasionally suffers from “hallucinations,” shifting obscure terms to more common ones.
  • Emotional Nuance: The Pro model doesn’t just transcribe text; it captures subtle breathing, laughter (like “fufufu”), and pauses. In our tests, Yuria’s confession “A… Ai shimasu” was perfectly captured with its hesitant tone by the Pro model, whereas the Lite model simplified it.

2. Interference Resistance in Complex Scenes

Action-heavy content often comes with loud background music (BGM) and sound effects.

  • Pro Model: Features exceptional noise reduction, isolating clear vocals from chaotic backgrounds.
  • Lite Model: In extremely noisy environments, it can sometimes get stuck in a loop of filler words or skip uncertain segments entirely.

3. Surprising Lyric Transcription

Interestingly, the Pro model even attempts to transcribe English lyrics in the opening theme (OP). While there are still deviations with heavy accents (e.g., “You wa Shock!” becoming “You are Shark!”), this dedication to detail is exactly what makes the Pro model special.


💻 Why is Model Switching Exclusive to macOS?

Many users ask: “If the Pro model is so powerful, why can’t I switch on my iPhone or iPad?”

It’s a matter of weighing Best Practices against the Limits of Mobile Performance.

The “Heavyweight” Pro Model

SubEnvoy’s Pro model uses a deeper neural network, with a model size exceeding 1GB. On macOS, thanks to the powerful Unified Memory Architecture and thermal headroom of M-series chips, running the Pro model is effortless.

The iOS Performance Ceiling

On iOS mobile devices, the situation is very different:

  1. Memory (RAM) Footprint: Loading a 1GB model puts massive pressure on system memory. To ensure overall system fluidity, iOS strictly limits the RAM usage of individual apps.
  2. Energy Efficiency: Forcing the Pro model on mobile would lead to rapid heat buildup, thermal throttling, and excessive battery drain, contradicting SubEnvoy’s mission as a “productivity tool.”
  3. User Experience: To ensure every iOS user gets a fast, stable, and battery-efficient transcription experience, we currently use a deeply optimized balanced model in the iOS version, rather than offering the bulky Pro model.

🚀 Version 1.2.1: A New Milestone

In the upcoming Version 1.2.1, Mac users can choose flexibly based on their tasks:

  • Quick Previews & Casual Interviews: Use the Lite model for lightning-fast processing.
  • Movie Translation, Fine-tuned Subtitles, & Academic Lectures: Switch to the Pro model for the most rigorous text baseline.

We are exploring more advanced model quantization techniques, aiming to bring the power of the Pro model to mobile devices in the future without sacrificing the experience.

Thank you for your continued support of SubEnvoy. See you on the Mac with version 1.2.1.


About the Author Wan Yi is an indie developer and the creator of SubEnvoy. With a portfolio of multiple apps across the Apple ecosystem, he is dedicated to building high-quality native tools that leverage AI to empower user experiences. You can explore more of his work and stories at wanyi.dev.