How I use Veo3 + Sora 2 to create Viral AI Videos (300M+ views)

TLDR

PJ Ace shares his comprehensive AI video production workflow, emphasizing a comedy-first approach with public domain IP to create viral ad content and detailing the tools and strategies used from scripting to final edit.

Takeways

• Prioritize comedy and public domain IP to create highly engaging, viral AI video ads.

• Utilize a script-to-images-to-video workflow using ChatGPT for scripting, Rev for image generation, and VO3 for animation to optimize efficiency and quality.

• Embrace the rapidly evolving AI tools, recognizing that while current workflows are effective, future platforms like Sora will further streamline and democratize video production.

PJ Ace, a leading AI video ad creator, unveils his step-by-step process for producing viral AI videos that have garnered hundreds of millions of views. The core strategy involves crafting humorous, Super Bowl-style ads using historical or public domain IP, leveraging AI tools like ChatGPT for scripting, Rev for image generation, and VO3 for animation. The goal is to make content highly entertaining and shareable, mitigating potential backlash against AI-generated media.

Creating Viral AI Video Concepts

• 00:04:52 The initial strategy for viral AI video ads focuses on comedy and ridiculousness, akin to a Super Bowl commercial, to engage viewers and avoid 'pitchforks' from those concerned about AI replacing jobs. This approach emphasizes humor first, with subtle brand integration at the end, ensuring entertainment value and higher ad completion rates. Concepts often draw from recognizable public domain IP and historical moments, combined with comedic juxtapositions and internet trends, to make content relatable and shareable.

• 00:06:14 The first step in the AI video workflow is scripting, often involving professional writers to develop three core concepts for the client. The process starts with a 'big idea' often incorporating recognizable, public domain IP (like Pompeii or the Titanic) to blend familiarity with novelty. ChatGPT assists in brainstorming historical bad advice scenarios and generating initial lines, although human iteration is crucial to refine humor and create comedic contrast.

• 00:12:15 After locking down the script, it is fed into ChatGPT to convert it into a detailed shot list with specific image prompts for each scene. This 'script-to-images' approach is preferred over direct text-to-video generation because it is more efficient, cost-effective, and allows for client feedback on individual shots before animation. Tools like Rev or NanoBanana are used for generating three variations of each image based on the prompts, allowing for iterative refinement of character poses and scene details through conversational commands.

• 00:20:45 Once images are generated and selected, they are imported into VO3 for animation, which is currently considered the best tool for realistic talking character performances. The original image prompt is combined with dialogue and camera movement instructions (which can be simplified for AI tools like ChatGPT to understand) to create dynamic animated clips. While VO3 excels at character performance and sound effects, external platforms like Epidemic Sound are used for music to maintain consistency across multiple clips. Other animation tools like Kling, Luma Labs, and Minimax are suitable for non-talking segments or higher resolutions.