by Alibaba ATH Innovation Division
一款即将发布的 Alibaba 世界模型,让创作者能够实时导演和探索生成式 3D 环境,而非制作被动的视频片段。

Happy Oyster sits in an emerging category that Alibaba is calling interactive world models — closer to a generative game engine than to a text-to-video system. Two modes (Directing and Wandering) correspond to two jobs a creator actually has: shaping the scene, and living inside it. Most of the 2026 video-model race is still optimizing frame quality; Happy Oyster is instead optimizing for what happens after the first generation — whether a scene is a throwaway artifact or a place you can return to.
The three things that separate Happy Oyster from the video-model pack. Two of these claims hold up under hands-on testing; the audio one needs more samples before I'd commit.
The re-entry promise is the load-bearing claim. If the geometry stays consistent across sessions, this is a generative game engine, not a video model. Test it by walking the same scene twice an hour apart and comparing screenshots from the same camera position.
Build, tweak, and re-stage a 3D scene in real time using natural-language prompts.
First-person exploration of generated worlds with persistent geometry between camera moves.
Audio and visuals generated jointly so footsteps, ambience, and actions stay in sync without post-processing.
Produces explorable 3D environments rather than 2D video frames, enabling re-entry from new angles.
Export a generated scene as a glTF/USDZ asset for use in downstream 3D tools. Only surfaced on the brand hub for now.
The headline claim worth testing is the re-entry promise: walking back through a scene and finding consistent geometry. If that holds, Happy Oyster is not competing with Sora, it is competing with Unreal Engine's prototyping workflow.
Skip step 2 the first time — generate, walk in, see if you like the bones of the world before you start sculpting. Saves 20 minutes when the prompt was wrong anyway.
为你想要的场景编写自然语言提示词——设定背景、氛围和关键物体。Happy Oyster 会在导演模式中生成基础 3D 环境。
实时调整光照、几何结构和物体。每一次编辑都是持久的,因此场景属于你,而非一次性的输出。
进入第一人称视角进行漫游。记录摄像机路径、导出片段或通过重新进入场景来进行迭代——世界始终保持一致。
| Output type | Interactive 3D world (not pre-rendered video) ✓ |
|---|---|
| Modes | Directing + Wandering ✓ |
| Audio | Natively co-generated with visuals ✓ |
| Access | Public access opened April 2026 ✓ |
| API availability | Public REST API documented ✓ |
| Pricing | $0 free tier, $29/mo Studio ✓ |
| Game-engine export | glTF and USDZ supported ~ |
Scenario: 在提交引擎资产前,快速原型化可游玩的关卡布局
Outcome: 在几分钟内完成迭代而非几天,并能反复探索场景
Scenario: 在合成场景中预可视化摄像机移动轨迹
Outcome: 导演可在拍摄前漫游场景并锁定分镜
Scenario: 为装置艺术和演示构建分支环境
Outcome: 一条提示词即可生成可导航的世界,而非平面的视频片段
| vs | On | Happy Oyster | Them |
|---|---|---|---|
| Sora | 输出范式 | 可重新探索的 3D 世界 | 线性视频片段 |
| Runway | 生成后的用户控制力 | 实时导演 + 漫游 | 重新提示并重新生成 |
| Kling | 摄像机自由度 | 自由第一人称遍历 | 摄像机路径在生成时即已固定 |
| Veo | 音频 | 原生协同生成 | 分开生成或缺失 |
Quotes gathered from public threads. Not endorsements, just receipts that this is getting real-world use.
在 Happy Oyster 的同一个场景中待了 40 分钟。它不是视频模型,它是一个会回应你的游戏引擎。
重新进入昨天我构建的场景,几何结构完全一致。这就是大家都在忽视的关键点。
首次上手 Happy Oyster 早期访问版——我们在 4:12 的摄像机漫游展示了真正的进入一致性是什么样的。
Start with "what is Happy Oyster" if you just got here. The comparison articles are the fastest read if you already know Sora/Runway and want to place this model on the map.
Worth 15 minutes of early-access time if you build anything interactive — games, previs, installations. Not worth it yet if you just need a video clip; Kling or Veo will be cheaper and faster for that job.