... an alien forest
... a martian desert
... a grass field
... an underwater world
TL;DR: SphericalDreamer creates large-scale, fully immersive 3D environments from text by generating and fusing multiple panoramic images into a coherent 3D world.
The generation of immersive and navigable 3D environments is increasingly prevalent with the growing adoption of virtual reality and 3D content. However, recent methods face a fundamental limitation: they cannot produce 3D worlds that simultaneously (i) are navigable over long-range spatial extents and (ii) cover the complete omnidirectional field of view (\(360^\circ\) horizontally and \(180^\circ\) vertically). To address this challenge, we introduce SphericalDreamer, a method for generating fully immersive and long-range 3D outdoor environments from textual prompts. Our approach is built on the generation of multiple panoramic images, which are subsequently lifted into 3D and fused together while maintaining visual and geometric consistency. SphericalDreamer produces highly detailed, fully immersive 3D environments, while substantially improving scale and navigability compared to prior approaches.
SphericalDreamer creates fully immersive 3D environments with complete 360° horizontal and 180° vertical coverage, allowing users to freely explore and navigate within the scene.
Generating longer worlds. 3D worlds generated with varying numbers of fused panoramas (N = 3, 5, 7).
Qualitative comparison over the full \(180^\circ \times 360^\circ\) field of view across distant camera viewpoints. Our proposed method SphericalDreamer is the only one to support high quality, full omnidirectional coverage across distant camera viewpoints. In comparison, SceneScape and Wonderjourney renderings are only visually plausible within a restricted field of view, making them non-immersive. For LucidDreamer and LayerPano3D, the second camera lies outside of the 3D world, making them non-navigable.
Ablation of LDI and harmonic blending. Rendered frames from 3D worlds generated with and without LDI, and with alternative blending strategies (naive blending, depth interpolation, depth inpainting) replacing harmonic blending.
Method overview. SphericalDreamer generates navigable immersive 3D worlds from textual prompts. In Stage I, a set of spherical building blocks \(\{\mathcal{S}_i\}_{i=0}^{N-1}\) is generated by lifting multiple text-generated layered depth panoramas into 3D. Each block \(\mathcal{S}_i\), also referred to as a sphere, can be geometrically transformed to create a connection interface on its right side, left side, or both. In Stage II, consecutive spheres \(\mathcal{S}_i\) and \(\mathcal{S}_{i+1}\) are positioned to face each other, forming a capsule-like configuration with a missing central region. An intermediate RGB–D view is rendered, its missing regions are inpainted, and the newly synthesized content is lifted into 3D to produce a filling block \(\mathcal{B}_i^{\mathrm{fill}}\), thereby completing the connection between the two spheres. In Stage III, all spheres and filling blocks are assembled to produce the complete 3D world.
@article{schnepf2026sphericaldreamer,
title={{SphericalDreamer: Generating Navigable Immersive 3D Worlds with Panorama Fusion}},
author={Antoine Schnepf and Karim Kassab and Flavian Vasile and Andrew Comport},
booktitle={Forty-third International Conference on Machine Learning},
year={2026}
}