Stitching is a digital process that combines multiple source videos from a VR camera’s lenses into equirectangular videos for playback and distribution in VR
Image: Steve Cooper
Read Time: 5 Minutes
Stitching is a fundamental step in immersive video production. Stitching is a computational process that transforms the multiple fisheye videos captured from a VR camera into a spherical video suitable for playback in a VR headset. Each fisheye video is de-warped and arranged, before being blended into a single video in “equirectangular” projection, the standard projection used to encode a spherical image into a rectangle (it’s commonly used in maps).
3D-180 media is technically not “stitched”, but the term is commonly used as shorthand for the general process of converting captured media to a VR-ready format.
Specialized software tools or plug-ins are used for stitching. The process is very computationally intensive, so powerful hardware and graphics cards are beneficial. Stitching tools include features like color matching between lenses, optical flow, editable camera rotations and scaling, edge points (to force a stitch to use one lens instead of another), and image stabilization. Stitching tools sometimes require calibration data from the camera which captured the content; if this exists, it is usually encoded into the header of files written to a camera’s media card(s). These data may include what are called camera “intrinsics” and “extrinsics”: sensor and lens specifications, lens warp profiles, positional offsets of the lenses, etc. The metadata are used to assist in the stitching process.
Many camera manufacturers develop and distribute their own stitching software. These tools are often free for camera owners, and are meant to automatically stitch video from their cameras. However, stitchers from manufacturers are usually limited in features and lack the ability to customize stitches.
Third party stitching software supports customization and manual control of stitching. These tools require technical expertise for best results in use, but are commonly used by most immersive video productions. Examples include SGO’s Mistika VR, and in the high end, Foundry’s Nuke with CaraVR.
Regardless of how a creator plans to stitch immersive videos, it is a good idea to capture test footage to run through a planned post production workflow before committing to a camera and to software tools, both to stress test the production workflow as well as evaluate the camera and output quality.
360 video is captured using two or more multiple fisheye lenses in a VR video camera. Each fisheye lens projects a hemispherical (or close to hemispherical) image onto a sensor, where it is recorded as a source video. There is significant overlap at the edges of each source video, and during the stitching process, the overlapping sections of each video are blended together to produce a single 360 spherical video in equirectangular projection.
Stereoscopic 3D-360 VR cameras usually have six or more lenses and sensors, and stitching 3D-360 requires complete oversampling of every point in the world, meaning that everything the camera sees must be captured by at least two adjacent lenses. Stitching 3D-360 video is much more complicated than stitching mono 2D-360, and 3D-360 productions should consider including a stitching specialist to make sure the output is comfortable to view. The output of a 3D-360 stitcher is two monoscopic 360 equirectangular images: one for each eye.
Monoscopic 2D-360 stitching outputs a single 360 equirectangular video. During playback, each eye is shown the same video, and there is no perceived 3D depth in the experience.
Unlike stitching in 360, 180 video sources do not require blending between images. Monoscopic 180 is just a single fisheye image and can be captured using a traditional camera with a single circular fisheye lens. Rough de-warping is sufficient for mono 180 production. (Strictly speaking, processing 180 video isn't really stitching, but this is the industry-standard term.)
In 3D-180 stitching, the fisheye image from both left and right eyes are de-warped precisely aligned
The Canon R5 and RF5.2mm F2.8 L Dual Fisheye Lens Image: Canon USA
The Canon R5 / R5 C and RF5.2mm F2.8 L Dual Fisheye Lens is an example of a popular, high-end 3D-180 camera system. The Canon system uses a pair of >180 degree fisheye lenses that project side-by-side image circles onto a single 8K sensor. The two lenses are 62mm apart, which is close to the average human IPD (inter-pupillary distance), and because a single sensor is used, the captures are perfectly synchronized. The Canon R5 C is a modern, traditional video camera capable of cinema-quality capture, so post-processing workflows such as color grading, noise reduction and sharpening can be achieved at the highest level of quality.