The Cinematic VR Field Guide

In this section, The Jaunt Team provide an approachable field guide for better understanding everything from proper equipment to camera movement, lighting, and sound.


Written by The Jaunt Team

This is part one of a four part series on best practices for shooting 360 video for VR.

Introduction

Virtual reality (VR) is truly a new medium. Along with the excitement at the creative possibilities there is also much confusion within the film industry on how best to shoot a compelling piece of VR content. Questions regarding camera movement, blocking, lighting, stereoscopic 3D versus mono, spatial sound capture, and interactivity all get asked repeatedly.

As Jaunt is at the forefront of cinematic virtual reality production, the purpose of this guide is to share our experiences with shooting a vast array of VR content with the wider community-what works and what doesn’t. We are not, however, trying to produce an exhaustive text on the entirety of filmmaking but rather trying to cover the additional complexities and challenges that come with shooting in VR.

Much of what will be discussed is framed through the lens (so to speak) of the Jaunt ONE camera system as that is the rig with which we are most familiar and we provide specific details on it wherever applicable. The vast majority of the content of this paper covers general VR shooting techniques however and we attempt to keep the material as agnostic as possible. Virtual reality technology as well as the language of cinematic VR is constantly and rapidly changing at a breakneck pace so we will endeavor to update this guide from time to time as new techniques present themselves and new technology develops. We hope you enjoy this guide.

-The Jaunt Team


Virtual Reality Basics

According to Wikipedia virtual reality is a computer technology that replicates an environment, real or imagined, and simulates a user's physical presence and environment to allow for user interaction.

On a computer or cell phone this usually means sight and sound on a display device and speakers or headphones. Devices for touch or force feedback are starting to be introduced. Smell and taste are quite a ways off still.

The key to any form of virtual reality is presence and immersion. It’s these qualities that separate it from any media that has come before it and can create an intense emotional connection to the material. Chris Milk, one of the early directors in VR has been frequently quoted as calling it the “empathy engine”.

Types of Virtual Reality

There are generally two main types of virtual reality: Cinematic and game engine based. These differ on the means of production, playback method, realism, and amount of interactivity possible.

What is Game Based Virtual Reality?

For a long while this is what people typically thought of when they thought of virtual reality. This is computer graphics generated in real time typically by a 3D CG gaming engine such as Unity or Unreal. Since the world is generated on the fly, using specialized head mounted displays (HMDs) like the Oculus Rift or HTC Vive which include motion tracking, users are able to walk around the environment as if it was real.

Unreal Showdown Cinematic VR Demo © Epic Games

This type of VR also lends itself to highly interactive content with the Rift and Vive also offering tracked hand controls allowing you to pick up and move objects, wield a sword, shoot a gun, and generally interact with the entire environment. It’s very much like being dropped into a video game.

Just because the first round of HMDs have been heavily targeting gamers does not mean that this type of technology is only for gaming. Game engines are just as capable of making an interactive film or music video as they are a game and excel at creating worlds you can visit that are completely unlike real life.

What is Cinematic Virtual Reality?

Jaunt originally specialized in cinematic virtual reality. This is 360 video filmed using a panoramic video camera system and played back as an equirectangular video file which allows the user to look around the scene as it unfolds. (Equirectangular is a rectangular map projection which wraps accurately onto a sphere, and thus displays correctly in a panoramic view.) Depending on the camera system and stitching process the scenes can be either monoscopic (flat) or stereoscopic (3D).

Escape the Living Dead Equirectangular Image

Here you have the advantage of scenes looking completely real and not computer generated as with game engines. Scenes are also usually captured with spatial sound microphones making them sound just as real. If you hear a dog to your right and turn your head you’ll see the dog in the spot the sound came from. It’s as if you were dropped into a movie.

Unlike in game engines, however, you cannot move around the scene freely. Only if the camera is moved during filming do you move. As new camera systems and acquisition technologies are developed eventually you will be able to move around filmed scenes as well. See below under Types of VR Cameras for more on this.

Though not as interactive as full game based environment, you can still add interactivity to cinematic VR. Branching “Choose Your Own Adventure” stories, gaze detection, interactive overlays and interfaces, audio or clip triggers, gestures, and even full CG integration are all possible, and these are becoming more commonplace as mixed reality platforms evolve.

All of this leads to a completely new form of media. A blank canvas with which we’ve only just begun to realize what’s possible. The “killer app” in VR will be some combination of cinema, gaming, and interactive theatre. Right now we’re only in the dress rehearsal and anything is possible. Five years from now VR content may look nothing like it does today.

Stereoscopic vs. Monoscopic VR

VR footage can be either monoscopic or stereoscopic 3D. Monoscopic (or ‘mono’) footage is flat and has no depth since no parallax is present, as both eyes are viewing the same image, with everything projected back to the same depth of the 360 viewing sphere. While you can still turn your head and look around the scene, nothing ever truly seems closer to you, only bigger. This is similar to the difference between a “closeup” in a 2D film versus how something actually gets closer and comes out at you in a 3D film.

With 360 stereoscopic 3D on the other hand, you have full 3D in every direction and objects can actually get closer to the viewer. The parallax resulting from each eye seeing a slightly different image due to the separation between eyes (or cameras) allows you to perceives a subject at a given distance. This leads to a much more naturalistic and immersive feeling as this is how we actually experience things in real life. Imagine filming a shark underwater in VR. For maximum impact, you’d want the viewer to feel the shark actually getting up close and personal with the viewer. With stereoscopic 3D, you can achieve this immersion, while in mono, although still menacing, the shark cannot actually get any closer and you lose that sense of presence and immersion - and fear factor!

Wherever possible you should always strive to shoot in full 360 3D. So, why doesn’t everyone just do that then? As you might expect, the camera rigs are more complicated and expensive, and the stitching process is much more complicated (and computationally expensive), and it can be difficult to get good results without a lot of post productions efforts and dollars.

All that said, not every scene or shot necessarily requires shooting in 3D nor is it always possible. Currently, there are very few stereoscopic 360 rigs for shooting underwater. Due to the confines of the protective encasement it’s harder to fit a stereo rig within and smaller mono rigs are typically used. See the Underwater section below for more.

Likewise when shooting in the confines of a car where things are going to be very close quarters you usually have a better shot using a smaller GoPro style rig and stitching the material in mono. Most cameras have a minimum distance to subject that you must respect in order to get a quality stitch and these distances are generally greater when stitching in 3D. If you need to get very close to a subject it may be better to go the mono route. See Distance to Subject below for more information.

Similarly when using drones, weight is always an issue. Therefore there are many instances where we can again use a smaller, lighter GoPro rig and stitch in mono. Very often you are far enough above the landscape where you’re not getting much stereo parallax anyway and the viewer will hardly notice.

In any given show we might have the majority of footage shot in full 360 3D with a smattering of shots as in the above cases filmed with smaller, lighter GoPro rigs and stitched in mono. If done correctly and in the right circumstances your audience will likely not notice.

360 Video

A note must be made about what we call 360 video. How is this different from VR? In an effort to get people into VR and leverage the heavily trafficked platforms that exist now, many companies (Facebook and Google’s YouTube in particular) have started promoting 360° video. This is video you can watch in a web browser, allowing you to pan around the scene with your mouse. Further you could view this same content on a smartphone by panning the device around your position.

As well, our Jaunt smartphone and web apps have this capability for those times or for those users that do not yet have a VR viewing device to be able to experience the content in full 3D. Brands and companies such as Facebook love 360 video as it allows them to leverage their massive user bases on platforms that everyone is already using.

Minimum VR Requirements

You could talk to one hundred different people about what is essential to be considered virtual reality and get almost as many answers. As we are looking for maximum immersion and presence (that is, the feeling of actually being there), Jaunt assumes a minimum of four things:

  1. 360 Equirectangular Images : This is a scene in which you can look arounda full 360°, and up and down from pole to pole. Some camera rigs have instead opted for a front-only 180° field of view (FOV), particularly cameras that are streaming live to reduce bandwidth and stitching complexity. However, as soon as you look behind you you’re pulled right out of the scene. Often times to combat just having a black background behind you, a graphic will be inserted (such as a poster frame from the show or stat card from a game).
  2. Stereoscopic 3D : This is one of the more contentious requirements as many people are filming in mono today as it is both cheaper and simpler to capture and stitch per the reasons given above. However, to truly get that sense of immersion and presence (that is the hallmark of VR), you really need to shoot in stereoscopic 3D wherever possible. Stereo 3D vision is how we see in real life and is equally important in VR.
  3. Spatial 3D Sound : Sound is always an important part of any production. In VR it can be critical. Not only does it help with immersion but it is one of the few cues, along with motion and light, to get your viewers attention for an important moment, as they could be looking anywhere. Capturing spatial audio increases your sense of place.
  4. Viewed in an HMD : Finally, none of the above is any good unless you have a method of actually viewing it. Though 360° video is often created for those without a viewing device and allows you to pan around the image, it doesn’t allow you to see in 3D or provide you with spatial audio playback. For the full experience you really must use a proper HMD. The good news is you do not need an expensive HMD. There are some very inexpensive or even free options on the market, with the selection increasing at a dizzying rate. Further, the competitive market of HMDs continues to lead to lower prices and increased number of models for HMDs.

Types of Head Mounted Displays (HMD)

There are many different types of head mounted displays (or HMD) that vary drastically in price and capability ranging from the very simple Google Cardboard to the Samsung Gear VR to the Oculus Rift and HTC Vive. It was the cell phone and its suite of miniaturized components - gyroscopes, accelerometers, compact hi-resolution screens - that led to the resurgence of viable virtual reality and allowed Palmer Lucky to create the first Oculus headset. And it is the cellphone that is the basis for all of them, even the high end Rift and Vive.

The higher-end HMDs provide full body tracking and some also include hand controllers creating a “room scale” VR system that allows you to move about and interact with your environment. But using just your cellphone with some simple lenses housed in a cardboard or plastic enclosure gets you a pretty amazing experience. This will only get better as cell phone manufacturers integrate better VR subsystems into their handsets.

The list of HMDs is ever growing at a breakneck pace but for a good overall list of the current HMDs on the market or in development see the VR Times.

Cameras

In this section we discuss the various types of VR camera rigs you will encounter, some of the gotchas to be aware of with VR cinematography (and how to avoid them), mounts and rigging solutions, the importance of clean plates, and underwater and aerial VR shoots.

Types of VR Cameras

There are many types of camera systems for shooting VR and the space is evolving rapidly. Each has their own strengths and weaknesses and we cover the major forms below. There are many other forms of panoramic cameras but we won’t cover those that don’t allow for video capture such as slit-scan cameras. Where possible, it’s best to research and test each one based on your own needs.

Panoptic

These camera systems are generally inspired from the visual system of flying insects and consist of many discrete camera modules arranged on a sphere, dome, or other shape. The term comes from the Greek “Panoptes” which was a giant with a hundred eyes in Greek mythology.

Jaunt has previously developed camera systems using this multiple-module configuration, including the Jaunt ONE.

Jaunt One Camera

This is by far the most popular type of 360 stereoscopic (3D) VR camera rig and many companies have jumped into the fray by designing lightweight rigs to support a variety of off the shelf camera modules. Being small, lightweight, and relatively inexpensive the GoPro has proved to be the go to camera for snapping together a VR camera rig. In fact, Jaunt’s first production camera, the GP14 and GP16, consisted of fourteen or sixteen GoPro cameras in a custom 3D printed enclosure.

However, there are numerous problems with a GoPro based system including image quality, heat dissipation, and lack of sync. When shooting VR, it is crucial that all of your camera modules are in lockstep so that overlapping images match precisely and can be easily stitched together in post. Out of the box, GoPros have no built-in syncing capability and even when properly synced in post based on audio/visual cues they can drift over time.

Jaunt GP16 Camera

This isn’t to pan GoPro cameras. They have enabled so many different VR rigs is a feat unto itself, but they were not originally conceived for this task and the limitations are showing.

Jaunt has since moved on to twenty-four custom built camera modules in the Jaunt ONE that provide four times the sensor size with better low light performance, higher dynamic range with eleven stops of latitude, better color reproduction, global shutters to prevent tearing of fast moving objects, and most importantly synced camera modules.

The number of cameras and their respective field of view in any given system will determine the overlap of adjacent views. You sufficient overlap between images in order to properly stitch adjacent frames together - more if you want to provide a stereo stitch. The more cameras you have in a rig and the more closely spaced they are to one another also provides a shorter minimum distance to camera allowing subjects to get much closer before stitching falls apart. See Stitching Approaches and Distance to Subject below for more information.

Mirror Rigs

Another type of panoramic 360 camera is the mirror rig. This typically has a number of cameras in a circular configuration shooting up into a collection of mirrors that are facing out into the scene at an angle. A good example of this kind of rig is the Fraunhofer OmniCam.

Fraunhofer Omnicam © Fraunhofer Heinrich Hertz Institute

These rigs can be either mono or stereo and are generally bigger and heavier than other types of panorama rigs due to the mirrors. A big benefit of these rigs however is that the mirrors allow the cameras to shoot into a virtual nodal point within the mirrors that provide minimal or no parallax in the scene making stitching very easy and relatively artifact free.

Because of a shared nodal point, many of these rigs allow for realtime stitching and transmission of live 360 imagery (as there is no issue with stitches having seams). By having two cameras shooting into each mirror, you can create a seamless stereo stitch. The main drawback is the size and weight of these rigs, along with the relatively powerful computer they must be attached to for live stitching.

Fisheye

Many consumer panoramic cameras are of this variety because they are relatively cheap, small, lightweight, and are easily stitched-usually in-camera. Some use one lens, like the Kodak 360 Action Cam, and capture 180 degrees while a two lens system, like the Ricoh Theta, captures a full 360 degrees by stitching the two halves together.

Fisheye cameras are also commonly used in stereo-pair rigs for 180 VR content, as no computational stitching is required since one fisheye sees the full hemisphere.

FKocak Pixpro SP360-4K Camera

Ricoh Theta Cameras

Though they are convenient and easily stitched the quality of this type of camera is relatively low. Many can stream to an iPhone or Android device making them a good remote viewing solution if your VR camera doesn't provide one. See below under Live Preview for more information.

Prosumer versions of these types of cameras also exist with much larger lenses and sensors. Unfortunately all cameras of this type produce only monoscopic images and not stereoscopic 3D images lessening the immersion for VR purposes.

Light-field

Light-field cameras are a more complicated technology to join the VR market. They represent a future of virtual reality filmmaking though their practical use is still a ways off. Instead of focusing light through a lens and onto a sensor, there is a large array of many smaller lenses that capture light rays from every conceivable direction. In order to capture more than a small section of the panorama, the array has to either shoot a section of the circle at a time. Currently, this technology is costly, time-consuming, and computationally intensive to implement. Much of the innovation in this realm was being lead by the now-defunct company Lytro.

Light field capture allows for some pretty amazing things to be done in post including shifting parallax by moving your head in an HMD, refocusing the image, generating depth mattes and stereo 3D, and pulling mattes without a green screen.

Light field cameras were first popularized in the consumer market with the Lytro Illum still camera.

Paul Debevec at Google has recently been demonstrating camera configurations to bring light fields to a wider audience.

The light-field camera outside the mosaic tile house in Venice, California.

Photogrammetry

To fully realize scene capture for VR you need to change your thinking entirely and move from the current inside-out methodology to an outside-in perspective. That is, instead of filming with an array of cameras that are facing out into the scene, surround the scene with an array of cameras that are looking in.

Microsoft has created a video based photogrammetry technology used to create holographic videos for its HoloLens augmented reality headset called Free Viewpoint Video. An array of cameras placed around a green screen stage captures video from many different angles where it is then processed using advanced photogrammetry techniques to create a full 3D mesh with projection mapped textures of whatever is in the scene. Their technology uses advanced mesh tessellation, smoothed mesh reduction, and compression to create scenes that you can actually walk around in VR or AR.

Another company working in this space, 8i, uses a similar array of cameras to capture what they call volumetric video stored in a proprietary compressed light field format. This technology does not create a full CG mesh (though that is an option) but yet still allows you to walk about the scene and observe it from any angle. For more info visit 8i. This sector of volumetric capture and streaming is quickly gaining interest and platforms in recent months, and may yet play a large role in the future of mixed reality.

Whatever the technology or approach, advanced realtime photogrammetry techniques will be an important capture technology in the not too distant future allowing you to fully immerse yourself in any scene. As the technology improves and reduces in cost, it will also allow consumers to truly connect like never before through holographic video feeds and social environments.

© 2018 Jaunt, Inc.