Meta XR Scene Data Collection

When I started working at Meta in 2021 as a software prototyper, my first big assignment was a data collection effort that would be the foundation for the scene recognition capabilities of the Quest VR headsets. It was a huge project involving a lot of very talented computer science and data science people. The idea was to run the headsets through a bunch of different rooms from a typical home environment (living rooms, offices, bedrooms, etc.) while collecting sensor data, then use that data to teach AI algorithms to identify room spaces and furniture.

The first thing to do was create a data collection app. The structure and functionality of the app was done by myself and one other prototyper, along with a couple of data scientists who created a custom plugin to interface with the sensors on the headset. The app was created in Unity and ran on an early prototype for what would become the Quest Pro headset. The app was was passthrough-enabled, so the real world was visible to you while you used it. Users would create and label ‘planes’ in 3D space to represent basic surfaces and objects. Floors, walls, ceilings, windows, tabletops, chairs- anything that would be considered a tangible ‘surface’ to identify in 3D space. In the upper left image you can see an example of what this plane data looked like from the perspective of the headset’s forward facing camera.

Since creating 3D planes was a core requirement for the app, and considering that we would be working heavily with ‘plane data’ even after the initial data collection effort, I began working on a modular plane creation tool. This tool was wrapped up into a drag-and-drop Unity prefab that worked standalone using a mouse or in VR using controller raycasting.

On the right you can see the plane tool being used to create a new plane on a shelf. The front left and right corners are chosen using the lower tips of the controllers, which had little nubs on the early prototype that were perfect for this situation. After that the opposite edge of the plane is positioned and a new plane is created. The blue and red grid is just the passthrough boundary warning me about getting too close to the wall.

Once the plane is created it reveals handles for adjusting the size and rotation. There is a green center plane that can be dragged to move the position around. Grabbable yellow edges and red corners can be dragged for resizing. A rotational gizmo is shown on the edge or corner of the plane closest to the camera, which can also be dragged around to rotate the plane. In the second image on the right you can see this in action.

In addition to creating planes from user input, we would need to be able to load planes from existing data. Plane data on the app side was generally stored as 4 vertices (v0,v1,v2,v3) in CCW order, much like any 3D Polygon. It was simple enough to regenerate a plane from three of these verts just by finding a length and width vector from v0->v1 and v0->v3, then using a cross product to find the normal. However, there were also situations where we had an array of vertices with arbitrary size and vertex order. These situations arose from early explorations with having users create planes on top of circular tables or other non-rectangular surfaces in which they would place as many vertices as they wanted all the way around the perimeter of the surface. I referred to these as ‘polyplanes’, and to handle them I came up with my own parsing function that worked roughly like this:

-Find the longest distance between two verts in the array (vA and vB)
-Create our width vector vA->vB
-Find the furthest away vertex from vA->vB (vC)
-Find the furthest away vertex opposite vC across vA->vB (vD)
-Create our length vector vC->vD

This simplified the arbitrary vertex array into a rectangular 4-vertex plane. You can see this in action to the left here with 6 vertices as I drag one of the verts around and the plane updates live. This method was quick and reliable, however it had a couple limitations: First, it did not always create the smallest possible plane in situations where the furthest apart vertices would be the diagonal of a plane. For instance, feeding the points for a square 4-vertex plane through this function would result in a slightly larger plane rotated by 45 degrees, since the furthest away vertices in this case is the diagonal of the original plane. Second, we have to make assumptions for the normal of the plane. In this case we run a cross product between vA->vB and vC->vD, then compare the angle of our cross product to the world space Up vector. If this angle difference is less than 90 degrees, then the cross product is our normal, otherwise the inverse of the cross product is our normal. In our case these issues were not a concern because the incoming vertex arrays mostly represented circular tabletops.

While working on this plane tool, I was also getting a feel for how the plane data was being interpreted and stored in the database. In the database, I noticed that planes had a property consisting of four decimal numbers. One of the data scientists told me this was the Equation of Planes which was new to me, so I went on a side quest to understand it. Specifically, the 4 numbers I was seeing were values A, B, C, and D of the Scalar Equation of Planes, which in full looks like Ax+By+Cz=D. To help myself understand this concept, I made a little gizmo tool that would let me play with the values and see how it corresponded to a plane, which you can see below. The Equation of Planes represents an unbounded plane that extends infinitely which is hard to visualize, but I realized I could use the normal values (ABC) as the points of intersection with each world axis, while still applying the scalar value (D). With this in mind I fed the points into my polyplane function which gave me a more tangible visual interpretation of the plane I was dealing with.


Once the data collection app was ready, it was handed off to a team of data collectors. The data collection team had prepared a bunch of rooms with furniture to look like rooms in a normal household. They would bring a headset into one of the rooms and run the app. To begin, they would go around the room and use the plane tool to create planes on all the surfaces in the room and label them- walls, tables, windows, etc. Once everything was labeled, they would begin a ‘recording’ that would continuously log the headset position, camera frames, and other spatial sensors on the device. They would spend a couple minutes walking around the room getting a good perspective on all the labeled surfaces in the room. This process was repeated many times, then the recordings were pulled from the device onto a pc and uploaded to the database. The same thing was done in all the different rooms, and once a room was complete it would be rearranged into a ‘new’ room.

While straight forward in theory, this step ended up being one of the most stressful assignments of my career. I flew out to the data collection location and spent the first week of the effort helping to get things up and running. The app itself had some minor bugs to iron out, and it was very valuable to see the data team using the app in person so we could make improvements along the way. The issue was using the early prototype devices along with the operating system that was still in development. The headsets were constantly crashing, losing their spatial tracking orientation, overheating, controllers were constantly disconnecting, headsets constantly had to be rebooted; the list of issues was endless. I spent countless hours trying triaging issues and find solutions. Some issues were solvable- I noticed that the devices often lost tracking orientation if the user got too close into a corner where the walls did not offer enough contrast for the optical tracking to work. I also discovered the window shades, which were meant to prevent the reflective window glass from causing tracking issues, were made of a fabric mostly invisible to the infrared sensors on the headset, so it was still seeing reflections through the shades and getting confused. Some other issues required daily flashing and testing of different OS builds to find which one was most stable, but this was a tedious, inconsistent, and largely inconsequential process. The harsh reality was that we were stress-testing hardware and software that was nowhere near ready for action like this. We still managed to meet our data capture quota, but it took some re-evaluation of the goals and timeline to do so.

The whole idea of doing this was so that the manually labeled planes could be compared to the recorded sensor data and train AI algorithms learn to identify surface planes on their own. If you use any modern Meta VR headset and run Assisted Scene Capture or one of the other space scanning features, this data collection effort was the foundation for making that technology work. Despite the problems, this project was immensely valuable for me and my career path. I learned a lot of new things in 3D geometry, as well as live on-the-ground troubleshooting and problem solving. As my first big assignment at Meta, it also give me core knowledge and experience that would carry on through the rest of my time working there.