As the internet has evolved, and connectivity along with it, visuals have increasingly become the key element that stands out, and grabs user attention in ever-busy social feeds.
That started with static images, then moved to GIFs, and now video is the most engaging type of content. But in essence, you really need engaging, interesting visuals to stop people mid-scroll, which, for the most part, is far more effective than trying to catch them with a headline or witty one-liner.
Which is why this is interesting – today, Google has outlined its latest 3D image creation process called ‘LOLNeRF’ (yes, really), which is able to accurately estimate 3D structure from single 2D images.
There are many situations where it would be useful to know 3D structure from a single image, but this is generally difficult or impossible. Read about a framework that learns to model 3D structure and appearance from collections of single-view images → https://t.co/h4xpWBwbaA pic.twitter.com/mQnq8ZMKFM
— Google AI (@GoogleAI) September 13, 2022
As you can see in these examples, the LOLNeRF process can take your regular, 2D image and turn it into a 3D display.
Which Facebook has also offered a version of for some time, but the new LOLNeRF process is a far more advanced model, enabling more depth and interactivity, without the need to understand and capture full 3D models.
As explained by Google:
“In “LOLNeRF: Learn from One Look”, we propose a framework that learns to model 3D structure and appearance from collections of single-view images. LOLNeRF learns the typical 3D structure of a class of objects, such as cars, human faces or cats, but only from single views of any one object, never the same object twice.”
The process is able to simulate color and density for each point in 3D space, by using visual ‘landmarks’ in the image, based on machine learning – essentially replicating what the system knows from similar images.
“Each of these 2D predictions correspond to a semantically consistent point on the object (e.g., the tip of the nose or corners of the eyes). We can then derive a set of canonical 3D locations for the semantic points, along with estimates of the camera poses for each image, such that the projection of the canonical points into the images is as consistent as possible with the 2D landmarks.”
From this, the process is able to render more accurate, multi-dimensional visuals from a single, static source, which could have a range of applications, from AR art to expanded object creation in VR, and the future metaverse space.
Indeed, if this process is able to accurately create 3D depictions of a wide range of 2D images, that could greatly accelerate the development of 3D objects to help build metaverse worlds. The concept of the metaverse is that it will be able to facilitate virtually every real-life interaction and experience, but in order to do that, it needs 3D models of real world objects, from across the spectrum, as source material to fuel this new creative approach.
What if you could just feed a catalog of web images into a system, then have it spit out 3D equivalents, for use in ads, promotions, interactive experiences, etc.?
There’s a range of ways this could be used, and it’ll be interesting to see if Google is able to translate the LOLNerf process into more practical, accessible usage options for its own AR and VR ambitions.