Having patented technology for see through AR display in 2016 that is cited by Apple [1] and knowing how crazy hard it is, it's a little bit refreshing to know that Apple recognized pass-through HMD AR as too hard and decided to invest in compensatory technology instead of trying to solve the hard see through AR problems.
(Caveat: I don't know a lot about pass-through HMD AR, and I assume you're incredibly smart and that the patent is innovative.)
> …it's a little bit refreshing to know that Apple recognized pass-through HMD AR as too hard and decided to invest in compensatory technology…
I understand the framing as "compensatory technology", but is it possible that what Apple's doing is the simpler and better way to solve the problem? Pass-through AR strikes me as an old-school analog approach, like optical printing for special effects. But a 100% digital vision pipeline seems like it could unlock interesting capabilities like "night vision", new ways of highlighting interesting objects, etc.
I think the most prominent attempt at see-through AR was the Microsoft HoloLens. But if you've actually tried the HoloLens, the Field of View is atrociously, tragically small. The first HoloLens had a field of view of 30°*17.5°. The second HoloLens improved to 43°*29°, but it's still best described as "cramped." Couple that with almost all of the compute budget for the device going into vision processing and having very little compute left for actually running apps (the first HoloLens having a 1Ghz Intel Atom from 2015, the second a superior... Snapdragon 850).
The other problem, of course, is that nothing can be truly solidly-colored. Everything has some opacity - which, combined with the FOV issue, is why HoloLens was never marketed as having anything to do with VR.
The vision processing was done with a separate SoC. Your applications were not contending for compute time with the vision processing. That's just how anemic the CPU was.
> is it possible that what Apple's doing is the simpler and better way to solve the problem? Pass-through AR strikes me as an old-school analog approach, like optical printing for special effects.
I think they're different and not one better than the other. If I wanted to drive with an HMD on, or otherwise be in a situation where it could be deadly to have my sight turned off for even a second or to even have some lag or stutter or other glitch in my eyesight, I'd much rather have a pass-through AR HMD. One's sense of sight seems much more reliable with it by its very nature. You simply don't have those modes of failure with transparent plastic, no matter what's going on in the hardware/software.
Your steering wheel and pedals are digital too. Airliner controls have been digital for decades. Factories have life or death safety systems that count on silicon and a software stack to respond quickly and without error.
Steering is analog. One of reasons why by-wire steering control did not succeed is user feedback, it was tried and failed. Brake pedal is analog too, just assisted and override-able. Gas pedal had been all digital for some time.
Also, it’s true that Airbus sidestick is fully electronic, but there’s a nuance in engine control being in the center on airplanes. Until automated, engine control on airliners was a task of a flight engineer, like it still is on ocean going ships. So fully digitalized engine control is a replacement to the FE, not necessarily an automation of what originally was a pilot’s task. Which means, I think, pushing a vehicle forward was never necessarily the responsibility of a pilot or a driver, though controlling where not to go is.
That was the failed attempt I was referring to, literally a decade ago. Toyota is doing a retry with bZ4X/Lexus RZ, and those three are the only known cars to have passed prototype stages with (front wheel)Steer-By-Wire.
In theory the wavefront emulator is the simplest display - all of the "pixels" are rendered in your brain so you don't have to build an actual "display"
HOWEVER if someone can get the input -> photon production pipeline to be less than ~10ms, then that does solve a lot of the rendering issues, however it doesn't solve all of the other long term problems that come with that amount of hardware - including weight and complexity.
That said, there's a lot of known-unknowns that need to be solved, for example I don't have a solution for micropiezo resonance issues that I'm sure will crop up.
Agreed. In fact, this approach is also likely better for military applications which I hope they explore.
Currently, NVGs that are fielded by soldiers are already displaying images in a way that's not pass-through (using classic image intensification tubes). Something like the Apple vision headset in a lighter and more durable form factor would allow for eg; fusion imagery (fusing visible, thermal, and night vision).
I’ve always wanted an NVG to play with, and from watching firearms nerd rants I’ve learned:
- real NVG costs way above my budget, and all comes with defects. Realistically the tubes in civilian markets are all defective returns.
- even real NVGs can’t work in complete darkness, such as in a closet; moonlight or starlight is needed.
- real NVGs on firearms are often used in conjunction with an IR laser and a flashlight for aiming; advantage is solely that you have to have a goggle to see it, not that you can operate without making any emissions.
- partly due to above points, plain old CMOS cameras with modified 1” Sony sensors for better low light performance is starting to match practical usefulness of a real NVG(it’s Cold War tech so no wonder here). Latency is an issue, but more of “is an issue” and no longer a blocker.
… Vision Pro type of devices replacing NVGs indeed seem like a near future possibility.
I have a few NVGs. They're definitely expensive (a good setup is $10k+) but it's definitely not true that they're all defect returns. L3 image intensifier tubes that enter the civilian market are just as good as the military gets for the same product. For cheap (<$10k) NVGs that's a totally different story though.
They do work in complete darkness! You just need an IR illumination source. If not using a weapon mounted illuminator you can just use an IR flashlight. Thermal doesn't require external illumination but isn't anywhere near as useful for navigation. That said, good IR laser + illuminator weapon mounted systems are a few thousand bucks too.
My point about vision pro type devices is they can take a feed from real NVG tubes and potentially regular cameras and thermal and combine them. It'll be game changing when the tech is ready. It's not a matter of if but when.
It's not in Apple's interests to get into military contracting because of the potential to damage their brand, as well as avoiding the geopolitical tensions that would invite.
I'm quite certain they're still trying to solve the hard see-through AR problems, and are hoping to release a future
Vision headset with a true see-through display.
But otherwise I agree, it makes sense for them to focus on what they can do best with current tech as a stopgap. With current see-through HMD tech, AR ends up incredibly disappointing. (See also: Hololens & Magic Leap's limited FOV)
My understanding is that occlusion in "see through" AR is an unsolved problem, everything looks ghostly and somewhat semitransparent. Until someone solves that I suspect the re-projection method is the only viable option.
Yes "projecting black" is effectively impossible - every once in a while someone from Magic Leap would claim they had solved it but everyone knew it was bullshit
I'm sorta surprised by this. Don't liquid crystal displays in laptops do exactly that?: LCDs are a controllably transparent array that is overlaid on a uniform backlight. So why can't one create a set of glasses where the lens are an LCD array without the backlight?
An LCD simply turns "off" to display "black" - so the emissions stop (relative to surrounding pixel emissions)
With a projector, you can't "throw" nothing (aka black). As a result "projected black" is simply lack of projection.
In the case of a translucent or transparent reflection or waveguide surface - which is what the projection reflects off of - "black" is whatever the darkest part of the surface is. In effect whatever else is emitting from the surface that you're looking at will change the depth of "black" you get.
This is why the Hololens and other see through AR devices are always tinted, to set a higher threshold for "black" than the surrounding unaided view.
LCDs do not emit light. They have a light emitter (or reflector, for passive monochrome displays ala an LCD watch face) behind them and the liquid crystal part selectively allows that light to be blocked or pass through.
The are three layers of polarizing material. The two outer layers are at right-angle polarizations to each other and normally would be completely opaque on their own. When power is applied to the liquid crystal, it twists the crystal's polarization to be at a 45° angle to the other two layers, which then permits some of the incident light to pass through.
An optically transparent waveguide display can use an LCD layer to block light coming through the front and then not render graphics on that area of the display. It will be opaque black at that point (though rather fuzzy around the edges, as the LCD won't be in focus).
Magic Leap 2 actually employs this technique. It's... a lot like the rest of the device: a good idea on paper.
That’s a fair response, though I hope you’d agree that in the context of discussing pass through Vs see through “black” the majority of use cases are indeed fully occluding/lit LCDs near eye and not ML style lenses.
I don't know. I mean, their newest device is better than the HoloLens 2. But like, that's just a relative statement. Waveguide displays are still objectively dogshit.
You have to be looking through an LCD for that to work, but all existing AR HMDs are head mounted projectors with transparent mirrors, with no means to control, say, pixel level local reflectivity of the mirror. The mirror is just a flat mirror.
If you forgo the projector part and replace the mirror with a transparent LCD, it’s just too close to your eyes and you can’t see anything. If you add a microlens array to the LCD, now the LCD might come to focus but background becomes way too far-focused, and you can’t see anything either.
If we were on an Enterprise-D, I guess I could just ask replicator for a passive illuminated metamaterial light field image combiner with integrated processing than runs on bus power from DisplayPort input, but we are not there yet.
So, for now, our AR HMDs can only brighten pixels against backgrounds.
I think I understand. The intended idea was, e.g, a pair of "glasses" where in front of each eye are two layers, (1) a partial transparent mirror and (2) an LCD array, so that you could darken a pixel to block light coming through the mirror, and you would ideally only see light from the projector that was being bounced off the mirror and entering your eye. But you're pointing out that those LCD pixels won't be at the same focal depth as the objects out in the world you're looking at. Thank you.
This is also what I intuited. Can you give some examples of things that require “projected black” , especially those that couldn’t be solved by using a darkened room?
Hopefully we are not forgetting about Bosch's Retinal Projection which solves many problems with lensing HMD AR. If I was a betting man I would say Bosch came up with this tech almost specifically for Apple to integrate it into interactive glasses.
I'm not aware of all the details around the technical complications, but from a physics perspective, you can't make something darker and more opaque by adding more light. Therefore, an AR headset needs to be able to at least partially block light in order to make convincing images. It seems a lot easier just to go with Apple's approach of blocking all light rather than try to develop tech that will only selectively block the light behind the AR objects while allowing other light through.
The blocking all light approach also allows you to hide other potential weaknesses of a device. For example, a lower field of view is much more distracting in a pass-through AR device as you still have your full peripheral vision. VR devices will generally black out the light outside of the FOV making it easier to ignore.
1. Field of view is limited to existing optics miniaturization
2. Subtractive shading (rendering black) might not be solvable
3. Variable focus objects in the same scene requires projecting n>2 significantly different wavefronts - not solved how to do this with a single vibrating element
Which pixels need to be darkened to shade off an object depends on the distance to that object, and will also block the light coming from other objects at other distances. It's very inconvenient.
The Magic Leap 2 does. It works ok. It's not as completely useless as armchair quarterbacks on line would have you believe. But it's also not the greatest thing. Objects are a little fuzzy around the edges and friend on the accuracy of the object surface detection. So moving objects can lag.
This is not the same as the Vision Pro's display. This paper describes tracking a single other person and displaying a perspective-correct rendering for them, but the Vision Pro displays a perspective-correct rendering for many viewpoints at once using a lenticular screen.
Apple's solution works for >1 people at the same time and doesn't require any external tracking (though it's already doing the external tracking regardless), at the cost of lower resolution and only being correct in one dimension vs two.
> There are several established ways to display 3D images. For this research, we used a microlens-array light field display because it’s thin, simple to construct, and based on existing consumer LCD technology. These displays use a tiny grid of lenses that send light from different LCD pixels out in different directions, with the effect that an observer sees a different image when looking at the display from different directions. The perspective of the images shift naturally so that any number of people in the room can look at the light field display and see the correct perspective for their location.
> As with any early stage research prototype, this hardware still carries significant limitations: First, the viewing angle can’t be too severe, and second, the prototype can only show objects in sharp focus that are within a few centimeters of the physical screen surface. Conversations take place face-to-face, which naturally limits reverse passthrough viewing angles. And the wearer’s face is only a few centimeters from the physical screen surface, so the technology works well for this case — and will work even better if VR headsets continue to shrink in size, using methods such as holographic optics.
Do you know where one could read more about Apple's technique? I don't know much about lenticular displays or why the trick only works in one direction (presumably the horizontal one).
It could work in both dimensions but you're sacrificing even more resolution by doing it that way. For example imagine you have a 1000x1000 pixel display (I just made this resolution up) and you stick a 1D lenticular screen on top with a pitch of 10 pixels. You've effectively split the display into 10 separate 100x1000 displays that are each view from a different angle. You could instead use a 2D lenticular screen and split it up into 100 100x100 displays viewable from a different angle in a 10x10 grid at virtually no extra $ cost. However, you're displaying at 1/10th the resolution just to be able to support perspective-correct views from above or below, which are way less common than from the side.
Okay, "each person" does indeed sound like "multiple at the same time", but it could also mean "any single person". Then the lenticular lens display would only need to produce two images, for the left and the right eye.
Hey @ramboldio, as one of the authors of the paper, do you have insider knowledge that Apple got the idea from your paper vs. Facebook's "bizarre 'reverse passthrough'"¹ prototype from 2021? Is there a licensing arrangement? (Just curious, it's a really interesting idea in any case!)
Could you say more? I agree an Apple VR headset has been in the works for longer than 2 years, but is it that crazy that they were working on multiple approaches and didn't settle on a final design until after 8/2021, which included using some non-trivial ideas from that paper?
Can't really give out any non-public info unfortunately.
I'm not saying that everything was fixed in stone by 8/2021, but any big hardware features like the front-facing display would take longer than that to develop start-to-finish, so I'm just refuting the possibility that Apple could have started development of a front-facing display on the headset and had it ready on the final product in <2 years.
It's not necessarily that a display itself (or any other individual component really) takes >2 years to develop, but that a tightly integrated cutting-edge system can't have significant hardware features added on <2 years before the final product is demoed to the public.
I'm gonna go ahead and play:'I work near hardware in FAANG, this is totally possible to pull off in 2 years'
...but if you're asserting 'I work at Apple, impossible', I'll give it to you.
Generally people believe way too strongly that phones / other hardware /etc. are set 3 years in advance.
Note it's well-reported Vision Pro just got to DVT in the last 4-6 weeks.
Yeah I definitely agree with you that a FAANG could add a substantial feature to a headset in 2 years, but I think that there are a combination of circumstances that make it pretty impossible in this case.
In my mind it's some combination of:
- New product line (this would be easier in a well established product like iPhone, Mac, etc. or from Quest 2 -> Quest 3). A lot of decisions have to be made much further in advance because you're starting from a clean slate and have to have a final product at the end of similar quality to the iPhones that have been iterated upon for 16 years.
- The Vision Pro is much more constrained in a few key areas that are hugely impacted by the addition of another display: size, energy draw, compute, weight. Much more so than something like a Mac would be. If an additional display wasn't in the budget 2 years ago then you really aren't gonna just find the space for it in all those key areas.
- Custom silicon: Any feature that requires a decent amount of compute and/or IO bandwidth will have to be accounted for when designing the chips. Meta's headsets use off the shelf Qualcomm GPUs and they could in theory bump up a level later in the design process if they need more hardware (not as easily as I described but still possible). Apple simply doesn't have that option.
- By virtue of this being Apple and not Meta/Google/Microsoft/Amazon. I'm not knocking those other companies, but they are differently positioned in the market and are ok releasing more varied and less polished products just to see what sticks. Apple enters later in the game with a product that has had more time to be refined. Google Glass, Oculus, HoloLens, etc. all paved the way for the Vision Pro and it wouldn't really work the other way around in my opinion.
Happy to give insight and answer any more questions you might have! I'm no longer at Apple but I did work on the Vision Pro and am super excited to be able to talk about it now. Of course I can't spill all the secrets and am being careful to only say stuff that's public knowledge, but it's a great feeling to finally talk about it after spending the past 3 years hush-hush.
I have extremely mixed feelings about the fact that Apple seems to have "solved" AR by faking it through VR, and that means that with some iteration, a battery improvement, and a price drop, this could truly be the next revolutionary device that people can't help but want to use.
And I am scared of what it means for our society when "eye contact" no longer means a direct connection in person, but an indirect one through 2 cameras and 2 screens.
Obviously this is old hat for Facetime, and all remote collaboration. But in-person too?
I think that more than anything, the inclusion of this feature is a hint about where Apple intends to take this product, and that they want to send a crystal clear message that this device is meant for interacting with other people.
And it seems this is such an important aspect of the product that they're willing to reduce the addressable market from a cost perspective.
This, to me, is what makes this product intriguing. And it makes me think that Apple's real goal is something closer to a pair of glasses, and they just know they can't get there without a long series of iterations.
This product isn't in the public's hands yet, so I think it's a bit early to conclude that it's categorically creepy.
But even if it ends up being a bit creepy, a) I think Apple is well aware of that and b) again I think this highlights how important they think it is to send the message from day 1 that solving for isolation is a top priority and intrinsic to their end-goal.
> This product isn't in the public's hands yet, so I think it's a bit early to conclude that it's categorically creepy
I was just going by the materials that Apple has released. It's true that I haven't personally seen the device in action in real life. But what Apple has shown certainly triggers a "creepy factor" in me.
Whether or not this is an issue for many people, and whether or not Apple will address it, isn't really relevant. What I've seen right now creeps me out a bit.
When google glass came out, there was the question of if you want to talk to someone who has a camera running at all times, (recording?). Its weird how perspective shifts.
Of course in some places if its recording all the time it might be in some weird wire taping situation.
It's a shame I don't trust Apple not to do the same anymore. If the advertisement creep into MacOS is any indication, the Vision Pro might have more pop-ups than the Quest.
Huh. To me it massively decreases the "creepy factor", as the idea that someone is watching me when I can't see their eyes looking at me is where the creepiness originates... the only thing we normally interact with where that happens with people are mirrored sunglasses, and those are themselves often associated with cops and (if you even consider this a separate category ;P) creeps.
This scenario seems functionally similar to a FaceTime call or other form of video communication. Both parties know they’re not literally gazing into each other’s eyes, but the technical medium creates a pretty decent long distance approximation of doing just that.
A video call will never replace the real thing, but I don’t think I’d say it’s dystopian.
And I don’t find the tech in this headset all that different, the exception being that there isn’t a 2nd set of cameras and screens on the non-headset side of the conversation.
On the other hand, maybe it is dystopian, but if so, I’d argue that dystopia is already here and has been for awhile.
I don't know what kind of Facetime calls you're having but 99% of the time I'm looking at someone's face, but we're not making eye contact.
The video is almost never exactly where the camera is, and if the person is using an external camera or a secondary monitor, the eyeline will never line up.
It's fine! But you never have the illusion that you're talking to the person directly. Which may be is good enough for adults working on professional projects.
But it's not good enough for a parent interacting with a young child.
> This scenario seems functionally similar to a FaceTime call or other form of video communication.
But in a video call, the "distance" the video causes is something that you put up with in exchange for a greater benefit. If I'm talking with someone that has a screen in front of their face showing me their face, that's distance that provides no benefit, so it's a loss without gain. It makes the person seem sketchy.
But you still can't see their eyes. You're seeing an image on a screen. That strikes my creepy nerve because it seems even more opaque than if they are wearing sunglasses or something. Rather than just being aware that their eyes are being hidden, it feels like an attempt is being made to actively deceive me.
It's the same sort of creepy I get when someone is wearing a realistic mask.
Look on the bright side, maybe you can hack the vision pro to emulate direct eye contact in a conversation, when in reality you're scrolling hn comments.
I have to try it out to see whether it's worth it. But since the display can be fairly low-resolution, I don't except that it adds a lot of cost. Weight would be a bigger concern to me..
While the displays are significant areas of power draw, for this product I don't think they're the low-hanging fruit. This is a device that's performing a whole bunch of computation in real time: video, lidar, infrared, eye tracking, and then it has to drive two 4K+ displays, one for each eye, with 3D-accelerated content being rendered at all times.
Go to an AR web page on your phone and start playing around with 3D objects in your space and you'll notice your phone getting significantly warmer and drawing more power. The Vision Pro is doing this literally all the time.
Also, the camera and sensor work that is tracking your eyes has to happen whether or not there is an outward-facing display.
Apple makes a watch with a display that is always on, and their phones have high resolution OLEDs that can stay on for over 13 hours (iPhone 14 Pro Max 150 nits brightness doing continuous 5G web browsing).
Hm, doesn't it use the outward-facing display at all times, to display your status (whether you're fully immersed, are able to see others, or are recording a video, etc.?) I'm sure there's more processing required to show the eyes, but it's still illuminating the display when it's showing these other states. These could be done via simple LED indicators, saving some energy.
Considering the amount of processing, rendering, sensors, eye tracking and 10x higher resolution displays all happening inside the headset, it's unlikely that the external display consumes enough power to affect battery life significantly. It would probably only save a few minutes at most.
How heavy is an OLED panel? Isn't it like a piece of flexible plastic?
Early reviewers seem to say that the metal construction of the Vision Pro seems to be contributing a lot to its weight. Most other headsets are all plastic.
I think it's one of the most important features of the design. Apple is trying to get VR/AR users out in public so that it can be a mainstream device.
In the long run, adding a second screen isn't that expensive, and the cameras that capture the video of your eyes already have to be inside the system to perform eye tracking. If smartphone manufacturers can make folding phones with second screens for under $1000 I think that the outward-facing display is not the lowest hanging fruit for cost reduction.
Anyone who thought reprojection was the solve for AR considered this solution since you'd have to find a way to simulate glass. The first consumer grade passthrough VR was the GearVR in 2015 or so, so I don't think the idea was originally conceived this late.
I have extremely mixed feelings about the fact that Apple seems to have "solved" AR by faking it through VR, and that means that with some iteration, a battery improvement, and a price drop, this could truly be the next revolutionary device that people can't help but want to use.
And I am scared of what it means for our society when "eye contact" no longer means a direct connection in person, but an indirect one through 2 cameras and 2 screens.
Obviously this is old hat for Facetime, and all remote collaboration. But in-person too?
[1]https://patents.google.com/patent/US10757400B2/en