Tech Convergence Will Spur Demand for New ADAS Technology

TR10: Augmented Reality

TR10: Augmented Reality

Markus Kähäri wants to superimpose digital information on the real world.

By Erika Jonietz

Finding your way around a new city can be exasperating: juggling maps and guidebooks, trying to figure out where you are on roads with no street signs, talking with locals who give directions by referring to unfamiliar landmarks. If you're driving, a car with a GPS navigation system can make things easier, but it still won't help you decide, say, which restaurant suits both your palate and your budget. Engineers at the Nokia Research Center in Helsinki, Finland­, hope that a project called Mobile Augmented Reality Applications will help you get where you're going--and decide what to do once you're there.

Last October, a team led by Markus Kähäri unveiled a proto­type of the system at the International Symposium on Mixed and Augmented Reality. The team added a GPS sensor, a compass, and accelerometers to a Nokia smart phone. Using data from these sensors, the phone can calculate the location of just about any object its camera is aimed at. Each time the phone changes location, it retrieves the names and geographical coördinates of nearby landmarks from an external database. The user can then download additional information about a chosen location from the Web--say, the names of businesses in the Empire State Building, the cost of visiting the building's observatories, or hours and menus for its five eateries.

The Nokia project builds on more than a decade of academic research into mobile augmented reality. Steven Feiner, the director of Columbia University's Computer Graphics and User Interfaces Laboratory, undertook some of the earliest research in the field and finds the Nokia project heartening. "The big missing link when I started was a small computer," he says. "Those small computers are now cell phones."

Despite the availability and fairly low cost of the sensors the Nokia team used, some engineers believe that they introduce too much complexity for a commercial application. "In my opinion, this is very exotic hardware to provide," says Valentin Lefevre, chief technology officer and cofounder of Total Immersion­, an augmented-reality company in Suresnes, France. "That's why we think picture analysis is the solution." Relying on software alone, Total Immersion's­ system begins with a single still image of whatever object the camera is aimed at, plus a rough digital model of that object; image-­recognition algorithms then determine what data should be super­imposed on the image. The company is already marketing a mobile version of its system to cell-phone operators in Asia and Europe and expects the system's first applications to be in gaming and advertising.

Nokia researchers have begun working on real-time image-recognition algorithms as well; they hope the algorithms will eliminate the need for location sensors and improve their system's accuracy and reliability. "Methods that don't rely on those components can be more robust," says Kari Pulli, a research fellow at the Nokia Research Center in Palo Alto, CA.

All parties agree, though, that mobile augmented reality is nearly ready for the market. "For mobile-phone applications, the technology is here," says Feiner. One challenge is convincing carriers such as Sprint or Verizon that customers would pay for augmented-reality services. "If some big operator in the U.S. would launch this, it could fly today," Pulli says.
Microvision's Color Eyewear platform is uniquely suited to meet the requirements for a consumer augmented reality solution based on our expected combination of see-through capability, brightness and readability in any ambient lighting condition, and lightweight, fashionable form-factor. You just can't do this with any other display. Holding up a phone screen to do AR is just not going to be a compelling user experience.

On another note, I've done transcripts of some form or another for every conference call since April 2004. I may be tooooo busy to get this one done (and it's a long one!). Thanks.


  1. "On another note, I've done transcripts of some form or another for every conference call since April 2004. I may be tooooo busy to get this one done (and it's a long one!). "

    A top exec doesn't warrant an Administrative Assistant that can perform in his absence? Poor excuse. LOL You're the best. Thanks

  2. A top exec doesn't warrant an Administrative Assistant that can perform in his absence?
    That was exactly the kind of waste and abuse of shareholder value we saw under RR and Co. I would rather see some revenue than read Ben's comments pertaining to the CC. More selling and less talk.

  3. Sorry,

    I believe Ben understood I was using "Sarcastic Humor". Forgot that someone else might interpret it differently. My apologies. I agree with your sentiment. Was thanking Ben for his past and present efforts. He is "The Best"

  4. Wow, this is exactly the "killer app" needed to make color eyewear exciting to the masses.

  5. Have you found a solution for efficient tracking of the Color Eyewear?

    Ronald Azuma (prominent Augmented Reality (AR) researcher) concluded the following in a paper from 1993:

    "First, a tracker must be accurate to a small fraction of a degree in orientation and a few millimeters (mm) in position. Errors in measured head orientation usually cause larger registration offsets than object orientation errors do, making this requirement more critical for systems based on Head-Mounted Displays (HMDs). Try the following simple demonstration. Take out a dime and hold it at arm's length. The diameter of the dime covers approximately 1.5 degrees of arc. In comparison, a full moon covers 1/2 degree of arc. Now imagine a virtual coffee cup sitting on the corner of a real table two meters away from you. An angular error of 1.5 degrees in head orientation moves the cup by about 52 mm. Clearly, small orientation errors could result in a cup suspended in midair or interpenetrating the table. Similarly, if we want the cup to stay within 1 to 2 mm of its true position, then we cannot tolerate tracker positional errors of more than 1 to 2 mm."

    Nokias solution to AR uses not only the gps, compass and accelerometers but also a camera. It's much easier to augment a captured image with information in the display of mobile than using only the compass and accelerometers.

    But problems are to be solved! Right?

    MVIS has the best currently available display technology for AR. You just need to figure out how to do the tracking! Can the MVIS technology be used as an image capture device? Yes it can!

    Why not capture both the movement of the outside world as well as the movement of your eyes using the MVIS image scanning technology.

    Combine that with a pair of nicely designed Color Eyewear and you MVIS shareholders will have your fortune made. :-)

    Disclaimer: Well it is probably not that easy... But using image capturing technology is necessary. How would you else be able to put a AR generated name tag to all the faces that you do not recognize.

  6. lars,

    Clearly, if you want to superimpose imagery onto specific planes in the world, you'll need a camera. A combination of GPS (which are accurate on the order of a few meters), compass, and accelerometer could give you the initial 'position' from where image recognition would be used to specify and track items.

    T-immersion has been doing completely software based tracking for a while, check out this video:

  7. Well... I don't think a "cup of coffee" is a good example - itd be for larger things like buildings. You'd only need to know where the eyewear was pointed - not the eyes themselves as the eyewear is projecting from a fixed point.

    With all the advances in mapping/gps (just look at microsoft's live maps) couple that with image recognition we are quickly converging on making this possible. Affordable mvis eyewear is still a ways away so its looking like all these technologies could come available around the same timeframe.

  8. chris, I agree with you. It depends on the application!

    anonymous, thanks for the link. That is a great illustration of what you can do with camera based AR. (the video shows technology that merges a video stream with animated 3d models and movies). In Nokias and Total-Immersions case the real objects you see are first captured by a camera and then shown on a display. In the real world beyond YouTube, you want to look directly at the real objects and not at them in a display...

    I assume the MVIS Color Eyewear will be using similar technique as the Nomad. That means that you look through a tilted semi-transparent mirror directly at the real world and not at a display where a camera has captured images of the real world. If you want to use camera based tracking for optical see-through AR you would have to calculate the real world position of objects in the video stream and then use their coordinates to calculate where the graphics augmenting them should be drawn on the semi-transparent mirror. Then you loose the benefit of camera based tracking where there is no offset between where the camera sees the objects and where the augmented graphics should be. I was thinking, is it possible to keep a camera pointed such that its captured video stream is perfectly in match with the direction your eyes look through an optical see-through display?

    Assuming this is possible. Anything’s possible right? Then there would still be a delay between the captured video frames from the camera and the direct view of the real world. So the question is how fast would the video camera have to be to keep up with your head movement and be able to correctly position the augmented stuff in your view of the real world? It all depends on how fast you can move your head. Or how fast you *need* to move your head. How fast can you move your head? Say when moving your gaze from facing in the direction of your left shoulder to the direction of your right and still be able to focus on a couple of objects on the way? I suggest that you could do that in a second. Moving your head like that in one second means that you cover an arc distance of 180 degrees. For example to be able to position objects with no more than one degree offset you would therefore need a 180hz camera. Is that speed is enough? Probably much less i enough and it also depends on what other tracking equipment such as compasses, accelerometers, etc that you can combine with the camera based tracking. It also depends on how the display will be used. Will the users take a slow walk in the country side looking at flowers (augmented with their latin names) or will they dodge virtual bullets in an augmented reality game?

  9. lars, you bring up a good point. When using see-through AR Eyewear, how do you figure out where to place the virtual items based on the video stream? I suppose one way to do this would be to 'calibrate' the equipment to match your field of vision.

    This could work in an automated manner and be initiated by the system upon first use. Based on an estimation of the embedded camera's disposition to your eye-level view, a virtual circle would be superimposed on top of the real world. Then, you would be asked to align that circle to an object that the software is tracking. Once the calibration is done, it should work just fine from there, given that the camera stays fixed relative to you.

    I'm not sure about the required framerate, however given the fluid sense of motion in video games at 60 Hz, I believe this to be adequate. And should you move your head in a very swift manner, it creates natural motion blur that would in itself render superfluous the need to perfectly track your movement.

  10. Definately good points there lars.. how would the equipment know the angle of sight through the "eyepeice"? (I know its not an eyepiece rather lasers reflecting into your eye but its easier to think of eyepeice). I don't know if you'd ever want to look at a camera delayed image which is what makes mvis so attractive - the overlaying on real vision. By the time you get display technology so good that you would not mind looking at that instead of reality.. that would be way outside of the "mvis timeframe"

    How do fighter pilot huds etc handle it? Another fighter at distance is not exactly a big object.

    I'd think first phase could be large targets like buildings, statues etc where it doesn't have to have a direct pointer to the statue for example but yet give you information on what the statues name is and some way of linking to info... basically taking the functionality/strength of the internet and mapping into 3-d space with hotlinks etc on requiring more info.

    In your flower example it wouldn't have to put the latin name right on the flower itself (though that would be awesome) but rather just give "links" on the flower more towards the outter edge.

    I think a big thing will be how to determine which information is important to the user. As a hiker I would love to look at a mtn in the distance and have the eyewear tell me the mtn name, height, if there were trails to it and.. really neat if it could give you the view from that mtn back to where you are currently standing ;) But imagine if it had info on every plant.. yikes! Fun discussion for a Friday afternoon.

  11. anonymous,
    When using see-through AR Eyewear, how do you figure out where to place the virtual items based on the video stream?

    If you have only one camera and know the real size of an object in a Video Stream you can use the information to calculate the objects position in the coordinate system of the camera. Since you know the coordinate system of the camera and the coordinate system of the AR Eyewear you are only a couple of matrix multiplications away from transforming the objects position in camera space to that of the AR Eyewear. Precision depends on resolution and speed of the camera and your knowledge of the size of the object. Usually special markers are used to help the tracking. Have a look at

    If you have two or more cameras (with known position and orientation) you do not need to know the real size of the object. You can use image processing to identify similar features in each of the cameras video streams and then match them. You then know the direction to the specific feature in each cameras coordinate system. Use triangulation to figure out the coordinate of the real objects position. Further reading here

    The above transforms also combine the errors you get when calculating the size and orientation of objects, not knowing exactly how the camera(s) are aligned in relation to each other and not knowing exactly how they are aligned in relation to the see-through eyewear.

    By aligning a camera exactly with the optics in a see-through display you would at least avoid the transformation errors. If the camera is fast enough to keep up with the way you want to interact with the world then you could augment real world objects on the basis of where they are in the video stream. Without the need to calculate their real world coordinates and orientation of the objects you augment. You can for example keep track of features in a the video stream like in this example of motion estimation. With the video stream you can easily feed a couple of image frames to Polar Rose and have the faces recognized. Or recognize other nice things through services like

    how would the equipment know the angle of sight through the "eyepeice"?
    You could use some eye tracking technique. There exist wearable eye tracking systems. Usually they use a couple of infrared light spots pointing towards the eye and then a camera detects their reflection in the eye in the video stream.

    Eye tracking is probably a nice way to interact in augmented reality environments. As you mention it has been used in fighter pilot huds for a long time.

    By the way it is not the size of objects that matter it is the visual angle. Petrona Towers at the horizon might be no bigger than a person next to you, in arc degrees. And it is the precision in visual angle that is important. You are probably right in that it is easier to keep track of building at a distance since you would know its exact height and lat,long coords. And a building does not move as a person or some small object probably would.


Post a Comment