Site Network: Home |

TR10: Augmented Reality

Markus Kähäri wants to superimpose digital information on the real world.

By Erika Jonietz

Finding your way around a new city can be exasperating: juggling maps and guidebooks, trying to figure out where you are on roads with no street signs, talking with locals who give directions by referring to unfamiliar landmarks. If you're driving, a car with a GPS navigation system can make things easier, but it still won't help you decide, say, which restaurant suits both your palate and your budget. Engineers at the Nokia Research Center in Helsinki, Finland­, hope that a project called Mobile Augmented Reality Applications will help you get where you're going--and decide what to do once you're there.

Last October, a team led by Markus Kähäri unveiled a proto­type of the system at the International Symposium on Mixed and Augmented Reality. The team added a GPS sensor, a compass, and accelerometers to a Nokia smart phone. Using data from these sensors, the phone can calculate the location of just about any object its camera is aimed at. Each time the phone changes location, it retrieves the names and geographical coördinates of nearby landmarks from an external database. The user can then download additional information about a chosen location from the Web--say, the names of businesses in the Empire State Building, the cost of visiting the building's observatories, or hours and menus for its five eateries.

The Nokia project builds on more than a decade of academic research into mobile augmented reality. Steven Feiner, the director of Columbia University's Computer Graphics and User Interfaces Laboratory, undertook some of the earliest research in the field and finds the Nokia project heartening. "The big missing link when I started was a small computer," he says. "Those small computers are now cell phones."

Despite the availability and fairly low cost of the sensors the Nokia team used, some engineers believe that they introduce too much complexity for a commercial application. "In my opinion, this is very exotic hardware to provide," says Valentin Lefevre, chief technology officer and cofounder of Total Immersion­, an augmented-reality company in Suresnes, France. "That's why we think picture analysis is the solution." Relying on software alone, Total Immersion's­ system begins with a single still image of whatever object the camera is aimed at, plus a rough digital model of that object; image-­recognition algorithms then determine what data should be super­imposed on the image. The company is already marketing a mobile version of its system to cell-phone operators in Asia and Europe and expects the system's first applications to be in gaming and advertising.

Nokia researchers have begun working on real-time image-recognition algorithms as well; they hope the algorithms will eliminate the need for location sensors and improve their system's accuracy and reliability. "Methods that don't rely on those components can be more robust," says Kari Pulli, a research fellow at the Nokia Research Center in Palo Alto, CA.

All parties agree, though, that mobile augmented reality is nearly ready for the market. "For mobile-phone applications, the technology is here," says Feiner. One challenge is convincing carriers such as Sprint or Verizon that customers would pay for augmented-reality services. "If some big operator in the U.S. would launch this, it could fly today," Pulli says.
Microvision's Color Eyewear platform is uniquely suited to meet the requirements for a consumer augmented reality solution based on our expected combination of see-through capability, brightness and readability in any ambient lighting condition, and lightweight, fashionable form-factor. You just can't do this with any other display. Holding up a phone screen to do AR is just not going to be a compelling user experience.

On another note, I've done transcripts of some form or another for every conference call since April 2004. I may be tooooo busy to get this one done (and it's a long one!). Thanks.


At March 14, 2007 at 2:24 PM sturocks said...

"On another note, I've done transcripts of some form or another for every conference call since April 2004. I may be tooooo busy to get this one done (and it's a long one!). "

A top exec doesn't warrant an Administrative Assistant that can perform in his absence? Poor excuse. LOL You're the best. Thanks

At March 14, 2007 at 8:00 PM Anonymous said...

A top exec doesn't warrant an Administrative Assistant that can perform in his absence?
That was exactly the kind of waste and abuse of shareholder value we saw under RR and Co. I would rather see some revenue than read Ben's comments pertaining to the CC. More selling and less talk.

At March 15, 2007 at 6:32 AM sturocks said...


I believe Ben understood I was using "Sarcastic Humor". Forgot that someone else might interpret it differently. My apologies. I agree with your sentiment. Was thanking Ben for his past and present efforts. He is "The Best"

At March 15, 2007 at 9:32 AM Chris said...

Wow, this is exactly the "killer app" needed to make color eyewear exciting to the masses.

At March 16, 2007 at 1:52 AM Lars said...

Have you found a solution for efficient tracking of the Color Eyewear?

Ronald Azuma (prominent Augmented Reality (AR) researcher) concluded the following in a paper from 1993:

"First, a tracker must be accurate to a small fraction of a degree in orientation and a few millimeters (mm) in position. Errors in measured head orientation usually cause larger registration offsets than object orientation errors do, making this requirement more critical for systems based on Head-Mounted Displays (HMDs). Try the following simple demonstration. Take out a dime and hold it at arm's length. The diameter of the dime covers approximately 1.5 degrees of arc. In comparison, a full moon covers 1/2 degree of arc. Now imagine a virtual coffee cup sitting on the corner of a real table two meters away from you. An angular error of 1.5 degrees in head orientation moves the cup by about 52 mm. Clearly, small orientation errors could result in a cup suspended in midair or interpenetrating the table. Similarly, if we want the cup to stay within 1 to 2 mm of its true position, then we cannot tolerate tracker positional errors of more than 1 to 2 mm."

Nokias solution to AR uses not only the gps, compass and accelerometers but also a camera. It's much easier to augment a captured image with information in the display of mobile than using only the compass and accelerometers.

But problems are to be solved! Right?

MVIS has the best currently available display technology for AR. You just need to figure out how to do the tracking! Can the MVIS technology be used as an image capture device? Yes it can!

Why not capture both the movement of the outside world as well as the movement of your eyes using the MVIS image scanning technology.

Combine that with a pair of nicely designed Color Eyewear and you MVIS shareholders will have your fortune made. :-)

Disclaimer: Well it is probably not that easy... But using image capturing technology is necessary. How would you else be able to put a AR generated name tag to all the faces that you do not recognize.

At March 16, 2007 at 3:15 AM Anonymous said...


Clearly, if you want to superimpose imagery onto specific planes in the world, you'll need a camera. A combination of GPS (which are accurate on the order of a few meters), compass, and accelerometer could give you the initial 'position' from where image recognition would be used to specify and track items.

T-immersion has been doing completely software based tracking for a while, check out this video:

At March 16, 2007 at 9:43 AM Chris said...

Well... I don't think a "cup of coffee" is a good example - itd be for larger things like buildings. You'd only need to know where the eyewear was pointed - not the eyes themselves as the eyewear is projecting from a fixed point.

With all the advances in mapping/gps (just look at microsoft's live maps) couple that with image recognition we are quickly converging on making this possible. Affordable mvis eyewear is still a ways away so its looking like all these technologies could come available around the same timeframe.

At March 16, 2007 at 10:30 AM Lars said...

chris, I agree with you. It depends on the application!

anonymous, thanks for the link. That is a great illustration of what you can do with camera based AR. (the video shows technology that merges a video stream with animated 3d models and movies). In Nokias and Total-Immersions case the real objects you see are first captured by a camera and then shown on a display. In the real world beyond YouTube, you want to look directly at the real objects and not at them in a display...

I assume the MVIS Color Eyewear will be using similar technique as the Nomad. That means that you look through a tilted semi-transparent mirror directly at the real world and not at a display where a camera has captured images of the real world. If you want to use camera based tracking for optical see-through AR you would have to calculate the real world position of objects in the video stream and then use their coordinates to calculate where the graphics augmenting them should be drawn on the semi-transparent mirror. Then you loose the benefit of camera based tracking where there is no offset between where the camera sees the objects and where the augmented graphics should be. I was thinking, is it possible to keep a camera pointed such that its captured video stream is perfectly in match with the direction your eyes look through an optical see-through display?

Assuming this is possible. Anything’s possible right? Then there would still be a delay between the captured video frames from the camera and the direct view of the real world. So the question is how fast would the video camera have to be to keep up with your head movement and be able to correctly position the augmented stuff in your view of the real world? It all depends on how fast you can move your head. Or how fast you *need* to move your head. How fast can you move your head? Say when moving your gaze from facing in the direction of your left shoulder to the direction of your right and still be able to focus on a couple of objects on the way? I suggest that you could do that in a second. Moving your head like that in one second means that you cover an arc distance of 180 degrees. For example to be able to position objects with no more than one degree offset you would therefore need a 180hz camera. Is that speed is enough? Probably much less i enough and it also depends on what other tracking equipment such as compasses, accelerometers, etc that you can combine with the camera based tracking. It also depends on how the display will be used. Will the users take a slow walk in the country side looking at flowers (augmented with their latin names) or will they dodge virtual bullets in an augmented reality game?

At March 16, 2007 at 10:34 AM Ben said...

awesome discussion!

At March 16, 2007 at 2:19 PM Anonymous said...

lars, you bring up a good point. When using see-through AR Eyewear, how do you figure out where to place the virtual items based on the video stream? I suppose one way to do this would be to 'calibrate' the equipment to match your field of vision.

This could work in an automated manner and be initiated by the system upon first use. Based on an estimation of the embedded camera's disposition to your eye-level view, a virtual circle would be superimposed on top of the real world. Then, you would be asked to align that circle to an object that the software is tracking. Once the calibration is done, it should work just fine from there, given that the camera stays fixed relative to you.

I'm not sure about the required framerate, however given the fluid sense of motion in video games at 60 Hz, I believe this to be adequate. And should you move your head in a very swift manner, it creates natural motion blur that would in itself render superfluous the need to perfectly track your movement.

At March 16, 2007 at 2:48 PM chris said...

Definately good points there lars.. how would the equipment know the angle of sight through the "eyepeice"? (I know its not an eyepiece rather lasers reflecting into your eye but its easier to think of eyepeice). I don't know if you'd ever want to look at a camera delayed image which is what makes mvis so attractive - the overlaying on real vision. By the time you get display technology so good that you would not mind looking at that instead of reality.. that would be way outside of the "mvis timeframe"

How do fighter pilot huds etc handle it? Another fighter at distance is not exactly a big object.

I'd think first phase could be large targets like buildings, statues etc where it doesn't have to have a direct pointer to the statue for example but yet give you information on what the statues name is and some way of linking to info... basically taking the functionality/strength of the internet and mapping into 3-d space with hotlinks etc on requiring more info.

In your flower example it wouldn't have to put the latin name right on the flower itself (though that would be awesome) but rather just give "links" on the flower more towards the outter edge.

I think a big thing will be how to determine which information is important to the user. As a hiker I would love to look at a mtn in the distance and have the eyewear tell me the mtn name, height, if there were trails to it and.. really neat if it could give you the view from that mtn back to where you are currently standing ;) But imagine if it had info on every plant.. yikes! Fun discussion for a Friday afternoon.

At March 19, 2007 at 7:24 AM Lars said...

When using see-through AR Eyewear, how do you figure out where to place the virtual items based on the video stream?

If you have only one camera and know the real size of an object in a Video Stream you can use the information to calculate the objects position in the coordinate system of the camera. Since you know the coordinate system of the camera and the coordinate system of the AR Eyewear you are only a couple of matrix multiplications away from transforming the objects position in camera space to that of the AR Eyewear. Precision depends on resolution and speed of the camera and your knowledge of the size of the object. Usually special markers are used to help the tracking. Have a look at

If you have two or more cameras (with known position and orientation) you do not need to know the real size of the object. You can use image processing to identify similar features in each of the cameras video streams and then match them. You then know the direction to the specific feature in each cameras coordinate system. Use triangulation to figure out the coordinate of the real objects position. Further reading here

The above transforms also combine the errors you get when calculating the size and orientation of objects, not knowing exactly how the camera(s) are aligned in relation to each other and not knowing exactly how they are aligned in relation to the see-through eyewear.

By aligning a camera exactly with the optics in a see-through display you would at least avoid the transformation errors. If the camera is fast enough to keep up with the way you want to interact with the world then you could augment real world objects on the basis of where they are in the video stream. Without the need to calculate their real world coordinates and orientation of the objects you augment. You can for example keep track of features in a the video stream like in this example of motion estimation. With the video stream you can easily feed a couple of image frames to Polar Rose and have the faces recognized. Or recognize other nice things through services like

how would the equipment know the angle of sight through the "eyepeice"?
You could use some eye tracking technique. There exist wearable eye tracking systems. Usually they use a couple of infrared light spots pointing towards the eye and then a camera detects their reflection in the eye in the video stream.

Eye tracking is probably a nice way to interact in augmented reality environments. As you mention it has been used in fighter pilot huds for a long time.

By the way it is not the size of objects that matter it is the visual angle. Petrona Towers at the horizon might be no bigger than a person next to you, in arc degrees. And it is the precision in visual angle that is important. You are probably right in that it is easier to keep track of building at a distance since you would know its exact height and lat,long coords. And a building does not move as a person or some small object probably would.


Post a Comment

This website does not recommend the purchase or sale of any stocks, options, bonds or any investment of any kind. This website does not provide investment advice. Disclaimer and Notices: Disclaimer: This website may contain "forward-looking" information including statements concerning the company's outlook for the future, as well as other statements of beliefs, future plans and strategies or anticipated events, and similar expressions concerning matters that are not historical facts. The forward-looking information and statements are subject to risks and uncertainties that could cause actual results to differ materially from those expressed in, or implied by, the statements. The information on this website includes forward looking statements, including statements regarding projections of future operations, product applications, development and production, future benefits of contractual arrangements, growth in demand, as well as statements containing words like believe, estimate, expect, anticipate, target, plan, will, could, would, and other similar expressions. These statements are not guarantees of future performance. Actual results could differ materially from the results implied or expressed in the forward looking statement. Additional information concerning factors that could cause actual results to differ materially from those in the forward looking statements are included in MVIS most recent Annual Report on Form 10-K filed with the Securities and Exchange Commission under the heading 'Risk factors related to the company's business,' and our other reports filed with the Comission from time to time. Except as expressly required by Federal securities laws, MVIS Blog undertakes no obligation to publicly update or revise any forward looking statements, whether as a result of new information, future events, changes in circumstances, or other reasons. Legal Notice: Although considerable care has been taken in preparing and maintaining the information and material contained on this website, MVIS Blog makes no representation nor gives any warranty as to the currency, completeness, accuracy or correctness of any of the elements contained herein. Facts and information contained in the website are believed to be accurate at the time of posting. However, information may be superseded by subsequent disclosure, and changes may be made at any time without prior notice. MVIS Blog shall not be responsible for, or liable in respect of, any damage, direct or indirect, or of any nature whatsoever, resulting from the use of the information contained herein. While the information contained herein has been obtained from sources believed to be reliable, its accuracy and completeness cannot be guaranteed. MVIS Blog has not independently verified the facts, assumptions, and estimates contained on this website. Accordingly, no representation or warranty, express or implied, is made as to, and no reliance should be placed on the fairness, accuracy, or completeness of the information and opinions contained on this website. Consequently, MVIS Blog assumes no liability for the accompanying information, which is being provided to you solely for evaluation and general information. This website does not contain inside information, proprietary or confidential information learned or disclosed as part of employment relationships or under nondisclosure agreements or otherwise.