Site Network: Home |

Hyperlinking the World





The Feature: Hyperlinking the World



While most of us snap silly candids with our cameraphones, computer vision researcher Hartmut Neven is leveraging the ubiquity of digital cameras to google the world.



For computer vision researcher Hartmut Neven, the proliferation of cameraphones is an opportunity to put his life's work into every consumer's pocket. Neven, the head of the Laboratory for Human-Machine Interfaces at the University of Southern California's Information Sciences Institute, has developed image-recognition software optimized for mobile phone microprocessors. His technology, sold through start-up Neven Vision, already powers gimmicky MMS services from Vodafone Japan and NTT DoCoMo that automatically overlay special effects like tears or a halo on cameraphone video images. The Los Angeles Police Department is testing the same underlying facial analysis technology in the form of a digital "mugshot book." Officers on the streets point a camera at a suspect to see if his or her face matches anyone in their rogues gallery.



Neven's eyes are on the future though. His long-term goal is to bring biometrics to the mobile masses and hyperlink the world through a system best described as "a visual Google."



TheFeature: What do you mean by "visual Google"?



Neven: You take a picture of something, send it to our servers, and we either provide you with more information or link you to the place that will. Let's say you're standing in front of the Mona Lisa in the Louvre. You take a snapshot with your cameraphone and instantly receive an audio-visual narrative about the painting. Then you step out of the Louvre and see a cafe. Should you go in? Take a shot from the other side of the street and a restaurant guide will appear on your phone. You sit down inside, but perhaps your French is a little rusty. You take a picture of the menu and a dictionary comes up to translate. There is a huge variety of people in these kinds of situations, from stamp collectors, to people who want to check their skin melanoma, to police officers who need to identify the person in front of them.



TheFeature: But how do you seed such a massive database of objects?



Neven: The key is to start with well-defined segments where the cost and effort of building the database is not that large. A nice rollout example would be a movie guide. If you see a billboard of a movie on a bus, you take a shot of it and then are routed to a relevant site where you can download a trailer or get show times. All we would need are images of a couple hundred billboards. The same is true with the Louvre example, where a collection of images already exists. With our technology, it doesn't take an expert to train the system to recognize an object.



TheFeature: You're planning to roll out the first version of this system in a year or so. What will it look like at the beginning?



Neven: Mobile advertising is a natural place to start this and get the kinks out. You could take a picture of the new BMW and automatically be entered into a sweepstakes. An advertising campaign would create awareness about this technology so people would learn how to use it.



There's a big leap between an automobile ad campaign and a visual database of the world though. Pulling off a visual Google is certainly a huge endeavor. We haven't fully explored scalability. We can comfortably do 100,000 objects, but can we do a million? For example, comparison shopping is an attractive application. A lady sees a handbag she likes but she wants to know the price of that handbag at other stores or see similar handbags that may be available. If you have hundreds of handbags that are only differentiated by small features, the system isn't good enough yet to discriminate.



TheFeature: What are the other technical challenges?



Neven: If you have millions of objects in the database, you can't search through every one of them looking for a match. The search strategies have to become smarter. You have to find effective ways to prune down your search early on so you're only comparing the photo you took to the most relevant sets of objects. Also, you eventually need to think about smart balancing between processing on the phone and the server. As the discrimination needs increase, you must account for finer image details. That would require sending a higher resolution image to the server, which could be expensive and clog the system. I think it would make sense to put more of the intelligence on the handset to do some pre-processing. That way, the handset would only send the necessary image features to the server where the recognition process would be completed.



TheFeature: You're also looking at ways to use your technology for secure mobile commerce.



Neven: Yes, the goal is to use cameraphones to accurately identify humans as well as objects. Right now in my laboratory, we have a working version of a single image multibiometric system. It fuses classic facial feature comparison with iris scanning and skin texture analysis. Your skin texture is like a fingerprint for your face. Our system can create a very high quality biometric signature without expensive sensors or cameras.



TheFeature: Is there really enough demand to justify a biometric system in every phone?



Neven: I think one huge application is enough to justify it. We're working with smart card vendors and credit card companies on a solution that allows for more reliable authentication of users. Take mobile banking, for instance. The system would take your picture and provide access to the banking site or not. It adds a second layer of security (above passwords). Access control is another application. At a locked door, you'd be prompted to authenticate yourself with your Bluetooth-enabled phone.



There's a formidable infrastructure out there of modern multimedia cameraphones. It's perhaps the most popular consumer device ever. Now we can inject machine vision into that infrastructure to enable new services.
It's easy to see the precursors of a persistent augmented reality environment being established with some of these ideas. It seems to me that some of these ideas could maybe be carried out more efficiently through the use of RFID tags embedded in objects like the movie billboard in his example above. Instead of having to have your camera send out for information based on image recognition, you would simply access relevant information content associated with that billboard's unique RFID number. Eventually there will be enough computer power and enough speed and size in database software to be able to enable each of us to have cameras always on, gathering information about our surroundings, and receiving a stream of relevant data about where we are and what we are looking at everywhere we go. A combination of ubiquitous RFID tags as well as hyper-efficient image recognition technology could be the basis for AR as we will know it.



Thanks to Michael.

0 comments:

Post a Comment


This website does not recommend the purchase or sale of any stocks, options, bonds or any investment of any kind. This website does not provide investment advice. Disclaimer and Notices: Disclaimer: This website may contain "forward-looking" information including statements concerning the company's outlook for the future, as well as other statements of beliefs, future plans and strategies or anticipated events, and similar expressions concerning matters that are not historical facts. The forward-looking information and statements are subject to risks and uncertainties that could cause actual results to differ materially from those expressed in, or implied by, the statements. The information on this website includes forward looking statements, including statements regarding projections of future operations, product applications, development and production, future benefits of contractual arrangements, growth in demand, as well as statements containing words like believe, estimate, expect, anticipate, target, plan, will, could, would, and other similar expressions. These statements are not guarantees of future performance. Actual results could differ materially from the results implied or expressed in the forward looking statement. Additional information concerning factors that could cause actual results to differ materially from those in the forward looking statements are included in MVIS most recent Annual Report on Form 10-K filed with the Securities and Exchange Commission under the heading 'Risk factors related to the company's business,' and our other reports filed with the Comission from time to time. Except as expressly required by Federal securities laws, MVIS Blog undertakes no obligation to publicly update or revise any forward looking statements, whether as a result of new information, future events, changes in circumstances, or other reasons. Legal Notice: Although considerable care has been taken in preparing and maintaining the information and material contained on this website, MVIS Blog makes no representation nor gives any warranty as to the currency, completeness, accuracy or correctness of any of the elements contained herein. Facts and information contained in the website are believed to be accurate at the time of posting. However, information may be superseded by subsequent disclosure, and changes may be made at any time without prior notice. MVIS Blog shall not be responsible for, or liable in respect of, any damage, direct or indirect, or of any nature whatsoever, resulting from the use of the information contained herein. While the information contained herein has been obtained from sources believed to be reliable, its accuracy and completeness cannot be guaranteed. MVIS Blog has not independently verified the facts, assumptions, and estimates contained on this website. Accordingly, no representation or warranty, express or implied, is made as to, and no reliance should be placed on the fairness, accuracy, or completeness of the information and opinions contained on this website. Consequently, MVIS Blog assumes no liability for the accompanying information, which is being provided to you solely for evaluation and general information. This website does not contain inside information, proprietary or confidential information learned or disclosed as part of employment relationships or under nondisclosure agreements or otherwise.