Intel® Chip Chat is a series of interviews with technology experts to bring listeners closer to the innovations and inspirations of the future of computing. Viscovery was interviewed at the Intel Developer Forum 2016 at San Francisco. During the interview, Viscovery’s chairman, Don Hsi, talked about the challenges of searching and classifying videos as well as the solutions and products we offer. The following is the transcript of the interview.
Intel Chip Chat
Viscovery and Intel Advance Visual Search with Deep Learning – Intel Chip Chat Episode 505
Release date: 20 November 2016
Welcome Chip Chat an interview series that connects you with technology experts around the issues that industry is focused on today. And now your host Allyson Klein.
Allyson Klein: Welcome to Chip Chat I’m Allyson Klein. We are coming to you live from Intel Developer Forum in San Francisco and I am joined by Don Hsi, Chairman of Viscovery Group. Welcome Don.
Don Hsi: Thank you, I’m glad to be here today.
Allyson Klein: So Don why don’t you just describe Viscovery and you are the Chairman of the company, so why you founded the company.
Don Hsi: Viscovery is a deep learning, machine learning technology company. The name Viscovery comes from Video Discovery.
Allyson Klein: Yeah, sure.
Don Hsi: So the focus has been on recognizing the video content. Make them structured and once they are structured then we can do search and we can do indexing.
Allyson Klein: So there has been a tremendous amount of growth in video creation online and so video seems to be just ripe for analysis. What has been the limiter to being able to drive that analysis historically?
Don Hsi: Viscovery has been developing the video recognition technology and previously focusing on image and photo recognition. As we all know video is 30 frames of photos, and the problem of the video content has been the unstructured nature of it. If you want to find a particular footage for example, nowadays if you don’t have previously tagged them manually, right, it’s literally impossible to go through them. So sometimes when we forward a video to our friends on Facebook, on Twitter, whatnot we forward the whole link, and then we say go skip to 20 minutes and 30 seconds, then you will see what I want you to see. So it’s very inefficient. And the reason for that is because it’s not structured. You cannot really search a video content. Even today there are certain video search engine but they are really just a manually tagged result where the limit of it is based upon how many tags you put in. There is no programmatic, deep learning, ultramatic process for that. So based upon that we started focus on video recognition technology.
Allyson Klein: Now I read a little bit about your solution, and you do look at video from different content classes. Can you talk a little bit about that.
Don Hsi: There are many recognition technology out there, for example if you go talk about security surveillance, we can look at the video and then try to detect whether there is a intruder or a car parked illegally for example. And those are just taking a portion of the video content and focus on solving the character or the behavior. Whereas Viscovery is taking that approach to the next level we call it. Whereas we take a look at 7 major content classes and those are; face, image, text, audio, motion, object and scene. And again, if we can break it down quickly, for example, motion, when we look at a video, if I want to find Kobe Bryant play in staple stadium Los Angeles and all the slam dunk. Today, it’s not possible. You can not search video based upon that kind of query.
Allyson Klein: Right
Don Hsi: Because if you want to do that you need to know what slam dunk is.
Allyson Klein: Ahuh
Don Hsi: So and slam dunk is a motion, where in that it’s an area we are scientist solving that problem. How do we detect slam dunk motion, or three pointer? In short, it’s a very complicated technology that’s why we’ve been at it for many many years. And this year we started to put this all together and started deploying it to our clients.
Allyson Klein: Now tell me about the deep learning approach here, you’re training based on a data set to recognize the ‘slam dunk’ and talk a little bit about that.
Don Hsi: Alright, the fact that we recognize 7 major content classes, so it’s hard to just describe our technology in a very short sentence. But in general when we talk about deep learning and machine learning, we talk about CNN or DNN (deep neural network) or RCNN, but in addition to that, we also adapt the, we call, feature extraction.
Allyson Klein: Ahuh
Don Hsi: So a combination of this variety of technology then we are able to derive the motion from the particular video footage. And based upon, in some cases we based upon the voice. For example, the broadcaster will say “slam dunk”, right? So we recognize ‘slam dunk’. So in combination of that we are able to put semantics together and then turn the result to the customers.
Allyson Klein: Now I know that some customers have actually deployed your technology and have it running in different solutions in the market today. Where are we in terms of those types of solutions compared to the ultimate vision of video all being searchable, on the fly, whenever we want to.
Don Hsi: That’s a very admirable goal which we actually will achieve it eventually. But as we all know, for example, Google search engine or any other search engine per say, is you need to have a huge amount of meta data. So today we are working with our clients and many of them are online video platform and TV stations, so we are processing large amount of videos on data bases. And after each video is processed, we generate the meta data. So over time we will aggregate and we will collect and we will accumulate all this huge amount of meta data and they will become the search engine. So today when we work with a client for example, TV station, in away we are putting together a search engine for this particular client so they can search internally. But in addition to searching because the meta data can be applied not just for searching but also for Ad delivery. And as a matter of fact, that has been our focus today, is to help the platform or advertisers to achieve much more effective Ad delivery results.
Allyson Klein: Sure, that makes a lot of sense. It’s been a pleasure talking to you today. Final question for you, if folk are listening online and want to learn more about your company, where should they go for more information?
Don Hsi: They can just go to our website, which is: www.viscovery.co and we got many information available online and of course we also have some of the most recent result in terms of the effectiveness of the Ad technology. In short, I think it’s very important to share with the audience is that in average we can increase the Ad efficiency by 30 percent or more.
Allyson Klein: Wow
Don Hsi: And that is very significant.
Allyson Klein: Fantastic. Well thank you so much Don for being on the program today. I can’t wait to learn more about the solutions you are bring to market and I’m sure the listeners on Chip Chat wants to find out as well.
Don Hsi: Thank you very much. It’s my pleasure.
The link of the interview at Intel® Chip Chat：https://m.soundcloud.com/intelchipchat/visual-search-deep-learning?from=singlemessage&isappinstalled=0