Results of Year 1
In the first year of the project, we focused mostly on the following aspects:
- data collection,
- definition of user groups and collection of user requirements,
- development of baseline tools for integration in AXES – V0,
- research in innovative methods for audiovisual content analysis,
- first prototype of AXES – PRO system (‘AXES-lite’)
We collected a first dataset consisting of 30+ hours of video and audio content from the different content providers in the consortium (Deutsche Welle, NISV, BBC). This served to test the different components of the AXES-V0 system.
User groups and user requirements
We analysed the different user (sub)groups and their needs, based on interviews, surveys, and observations using a system mockup. We defined personas for each group, and constructed a detailed list of user requirements.
Different baseline components were integrated as webservices in the AXES V0-system. To this end, we build on the open source WebLab platform. We now have one server running at University of Twente and one at Cassidian premises, with all the data preprocessed offline and search and retrieval performed online. Components that have been included so far include:
- shot cut detection and keyframe extraction
- face detection and tracking
- scene category recognition
- object detection
- automatic speech recognition
- link management and search engine
- search by similarity
- graphical user interface
Audio-visual content analysis
We developed new technology for better audio-visual content analysis pushing the state-of-the-art. With this work we not only focus on better performance, but also scalability and weak levels of supervision. We built better models for object and action recognition, improved face detection, weakly supervised methods for face recognition, new spatial and spatiotemporal descriptors and kernels, speaker and face clustering, an interactive scheme for rapid adaptation of speaker identification models, video copy detection, novel schemes for better retrieval, new work on multimodal event detection, speech recognition, on-the-fly learning of new categories, unsupervised acoustic model training using time-synchronised audio/text resources, and many more.
First prototype of AXES-PRO
We built ‘AXES-lite’, a first prototype of what will eventually be the AXES-PRO system. AXES-lite allows keyword-based search as well as similarity search (search by example).
The keyword-based search is not limited to searching the metadata or ASR output, but also allows searching the visual content. If there is a predefined concept matching the keyword, the result is directly retrieved from the database. If not, a new classifier is learnt on-the-fly in a matter of seconds, starting from a set of images retrieved from a web image search engine like Google Images or Flickr.
Screenshot of the AXES-lite system