The Department of Homeland Security (DHS) collects and acquires a large amount of video from surveillance systems (e.g. body-worn cameras, vehicle-mounted cameras, and fixed surveillance systems) along with videos captured by both citizens and potential offenders (e.g. cell phone video and home surveillance systems). The goal of this project is to develop a set of tools to be able to exploit this video data and extract useful information from it. Initial discussions with DHS generated several project concepts, any one of which represents an entry point for this planned research program.
In many cases, several videos of an “event” are obtained from multiple cameras; each camera may see parts of the scene that others do not. We would like to be able to reconstruct, summarize and extract information about the event using the richest description of the scene that can be generated, using all of the available video data. For example, an end user might want to view the event from a viewpoint different from that of any of the real cameras. An investigator might also want to extract analytic measures from the video such as the number of people in the scene or the types of objects visible. One may also want to be able to summarize the video content or reject a particular video source as not being relevant to the task at hand.
We propose to develop techniques for video synthesis and video content analysis to support the use cases summarized above.
- One goal of this project is to create a new “view” of a scene that was not explicitly captured by a camera or set of cameras. We will do this by capturing video data of a common field of interest, from moving personal video devices (body-cams) and fixed surveillance cameras, and produce one or more synthetic scenes (output videos), each of which is “more than the sum of the parts”. The output videos are renderings of the video data captured by all of the cameras, as seen from the specified synthetic viewpoint. This would allow investigators to look at the scene from a particular point of view. Missing data will be indicated, as will areas of low detail arising from low input camera resolution.
- We will also develop methods, using tools from computer vision and machine learning (in particular deep learning), to annotate the synthesized videos to indicate persons, vehicles, events and other possible objects of interest. These methods can also be used to quickly summarize events in the video.
- An interactive dashboard will be created for both end users and developers. The visual analytic interface will allow interactive viewpoint selection and camera source visualization to show which cameras contributed to which portions of the reconstructed video. Highlighting and searching for vehicles, events, and objects, will be provided, as well as a visualization of the video summary. The dashboard will also allow selective visualization of a highlighting of missing and low-detail video. A developer/analyst dashboard will also be developed for visualizing performance of key portions of the video analytic workflow and algorithms.
