Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004.
This paper presents a framework of a system for the query and retrieval of video data based on video events in huge video repositories. The events are formulated using domain-independent event primitivies which are represented by spatio-temporal relationships between objects in the video scenes. Complex events are expressible as combinations of simpler events. This facilitates support of event queries from a variety of points of view. In addition, the framework is expected to provide adaptability of the framework to multiple domains.
This paper presents a framework of a system for the query and retrieval of video data based on video events in huge video repositories. The events are formulated using domain-independent event primitivies which are represented by spatio-temporal relationships between objects in the video scenes. Complex events are expressible as combinations of simpler events. This facilitates support of event queries from a variety of points of view. In addition, the framework is expected to provide adaptability of the framework to multiple domains.
Knowledge-Based Systems, 2006
This paper aims to show that by using low level feature extraction, motion and object identifying and tracking methods, features can be extracted and indexed for eYcient and eVective retrieval for video; such as an awards ceremony video. Video scene/shot analysis and key frame extraction are used as a foundation to identify objects in video and be able to Wnd spatial relationships within the video. The compounding of low level features such as colour, texture and abstract object identiWcation lead into higher level real object identiWcation and tracking and scene detection. The main focus is on using a video style that is diVerent to the heavily used sports and news genres. Using diVerent video styles can open the door to creating methods that could encompass all video types instead of specialized methods for each speciWc style of video.
2009
Abstract Motion information is regarded as one of the most important cues for developing semantics in video data. Yet it is extremely challenging to build semantics of video clips particularly when it involves interactive motion of multiple objects. Most of the existing research has focused on capturing and modelling the motion of each object individually thus loosing interaction information. Such approaches yield low precision-recall ratios and limited indexing and retrieval performances.
2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011
We present a distributed framework of understanding, indexing, and searching complex events from large amounts of surveillance video content. Video events and relationships between scene entities are represented by Spatio-Temporal And-Or Graphs (ST-AOG) and inferred in a distributed computing system using a bottom-up top-down strategy. We propose a method for sub-graph indexing of ST-AOGs of the recognized events for robust retrieval and quick search. Plain text reports of the scene are automatically generated to describe scene entities' relationships, contextual information, as well as events of interest. When a query is provided as keywords, plain text, voice, or a video clip, the query is parsed and the closest events are extracted utilizing text description and sub-graph matching.
Lecture Notes in Computer Science, 2000
Modeling video data poses a great challenge since they do not have as clear an underlying structure as traditional databases do. We propose a graphical object-based model, called VideoGraph, in this paper. This scheme has the following advantages: (1) In addition to semantics of video individual events, we capture their temporal relationships as well. (2) The inter-event relationships allow us to deduce implicit video information. (3) Uncertainty can also be handled by associating the video event with a temporal Boolean-like expression. This also allows us to exploit incomplete information. The above features make VideoGraph very exible in representing various metadata types extracted from diverse information sources. To facilitate video retrieval, we a l s o i n troduce a formalism for the query language based on path expressions. Query processing involves only simple traversal of the video graphs.
IEEE Transactions on Multimedia, 2003
In the past few years, modeling and querying video databases have been a subject of extensive research to develop tools for effective search of videos. In this paper, we present a hierarchal approach to model videos at three levels, object level ( ), frame level ( ), and shot level ( ). The model captures the visual features of individual objects at , visual-spatio-temporal (VST) relationships between objects at , and time-varying visual features and time-varying VST relationships at . We call the combination of the time-varying visual features and the time-varying VST relationships a Content trajectory which is used to represent and index a shot. A novel query interface that allows users to describe the time-varying contents of complex video shots such as those of skiers, soccer players, etc., by sketch and feature specification is presented. Our experimental results prove the effectiveness of modeling and querying shots using the content trajectory approach.
International Journal of Computer Theory and Engineering, 2014
With the continuous recording of video data in current surveillance systems, it is almost impossible to quickly identify frames where events of interest did occur within a camera scene. This paper presents the concept of Spatio-temporal Indexing, which is a novel video indexing and retrieval technique that can be used for event detection. Spatio-temporal Indexing allows the users to rapidly retrieve the video clips that contain events of interest from a given video library. The proposed indexing technique analyzes the video and stores the Spatio-temporal Indexes that will be further processed to retrieve video clips queried by users. The proposed technique was tested on hours of video recordings. The results obtained provide 99.9% accuracy for event detection and retrieval. The average processing time is 3 seconds to create index for 10 minutes of videousing Intel i5-2400 processor.
Multimedia Tools and Applications, 2016
Cognitive video supervision and event analysis in video sequences is a critical task in many multimedia applications. Methods, tools, and algorithms that aim to detect and recognize highlevel concepts and their respective spatiotemporal and causal relations in order to identify semantic video activities, actions, and procedures have been in the focus of the research community over the last years. This research area has strong impact on many real-life applications such as service quality assurance, compliance to the designed procedures in industrial plants, surveillance of people-dense areas (e.g., thematic parks, critical public infrastructures), crisis management in public service areas (e.g., train stations, airports), security (detection of abnormal behaviors in surveillance videos), semantic characterization, and annotation of video streams in various domains (e.g., broadcast or user-generated videos). For instance, the dynamic capture of situational awareness concerning crowds in specific mass gathering venues and its intelligent enablement into emergency management information
large, even open-ended, video streams. Video data present a unique challenge for the information retrieval community because properly representing video events is challenging. We propose a novel approach to analyze temporal aspects of video data. We consider video data as a sequence of images that forms a 3-dimensional spatiotemporal structure, and perform multiview orthographic projection to transform the video data into 2-dimensional representations. The projected views allow a unique way to represent video events and capture the temporal aspect of video data. We extract local salient points from 2D projection views and perform detection-via-similarity approach on a wide range of events against real-world surveillance data. We demonstrate that our example-based detection framework is competitive and robust. We also investigate synthetic example driven retrieval as a basis for query-by-example.
IEEE Transactions on Circuits and Systems for Video Technology, 1998
The rapidity with which digital information, particularly video, is being generated has necessitated the development of tools for efficient search of these media. Content-based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system on the Web, based on the visual paradigm, with spatiotemporal attributes playing a key role in video retrieval. We have developed innovative algorithms for automated video object segmentation and tracking, and use real-time video editing techniques while responding to user queries. The resulting system, called VideoQ (demo available at ), is the first on-line video search engine supporting automatic objectbased indexing and spatiotemporal queries. The system performs well, with the user being able to retrieve complex video clips such as those of skiers and baseball players with ease.
2010
This paper is proposing an approach for retrieving videos based on object trajectories. First, a trajectory is translated into a sequence of symbols based on a symbolic representation, beyond the initial numeric representation, which does not suffer from scaling, translation or rotation. Then, in order to compare trajectories based on their symbolic representations, two similarity measures are proposed, inspired by works in bioinformatic. Moreover, based on these similarity measures, two relevance feedback strategies are given. Experimental results for two databases show that the proposed similarity measures gave results as good as other existing measures. Real advantages of these measures are the possibility for the partial matching and for relevance feedback.
Lecture Notes in Computer Science, 2008
In this paper, we propose a novel query language for video indexing and retrieval that (1) enables to make queries both at the image level and at the semantic level (2) enables the users to define their own scenarios based on semantic events and (3) retrieves videos with both exact matching and similarity matching. For a query language, four main issues must be addressed: data modeling, query formulation, query parsing and query matching. In this paper we focus and give contributions on data modeling, query formulation and query matching. We are currently using color histograms and SIFT features at the image level and 10 types of events at the semantic level. We have tested the proposed query language for the retrieval of surveillance videos of a metro station. In our experiments the database contains more than 200 indexed physical objects and 48 semantic events. The results using different types of queries are promising.
IEICE Transactions on Information and Systems
Recently, two approaches investigated indexing and retrieving videos. One approach utilized the visual features of individual objects, and the other approach exploited the spatio-temporal relationships between multiple objects. In this paper, we integrate both approaches into a new video model, called the Visual-Spatio-Temporal (VST) model to represent videos. The visual features are modeled in a topological approach and integrated with the spatio-temporal relationships. As a result, we defined rich sets of VST relationships which support and simplify the formulation of more semantical queries. An intuitive query interface which allows users to describe VST features of video objects by sketch and feature specification is presented. The conducted experiments prove the effectiveness of modeling and querying videos by the visual features of individual objects and the VST relationships between multiple objects.
2019 First International Conference on Graph Computing (GC), 2019
Complex Event Processing (CEP) is a paradigm to detect event patterns over streaming data in a timely manner. Presently, CEP systems have inherent limitations to detect event patterns over video streams due to their data complexity and lack of structured data model. Modelling complex events in unstructured data like videos not only requires detecting objects but also the spatiotemporal relationships among objects. This work introduces a novel video representation technique where an input video stream is converted to a stream of graphs. We propose the Video Event Knowledge Graph (VEKG), a knowledge graph driven representation of video data. VEKG models video objects as nodes and their relationship interaction as edges over time and space. It creates a semantic knowledge representation of video data derived from the detection of high-level semantic concepts from the video using an ensemble of deep learning models. To optimize the run-time system performance, we introduce a graph aggregation method VEKG-TAG, which provides an aggregated view of VEKG for a given time length. We defined a set of operators using event rules which can be used as a query and applied over VEKG graphs to discover complex video patterns. The system achieves an F-Score accuracy ranging between 0.75 to 0.86 for different patterns when queried over VEKG. In given experiments, pattern search time over VEKG-TAG was 2.3X faster as compared to the baseline.
IEEE Transactions on Image Processing, 2000
With the rapid proliferation of multimedia applications that require video data management, it is becoming more desirable to provide proper video data indexing techniques capable of representing the rich semantics in video data. In real-time applications, the need for efficient query processing is another reason for the use of such techniques. We present models that use the object motion information in order to characterize the events to allow subsequent retrieval. Algorithms for different spatiotemporal search cases in terms of spatial and temporal translation and scale invariance have been developed using various signal and image processing techniques. We have developed a prototype video search engine, PICTURESQUE (pictorial information and content transformation unified retrieval engine for spatiotemporal queries) to verify the proposed methods. Development of such technology will enable true multimedia search engines that will enable indexing and searching of the digital video data based on its true content.
Journal of Multimedia, 2009
Multimedia database modeling and representation play an important role for efficient storage and retrieval of multimedia. Modeling of semantic video content that enables spatiotemporal queries is one of the challenging tasks. A video is called as "quantizable" if the instants of a video are enough for a person to imagine the missing scenes properly. A semantic query for quantizable videos can be defined in a more flexible way using spatio-temporal instants. In this paper, we provide a semantic modeling and retrieval system, termed as G-SMART. Firstly, the videos are quantized according to semantic events. Then semantic instants and events of the video that include objects, events, and locations are provided as a grammar-based string. This linear string representation enables both the spatial and temporal retrieval of the video using Structured Query Language (SQL). The redundancy in this linear representation is reduced by using data reduction properties such as removal of implied information. Various types of queries such as event-object-location, event-location, object-location, eventobject, current-next event, projection and semantic event are supported by G-SMART. A graphical user interface is designed to build queries and view the query results. G-SMART enables multimodal presentation by displaying the query results in the form images and videos. We show our results on a tennis video database.
Proceedings of IEEE International Conference on Multimedia Computing and Systems
Modeling moving objects has become a topic of increasing interest in the area of video databases. Two k ey aspects of such modeling are spatial and temporal relationships. In this paper we i n troduce an innovative w ay to represent the trajectory of a single moving object and the relative spatio-temporal relations between multiple moving objects. The representation supports a rich set of spatial topological and directional relations. It also supports both quantitative and qualitative user queries about moving objects. Algorithms for matching trajectories and spatio-temporal relations of moving objects are designed to facilitate query processing. These algorithms can handle both exact and similarity matches. We also discuss the integration of our moving object model, based on a video model, in an object-oriented system. Some query examples are provided to further validate the expressiveness of our model.
International Journal of Pattern Recognition and Artificial Intelligence, 2009
In this paper, we propose an approach for surveillance video indexing and retrieval. The objective of this approach is to answer five main challenges we have met in this domain: (1) the lack of means for finding data from the indexed databases, (2) the lack of approaches working at different abstraction levels, (3) imprecise indexing, (4) incomplete indexing, (5) the lack of user-centered search. We propose a new data model containing two main types of extracted video contents: physical objects and events. Based on this data model we present a new rich and flexible query language. This language works at different abstraction levels, provides both exact and approximate matching and takes into account users' interest. In order to work with the imprecise indexing, two new methods respectively for object representation and object matching are proposed. Videos from two projects which have been partially indexed are used to validate the proposed approach. We have analyzed both query language usage and retrieval results. The obtained retrieval results are analyzed by the average normalized ranks are promising. The retrieval results at the object level are compared with another state of the art approach.
Pattern Recognition Letters, 2009
With the existence of "semantic gap" between the machine-readable low level features (e.g. visual features in terms of colors and textures) and high level human concepts, it is inherently hard for the machine to automatically identify and retrieve events from videos according to their semantics by merely reading pixels and frames. This paper proposes a human-centered framework for mining and retrieving events and applies it to indoor surveillance video databases. The goal is to locate video sequences containing events of interest to the user of the surveillance video database. This framework starts by tracking objects. Since surveillance videos cannot be easily segmented, the Common Appearance Intervals (CAIs) are used to segment videos, which have the flavor of shots in movies. The video segmentation provides an efficient indexing schema for the retrieval. The trajectories obtained are thus spatiotemporal in nature, based on which features are extracted for the construction of event models. In the retrieval phase, the database user interacts with the machine and provides "feedbacks" to the retrieval results. The proposed learning algorithm learns from the spatiotemporal data, the event model as well as the "feedbacks" and returns the refined results to the user. Specifically, the learning algorithm is a Coupled Hidden Markov Model (CHMM), which models the interactions of objects in CAIs and recognizes hidden patterns among them. This iterative learning and retrieval process contributes to the bridging of the "semantic gap", and the experimental results show the effectiveness of the proposed framework by demonstrating the increase of retrieval accuracy through iterations and comparing with other methods.
2019 IEEE International Conference on Big Data (Big Data), 2019
Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Current CEP systems have inherent limitations to query video streams due to their unstructured data model and lack of expressive query language. In this work, we focus on a CEP framework where users can define high-level expressive queries over videos to detect a range of spatiotemporal event patterns. In this context, we propose-i) VidCEP, an in-memory, on the fly, near real-time complex event matching framework for video streams. The system uses a graphbased event representation for video streams which enables the detection of high-level semantic concepts from video using cascades of Deep Neural Network models, ii) a Video Event Query language (VEQL) to express high-level user queries for video streams in CEP, iii) a complex event matcher to detect spatiotemporal video event patterns by matching expressive user queries over video data. The proposed approach detects spatiotemporal video event patterns with an F-score ranging from 0.66 to 0.89. VidCEP maintains near real-time performance with an average throughput of 70 frames per second for 5 parallel videos with sub-second matching latency.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.