A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06N 3/00 - Computing arrangements based on biological models
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06K 9/62 - Methods or arrangements for recognition using electronic means
2.
Method and system for configurable security and surveillance systems
A method and system for a configurable security and surveillance system are provided. A configurable security and surveillance system may comprise at least one programmable sensor agent and/or at least one programmable content analysis agent. A plurality of processing features may be offered by the configurable security and surveillance system by programming configurable hardware devices in the programmable sensor agents and/or the programmable content analysis agents via a system manager. Device programming files may be utilized to program the configurable hardware devices. The device programming files may be encrypted and decryption keys may be requested to enable the programming of different processing features into the programmable sensor agents and/or the programmable content analysis agents. The device programming files and/or the decryption keys may be received via a network transfer and/or via a machine-readable media from an e-commerce vendor.
G08B 29/18 - Prevention or correction of operating errors
H04N 7/18 - Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
G08B 29/00 - Checking or monitoring of signalling or alarm systemsPrevention or correction of operating errors, e.g. preventing unauthorised operation
H04L 29/06 - Communication control; Communication processing characterised by a protocol
G08B 13/196 - Actuation by interference with heat, light, or radiation of shorter wavelengthActuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
H04N 7/167 - Systems rendering the television signal unintelligible and subsequently intelligible
3.
Method and system for identifying an individual in a digital image using location meta-tags
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as www.facebook.com to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06K 9/62 - Methods or arrangements for recognition using electronic means
5.
Inter-trajectory anomaly detection using adaptive voting experts in a video surveillance system
A sequence layer in a machine-learning engine configured to learn from the observations of a computer vision engine. In one embodiment, the machine-learning engine uses the voting experts to segment adaptive resonance theory (ART) network label sequences for different objects observed in a scene. The sequence layer may be configured to observe the ART label sequences and incrementally build, update, and trim, and reorganize an ngram trie for those label sequences. The sequence layer computes the entropies for the nodes in the ngram trie and determines a sliding window length and vote count parameters. Once determined, the sequence layer may segment newly observed sequences to estimate the primitive events observed in the scene as well as issue alerts for inter-sequence and intra-sequence anomalies.
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06F 17/30 - Information retrieval; Database structures therefor
G06N 99/00 - Subject matter not provided for in other groups of this subclass
G06K 9/62 - Methods or arrangements for recognition using electronic means
7.
Method and system for tagging an individual in a digital image
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as www.facebook.com to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user. The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
9.
Semantic representation module of a machine-learning engine in a video analysis system
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
10.
Method and system for configurable security and surveillance systems
A method and system for a configurable security and surveillance system are provided. A configurable security and surveillance system may comprise at least one programmable sensor agent and/or at least one programmable content analysis agent. A plurality of processing features may be offered by the configurable security and surveillance system by programming configurable hardware devices in the programmable sensor agents and/or the programmable content analysis agents via a system manager. Device programming files may be utilized to program the configurable hardware devices. The device programming files may be encrypted and decryption keys may be requested to enable the programming of different processing features into the programmable sensor agents and/or the programmable content analysis agents. The device programming files and/or the decryption keys may be received via a network transfer and/or via a machine-readable media from an e-commerce vendor.
G08B 29/18 - Prevention or correction of operating errors
H04N 7/18 - Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
G08B 29/00 - Checking or monitoring of signalling or alarm systemsPrevention or correction of operating errors, e.g. preventing unauthorised operation
H04L 29/06 - Communication control; Communication processing characterised by a protocol
G08B 13/196 - Actuation by interference with heat, light, or radiation of shorter wavelengthActuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
H04N 7/167 - Systems rendering the television signal unintelligible and subsequently intelligible
11.
Semantic representation module of a machine-learning engine in a video analysis system
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
12.
Method and system for attaching a metatag to a digital image
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as Facebook® to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06F 16/56 - Information retrievalDatabase structures thereforFile system structures therefor of still image data having vectorial format
G06F 16/583 - Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/46 - Extraction of features or characteristics of the image
13.
Image classification and information retrieval over wireless digital networks and the internet
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
A method and system for a configurable security and surveillance system are provided. A configurable security and surveillance system may comprise at least one programmable sensor agent and/or at least one programmable content analysis agent. A plurality of processing features may be offered by the configurable security and surveillance system by programming configurable hardware devices in the programmable sensor agents and/or the programmable content analysis agents via a system manager. Device programming files may be utilized to program the configurable hardware devices. The device programming files may be encrypted and decryption keys may be requested to enable the programming of different processing features into the programmable sensor agents and/or the programmable content analysis agents. The device programming files and/or the decryption keys may be received via a network transfer and/or via a machine-readable media from an e-commerce vendor.
G08B 29/00 - Checking or monitoring of signalling or alarm systemsPrevention or correction of operating errors, e.g. preventing unauthorised operation
G08B 13/196 - Actuation by interference with heat, light, or radiation of shorter wavelengthActuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
H04N 7/18 - Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
H04L 29/06 - Communication control; Communication processing characterised by a protocol
H04N 7/167 - Systems rendering the television signal unintelligible and subsequently intelligible
15.
Method and system for tagging an image of an individual in a plurality of photos
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
Systems and methods for viewing a scene depicted in a sequence of video frames and identifying and tracking objects between separate frames of the sequence. Each tracked object is classified based on known categories and a stream of context events associated with the object is generated. A sequence of primitive events based on the stream of context events is generated and stored together, along with detailed data and generalized data related to an event. All of the data is then evaluated to learn patterns of behavior that occur within the scene.
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06T 7/254 - Analysis of motion involving subtraction of images
17.
Semantic representation module of a machine-learning engine in a video analysis system
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06F 17/30 - Information retrieval; Database structures therefor
G06N 99/00 - Subject matter not provided for in other groups of this subclass
18.
Method and system for configurable security and surveillance systems
A method and system for a configurable security and surveillance system are provided. A configurable security and surveillance system may comprise at least one programmable sensor agent and/or at least one programmable content analysis agent. A plurality of processing features may be offered by the configurable security and surveillance system by programming configurable hardware devices in the programmable sensor agents and/or the programmable content analysis agents via a system manager. Device programming files may be utilized to program the configurable hardware devices. The device programming files may be encrypted and decryption keys may be requested to enable the programming of different processing features into the programmable sensor agents and/or the programmable content analysis agents. The device programming files and/or the decryption keys may be received via a network transfer and/or via a machine-readable media from an e-commerce vendor.
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06F 17/30 - Information retrieval; Database structures therefor
G06N 99/00 - Subject matter not provided for in other groups of this subclass
20.
Image classification and information retrieval over wireless digital networks and the internet
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
Techniques are disclosed for visually conveying a percept. The percept may represent information learned by a video surveillance system. A request may be received to view a percept for a specified scene. The percept may have been derived from data streams generated from a sequence of video frames depicting the specified scene captured by a video camera. A visual representation of the percept may be generated. A user interface may be configured to display the visual representation of the percept and to allow a user to view and/or modify metadata attributes with the percept. For example, the user may label a percept and set events matching the percept to always (or never) result in alert being generated for users of the video surveillance system.
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as Facebook® to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
G06F 17/30 - Information retrieval; Database structures therefor
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/46 - Extraction of features or characteristics of the image
G06K 9/62 - Methods or arrangements for recognition using electronic means
23.
Method and system for automatically measuring and forecasting the demographic characterization of customers to help customize programming contents in a media network
The present invention is a method and system for forecasting the demographic characterization of customers to help customize programming contents on each means for playing output of each site of a plurality of sites in a media network through automatically measuring, characterizing, and estimating the demographic information of customers that appear in the vicinity of each means for playing output. The analysis of demographic information of customers is performed automatically based on the visual information of the customers, using a plurality of means for capturing images and a plurality of computer vision technologies on the visual information. The measurement of the demographic information is performed in each measured node, where the node is defined as means for playing output. Extrapolation of the measurement characterizes the demographic information per each node of a plurality of nodes in a site of a plurality of sites of a media network. The forecasting and customization of the programming contents is based on the characterization of the demographic information.
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as www.facebook.com to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
A method and system for of identifying an unknown individual from a digital image is disclosed herein. In one embodiment, the present invention allows an individual to photograph a facial image an unknown individual, transfer that facial image to a server for processing into a feature vector, and then search social networking Web sites to obtain information on the unknown individual. The Web sites comprise myspace.com, facebook.com, linkedin.com, www.hi5.com, www.bebo.com, www.friendster.com, www.igoogle.com, netlog.com, and orkut.com. A method of networking is also disclosed. A method for determining unwanted individuals on a social networking website is also disclosed.
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06K 9/48 - Extraction of features or characteristics of the image by coding the contour of the pattern
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
29.
Method and system for optimizing the observation and annotation of complex human behavior from video sources
The present invention is a method and system for optimizing the observation and annotation of complex human behavior from video sources by automatically detecting predefined events based on the behavior of people in a first video stream from a first means for capturing images in a physical space, accessing a synchronized second video stream from a second means for capturing images that are positioned to observe the people more closely using the timestamps associated with the detected events from the first video stream, and enabling an annotator to annotate each of the events with more labels using a tool. The present invention captures a plurality of input images of the persons by a plurality of means for capturing images and processes the plurality of input images in order to detect the predefined events based on the behavior in an exemplary embodiment. The processes are based on a novel usage of a plurality of computer vision technologies to analyze the human behavior from the plurality of input images. The physical space may be a retail space, and the people may be customers in the retail space.
The present invention is a method and system for producing a set of ratings for out-of-home media based on the measurement of behavior patterns and demographics of the people in a digital media network. The present invention captures a plurality of input images of the people in the vicinity of sampled out-of-home media in a digital media network by a plurality of means for capturing images, and tracks each person. The present invention processes the plurality of input images in order to analyze the behavior and demographics of the people. The present invention aggregates the measurements for the behavior patterns and demographics of the people, analyzes the data, and extracts characteristic information based on the estimated parameters from the aggregated measurements. Finally, the present invention calculates a set of ratings based on the characteristic information. The plurality of computer vision technologies can comprise face detection, person tracking, body parts detection, and demographic classification of the people, on the captured visual information of the people in the vicinity of the out-of-home media.
The present invention is a method and system for selecting and storing videos by applying semantically-meaningful selection criteria to the track sequences of the trips made by people in an area covered by overlapping multiple cameras. The present invention captures video streams of the people in the area by multiple cameras and tracks the people in each of the video streams, producing track sequences in each video stream. The present invention determines a first set of video segments that contains the trip information of the people, and compacts each of the video streams by removing a second set of video segments that do not contain the trip information of the people from each of the video streams. The present invention selects video segments from the first set of video segments based on predefined selection criteria for the statistical behavior analysis. The stored video data is an efficient compact format of video segments that contain the track sequences of the people and selected according to semantically-meaningful and domain-specific selection criteria. The final storage format of the videos is a trip-centered format, which sequences videos from across multiple cameras, and it can be used to facilitate multiple applications dealing with behavior analysis in a specific domain.
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as www.facebook.com to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
The present invention is a method and system for storing videos by track sequences and selection of video segments in a manner to support “re-mining” by indexing and playback of individual visitors' entire trip to an area covered by overlapping cameras, allowing analysis and recognition of detailed behavior. The present invention captures video streams of the people in the area by multiple cameras and tracks the people in each of the video streams, producing track sequences in each video stream. Using the track sequences, the present invention finds trip information of the people. The present invention determines a first set of video segments that contain the trip information of the people, and compacts each of the video streams by removing a second set of video segments that do not contain the trip information of the people from each of the video streams. The video segments in the first set of video segments are associated with the people by indexing the video segments per person based on the trip information. The final storage format of the videos is a trip-centered format which sequences videos from across multiple cameras in a manner to facilitate multiple applications dealing with behavior analysis, and it is an efficient compact format without losing any video segments that contain the track sequences of the people.
Techniques are disclosed for learning and modeling a background for a complex and/or dynamic scene over a period of observations without supervision. A background/foreground component of a computer vision engine may be configured to model a scene using an array of ART networks. The ART networks learn the regularity and periodicity of the scene by observing the scene over a period of time. Thus, the ART networks allow the computer vision engine to model complex and dynamic scene backgrounds in video.
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
G06T 7/254 - Analysis of motion involving subtraction of images
35.
Inter-trajectory anomaly detection using adaptive voting experts in a video surveillance system
A sequence layer in a machine-learning engine configured to learn from the observations of a computer vision engine. In one embodiment, the machine-learning engine uses the voting experts to segment adaptive resonance theory (ART) network label sequences for different objects observed in a scene. The sequence layer may be configured to observe the ART label sequences and incrementally build, update, and trim, and reorganize an ngram trie for those label sequences. The sequence layer computes the entropies for the nodes in the ngram trie and determines a sliding window length and vote count parameters. Once determined, the sequence layer may segment newly observed sequences to estimate the primitive events observed in the scene as well as issue alerts for inter-sequence and intra-sequence anomalies.
A method, using a mobile device, of initiating the processing of a transaction for a parking session between a parking system and a payment provider on behalf of a user, the method comprising storing a user identifier in memory on the mobile device, receiving a parking identifier and transmitting the user identifier and the parking identifier to a network server.
G06Q 20/40 - Authorisation, e.g. identification of payer or payee, verification of customer or shop credentialsReview and approval of payers, e.g. check of credit lines or negative lists
G07B 15/00 - Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points
G08G 1/00 - Traffic control systems for road vehicles
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
The present invention is a method and system to provide an automatic measurement of people's responses to dynamic digital media, based on changes in their facial expressions and attention to specific content. First, the method detects and tracks faces from the audience. It then localizes each of the faces and facial features to extract emotion-sensitive features of the face by applying emotion-sensitive feature filters, to determine the facial muscle actions of the face based on the extracted emotion-sensitive features. The changes in facial muscle actions are then converted to the changes in affective state, called an emotion trajectory. On the other hand, the method also estimates eye gaze based on extracted eye images and three-dimensional facial pose of the face based on localized facial images. The gaze direction of the person, is estimated based on the estimated eye gaze and the three-dimensional facial pose of the person. The gaze target on the media display is then estimated based on the estimated gaze direction and the position of the person. Finally, the response of the person to the dynamic digital media content is determined by analyzing the emotion trajectory in relation to the time and screen positions of the specific digital media sub-content that the person is watching.
Techniques are disclosed for identifying anomaly object types during classification of foreground objects extracted from image data. A self-organizing map and adaptive resonance theory (SOM-ART) network is used to discover object type clusters and classify objects depicted in the image data based on pixel-level micro-features that are extracted from the image data. Importantly, the discovery of the object type clusters is unsupervised, i.e., performed independent of any training data that defines particular objects, allowing a behavior-recognition system to forgo a training phase and for object classification to proceed without being constrained by specific object definitions. The SOM-ART network is adaptive and able to learn while discovering the object type clusters and classifying objects and identifying anomaly object types.
Techniques are disclosed for detecting foreground objects in a scene captured by a surveillance system and tracking the detected foreground objects from frame to frame in real time. A motion flow field is used to validate foreground objects(s) that are extracted from the background model of a scene. Spurious foreground objects are filtered before the foreground objects are provided to the tracking stage. The motion flow field is also used by the tracking stage to improve the performance of the tracking as needed for real time surveillance applications.
Embodiments of the present invention provide a method and a system for mapping a scene depicted in an acquired stream of video frames that may be used by a machine-learning behavior-recognition system. A background image of the scene is segmented into plurality of regions representing various objects of the background image. Statistically similar regions may be merged and associated. The regions are analyzed to determine their z-depth order in relation to a video capturing device providing the stream of the video frames and other regions, using occlusions between the regions and data about foreground objects in the scene. An annotated map describing the identified regions and their properties is created and updated.
Techniques are disclosed for a video surveillance system to learn to recognize complex behaviors by analyzing pixel data using alternating layers of clustering and sequencing. A combination of a self organizing map (SOM) and an adaptive resonance theory (ART) network may be used to identify a variety of different anomalous inputs at each cluster layer. As progressively higher layers of the cortex model component represent progressively higher levels of abstraction, anomalies occurring in the higher levels of the cortex model represent observations of behavioral anomalies corresponding to progressively complex patterns of behavior.
Embodiments of the present invention provide a method and a system for analyzing and learning behavior based on an acquired stream of video frames. Objects depicted in the stream are determined based on an analysis of the video frames. Each object may have a corresponding search model used to track an object's motion frame-to-frame. Classes of the objects are determined and semantic representations of the objects are generated. The semantic representations are used to determine objects' behaviors and to learn about behaviors occurring in an environment depicted by the acquired video streams. This way, the system learns rapidly and in real-time normal and abnormal behaviors for any environment by analyzing movements or activities or absence of such in the environment and identifies and predicts abnormal and suspicious behavior based on what has been learned.
The present invention is an embedded audience measurement platform, which is called HAM. The HAM includes hardware, apparatus, and method for measuring audience data from image stream using dynamically-configurable hardware architecture. The HAM provides an end-to-end solution for audience measurement, wherein reconfigurable computational modules are used as engines per node to power the complete solution implemented in a flexible hardware architecture. The HAM is also a complete system for broad audience measurement, which has various components built into the system. Examples of the components comprise demographics classification, gaze estimation, emotion recognition, behavior analysis, and impression measurement.
The present invention is an apparatus and method for object recognition from at least an image stream from at least an image frame utilizing at least an artificial neural network. The present invention further comprises means for generating multiple components of an image pyramid simultaneously from a single image stream, means for providing the active pixel and interlayer neuron data to at least a subwindow processor, means for multiplying and accumulating the product of a pixel data or interlayer data and a synapse weight, and means for performing the activation of an accumulation. The present invention allows the artificial neural networks to be reconfigurable, thus embracing a broad range of object recognition applications in a flexible way. The subwindow processor in the present invention also further comprises means for performing neuron computations for at least a neuron. An exemplary embodiment of the present invention is used for object recognition, including face detection and gender recognition, in hardware. The apparatus comprises a digital circuitry system or IC that embodies the components of the present invention.
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as www.facebook.com to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
Techniques are disclosed for a video surveillance system to learn to recognize complex behaviors by analyzing pixel data using alternating layers of clustering and sequencing. A combination of a self organizing map (SOM) and an adaptive resonance theory (ART) network may be used to identify a variety of different anomalous inputs at each cluster layer. As progressively higher layers of the cortex model component represent progressively higher levels of abstraction, anomalies occurring in the higher levels of the cortex model represent observations of behavioral anomalies corresponding to progressively complex patterns of behavior.
Techniques are disclosed for a video surveillance system to learn to recognize complex behaviors by analyzing pixel data using alternating layers of clustering and sequencing. A video surveillance system may be configured to observe a scene (as depicted in a sequence of video frames) and, over time, develop hierarchies of concepts including classes of objects, actions and behaviors. That is, the video surveillance system may develop models at progressively more complex levels of abstraction used to identify what events and behaviors are common and which are unusual. When the models have matured, the video surveillance system issues alerts on unusual events.
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
The present invention is a method and system for detecting scene events in an imaged sequence by analysis of occlusion of user-defined regions of interest in the image. The present invention is based on the multi-scale groups of nearby pixel locations employing contrast functions, a feature that is invariant to changing illumination conditions. The feature allows the classification of each pixel location in the region of interest as occluded or not. Scene events, based on the occlusion of the regions of interest, are defined and subsequently detected in an image sequence. Example applications of this invention are automated surveillance of persons for security, and automated person counting, tracking and aisle-touch detection for market research.
Techniques are disclosed for identifying anomaly object types during classification of foreground objects extracted from image data. A self-organizing map and adaptive resonance theory (SOM-ART) network is used to discover object type clusters and classify objects depicted in the image data based on pixel-level micro-features that are extracted from the image data. Importantly, the discovery of the object type clusters is unsupervised, i.e., performed independent of any training data that defines particular objects, allowing a behavior-recognition system to forgo a training phase and for object classification to proceed without being constrained by specific object definitions. The SOM-ART network is adaptive and able to learn while discovering the object type clusters and classifying objects and identifying anomaly object types.
Techniques are disclosed for visually conveying a percept. The percept may represent information learned by a video surveillance system. A request may be received to view a percept for a specified scene. The percept may have been derived from data streams generated from a sequence of video frames depicting the specified scene captured by a video camera. A visual representation of the percept may be generated. A user interface may be configured to display the visual representation of the percept and to allow a user to view and/or modify metadata attributes with the percept. For example, the user may label a percept and set events matching the percept to always (or never) result in alert being generated for users of the video surveillance system.
A method and system for matching an unknown facial image of an individual with an image of an unknown twin using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. The method and system uses human perception techniques to weight the feature vectors.
Techniques are disclosed for discovering object type clusters using pixel-level micro-features extracted from image data. A self-organizing map and adaptive resonance theory (SOM-ART) network is used to classify objects depicted in the image data based on the pixel-level micro-features. Importantly, the discovery of the object type clusters is unsupervised, i.e., performed independent of any training data that defines particular objects, allowing a behavior-recognition system to forgo a training phase and for object classification to proceed without being constrained by specific object definitions. The SOM-ART network is adaptive and able to learn while discovering the object type clusters and classifying objects.
Techniques are disclosed for visually conveying classifications derived from pixel-level micro-features extracted from image data. The image data may include an input stream of video frames depicting one or more foreground objects. The classifications represent information learned by a video surveillance system. A request may be received to view a classification. A visual representation of the classification may be generated. A user interface may be configured to display the visual representation of the classification and to allow a user to view and/or modify properties associated with the classification.
Techniques are described for analyzing a stream of video frames to identify temporal anomalies. A video surveillance system configured to identify when agents depicted in the video stream engage in anomalous behavior, relative to the time-of-day (TOD) or day-of-week (DOW) at which the behavior occurs. A machine-learning engine may establish the normalcy of a scene by observing the scene over a specified period of time. Once the observations of the scene have matured, the actions of agents in the scene may be evaluated and classified as normal or abnormal temporal behavior, relative to the past observations.
Techniques are disclosed for determining anomalous trajectories of objects tracked over a sequence of video frames. In one embodiment, a symbol trajectory may be derived from observing an object moving through a scene. The symbol trajectory represents semantic concepts extracted from the trajectory of the object. Whether the symbol trajectory is anomalous may be determined, based on previously observed symbol trajectories. A user may be alerted upon determining that the symbol trajectory is anomalous.
Techniques are disclosed for detecting foreground objects in a scene captured by a surveillance system and tracking the detected foreground objects from frame to frame in real time. A motion flow field is used to validate foreground objects(s) that are extracted from the background model of a scene. Spurious foreground objects are filtered before the foreground objects are provided to the tracking stage. The motion flow field is also used by the tracking stage to improve the performance of the tracking as needed for real time surveillance applications.
Techniques are disclosed for detecting foreground objects in a scene captured by a surveillance system and tracking the detected foreground objects from frame to frame in real time. A motion flow field is used to validate foreground objects(s) that are extracted from the background model of a scene. Spurious foreground objects are filtered before the detected foreground objects are provided to the tracking stage. The motion flow field is also used by the tracking stage to improve the performance of the tracking as needed for real time surveillance applications.
Techniques are disclosed for visually conveying an event map. The event map may represent information learned by a surveillance system. A request may be received to view the event map for a specified scene. The event map may be generated, including a background model of the specified scene and at least one cluster providing a statistical distribution of an event in the specified scene. Each statistical distribution may be derived from data streams generated from a sequence of video frames depicting the specified scene captured by a video camera. Each event may be observed to occur at a location in the specified scene corresponding to a location of the respective cluster in the event map. The event map may be configured to allow a user to view and/or modify properties associated with each cluster. For example, the user may label a cluster and set events matching the cluster to always (or never) generate an alert.
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
A sequence layer in a machine-learning engine configured to learn from the observations of a computer vision engine. In one embodiment, the machine-learning engine uses the voting experts to segment adaptive resonance theory (ART) network label sequences for different objects observed in a scene. The sequence layer may be configured to observe the ART label sequences and incrementally build, update, and trim, and reorganize an ngram trie for those label sequences. The sequence layer computes the entropies for the nodes in the ngram trie and determines a sliding window length and vote count parameters. Once determined, the sequence layer may segment newly observed sequences to estimate the primitive events observed in the scene as well as issue alerts for inter-sequence and intra-sequence anomalies.
Techniques are disclosed for visually conveying a trajectory map. The trajectory map provides users with a visualization of data observed by a machine-learning engine of a behavior recognition system. Further, the visualization may provide an interface used to guide system behavior. For example, the interface may be used to specify that the behavior recognition system should alert (or not alert) when a particular trajectory is observed to occur.
A sequence layer in a machine-learning engine configured to learn from the observations of a computer vision engine. In one embodiment, the machine-learning engine uses the voting experts to segment adaptive resonance theory (ART) network label sequences for different objects observed in a scene. The sequence layer may be configured to observe the ART label sequences and incrementally build, update, and trim, and reorganize an ngram trie for those label sequences. The sequence layer computes the entropies for the nodes in the ngram trie and determines a sliding window length and vote count parameters. Once determined, the sequence layer may segment newly observed sequences to estimate the primitive events observed in the scene as well as issue alerts for inter-sequence and intra-sequence anomalies.
Techniques are disclosed for learning and modeling a background for a complex and/or dynamic scene over a period of observations without supervision. A background/foreground component of a computer vision engine may be configured to model a scene using an array of ART networks. The ART networks learn the regularity and periodicity of the scene by observing the scene over a period of time. Thus, the ART networks allow the computer vision engine to model complex and dynamic scene backgrounds in video.
Techniques are disclosed for visually conveying a sequence storing an ordered string of symbols generated from kinematic data derived from analyzing an input stream of video frames depicting one or more foreground objects. The sequence may represent information learned by a video surveillance system. A request may be received to view the sequence or a segment partitioned form the sequence. A visual representation of the segment may be generated and superimposed over a background image associated with the scene. A user interface may be configured to display the visual representation of the sequence or segment and to allow a user to view and/or modify properties associated with the sequence or segment.
A sequence layer in a machine-learning engine configured to learn from the observations of a computer vision engine. In one embodiment, the machine-learning engine uses the voting experts to segment adaptive resonance theory (ART) network label sequences for different objects observed in a scene. The sequence layer may be configured to observe the ART label sequences and incrementally build, update, and trim, and reorganize an ngram trie for those label sequences. The sequence layer computes the entropies for the nodes in the ngram trie and determines a sliding window length and vote count parameters. Once determined, the sequence layer may segment newly observed sequences to estimate the primitive events observed in the scene as well as issue alerts for inter-sequence and intra-sequence anomalies.
A system and method for tagging an image of an individual in a plurality of photos is disclosed herein. A feature vector of an individual is used to analyze a set of photos on a social networking website such as www.facebook.com to determine if an image of the individual is present in a photo of the set of photos. Photos having an image of the individual are tagged preferably by listing a URL or URI for each of the photos in a database.
Techniques are disclosed for a computer vision engine to update both a background model and thresholds used to classify pixels as depicting scene foreground or background in response to detecting that a sudden illumination changes has occurred in a sequence of video frames. The threshold values may be used to specify how much pixel a given pixel may differ from corresponding values in the background model before being classified as depicting foreground. When a sudden illumination change is detected, the values for pixels affected by sudden illumination change may be used to update the value in the background image to reflect the value for that pixel following the sudden illumination change as well as update the threshold for classifying that pixel as depicting foreground/background in subsequent frames of video.
The present invention is a method and system to estimate the visual target that people are looking, based on automatic image measurements. The system utilizes image measurements from both face-view cameras and top-down view cameras. The cameras are calibrated with respect to the site and the visual target, so that the gaze target is determined from the estimated position and gaze direction of a person. Face detection and two-dimensional pose estimation locate and normalize the face of the person so that the eyes can be accurately localized and the three-dimensional facial pose can be estimated. The eye gaze is estimated based on either the positions of localized eyes and irises or on the eye image itself, depending on the quality of the image. The gaze direction is estimated from the eye gaze measurement in the context of the three-dimensional facial pose. From the top-down view the body of the person is detected and tracked, so that the position of the head is estimated using a body blob model that depends on the body position in the view. The gaze target is determined based on the estimated gaze direction, estimated head pose, and the camera calibration. The gaze target estimation can provide a gaze trajectory of the person or a collective gaze map from many instances of gaze.
The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals or objects using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar objects or faces to the user The system features classification of images from a variety of Internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, or the matching object, image and associated meta-data is sent back to the user. The image may be manipulated to emphasize similar characteristics between the received facial image and the matching facial image. The meta-data sent down with the image may include sponsored links and advertisements.
A long-term memory used to store and retrieve information learned while a video analysis system observes a stream of video frames is disclosed. The long-term memory provides a memory with a capacity that grows in size gracefully, as events are observed over time. Additionally, the long-term memory may encode events, represented by sub-graphs of a neural network. Further, rather than predefining a number of patterns recognized and manipulated by the long-term memory, embodiments of the invention provide a long-term memory where the size of a feature dimension (used to determine the similarity between different observed events) may grow dynamically as necessary, depending on the actual events observed in a sequence of video frames.
G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions
G06F 15/18 - in which a program is changed according to experience gained by the computer itself during a complete run; Learning machines (adaptive control systems G05B 13/00;artificial intelligence G06N)
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06K 9/46 - Extraction of features or characteristics of the image
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
72.
Detecting anomalous events using a long-term memory in a video analysis system
Techniques are described for detecting anomalous events using a long-term memory in a video analysis system. The long-term memory may be used to store and retrieve information learned while a video analysis system observes a stream of video frames depicting a given scene. Further, the long-term memory may be configured to detect the occurrence of anomalous events, relative to observations of other events that have occurred in the scene over time. A distance measure may used to determine a distance between an active percept (encoding an observed event depicted in the stream of video frames) and a retrieved percept (encoding a memory of previously observed events in the long-term memory). If the distance exceeds a specified threshold, the long-term memory may publish the occurrence of an anomalous event for review by users of the system.
G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions
G06F 15/18 - in which a program is changed according to experience gained by the computer itself during a complete run; Learning machines (adaptive control systems G05B 13/00;artificial intelligence G06N)
G06K 9/62 - Methods or arrangements for recognition using electronic means
G06K 9/46 - Extraction of features or characteristics of the image
G06K 9/66 - Methods or arrangements for recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references, e.g. resistor matrix references adjustable by an adaptive method, e.g. learning
A method and system for matching an unknown facial image of an individual with an image of an unknown twin using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. The method and system uses human perception techniques to weight the feature vectors.
The present invention is a method and system for measuring human emotional response to visual stimulus, based on the person's facial expressions. Given a detected and tracked human face, it is accurately localized so that the facial features are correctly identified and localized. Face and facial features are localized using the geometrically specialized learning machines. Then the emotion-sensitive features, such as the shapes of the facial features or facial wrinkles, are extracted. The facial muscle actions are estimated using a learning machine trained on the emotion-sensitive features. The instantaneous facial muscle actions are projected to a point in affect space, using the relation between the facial muscle actions and the affective state (arousal, valence, and stance). The series of estimated emotional changes renders a trajectory in affect space, which is further analyzed in relation to the temporal changes in visual stimulus, to determine the response.
The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals or objects using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar objects or faces to the user The system features classification of images from a variety of Internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, or the matching object, image and associated meta-data is sent back to the user. The image may be manipulated to emphasize similar characteristics between the received facial image and the matching facial image. The meta-data sent down with the image may include sponsored links and advertisements.
Embodiments of the present invention provide a method and a system for mapping a scene depicted in an acquired stream of video frames that may be used by a machine-learning behavior-recognition system. A background image of the scene is segmented into plurality of regions representing various objects of the background image. Statistically similar regions may be merged and associated. The regions are analyzed to determine their z-depth order in relation to a video capturing device providing the stream of the video frames and other regions, using occlusions between the regions and data about foreground objects in the scene. An annotated map describing the identified regions and their properties is created and updated.
Embodiments of the present invention provide a method and a module for identifying a background of a scene depicted in an acquired stream of video frames that may be used by a video-analysis system. For each pixel or block of pixels in an acquired video frame a comparison measure is determined. The comparison measure depends on difference of color values exhibited in the acquired video frame and in a background image respectively by the pixel or block of pixels and a corresponding pixel and block of pixels in the background image. To determine the comparison measure, the resulting difference is considered in relation to a range of possible color values. If the comparison measure is above a dynamically adjusted threshold, the pixel or the block of pixels is classified as a part of the background of the scene.
G06K 9/34 - Segmentation of touching or overlapping patterns in the image field
H04N 7/12 - Systems in which the television signal is transmitted via one channel or a plurality of parallel channels, the bandwidth of each channel being less than the bandwidth of the television signal
G06K 9/00 - Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
78.
Background-foreground module for video analysis system
Embodiments of the present invention provide a method and a module for identifying a background of a scene depicted in an acquired stream of video frames that may be used by a video-analysis system. For each pixel or block of pixels in an acquired video frame a comparison measure is determined. The comparison measure depends on difference of color values exhibited in the acquired video frame and in a background image respectively by the pixel or block of pixels and a corresponding pixel and block of pixels in the background image. To determine the comparison measure, the resulting difference is considered in relation to a range of possible color values. If the comparison measure is above a dynamically adjusted threshold, the pixel or the block of pixels is classified as a part of the background of the scene.
Embodiments of the present invention provide a method and a module for identifying a background of a scene depicted in an acquired stream of video frames that may be used by a video-analysis system. For each pixel or block of pixels in an acquired video frame a comparison measure is determined. The comparison measure depends on difference of color values exhibited in the acquired video frame and in a background image respectively by the pixel or block of pixels and a corresponding pixel and block of pixels in the background image. To determine the comparison measure, the resulting difference is considered in relation to a range of possible color values. If the comparison measure is above a dynamically adjusted threshold, the pixel or the block of pixels is classified as a part of the background of the scene.
An estimator/identifier component for a computer vision engine of a machine-learning based behavior-recognition system is disclosed. The estimator/identifier component may be configured to classify an object being one of two or more classification types, e.g., as being a vehicle or a person. Once classified, the estimator/identifier may evaluate the object to determine a set of kinematic data, static data, and a current pose of the object. The output of the estimator/identifier component may include the classifications assigned to a tracked object, as well as the derived information and object attributes.
A tracker component for a computer vision engine of a machine-learning based behavior-recognition system is disclosed. The behavior-recognition system may be configured to learn, identify, and recognize patterns of behavior by observing a video stream (i.e., a sequence of individual video frames). The tracker component may be configured to track objects depicted in the sequence of video frames and to generate, search, match, and update computational models of such objects.
Aspects of a method and system for processing video data are disclosed and may include detecting, within a single chip in a programmable surveillance video camera, one or more moving objects in a raw video signal generated by the programmable surveillance video camera. One or more characteristics of the detected one or more objects may be extracted within the single chip in the programmable surveillance video camera. The extraction may be based on the raw video signal and may be performed prior to compression of the raw video data. The characteristics of the detected one or more objects may include shape, texture, color, motion presence, motion direction, sequence name, location, links, and/or alarm type. One or more textual representations of at least one of the characteristics of the detected one or more objects may be generated within the single chip in the programmable surveillance video camera.
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
A machine-learning engine is disclosed that is configured to recognize and learn behaviors, as well as to identify and distinguish between normal and abnormal behavior within a scene, by analyzing movements and/or activities (or absence of such) over time. The machine-learning engine may be configured to evaluate a sequence of primitive events and associated kinematic data generated for an object depicted in a sequence of video frames and a related vector representation. The vector representation is generated from a primitive event symbol stream and a phase space symbol stream, and the streams describe actions of the objects depicted in the sequence of video frames.
A method and system for matching an unknown facial image of an individual with an image of an unknown twin using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. The method and system uses human perception techniques to weight the feature vectors.
Aspects of a method and system for change detection in localization and tracking of objects in a smart video camera are provided. A programmable surveillance video camera comprises processors for detecting objects in a video signal based on an object mask. The processors may generate a textual representation of the video signal by utilizing a description language to indicate characteristics of the detected objects, such as shape, texture, color, and/or motion, for example. The object mask may be based on a detection field value generated for each pixel in the video signal by comparing a first observation field and a second observation field associated with each of the pixels. The first observation field may be based on a difference between an input video signal value and an estimated background value while the second observation field may be based on a temporal difference between first observation fields.
The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals or objects using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar objects or faces to the user. The system features classification of images from a variety of Internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, or the matching object, image and associated meta-data is sent back to the user. The image may be manipulated to emphasize similar characteristics between the received facial image and the matching facial image. The meta-data sent down with the image may include sponsored links and advertisements.
Embodiments of the present invention provide a method and a system for analyzing and learning behavior based on an acquired stream of video frames. Objects depicted in the stream are determined based on an analysis of the video frames. Each object may have a corresponding search model used to track an object's motion frame-to-frame. Classes of the objects are determined and semantic representations of the objects are generated. The semantic representations are used to determine objects' behaviors and to learn about behaviors occurring in an environment depicted by the acquired video streams. This way, the system learns rapidly and in real-time normal and abnormal behaviors for any environment by analyzing movements or activities or absence of such in the environment and identifies and predicts abnormal and suspicious behavior based on what has been learned.
The present invention is a method and system for handling a plurality of information units in an information processing system, such as a multimodal human computer interaction (HCI) system, through verification process for the plurality of information units. The present invention converts each information unit in the plurality of information units into verified object by augmenting the first meaning in the information unit with a second meaning and expresses the verified objects by object representation for each verified object. The present invention utilizes a processing structure, called polymorphic operator, which is capable of applying a plurality of relationships among the verified objects based on a set of predefined rules in a particular application domain for governing the operation among the verified objects. The present invention is named Object Verification Enabled Network (OVEN). The OVEN provides a computational framework for the information processing system that needs to handle complex data and event in the system, such as handling a huge amount of data in a database, correlating information pieces from multiple sources, applying contextual information to the recognition of inputs in a specific domain, processing fusion of the multiple inputs from different modalities, handling unforeseen challenges in deploying a commercially working information processing system in a real-world environment, and handling collaboration among multiple users.
The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals or objects using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar objects or faces to the user. The system features classification of images from a variety of Internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, or the matching object, image and associated meta-data is sent back to the user. The image may be manipulated to emphasize similar characteristics between the received facial image and the matching facial image. The meta-data sent down with the image may include sponsored links and advertisements.
A method and system for a programmable camera for a configurable security and surveillance system are provided. A programmable sensor agent for video surveillance may comprise a network interface, a processor, an image processor, and an image sensor. The image processor may comprise at least one configurable device. A device programming file may be received by the network interface from a system manager and may be programmed into at least one configurable device in the image processor via a JTAG interface in the processor. The processor may also verify that the programming has been completed successfully. The programmable sensor agent may also comprise a display interface. The device programming file may be selected via the system manager and/or via the display interface in the programmable sensor agent. The programmable sensor agent may also comprise a battery for backup power, a wireless processor, and/or a global positioning system (GPS) processor.