CN106355171A - Video monitoring internetworking system - Google Patents

Video monitoring internetworking system Download PDF

Info

Publication number
CN106355171A
CN106355171A CN201611063348.0A CN201611063348A CN106355171A CN 106355171 A CN106355171 A CN 106355171A CN 201611063348 A CN201611063348 A CN 201611063348A CN 106355171 A CN106355171 A CN 106355171A
Authority
CN
China
Prior art keywords
module
image
voice
dictionary
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611063348.0A
Other languages
Chinese (zh)
Inventor
邱林新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Kaida Photoelectric Technology Co Ltd
Original Assignee
Shenzhen Kaida Photoelectric Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Kaida Photoelectric Technology Co Ltd filed Critical Shenzhen Kaida Photoelectric Technology Co Ltd
Priority to CN201611063348.0A priority Critical patent/CN106355171A/en
Publication of CN106355171A publication Critical patent/CN106355171A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video monitoring internetworking system. The video monitoring internetworking system is used for identifying personnel by two types of voices and images. The video monitoring internetworking system comprises a collection system, a voice recognition system and an image recognition system, wherein the collection system is used for collecting the voices and images; the voice recognition system comprises a dictionary scene voice module, a similarity comparison module and a voice recognition engine module; the image recognition system comprises a preprocessing module, a feature extraction module, a training module, a re-recognition module and an evaluation module. The video monitoring internetworking system has the advantage that the personnel can be effectively recognized.

Description

一种视频监控联网系统A video monitoring network system

技术领域technical field

本发明涉及视频监控领域,具体涉及一种视频监控联网系统。The invention relates to the field of video monitoring, in particular to a video monitoring networking system.

背景技术Background technique

视频监控是安全防范系统的重要组成部分,传统的监控系统包括前端摄像机、传输线缆、视频监控平台。摄像机可分为网络数字摄像机和模拟摄像机,可作为前端视频图像信号的采集,它是一种防范能力较强的综合系统。视频监控以其直观、准确、及时和信息内容丰富而广泛应用于许多场合。近年来,随着计算机、网络以及图像处理、传输技术的飞速发展,视频监控技术也有了长足的发展。Video surveillance is an important part of the security system. The traditional surveillance system includes front-end cameras, transmission cables, and video surveillance platforms. Cameras can be divided into network digital cameras and analog cameras, which can be used as front-end video image signal collection. It is a comprehensive system with strong preventive capabilities. Video surveillance is widely used in many occasions because of its intuition, accuracy, timeliness and rich information content. In recent years, with the rapid development of computer, network, image processing and transmission technology, video surveillance technology has also made great progress.

发明内容Contents of the invention

本发明旨在提供一种能够对人员进行快速、有效识别的视频监控联网系统。The invention aims to provide a video monitoring network system capable of quickly and effectively identifying personnel.

本发明的目的采用以下技术方案来实现:The object of the present invention adopts following technical scheme to realize:

提供了一种视频监控联网系统,能够通过语音和图像两种方式对人员进行识别,包括采集系统、语音识别系统和与图像识别系统,所述采集系统对语音和图像进行采集,所述语音识别系统包括词典场景语音模块、相似度比较模块和语音识别引擎模块,所述图像识别系统包括预处理模块、特征提取模块、训练模块、再识别模块和评价模块;所述预处理模块用于确定行人图像中的人员位置,获取包含人员的矩形区域;所述特征提取模块,用于在包含人员的矩形区域中进行外观特征提取;所述训练模块用于训练多个跨模态投影模型,每一个跨模态投影模型中包含两个投影函数,它们分别将不同摄像机中的图像持征映射到共同的特征空间中并完成相似度计算;所述再识别模块,用于识别数据库中是否含有与查询人员一致的行人图像并确认查询人员身份;所述评价模块用于对系统性能进行评估。Provided is a networked video monitoring system capable of identifying personnel through voice and image, including a collection system, a voice recognition system and an image recognition system, the collection system collects voice and images, and the voice recognition system The system includes a dictionary scene speech module, a similarity comparison module and a speech recognition engine module, and the image recognition system includes a preprocessing module, a feature extraction module, a training module, a re-identification module and an evaluation module; the preprocessing module is used to determine pedestrians The position of the person in the image is to obtain a rectangular area containing the person; the feature extraction module is used to extract appearance features in the rectangular area containing the person; the training module is used to train multiple cross-modal projection models, each The cross-modal projection model contains two projection functions, which respectively map the images in different cameras into a common feature space and complete the similarity calculation; the re-identification module is used to identify whether the database contains and query Pedestrian images with the same personnel and confirm the identity of the query personnel; the evaluation module is used to evaluate the performance of the system.

本发明的有益效果为:实现了对人员的有效识别。The beneficial effect of the present invention is that the effective identification of personnel is realized.

附图说明Description of drawings

利用附图对本发明作进一步说明,但附图中的实施例不构成对本发明的任何限制,对于本领域的普通技术人员,在不付出创造性劳动的前提下,还可以根据以下附图获得其它的附图。The present invention is further described by using the accompanying drawings, but the embodiments in the accompanying drawings do not constitute any limitation to the present invention. For those of ordinary skill in the art, without paying creative work, other embodiments can also be obtained according to the following accompanying drawings Attached picture.

图1是本发明的结构连接示意图。Fig. 1 is a schematic diagram of structural connection of the present invention.

附图标记:Reference signs:

采集系统1、语音识别系统2、图像识别系统3。Acquisition system 1, voice recognition system 2, image recognition system 3.

具体实施方式detailed description

结合以下实施例对本发明作进一步描述。The present invention is further described in conjunction with the following examples.

参见图1,本实施例的一种视频监控联网系统,能够通过语音和图像两种方式对人员进行识别,包括采集系统1、语音识别系统2和与图像识别系统3,所述采集系统1对语音和图像进行采集,所述语音识别系统2包括词典场景语音模块、相似度比较模块和语音识别引擎模块,所述图像识别系统3包括预处理模块、特征提取模块、训练模块、再识别模块和评价模块;所述预处理模块用于确定行人图像中的人员位置,获取包含人员的矩形区域;所述特征提取模块用于在包含人员的矩形区域中进行外观特征提取;所述训练模块用于训练多个跨模态投影模型,每一个跨模态投影模型中包含两个投影函数,它们分别将不同摄像机中的图像持征映射到共同的特征空间中并完成相似度计算;所述再识别模块用于识别数据库中是否含有与查询人员一致的行人图像并确认查询人员身份;所述评价模块用于对系统性能进行评估。Referring to Fig. 1, a kind of video surveillance networking system of the present embodiment can identify people through voice and image, including acquisition system 1, voice recognition system 2 and image recognition system 3, and the acquisition system 1 is paired with Voice and image are collected, and described voice recognition system 2 comprises dictionary scene voice module, similarity comparison module and voice recognition engine module, and described image recognition system 3 comprises preprocessing module, feature extraction module, training module, re-identification module and An evaluation module; the preprocessing module is used to determine the position of the person in the pedestrian image, and obtains a rectangular area containing the person; the feature extraction module is used to extract the appearance feature in the rectangular area containing the person; the training module is used for Training multiple cross-modal projection models, each cross-modal projection model contains two projection functions, which respectively map images in different cameras into a common feature space and complete similarity calculations; the re-identification The module is used to identify whether there is a pedestrian image consistent with the queryer in the database and to confirm the identity of the queryer; the evaluation module is used to evaluate the performance of the system.

优选地,词典场景语音模块,适于对用户词汇表中的词典、场景语音依次进行采集,并将采集的特征矢量作为模版进行保存;Preferably, the dictionary scene speech module is adapted to sequentially collect the dictionary and scene speech in the user vocabulary, and save the collected feature vector as a template;

相似度比较模块,适于将语音输入语音信号的特征矢量依次与所述词典场景语音模块中保存的每个特征矢量模版进行相似度比较,将相似度最高者作为语音识别结果输出。The similarity comparison module is adapted to compare the feature vectors of the speech input speech signal with each feature vector template stored in the dictionary scene speech module in turn, and output the one with the highest similarity as the speech recognition result.

本有选实施例实现了对人员的有效识别。This alternative embodiment achieves effective identification of persons.

优选地,所述词典场景语音模块中的模版包括监控系统术语模版和人体语音加词典模版。Preferably, the templates in the dictionary scene speech module include monitoring system term templates and human voice plus dictionary templates.

本有选实施例加快了识别速度。This alternative embodiment speeds up the recognition speed.

优选地,所述预处理模块包括图像融合单元,所述图像融合单元用于对不同来源的图像进行融合处理,以便更好地获取图像的全面特征,包括:对需要融合的两幅源图像分别用双正交小波变换进行小波分解,确定分解后图像的小波系数;对低频系数按设定的比例选取分解后图像的小波系数,构成融合图像的小波低频系数矩阵;对高频系数采用纹理一致性测度分析特定区域不同高低频系数的边缘特性,计算图像区域的纹理一致性测度,并按照预定的规则确定融合图像的高频小波系数矩阵,所述图像区域的纹理一致性测度的计算公式定义为:Preferably, the preprocessing module includes an image fusion unit, and the image fusion unit is used to fuse images from different sources so as to better obtain comprehensive features of the images, including: separate the two source images that need to be fused Use biorthogonal wavelet transform for wavelet decomposition to determine the wavelet coefficients of the decomposed image; select the wavelet coefficients of the decomposed image according to the set ratio for the low-frequency coefficients to form the wavelet low-frequency coefficient matrix of the fused image; use consistent texture for the high-frequency coefficients The property measure analyzes the edge characteristics of different high and low frequency coefficients in a specific area, calculates the texture consistency measure of the image area, and determines the high-frequency wavelet coefficient matrix of the fused image according to predetermined rules, and the calculation formula of the texture consistency measure of the image area is defined for:

EE. Ff (( xx )) == 33 88 (( EFEF ll ++ EFEF cc )) ++ 11 44 EFEF dd

式中,EF(x)表示图像区域x的纹理一致性测度,EFl表示图像区域x的各高频分量图像在水平方向上的纹理一致性测度,EFc表示图像区域x的各高频分量图像在垂直方向上的纹理一致性测度,EFd表示图像区域x的各高频分量图像在对角线方向上的纹理一致性测度;将所述融合图像的小波低频系数矩阵、所述融合图像的高频小波系数矩阵进行离散双正交小波逆变换,最终获得融合图像。In the formula, EF(x) represents the texture consistency measure of the image region x, EF l represents the texture consistency measure of each high-frequency component image in the image region x in the horizontal direction, and EF c represents each high-frequency component of the image region x The texture consistency measure of the image in the vertical direction, EF d represents the texture consistency measure of each high-frequency component image of the image area x in the diagonal direction; the wavelet low-frequency coefficient matrix of the fusion image, the fusion image The high-frequency wavelet coefficient matrix is subjected to discrete biorthogonal wavelet inverse transform, and finally the fused image is obtained.

本优选实施例设置图像融合单元,按照纹理一致性测度可较好地分辨出图像的伪边缘,在保证整体视觉效果的同时使细节信息更加丰富和真实;定义了图像区域的纹理一致性测度的计算公式,加快了图像融合的速度。This preferred embodiment sets the image fusion unit, which can better distinguish the false edges of the image according to the texture consistency measure, and makes the detail information more abundant and real while ensuring the overall visual effect; defines the texture consistency measure of the image area The calculation formula speeds up the speed of image fusion.

优选地,所述预定的规则包括:Preferably, the predetermined rules include:

(1)若图像区域中有88%以上像素值具有较大的纹理一致性测度,定义该图像区域为边缘区,选取相应的边缘纹理一致性测度最大的高频图像小波系数构成所述融合图像的高频小波系数矩阵;(1) If more than 88% of the pixel values in the image area have a larger texture consistency measure, define the image area as an edge area, and select the wavelet coefficient of the high-frequency image with the largest corresponding edge texture consistency measure to form the fusion image The high-frequency wavelet coefficient matrix of ;

(2)若图像区域中有88%以上像素值具有较小的纹理一致性测度,定义该图像区域为平滑区,分别计算两幅源图像在该图像区域的能量及匹配度,根据能量及匹配度确定两幅源图像的小波系数在融合图像小波系数中所占的比重,根据下式确定所述融合图像的高频小波系数矩阵:(2) If more than 88% of the pixel values in the image area have a small texture consistency measure, define the image area as a smooth area, and calculate the energy and matching degree of the two source images in the image area respectively. According to the energy and matching Determine the proportion of the wavelet coefficients of the two source images in the wavelet coefficients of the fusion image, and determine the high-frequency wavelet coefficient matrix of the fusion image according to the following formula:

RG=βARABRB R GA R AB R B

式中,RG表示融合图像的高频小波系数矩阵,RA、βA分别表示一副源图像的小波系数、该小波系数在融合图像小波系数中所占的比重,RB、βB分别表示另一副源图像的小波系数、该小波系数在融合图像小波系数中所占的比重,其中βAB=1。In the formula, R G represents the high-frequency wavelet coefficient matrix of the fused image, RA and β A represent the wavelet coefficient of a source image and the proportion of the wavelet coefficient in the wavelet coefficient of the fused image, and R B and β B respectively Indicates the wavelet coefficient of another secondary source image and the proportion of the wavelet coefficient in the wavelet coefficient of the fused image, where β A + β B =1.

本优选实施例按照预定的规则确定融合图像的高频小波系数矩阵,提高了融合的效果以及融合的速度。This preferred embodiment determines the high-frequency wavelet coefficient matrix of the fused image according to a predetermined rule, which improves the fusion effect and speed.

优选地,所述在包含人员的矩形区域中进行外观特征提取,包括:Preferably, the extraction of appearance features in a rectangular area containing people includes:

(1)进行图像的光照归一化处理,具体包括:a、设图像为I,利用LOG对数将图像I转换到对数域,利用差分高斯滤波器对图像I进行平滑处理;b、对图像I进行全局对比度均衡化处理;(1) Carry out the illumination normalization processing of image, specifically comprise: a, set image as I, utilize LOG logarithm to convert image I to logarithmic domain, utilize differential Gaussian filter to carry out smoothing process to image I; b, to image I Image I performs global contrast equalization processing;

(2)进行图像尺寸归一化处理;(2) Carry out image size normalization processing;

(3)进行图像分块,针对每个图像块,进行特征向量提取;(3) Carry out image segmentation, for each image block, carry out feature vector extraction;

(4)将所有图像块的特征向量进行串联,然后对串联后的图像进行PCA特征降维。(4) Concatenate the feature vectors of all image blocks, and then perform PCA feature dimensionality reduction on the concatenated images.

本优选实施例设置特征提取模块,在提取特征前先对图像进行光照归一化处理,减少了因光照变化而产生的图像扭曲,使特征的提取更为精确。In this preferred embodiment, a feature extraction module is provided to perform illumination normalization processing on the image before feature extraction, which reduces image distortion caused by illumination changes and makes feature extraction more accurate.

优选地,所述训练模块包括样本分类单元和跨模态投影模型学习单元;所述样本分类单元具体执行:Preferably, the training module includes a sample classification unit and a cross-modal projection model learning unit; the sample classification unit specifically performs:

设两个摄像机C1和C2对应的特征空间分别为d1和d2分别表示两个摄像机特征空间的维度,假定训练数据集合为K对跨摄像机图像特征sk=s(xk,yk)∈{-1,+1}表示样本对的类别标签,-1表示异类,+1表示同类,根据类别标签将训练集合分为负样本集合和正样本集合|D1|+|D2|=K;Let the feature spaces corresponding to two cameras C 1 and C 2 be and d 1 and d 2 represent the dimensions of the feature space of the two cameras respectively, assuming that the training data set is K pairs of cross-camera image features s k =s(x k ,y k )∈{-1,+1} indicates the category label of the sample pair, -1 indicates heterogeneity, +1 indicates the same category, and the training set is divided into negative sample sets according to the category label and a set of positive samples |D 1 |+|D 2 |=K;

所述跨模态投影模型学习单元具体执行:The cross-modal projection model learning unit specifically executes:

设跨模态投影模型集合H=[h1h2,…,hL],L个子模型用于处理L种数据差异,每一个子模型由一对投影函数构成,hl=[pXl(x),pYl(y)],略去脚标l,投影函数pX(x)和pY(y)将x∈X和y∈Y投影到共同的特征空间: Suppose the set of cross-modal projection models H=[h 1 h 2 ,…,h L ], L sub-models are used to deal with L kinds of data differences, each sub-model is composed of a pair of projection functions, h l =[p Xl ( x), p Yl (y)], omitting the subscript l, the projection functions p X (x) and p Y (y) project x∈X and y∈Y into a common feature space:

式中,表示投影向量,a、b∈R为线性偏差,px(x)和pY(y)将原始特征投影到{-1,+1}空间中;In the formula, Represents the projection vector, a, b∈R are linear deviations, p x (x) and p Y (y) project the original features into {-1,+1} space;

同时存在投影函数qX(x)和qY(y)将x∈X和y∈Y投影到另一共同的特征空间:There are also projection functions q X (x) and q Y (y) that project x∈X and y∈Y to another common feature space:

qq Xx (( xx )) == uu TT xx ++ aa qq YY (( ythe y )) == vv TT ythe y ++ bb

建立数据类别和共同特征空间之间的关系,定义目标函数:Establish the relationship between the data categories and the common feature space, and define the objective function:

式中,E表示期望,表示同类样本对和异类样本对的重要性权衡指数;In the formula, E represents expectation, Indicates the importance trade-off index of similar sample pairs and heterogeneous sample pairs;

式中,wk表示样本对{xk,yk}在本次子模型学习中的样本权重,sk=s(xk,yk)∈{-1,+1}表示样本对的类别标签,In the formula, w k represents the sample weight of the sample pair {x k , y k } in this sub-model learning, s k =s(x k ,y k )∈{-1,+1} represents the category label of the sample pair,

通过最小化目标函数来学习参数{u,v,a,b},得到相应的投影函数。The parameters {u,v,a,b} are learned by minimizing the objective function to obtain the corresponding projection function.

本优选实施例采用多个跨模态投影模型,可充分应对各种不同的数据分布差异。This preferred embodiment adopts multiple cross-modal projection models, which can fully cope with various data distribution differences.

优选地,所述识别数据库中是否含有与查询人员一致的行人图像并确认查询人员身份,包括:Preferably, whether the identification database contains pedestrian images consistent with the inquiring person and confirming the identity of the inquiring person, including:

假设被查询人员集合为{fi,STA(fi)},i=1,2,…,N,fi表示第i个被查询人员,STA(fi)表示第个被查询人员的身份,对于查询人员集合{gj,STA(gj),j=1,2,…,M:Suppose the set of queried persons is {f i , STA(f i )}, i=1, 2,...,N, f i represents the i-th queried person, STA(f i ) represents the identity of the th queried person , for the set of query personnel {g j , STA(g j ), j=1,2,...,M:

STA(gj)=STA(f)STA(g j )=STA(f)

ff == argmaxargmax ii ZZ (( gg jj ,, ff ii ))

gj和fi的相似度Z(gj,fi)表示为:The similarity Z(g j , f i ) between g j and f i is expressed as:

Z(gj,fi)=sign(uTgj+a)·sign(vTfi+b)+||(uTgj+a)-(vTfi+b)||Z(g j ,f i )=sign(u T g j +a)·sign(v T f i +b)+||(u T g j +a)-(v T f i +b)||

设定阔值T,T∈[1,2],若Z(gj,fi)<T,则被查询人员中不存在与查询人员一致的图像;Set the threshold T, T∈[1, 2], if Z(g j , f i )<T, there is no image consistent with the query person among the queried persons;

若Z(gj,fi)≥T,将被查询人员按照相似度从大到小排序,排在最前面的与查询人员具有相同的身份。If Z(g j , f i )≥T, the inquired persons are sorted in descending order of similarity, and the ones at the top have the same identity as the inquiring persons.

本优选实施例提高了视频监控联网系统人员的识别精度和效率。This preferred embodiment improves the identification accuracy and efficiency of personnel in the video surveillance networking system.

优选地,所述对图像识别系统性能进行评估,包括:Preferably, said evaluating the performance of the image recognition system includes:

定义评价函数:Define the evaluation function:

Ff (( nno )) == &Sigma;&Sigma; nno == 11 NN SS nno NN 22

式中,N表示查询次数,Sn表示前n位中可以找到正确结果的次数,评价函数值越大,则系统的再识别性能越好。In the formula, N represents the number of queries, and S n represents the number of times the correct result can be found in the first n bits. The larger the value of the evaluation function, the better the re-identification performance of the system.

本优选实施例设置评价模块,有利于对视频监控联网系统进行改进。The evaluation module is set in this preferred embodiment, which is beneficial to improve the video surveillance networking system.

本发明视频监控联网系统的一组识别结果如下表所示:A group of recognition results of the video surveillance networking system of the present invention are shown in the following table:

NN 人员识别平均用时Average time for person identification 人员识别准确率Person recognition accuracy 66 0.14s0.14s 95.5%95.5% 1212 0.12s0.12s 95.3%95.3% 1818 0.16s0.16s 95.7%95.7%

最后应当说明的是,以上实施例仅用以说明本发明的技术方案,而非对本发明保护范围的限制,尽管参照较佳实施例对本发明作了详细地说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的实质和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting the protection scope of the present invention, although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand , the technical solution of the present invention may be modified or equivalently replaced without departing from the spirit and scope of the technical solution of the present invention.

Claims (3)

1.一种视频监控联网系统,其特征是,能够通过语音和图像两种方式对人员进行识别,包括采集系统、语音识别系统和与图像识别系统,所述采集系统对语音和图像进行采集,所述语音识别系统包括词典场景语音模块、相似度比较模块和语音识别引擎模块,所述图像识别系统包括预处理模块、特征提取模块、训练模块、再识别模块和评价模块;所述预处理模块用于确定行人图像中的人员位置,获取包含人员的矩形区域;所述特征提取模块,用于在包含人员的矩形区域中进行外观特征提取;所述训练模块用于训练多个跨模态投影模型,每一个跨模态投影模型中包含两个投影函数,它们分别将不同摄像机中的图像持征映射到共同的特征空间中并完成相似度计算;所述再识别模块,用于识别数据库中是否含有与查询人员一致的行人图像并确认查询人员身份;所述评价模块用于对系统性能进行评估。1. A video surveillance networked system, characterized in that it can identify personnel by two modes of voice and image, including a collection system, a voice recognition system and an image recognition system, and the collection system collects voice and image, The speech recognition system includes a dictionary scene speech module, a similarity comparison module and a speech recognition engine module, and the image recognition system includes a preprocessing module, a feature extraction module, a training module, a re-identification module and an evaluation module; the preprocessing module It is used to determine the position of the person in the pedestrian image, and obtain the rectangular area containing the person; the feature extraction module is used to extract the appearance feature in the rectangular area containing the person; the training module is used to train multiple cross-modal projections model, each cross-modal projection model contains two projection functions, which respectively map images in different cameras to a common feature space and complete similarity calculations; the re-identification module is used to identify Whether it contains a pedestrian image consistent with the queryer and confirm the identity of the queryer; the evaluation module is used to evaluate the system performance. 2.根据权利要求1所述的一种视频监控联网系统,其特征是,词典场景语音模块,适于对用户词汇表中的词典、场景语音依次进行采集,并将采集的特征矢量作为模版进行保存;2. A kind of video surveillance networking system according to claim 1, is characterized in that, the dictionary scene voice module is suitable for collecting the dictionary in the user vocabulary, the scene voice successively, and carries out the characteristic vector of collection as template save; 相似度比较模块,适于将语音输入语音信号的特征矢量依次与所述词典场景语音模块中保存的每个特征矢量模版进行相似度比较,将相似度最高者作为语音识别结果输出。The similarity comparison module is adapted to compare the feature vectors of the speech input speech signal with each feature vector template stored in the dictionary scene speech module in turn, and output the one with the highest similarity as the speech recognition result. 3.根据权利要求2所述的一种视频监控联网系统,其特征是,所述词典场景语音模块中的模版包括监控系统术语模版和人体语音加词典模版。3. A video monitoring networking system according to claim 2, wherein the templates in the dictionary scene speech module include a monitoring system term template and a human voice plus dictionary template.
CN201611063348.0A 2016-11-24 2016-11-24 Video monitoring internetworking system Pending CN106355171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611063348.0A CN106355171A (en) 2016-11-24 2016-11-24 Video monitoring internetworking system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611063348.0A CN106355171A (en) 2016-11-24 2016-11-24 Video monitoring internetworking system

Publications (1)

Publication Number Publication Date
CN106355171A true CN106355171A (en) 2017-01-25

Family

ID=57863012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611063348.0A Pending CN106355171A (en) 2016-11-24 2016-11-24 Video monitoring internetworking system

Country Status (1)

Country Link
CN (1) CN106355171A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919954A (en) * 2017-03-02 2017-07-04 深圳明创自控技术有限公司 A kind of cloud computing system for commodity classification
CN108090473A (en) * 2018-01-12 2018-05-29 北京陌上花科技有限公司 The method and device of polyphaser human face identification
CN108345866A (en) * 2018-03-08 2018-07-31 天津师范大学 A kind of pedestrian's recognition methods again based on depth characteristic study
CN108924483A (en) * 2018-06-27 2018-11-30 南京朴厚生态科技有限公司 A kind of automatic monitoring system and method for the field animal based on depth learning technology
CN111292764A (en) * 2018-11-20 2020-06-16 新唐科技股份有限公司 Identification system and identification method
CN111507774A (en) * 2020-04-28 2020-08-07 上海依图网络科技有限公司 A data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346547A (en) * 2013-07-26 2015-02-11 宁夏新航信息科技有限公司 Intelligent identity identification system
CN104834849A (en) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN105228033A (en) * 2015-08-27 2016-01-06 联想(北京)有限公司 A kind of method for processing video frequency and electronic equipment
CN105426723A (en) * 2015-11-20 2016-03-23 北京得意音通技术有限责任公司 Voiceprint identification, face identification and synchronous in-vivo detection-based identity authentication method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346547A (en) * 2013-07-26 2015-02-11 宁夏新航信息科技有限公司 Intelligent identity identification system
CN104834849A (en) * 2015-04-14 2015-08-12 时代亿宝(北京)科技有限公司 Dual-factor identity authentication method and system based on voiceprint recognition and face recognition
CN105228033A (en) * 2015-08-27 2016-01-06 联想(北京)有限公司 A kind of method for processing video frequency and electronic equipment
CN105426723A (en) * 2015-11-20 2016-03-23 北京得意音通技术有限责任公司 Voiceprint identification, face identification and synchronous in-vivo detection-based identity authentication method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘凯: ""无交叠多摄像机网络中的人员再辨识"", 《中国博士学位论文全文数据库 信息科技辑》 *
张德祥等: ""基于小波变换纹理一致性测度的遥感图像融合算法"", 《仪器仪表学报》 *
许百林: ""基于矢量两户(VQ)和混合高斯模型(GMM)的说话人识别的研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919954A (en) * 2017-03-02 2017-07-04 深圳明创自控技术有限公司 A kind of cloud computing system for commodity classification
CN108090473A (en) * 2018-01-12 2018-05-29 北京陌上花科技有限公司 The method and device of polyphaser human face identification
CN108345866A (en) * 2018-03-08 2018-07-31 天津师范大学 A kind of pedestrian's recognition methods again based on depth characteristic study
CN108345866B (en) * 2018-03-08 2021-08-24 天津师范大学 A Pedestrian Re-identification Method Based on Deep Feature Learning
CN108924483A (en) * 2018-06-27 2018-11-30 南京朴厚生态科技有限公司 A kind of automatic monitoring system and method for the field animal based on depth learning technology
CN111292764A (en) * 2018-11-20 2020-06-16 新唐科技股份有限公司 Identification system and identification method
CN111292764B (en) * 2018-11-20 2023-12-29 新唐科技股份有限公司 Identification system and identification method
CN111507774A (en) * 2020-04-28 2020-08-07 上海依图网络科技有限公司 A data processing method and device

Similar Documents

Publication Publication Date Title
US11263435B2 (en) Method for recognizing face from monitoring video data
CN106355171A (en) Video monitoring internetworking system
CN104240256B (en) A kind of image significance detection method based on the sparse modeling of stratification
CN103268497B (en) A kind of human face posture detection method and the application in recognition of face
CN105976809A (en) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN114170411B (en) A method for image emotion recognition by integrating multi-scale information
CN106909946A (en) A kind of picking system of multi-modal fusion
CN105389593A (en) Image object recognition method based on SURF
CN104866829A (en) Cross-age face verify method based on characteristic learning
CN104636755A (en) Face beauty evaluation method based on deep learning
CN105760472A (en) Video retrieval method and system
CN107273783A (en) Face identification system and its method
CN109543546B (en) Gait age estimation method based on depth sequence distribution regression
CN118747916A (en) A small sample action recognition method based on semantically guided multimodal fusion
CN112733665A (en) Face recognition method and system based on lightweight network structure design
CN105023025B (en) An open-set trace image classification method and system
CN115984968A (en) Student time-space action recognition method and device, terminal equipment and medium
CN101520839B (en) Human body detection method based on second-generation strip wave conversion
CN108009512A (en) A kind of recognition methods again of the personage based on convolutional neural networks feature learning
CN106295478A (en) A kind of image characteristic extracting method and device
CN104966075A (en) Face recognition method and system based on two-dimensional discriminant features
CN110874576A (en) Pedestrian re-identification method based on canonical correlation analysis fusion features
CN111582195B (en) A method for constructing a Chinese lip language monosyllable recognition classifier
CN106339665A (en) Fast face detection method
CN110458064B (en) Combining data-driven and knowledge-driven low-altitude target detection and recognition methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170125