CN107783990A - A kind of data compression method and terminal - Google Patents
A kind of data compression method and terminal Download PDFInfo
- Publication number
- CN107783990A CN107783990A CN201610729693.7A CN201610729693A CN107783990A CN 107783990 A CN107783990 A CN 107783990A CN 201610729693 A CN201610729693 A CN 201610729693A CN 107783990 A CN107783990 A CN 107783990A
- Authority
- CN
- China
- Prior art keywords
- data block
- compressed
- data
- value
- terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明实施例公开了一种数据压缩方法及终端,该方法包括:终端通过第一计算策略计算待压缩数据块的第一特征值;所述终端判断第一查找库中是否存在第一参考值;若存在所述第一参考值,则所述终端通过相似压缩技术以所述第一参考值对应的数据块为参考数据块对所述待压缩数据块压缩;若不存在所述第一参考值,则所述终端通过第二计算策略计算所述待压缩数据块的第二特征值;所述终端判断第二查找库中是否存在第二参考值,所述第二参考值为与所述第二特征值相同的特征值;若存在所述第二参考值,则所述终端通过相似压缩技术以所述第二参考值对应的数据块为参考数据块对所述待压缩数据块压缩。采用本发明,能够提高压缩率。
The embodiment of the present invention discloses a data compression method and a terminal, the method comprising: the terminal calculates the first characteristic value of the data block to be compressed through the first calculation strategy; the terminal judges whether the first reference value exists in the first search library ; If the first reference value exists, the terminal uses a similar compression technique to use the data block corresponding to the first reference value as a reference data block to compress the data block to be compressed; if the first reference value does not exist value, then the terminal calculates the second eigenvalue of the data block to be compressed through the second calculation strategy; the terminal judges whether there is a second reference value in the second search library, and the second reference value is the same as the An eigenvalue with the same second eigenvalue; if the second reference value exists, the terminal compresses the data block to be compressed using a data block corresponding to the second reference value as a reference data block through a similar compression technique. According to the present invention, the compression ratio can be improved.
Description
技术领域technical field
本发明涉及计算机技术领域,尤其涉及一种数据压缩方法及终端。The invention relates to the field of computer technology, in particular to a data compression method and a terminal.
背景技术Background technique
数据存储是计算机系统中不可或缺的部分,磁盘、磁带、闪存、非易失性内存、云存储等都可以用来存储数据。全世界每天都会产生巨量的数据,如果对这些数据进行未经处理的原始保存会占用较大的存储空间,成本开销很高;为了有效地保存这些数据,通常会采用数据缩减技术对这些数据进行压缩,常用的数据缩减技术包括重复数据删除、通用无损压缩、相似压缩等技术。Data storage is an integral part of a computer system, and disks, tapes, flash memory, non-volatile memory, cloud storage, etc. can all be used to store data. Huge amounts of data are generated every day all over the world. If these data are stored unprocessed, they will take up a large storage space and the cost will be high. In order to effectively save these data, data reduction techniques are usually used to reduce these data Commonly used data reduction techniques include data deduplication, general lossless compression, and similar compression.
以备份场景为例,连续的两个备份文件之间通常会存在大量的相同数据,可以通过重删技术有效的缩减备份文件的实际存盘数量,即后一个备份文件只保存与前一个备份文件“不同的数据”,从而起到了节省存储成本和降低网络传输的数据量的效果。然而,数据库应用中大量的操作是修改查询,大部分需要下盘的“不同的数据”都是由修改产生。这些修改产生的“不同的数据”与修改前备份的数据之间存在“相似性”,可以通过Delta压缩技术对存在相似性的数据做进一步的压缩,Delta压缩技术的压缩原理如下:Taking the backup scenario as an example, there is usually a large amount of identical data between two consecutive backup files. The deduplication technology can be used to effectively reduce the actual number of backup files saved, that is, the latter backup file only saves the same data as the previous backup file" different data", which has the effect of saving storage costs and reducing the amount of data transmitted by the network. However, a large number of operations in database applications are to modify queries, and most of the "different data" that needs to be downloaded are generated by modification. There is a "similarity" between the "different data" generated by these modifications and the data backed up before the modification. Delta compression technology can be used to further compress the similar data. The compression principle of Delta compression technology is as follows:
首先为待压缩的数据选择相似数据,然后使用Delta压缩技术参照该相似数据对该待压缩的数据压缩,相似度越高压缩效果越好;现有技术的缺陷在于,该待压缩的数据由多个数据块组成,各个数据块之间的差异性较大,如果通过较高相似度标准为这些数据块选择参考数据块,那么某些数据块可能找不到参考数据块,导致该数据块无法基于Delta压缩技术压缩,如果通过较低相似度标准为这些数据选择参考数据块,那么为某些数据块筛选的参考数据块与该数据块的相似度角度较低,导致该数据块的压缩效果不好。First select similar data for the data to be compressed, then use Delta compression technology to compress the data to be compressed with reference to the similar data, the higher the similarity, the better the compression effect; the defect of the prior art is that the data to be compressed consists of many If the reference data block is selected for these data blocks through a higher similarity standard, some data blocks may not be able to find the reference data block, resulting in the data block not being able to find the reference data block. Based on the Delta compression technology compression, if the reference data block is selected for these data through a lower similarity standard, then the reference data block screened for some data blocks has a lower similarity angle with the data block, resulting in the compression effect of the data block not good.
发明内容Contents of the invention
本发明实施例公开了一种数据压缩方法及终端,能够提高压缩率。The embodiment of the invention discloses a data compression method and a terminal, which can improve the compression rate.
第一方面,本发明实施例提供了一种数据压缩方法,该方法包括:终端通过第一计算策略计算待压缩数据块的第一特征值;所述终端判断第一查找库中是否存在第一参考值,所述第一参考值为与所述第一特征值相同的特征值,所述第一查找库包含N个特征值且每个特征值为基于所述第一计算策略对所述特征值对应的数据块计算得到,所述第一查找库中的所述N个特征值一一对应N个数据块,N大于等于1;若存在所述第一参考值,则所述终端通过相似压缩技术以所述第一参考值对应的数据块为参考数据块对所述待压缩数据块压缩;若不存在所述第一参考值,则所述终端通过第二计算策略计算所述待压缩数据块的第二特征值,两个数据块的相似度高于第一相似阈值时,通过所述第一计算策略计算出的所述两个数据块的特征值相同;所述两个数据块的相似度高于第二相似阈值时,通过所述第二计算策略计算出的所述两个数据块的特征值相同,所述第一相似阈值高于所述第二相似阈值;所述终端判断第二查找库中是否存在第二参考值,所述第二参考值为与所述第二特征值相同的特征值,所述第二查找库包含N个特征值且每个特征值为基于所述第二计算策略对所述特征值对应的数据块计算得到,所述第二查找库中的所述N个特征值一一对应所述N个数据块;若存在所述第二参考值,则所述终端通过相似压缩技术以所述第二参考值对应的数据块为参考数据块对所述待压缩数据块压缩。In the first aspect, the embodiment of the present invention provides a data compression method, the method includes: the terminal calculates the first characteristic value of the data block to be compressed through the first calculation strategy; the terminal judges whether there is the first A reference value, the first reference value is the same feature value as the first feature value, the first search library contains N feature values and each feature value is based on the first calculation strategy for the feature The data block corresponding to the value is calculated, and the N feature values in the first search library correspond to N data blocks one by one, and N is greater than or equal to 1; if the first reference value exists, the terminal uses similar The compression technology uses the data block corresponding to the first reference value as a reference data block to compress the data block to be compressed; if the first reference value does not exist, the terminal calculates the data block to be compressed by using a second calculation strategy The second eigenvalue of the data block, when the similarity of the two data blocks is higher than the first similarity threshold, the eigenvalues of the two data blocks calculated by the first calculation strategy are the same; the two data blocks When the similarity of is higher than the second similarity threshold, the feature values of the two data blocks calculated by the second calculation strategy are the same, and the first similarity threshold is higher than the second similarity threshold; the terminal Judging whether there is a second reference value in the second search library, the second reference value is the same eigenvalue as the second eigenvalue, the second search library contains N eigenvalues and each eigenvalue is based on The second calculation strategy calculates the data block corresponding to the feature value, and the N feature values in the second search library correspond to the N data blocks one by one; if there is the second reference value , then the terminal compresses the to-be-compressed data block by using a similar compression technique using the data block corresponding to the second reference value as a reference data block.
通过执行上述步骤,终端以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。By performing the above steps, the terminal compresses the data in units of data blocks, and first judges whether there is a reference data block with a high similarity to the data block to be compressed when compressing, and if so, refers to the reference data block with a high similarity Compress the data block to be compressed, if it does not exist, judge whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, compress the data block with reference to the reference data block with a lower similarity That is to say, the embodiment of the present invention selects the reference data block from high to low through multi-level similarity standards, which generally improves the compression rate of data compression and saves storage space.
结合第一方面,在第一方面的第一种可能的实现方式中,所述终端通过第一计算策略计算预设的待压缩数据块的第一特征值之前,所述方法还包括:所述终端从待压缩数据块中划分出M个数据单元,所述M个数据单元中每个数据单元对应有各自的初始参考值,M大于等于1;所述终端通过第一计算策略计算预设的待压缩数据块的第一特征值,包括:将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算所述待压缩数据块的第一特征值,P大于等于2;所述终端通过第二计算策略计算所述待压缩数据块的第二特征值,包括:将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的Q个过滤函数中计算所述待压缩数据块的第二特征值,所述P个过滤函数包括所述Q个过滤函数。With reference to the first aspect, in a first possible implementation manner of the first aspect, before the terminal calculates the preset first eigenvalue of the data block to be compressed through the first calculation strategy, the method further includes: the The terminal divides M data units from the data block to be compressed, each of the M data units corresponds to its own initial reference value, and M is greater than or equal to 1; the terminal calculates the preset The first characteristic value of the data block to be compressed includes: substituting initial reference values of at least two data units in the M data units into preset P filter functions to calculate the first characteristic value of the data block to be compressed value, P is greater than or equal to 2; the terminal calculates the second characteristic value of the data block to be compressed through the second calculation strategy, including: substituting the initial reference values of at least two data units in the M data units into the preset The second eigenvalue of the data block to be compressed is calculated in the Q filter functions provided, and the P filter functions include the Q filter functions.
结合第一方面,或者第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,对所述待压缩数据块压缩之后,所述方法还包括:所述终端将所述第一特征值加入到所述第一查找库中,以及将所述第二特征值加入到所述第二查找库中,在所述第一查找库中所述第一特征值对应的数据块为所述待压缩数据块,在所述第二查找库中所述第二特征值对应的数据块为所述待压缩数据块。With reference to the first aspect, or the first possible implementation of the first aspect, in the second possible implementation of the first aspect, after compressing the data block to be compressed, the method further includes: the The terminal adds the first feature value to the first search library, and adds the second feature value to the second search library, and the first feature value in the first search library The corresponding data block is the data block to be compressed, and the data block corresponding to the second feature value in the second lookup library is the data block to be compressed.
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述方法还包括:所述终端将存在于所述第一查找库中的时间超过预设时间阈值的特征值从所述第一查找库中删除,以及将存在于所述第二查找库中的时间超过所述预设时间阈值的特征值从所述第二查找库中删除。With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the method further includes: the terminal will exist in the first search library for more than a predetermined time It is assumed that the feature values of the time threshold are deleted from the first search library, and the feature values existing in the second search library for a time exceeding the preset time threshold are deleted from the second search library.
结合第一方面,或者第一方面的第一种可能的实现方式,或者第一方面的第二种可能的实现方式,或者第一方面的第三种可能的实现方式,在第一方面的第四种可能的实现方式中,所述终端通过第一计算策略计算待压缩数据块的第一特征值之前,所述方法还包括:所述终端统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值,以及确定所述第二查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第一计算策略计算所述多个数据块的特征值;所述终端根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第一中标率;当所述第一中标率高于预设的第一中标阈值时,若存在待压缩数据块,则执行所述终端通过第一计算策略计算待压缩数据块的第一特征值的步骤。In combination with the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in the first aspect of the first In four possible implementation manners, before the terminal calculates the first eigenvalue of the data block to be compressed through the first calculation strategy, the method further includes: during the statistical history compression process of the terminal, the second calculation strategy is used to calculate the multiple A plurality of eigenvalues calculated by the data block, and determining the number of identical eigenvalues in the second lookup library and the plurality of eigenvalues; the plurality of eigenvalues were not calculated by the first calculation strategy during the historical compression process The feature value of the data block; the terminal calculates the first bid winning rate according to the number of feature values in the plurality of feature values and the number of the same feature value; when the first bid winning rate is higher than the preset first winning bid rate When the threshold is reached, if there is a data block to be compressed, execute the step of calculating the first characteristic value of the data block to be compressed by the terminal through a first calculation strategy.
结合第一方面,或者第一方面的第一种可能的实现方式,或者第一方面的第二种可能的实现方式,或者第一方面的第三种可能的实现方式,在第一方面的第五种可能的实现方式中,所述终端通过第二计算策略计算所述待压缩数据块的第二特征值之前,所述方法还包括:所述终端统计历史压缩过程中通过第一计算策略对多个数据块压缩得到的多个特征值,以及确定所述第一查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第二计算策略计算所述多个数据块的特征值;所述终端根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第二中标率;当所述第二中标率低于预设的第二中标阈值时,执行所述若不存在所述第一参考值,则所述终端通过第二计算策略计算所述待压缩数据块的第二特征值的步骤。In combination with the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in the first aspect of the first In five possible implementation manners, before the terminal calculates the second eigenvalue of the data block to be compressed by using the second calculation strategy, the method further includes: during the statistical history compression process of the terminal, use the first calculation strategy to A plurality of eigenvalues obtained by compressing a plurality of data blocks, and determining the number of identical eigenvalues in the first search library and the plurality of eigenvalues; the historical compression process did not calculate the second calculation strategy feature values of a plurality of data blocks; the terminal calculates a second bid winning rate according to the number of feature values in the plurality of feature values and the number of the same feature values; when the second winning rate is lower than the preset first When the bid winning threshold is selected, the step of calculating the second characteristic value of the data block to be compressed by the terminal through a second calculation strategy is performed if the first reference value does not exist.
第二方面,本发明实施例提供一种终端,该终端包括:第一计算单元,用于通过第一计算策略计算待压缩数据块的第一特征值;第一判断单元,用于判断第一查找库中是否存在第一参考值,所述第一参考值为与所述第一特征值相同的特征值,所述第一查找库包含N个特征值且每个特征值为基于所述第一计算策略对所述特征值对应的数据块计算得到,所述第一查找库中的所述N个特征值一一对应N个数据块,N大于等于1;第一压缩单元,用于在所述第一判断单元判断出存在所述第一参考值时,通过相似压缩技术以所述第一参考值对应的数据块为参考数据块对所述待压缩数据块压缩;第二计算单元,用于在所述第一判断单元判断出不存在所述第一参考值时,通过第二计算策略计算所述待压缩数据块的第二特征值,两个数据块的相似度高于第一相似阈值时,通过所述第一计算策略计算出的所述两个数据块的特征值相同;所述两个数据块的相似度高于第二相似阈值时,通过所述第二计算策略计算出的所述两个数据块的特征值相同,所述第一相似阈值高于所述第二相似阈值;第二判断单元,用于判断第二查找库中是否存在第二参考值,所述第二参考值为与所述第二特征值相同的特征值,所述第二查找库包含N个特征值且每个特征值为基于所述第二计算策略对所述特征值对应的数据块计算得到,所述第二查找库中的所述N个特征值一一对应所述N个数据块;第二压缩单元,用于在所述第二判断单元判断出存在所述第二参考值时,通过相似压缩技术以所述第二参考值对应的数据块为参考数据块对所述待压缩数据块压缩。In the second aspect, an embodiment of the present invention provides a terminal, which includes: a first calculation unit, configured to calculate a first characteristic value of a data block to be compressed through a first calculation strategy; a first judging unit, configured to judge the first Find whether there is a first reference value in the library, the first reference value is the same feature value as the first feature value, the first search library contains N feature values and each feature value is based on the first feature value A calculation strategy is calculated for the data block corresponding to the feature value, and the N feature values in the first search library correspond to N data blocks one by one, and N is greater than or equal to 1; the first compression unit is used for When the first judging unit judges that the first reference value exists, compress the data block to be compressed by using a similar compression technique using the data block corresponding to the first reference value as a reference data block; the second computing unit, When the first judging unit judges that the first reference value does not exist, the second calculation strategy is used to calculate the second characteristic value of the data block to be compressed, and the similarity between the two data blocks is higher than the first When the similarity threshold is reached, the feature values of the two data blocks calculated by the first calculation strategy are the same; when the similarity of the two data blocks is higher than the second similarity threshold, the feature values calculated by the second calculation strategy The feature values of the two data blocks obtained are the same, and the first similarity threshold is higher than the second similarity threshold; the second judging unit is used to judge whether there is a second reference value in the second search library, and the The second reference value is the same eigenvalue as the second eigenvalue, the second search library contains N eigenvalues, and each eigenvalue is a data block corresponding to the eigenvalue based on the second calculation strategy It is calculated that the N feature values in the second search library correspond to the N data blocks one by one; the second compression unit is used to determine the existence of the second reference value in the second judgment unit , the data block to be compressed is compressed by using a similar compression technique with the data block corresponding to the second reference value as a reference data block.
通过执行上述操作,终端以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。By performing the above operations, the terminal compresses the data in units of data blocks. When compressing, it first judges whether there is a reference data block with a high similarity to the data block to be compressed, and if it exists, it refers to the reference data block with a high similarity. Compress the data block to be compressed, if it does not exist, judge whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, compress the data block with reference to the reference data block with a lower similarity That is to say, the embodiment of the present invention selects the reference data block from high to low through multi-level similarity standards, which generally improves the compression rate of data compression and saves storage space.
结合第二方面,在第二方面的第一种可能的实现方式,所述终端还包括:划分单元,用于在所述第一计算单元通过第一计算策略计算预设的待压缩数据块的第一特征值之前,从待压缩数据块中划分出M个数据单元,所述M个数据单元中每个数据单元对应有各自的初始参考值,M大于等于1;所述第一计算单元具体用于将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算所述待压缩数据块的第一特征值,P大于等于2;所述第二计算单元具体用于将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的Q个过滤函数中计算所述待压缩数据块的第二特征值,所述P个过滤函数包括所述Q个过滤函数。With reference to the second aspect, in the first possible implementation manner of the second aspect, the terminal further includes: a dividing unit, configured to calculate, at the first calculation unit, the number of preset data blocks to be compressed by using a first calculation strategy Before the first eigenvalue, M data units are divided from the data block to be compressed, and each data unit in the M data units corresponds to its own initial reference value, and M is greater than or equal to 1; the first calculation unit specifically It is used to substitute the initial reference values of at least two data units in the M data units into preset P filter functions to calculate the first characteristic value of the data block to be compressed, where P is greater than or equal to 2; the second The second calculation unit is specifically configured to substitute the initial reference values of at least two data units in the M data units into preset Q filter functions to calculate the second eigenvalue of the data block to be compressed, the P The filter functions include the Q filter functions.
结合第二方面,或者第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述终端还包括:添加单元,用于将所述第一特征值加入到所述第一查找库中,以及将所述第二特征值加入到所述第二查找库中,在所述第一查找库中所述第一特征值对应的数据块为所述待压缩数据块,在所述第二查找库中所述第二特征值对应的数据块为所述待压缩数据块。With reference to the second aspect, or the first possible implementation manner of the second aspect, in the second possible implementation manner of the second aspect, the terminal further includes: an adding unit, configured to add the first feature value added to the first search library, and the second feature value is added to the second search library, the data block corresponding to the first feature value in the first search library is the to-be A data block is compressed, and the data block corresponding to the second characteristic value in the second lookup library is the data block to be compressed.
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,所述终端还包括:删除单元,用于将存在于所述第一查找库中的时间超过预设时间阈值的特征值从所述第一查找库中删除,以及将存在于所述第二查找库中的时间超过所述预设时间阈值的特征值从所述第二查找库中删除。With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the terminal further includes: a deletion unit configured to delete the time stored in the first search library The feature values exceeding the preset time threshold are deleted from the first lookup library, and the feature values existing in the second lookup library for a time exceeding the preset time threshold are deleted from the second lookup library .
结合第二方面,或者第二方面的第一种可能的实现方式,或者第二方面的第二种可能的实现方式,或者第二方面的第三种可能的实现方式,在第二方面的第四种可能的实现方式中,所述方法还包括:第一统计单元,用于在所述第一计算单元通过第一计算策略计算待压缩数据块的第一特征值之前,统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值,以及确定所述第二查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第一计算策略计算所述多个数据块的特征值;第三计算单元,用于根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第一中标率;当所述第一中标率高于预设的第一中标阈值时,若存在待压缩数据块,则触发所述第一计算单元通过第一计算策略计算待压缩数据块的第一特征值。In combination with the second aspect, or the first possible implementation of the second aspect, or the second possible implementation of the second aspect, or the third possible implementation of the second aspect, in the first possible implementation of the second aspect In four possible implementation manners, the method further includes: a first statistical unit, configured to make statistics on the historical compression process before the first calculation unit calculates the first characteristic value of the data block to be compressed through the first calculation strategy A plurality of eigenvalues calculated for a plurality of data blocks through the second calculation strategy, and determining the number of identical eigenvalues in the second search library and the plurality of eigenvalues; A calculation strategy to calculate the feature values of the plurality of data blocks; a third calculation unit, configured to calculate a first bid-winning rate according to the number of feature values in the plurality of feature values and the number of the same feature values; when the When the first successful bid rate is higher than the preset first successful bid threshold, if there is a data block to be compressed, the first calculation unit is triggered to calculate the first characteristic value of the data block to be compressed by using a first calculation strategy.
结合第二方面,或者第二方面的第一种可能的实现方式,或者第二方面的第二种可能的实现方式,或者第二方面的第三种可能的实现方式,在第二方面的第五种可能的实现方式中,所述终端还包括:第二统计单元,用于在所述第一计算单元通过第二计算策略计算所述待压缩数据块的第二特征值之前所述终端统计历史压缩过程中通过第一计算策略对多个数据块压缩得到的多个特征值,以及确定所述第一查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第二计算策略计算所述多个数据块的特征值;第四计算单元,用于根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第二中标率;当所述第二中标率低于预设的第二中标阈值时,触发所述第二计算单元在不存在所述第一参考值时,通过第二计算策略计算所述待压缩数据块的第二特征值。In combination with the second aspect, or the first possible implementation of the second aspect, or the second possible implementation of the second aspect, or the third possible implementation of the second aspect, in the first possible implementation of the second aspect In five possible implementation manners, the terminal further includes: a second statistical unit, configured for the terminal to count Compressing multiple eigenvalues obtained by compressing multiple data blocks through the first calculation strategy in the historical compression process, and determining the number of identical eigenvalues in the first lookup library and the multiple eigenvalues; Calculate the eigenvalues of the plurality of data blocks through the second calculation strategy; a fourth calculation unit, configured to calculate a second bid-winning rate according to the number of eigenvalues in the plurality of eigenvalues and the number of the same eigenvalues ; when the second bid winning rate is lower than the preset second bid winning threshold, triggering the second calculation unit to calculate the value of the data block to be compressed through a second calculation strategy when the first reference value does not exist the second eigenvalue.
第三方面,本发明实施例提供一种终端,所述终端包括处理器和存储器:所述存储器用于存储数据和程序;所述处理器调用所述存储器中的程序用于执行如下操作:通过第一计算策略计算待压缩数据块的第一特征值;判断第一查找库中是否存在第一参考值,所述第一参考值为与所述第一特征值相同的特征值,所述第一查找库包含N个特征值且每个特征值为基于所述第一计算策略对所述特征值对应的数据块计算得到,所述第一查找库中的所述N个特征值一一对应N个数据块,N大于等于1;若存在所述第一参考值,则通过相似压缩技术以所述第一参考值对应的数据块为参考数据块对所述待压缩数据块压缩;若不存在所述第一参考值,则通过第二计算策略计算所述待压缩数据块的第二特征值,两个数据块的相似度高于第一相似阈值时,通过所述第一计算策略计算出的所述两个数据块的特征值相同;所述两个数据块的相似度高于第二相似阈值时,通过所述第二计算策略计算出的所述两个数据块的特征值相同,所述第一相似阈值高于所述第二相似阈值;判断第二查找库中是否存在第二参考值,所述第二参考值为与所述第二特征值相同的特征值,所述第二查找库包含N个特征值且每个特征值为基于所述第二计算策略对所述特征值对应的数据块计算得到,所述第二查找库中的所述N个特征值一一对应所述N个数据块;若存在所述第二参考值,则通过相似压缩技术以所述第二参考值对应的数据块为参考数据块对所述待压缩数据块压缩。In a third aspect, an embodiment of the present invention provides a terminal, the terminal includes a processor and a memory: the memory is used to store data and programs; the processor invokes the program in the memory to perform the following operations: by The first calculation strategy calculates the first eigenvalue of the data block to be compressed; judges whether there is a first reference value in the first search library, the first reference value is the same eigenvalue as the first eigenvalue, and the first reference value is the same as the first eigenvalue, and the first reference value is the same as the first eigenvalue. A lookup library includes N eigenvalues, and each eigenvalue is calculated based on the first calculation strategy for the data block corresponding to the eigenvalues, and the N eigenvalues in the first lookup library are in one-to-one correspondence N data blocks, where N is greater than or equal to 1; if there is the first reference value, the data block corresponding to the first reference value is used as a reference data block to compress the data block to be compressed by a similar compression technique; if not If the first reference value exists, the second characteristic value of the data block to be compressed is calculated by the second calculation strategy, and when the similarity of the two data blocks is higher than the first similarity threshold, the second characteristic value is calculated by the first calculation strategy The eigenvalues of the two data blocks are the same; when the similarity of the two data blocks is higher than the second similarity threshold, the eigenvalues of the two data blocks calculated by the second calculation strategy are the same , the first similarity threshold is higher than the second similarity threshold; judging whether there is a second reference value in the second search library, the second reference value is the same feature value as the second feature value, the The second search library contains N eigenvalues, and each eigenvalue is calculated based on the second calculation strategy for the data block corresponding to the eigenvalue, and the N eigenvalues in the second search library are one by one Corresponding to the N data blocks; if the second reference value exists, compress the data block to be compressed by using the data block corresponding to the second reference value as a reference data block through a similar compression technique.
通过执行上述操作,终端以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。By performing the above operations, the terminal compresses the data in units of data blocks. When compressing, it first judges whether there is a reference data block with a high similarity to the data block to be compressed, and if it exists, it refers to the reference data block with a high similarity. Compress the data block to be compressed, if it does not exist, judge whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, compress the data block with reference to the reference data block with a lower similarity That is to say, the embodiment of the present invention selects the reference data block from high to low through multi-level similarity standards, which generally improves the compression rate of data compression and saves storage space.
结合第三方面,在第三方面的第一种可能的实现方式中,所述处理器通过第一计算策略计算预设的待压缩数据块的第一特征值之前,还用于:从待压缩数据块中划分出M个数据单元,所述M个数据单元中每个数据单元对应有各自的初始参考值,M大于等于1;所述处理器通过第一计算策略计算预设的待压缩数据块的第一特征值,具体为:将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算所述待压缩数据块的第一特征值,P大于等于2;所述处理器通过第二计算策略计算所述待压缩数据块的第二特征值,具体为:将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的Q个过滤函数中计算所述待压缩数据块的第二特征值,所述P个过滤函数包括所述Q个过滤函数。With reference to the third aspect, in a first possible implementation manner of the third aspect, before the processor calculates the preset first eigenvalue of the data block to be compressed through the first calculation strategy, it is further configured to: The data block is divided into M data units, each of the M data units corresponds to its own initial reference value, and M is greater than or equal to 1; the processor calculates the preset data to be compressed through the first calculation strategy The first eigenvalue of the block is specifically: substituting initial reference values of at least two data units among the M data units into preset P filter functions to calculate the first eigenvalue of the data block to be compressed, P is greater than or equal to 2; the processor calculates the second characteristic value of the data block to be compressed through a second calculation strategy, specifically: substituting the initial reference values of at least two data units in the M data units into the preset The second eigenvalue of the data block to be compressed is calculated in the Q filter functions provided, and the P filter functions include the Q filter functions.
结合第三方面,或者第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,对所述待压缩数据块压缩之后,所述处理器还用于:将所述第一特征值加入到所述第一查找库中,以及将所述第二特征值加入到所述第二查找库中,在所述第一查找库中所述第一特征值对应的数据块为所述待压缩数据块,在所述第二查找库中所述第二特征值对应的数据块为所述待压缩数据块。With reference to the third aspect, or the first possible implementation of the third aspect, in the second possible implementation of the third aspect, after compressing the data block to be compressed, the processor is further configured to: adding the first feature value to the first lookup library, and adding the second feature value to the second lookup library, in the first lookup library the first feature value corresponds to The data block is the data block to be compressed, and the data block corresponding to the second characteristic value in the second lookup library is the data block to be compressed.
结合第三方面的第二种可能的实现方式,在第三方面的第三种可能的实现方式中,所述处理器还用于:将存在于所述第一查找库中的时间超过预设时间阈值的特征值从所述第一查找库中删除,以及将存在于所述第二查找库中的时间超过所述预设时间阈值的特征值从所述第二查找库中删除。With reference to the second possible implementation manner of the third aspect, in a third possible implementation manner of the third aspect, the processor is further configured to: set the time that exists in the first search library to exceed a preset time The characteristic values of the time threshold are deleted from the first search library, and the characteristic values existing in the second search library for a time exceeding the preset time threshold are deleted from the second search library.
结合第三方面,或者第三方面的第一种可能的实现方式,或者第三方面的第二种可能的实现方式,或者第三方面的第三种可能的实现方式,在第三方面的第四种可能的实现方式中,所述处理器通过第一计算策略计算待压缩数据块的第一特征值之前,还用于:统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值,以及确定所述第二查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第一计算策略计算所述多个数据块的特征值;根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第一中标率;当所述第一中标率高于预设的第一中标阈值时,若存在待压缩数据块,则执行通过第一计算策略计算待压缩数据块的第一特征值的操作。In combination with the third aspect, or the first possible implementation of the third aspect, or the second possible implementation of the third aspect, or the third possible implementation of the third aspect, in the first possible implementation of the third aspect In four possible implementation manners, before the processor calculates the first eigenvalue of the data block to be compressed through the first calculation strategy, it is also used to: calculate the data obtained from multiple data blocks through the second calculation strategy during the statistical history compression process. multiple eigenvalues, and determine the number of identical eigenvalues in the second lookup library and the multiple eigenvalues; the features of the multiple data blocks have not been calculated by the first calculation strategy in the historical compression process value; according to the number of eigenvalues in the plurality of eigenvalues and the number of the same eigenvalues, calculate the first winning rate; when the first winning rate is higher than the preset first winning threshold, if there is a If the data block is a data block, the operation of calculating the first characteristic value of the data block to be compressed by using the first calculation strategy is performed.
结合第三方面,或者第三方面的第一种可能的实现方式,或者第三方面的第二种可能的实现方式,或者第三方面的第三种可能的实现方式,在第三方面的第五种可能的实现方式中,所述处理器通过第二计算策略计算所述待压缩数据块的第二特征值之前,还用于:统计历史压缩过程中通过第一计算策略对多个数据块压缩得到的多个特征值,以及确定所述第一查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第二计算策略计算所述多个数据块的特征值;根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第二中标率;当所述第二中标率低于预设的第二中标阈值时,执行所述若不存在所述第一参考值,则通过第二计算策略计算所述待压缩数据块的第二特征值的操作。In combination with the third aspect, or the first possible implementation of the third aspect, or the second possible implementation of the third aspect, or the third possible implementation of the third aspect, in the first possible implementation of the third aspect In five possible implementation manners, before the processor calculates the second eigenvalue of the data block to be compressed through the second calculation strategy, it is also used to: use the first calculation strategy to perform statistics on multiple data blocks during historical compression. Compressing the multiple eigenvalues obtained, and determining the number of identical eigenvalues in the first lookup library and the multiple eigenvalues; the multiple data blocks were not calculated by the second calculation strategy during the historical compression process eigenvalue; according to the number of eigenvalues in the plurality of eigenvalues and the number of the same eigenvalues, calculate the second winning rate; when the second winning rate is lower than the preset second winning threshold, execute the The operation of calculating the second characteristic value of the data block to be compressed by using the second calculation strategy if the first reference value does not exist is described.
通过实施本发明实施例,终端以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。By implementing the embodiment of the present invention, the terminal compresses the data in units of data blocks, and first judges whether there is a reference data block with a high similarity to the data block to be compressed, and if so, refers to the reference data block with a high similarity. The data block is compressed to the data block to be compressed. If it does not exist, it is judged whether there is a reference data block with a lower similarity with the data block to be compressed. If it exists, the reference data block with a lower similarity is referred to the data to be compressed Block compression; that is to say, the embodiment of the present invention selects reference data blocks from high to low through multi-level similarity standards, which generally improves the compression rate during data compression and saves storage space.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are For some embodiments of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明实施例提供的一种数据压缩方法的流程示意图;Fig. 1 is a schematic flow chart of a data compression method provided by an embodiment of the present invention;
图2是本发明实施例提供的又一种数据压缩方法的流程示意图;Fig. 2 is a schematic flow chart of another data compression method provided by an embodiment of the present invention;
图3是本发明实施例提供的一种生成特征值的场景示意图;FIG. 3 is a schematic diagram of a scenario for generating feature values provided by an embodiment of the present invention;
图4是本发明实施例提供的又一种生成特征值的场景示意图;FIG. 4 is a schematic diagram of another scenario for generating feature values provided by an embodiment of the present invention;
图5是本发明实施例提供的一种按照优先级选择参考数据块的场景示意图;Fig. 5 is a schematic diagram of a scenario of selecting reference data blocks according to priority provided by an embodiment of the present invention;
图6是本发明实施例提供的又一种数据压缩方法的流程示意图;FIG. 6 is a schematic flowchart of another data compression method provided by an embodiment of the present invention;
图7是本发明实施例提供的又一种数据压缩方法的流程示意图;FIG. 7 is a schematic flowchart of another data compression method provided by an embodiment of the present invention;
图8是本发明实施例提供的一种终端的结构示意图;FIG. 8 is a schematic structural diagram of a terminal provided by an embodiment of the present invention;
图9是本发明实施例提供的又一种终端的结构示意图。FIG. 9 is a schematic structural diagram of another terminal provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将附图对本发明实施例中的技术方案进行清楚、完整地描述。本发明的总体思想是将相似压缩(即Delta压缩)技术中“相似”的标准划分为至少两个等级(或称作“优先级”),在有数据要压缩时,先通过等级更高的“相似”评判标准来查找与待压缩的数据相似的数据,若未找到则通过等级稍低的“相似”评判标准来查找与该待压缩的数据相似的数据,依次类推,直至找到与该待压缩数据相似的数据,并通过Delta压缩技术参照该相似的数据对待压缩的数据进行压缩。本发明实施例中,压缩率等于数据压缩之前的大小除以数据压缩之后的大小,因此压缩率越大表明压缩后的数据越小,压缩效果越理想。The following drawings clearly and completely describe the technical solutions in the embodiments of the present invention. The general idea of the present invention is to divide the "similar" standard in similar compression (being Delta compression) technology into at least two grades (or be called "priority"), when there is data to be compressed, first pass through the higher grade The "similar" criterion is used to find data similar to the data to be compressed. If not found, the "similar" criterion of a lower level is used to find data similar to the data to be compressed, and so on until the data similar to the data to be compressed is found. Compress data with similar data, and compress the data to be compressed with reference to the similar data through Delta compression technology. In the embodiment of the present invention, the compression rate is equal to the size of the data before compression divided by the size of the data after compression. Therefore, the larger the compression rate, the smaller the compressed data and the better the compression effect.
请参见图1,图1是本发明实施例提供的一种数据压缩方法的流程示意图,该方法包括但不限于如下步骤。Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of a data compression method provided by an embodiment of the present invention. The method includes but is not limited to the following steps.
步骤S101:终端通过第一计算策略计算待压缩数据块的第一特征值。Step S101: the terminal calculates a first characteristic value of a data block to be compressed by using a first calculation strategy.
具体地,该终端可以是手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(英文:mobile internet device,简称:MID)、可穿戴设备(例如智能手表(如iWatch等)、智能手环、计步器等)或其他涉及到数据压缩的终端设备。在对数据压缩时可以以数据块为单位进行压缩,数据块常用的大小有4K、8K等,该待压缩数据块是指当前即将要进行压缩的数据块。Specifically, the terminal may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a mobile internet device (English: mobile internet device, MID for short), a wearable device (such as a smart watch (such as iWatch, etc.), a smart bracelet, pedometer, etc.) or other terminal devices involving data compression. When compressing data, data block can be used as a unit to compress data, and the commonly used size of data block is 4K, 8K, etc., and the data block to be compressed refers to the data block to be compressed currently.
在本发明实施例中,可以预先设置多个计算策略且每个计算策略均可以对数据块进行计算以得到一个特征值,该特征值用于体现被计算的数据块的特征,两个数据块越相似则通过该计算策略计算出的这两个数据块的特征值越接近。In the embodiment of the present invention, multiple calculation strategies can be preset and each calculation strategy can calculate a data block to obtain a feature value, which is used to reflect the characteristics of the calculated data block, two data blocks The closer the eigenvalues of the two data blocks calculated by the calculation strategy are, the closer they are.
该多个计算策略之间存在优先级,任意两个相邻优先级的计算策略中,优先级较高的计算策略刻称为第一计算策略,优先级较低的计算策略可称为第二计算策略,该第一计算策略与该第二计算策略之间存在如下关系:两个数据块的相似度高于第一相似阈值时,通过该第一计算策略计算出的该两个数据块的特征值相同;两个数据块的相似度高于第二相似阈值时,通过该第二计算策略计算出的该两个数据块的特征值相同,该第一相似阈值高于该第二相似阈值。也即是说,基于更高优先级的计算策略衡量两个数据块相似的标准,比基于更低优先级的计算策略衡量连个数据块相似的标识更高。There are priorities among the multiple computing strategies. Among any two computing strategies with adjacent priorities, the computing strategy with higher priority is called the first computing strategy, and the computing strategy with lower priority can be called the second computing strategy. Computing strategy, there is the following relationship between the first computing strategy and the second computing strategy: when the similarity of two data blocks is higher than the first similarity threshold, the two data blocks calculated by the first computing strategy The eigenvalues are the same; when the similarity of the two data blocks is higher than the second similarity threshold, the eigenvalues of the two data blocks calculated by the second calculation strategy are the same, and the first similarity threshold is higher than the second similarity threshold . That is to say, the standard for measuring the similarity of two data blocks based on a computing strategy with a higher priority is higher than the standard for measuring the similarity of two data blocks based on a computing strategy with a lower priority.
在本发明实施例中,该终端会基于该第一计算策略计算待压缩数据块的特征值,可称计算得到的特征值为第一特征值。通过该第一计算策略计算待压缩数据块的具体方式此处暂不作限定,以下参照图2例举一种可选的实现方式。In the embodiment of the present invention, the terminal calculates the eigenvalue of the data block to be compressed based on the first calculation strategy, and the calculated eigenvalue may be referred to as the first eigenvalue. The specific manner of calculating the data block to be compressed by using the first calculation strategy is not limited here, and an optional implementation manner is exemplified below with reference to FIG. 2 .
第一步:该终端从待压缩数据块中划分出M个数据单元,该M个数据单元中每个数据单元对应有各自的初始参考值,M大于等于1;例如,可以以4字节(byte)长度为单位,将该待压缩数据块划分成M个部分,划分出的每部分称为数据单元,通过预设的哈希函数为这M个数据单元中每个数据单元计算出一个哈希值,计算出的该哈希值即为该初始参考值,第i个数据单元的初始参考值可以表示为h(xi),i大于等于1小于等于M。Step 1: The terminal divides M data units from the data block to be compressed, and each data unit in the M data units corresponds to its own initial reference value, and M is greater than or equal to 1; for example, 4 bytes ( byte) length as the unit, the data block to be compressed is divided into M parts, and each part is called a data unit, and a hash is calculated for each data unit in the M data units through the preset hash function Hash value, the calculated hash value is the initial reference value, the initial reference value of the i-th data unit can be expressed as h( xi ), i is greater than or equal to 1 and less than or equal to M.
第二步:该终端将该M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算该待压缩数据块的第一特征值,P大于等于2。具体实现可以如下:Step 2: The terminal substitutes the initial reference values of at least two data units among the M data units into preset P filter functions to calculate the first characteristic value of the data block to be compressed, where P is greater than or equal to 2. The specific implementation can be as follows:
1、配置P组参数,该P组参数中的第j组参数可以表示为:{Sj,pj,qj},j大于等于1小于等于P,Sj为抽样比,(pj,qj)为线性参数组;1. Configure the P group of parameters. The jth group of parameters in the P group of parameters can be expressed as: {S j , p j , q j }, j is greater than or equal to 1 and less than or equal to P, S j is the sampling ratio, (p j , q j ) is a linear parameter group;
2、从{h(x1),h(x2),…h(xi)…,h(xM)}中按照抽样比Sj进行抽样得到抽样序列{h(x1j),h(x2j),…h(xkj)},k大于等于1小于等于M;2. Sampling from {h(x 1 ), h(x 2 ), ... h(x i )..., h(x M )} according to the sampling ratio S j to obtain the sampling sequence {h(x 1j ), h( x 2j ),...h(x kj )}, k is greater than or equal to 1 and less than or equal to M;
3、以基于参数组(pj,qj)构建的过滤函数f(kj)=h(xkj)×pj+qj,对抽样序列{h(x1j),h(x2j),…,h(xkj)}中每个元素分别进行计算,得到计算序列{f(1j),f(2j),…,f(kj)}。3. With the filter function f(kj)=h(x kj )×p j +q j constructed based on the parameter set (p j , q j ), for the sampling sequence {h(x 1j ), h(x 2j ), ..., each element in h(x kj )} is calculated separately, and the calculation sequence {f(1j), f(2j), ..., f(kj)} is obtained.
4、获取计算序列{f(1j),f(2j),…f(kj)}中的最大值,可表示为f(j)max;4. Obtain the maximum value in the calculation sequence {f(1j), f(2j), ... f(kj)}, which can be expressed as f(j) max ;
5、第一特征值HP={f(1)max,f(2)max,……,f(P)max}。5. The first eigenvalue H P ={f(1) max , f(2) max , . . . , f(P) max }.
为了更好地理解上述公式所表达的含义,以下提供具体数值来举例说明,假设P等于3,M等于5,K=3,即要配置三组参数{S1,p1,q1}、{S1,p2,q2}和{S3,p3,q3},基于S1、S1和S3抽样得到的抽样序列均为{h(x1),h(x2),h(x3)}。In order to better understand the meaning expressed by the above formula, specific numerical values are provided as examples below. Assume that P is equal to 3, M is equal to 5, and K=3, that is, three sets of parameters {S 1 , p 1 , q 1 }, {S1, p 2 , q 2 } and {S 3 , p 3 , q 3 }, the sampling sequence obtained based on S 1 , S1 and S 3 sampling is {h(x 1 ), h(x 2 ), h (x 3 )}.
假设基于上述参数的计算结果如下:Assume that the calculation results based on the above parameters are as follows:
f(11)=h(x1)×p1+q1=6;f(21)=h(x2)×p1+q1=9;f(31)=h(x2)×p1+q1=1;f(11)=h(x 1 )×p 1 +q 1 =6; f(21)=h(x 2 )×p 1 +q 1 =9; f(31)=h(x 2 )×p 1 +q 1 =1;
f(12)=h(x1)×p2+q2=1;f(22)=h(x2)×p2+q2=4;f(32)=h(x2)×p2+q2=5;f(12)=h(x 1 )×p 2 +q 2 =1; f(22)=h(x 2 )×p 2 +q 2 =4; f(32)=h(x 2 )×p 2 +q 2 =5;
f(13)=h(x1)×p3+q3=3;f(23)=h(x2)×p3+q3=2;f(33)=h(x2)×p3+q3=7;f(13)=h(x 1 )×p 3 +q 3 =3; f(23)=h(x 2 )×p 3 +q 3 =2; f(33)=h(x 2 )×p 3 +q 3 =7;
那么,计算序列{f(11),f(21),…f(31)}中的最大值f(1)max为9,计算序列{f(12),f(22),…f(32)}中的最大值f(2)max为5,计算序列{f(13),f(23),…f(33)}中的最大值f(3)max为7,因此第一特征值HP={9,5,7}。在一种可选的方案中,P个过滤函数中每个过滤函数的输入均为{h(x1),h(x2),…h(xi)…,h(xM)},图3为对应的场景示意图;在又一种可选的方案中,P个过滤函数中有一个过滤函数的输入为{h(x1),h(x2),…h(xi)…,h(xM)},该过滤函数的输出作为另一个过滤函数的输入,该另一个过滤函数的输出作为又一个过滤函数的输入,依次类推,图4为对应的场景示意图。Then, calculate the maximum value f(1) max in the sequence {f(11), f(21), ...f(31)} is 9, calculate the sequence {f(12), f(22), ...f(32) )} in the maximum value f(2) max is 5, calculate the maximum value f(3) max in the sequence {f(13), f(23), ... f(33)} is 7, so the first eigenvalue H P ={9,5,7}. In an optional solution, the input of each of the P filtering functions is {h(x 1 ), h(x 2 ),...h( xi )..., h(x M )}, Figure 3 is a schematic diagram of the corresponding scene; in yet another alternative solution, the input of one filter function among the P filter functions is {h(x 1 ), h(x 2 ),...h(x i )... , h(x M )}, the output of the filter function is used as the input of another filter function, and the output of the other filter function is used as the input of another filter function, and so on. Figure 4 is a schematic diagram of the corresponding scene.
需要说明的是,基于第一计算策略计算其他数据块的特征值的方式,可以参照基于该第一计算策略计算该待压缩数据块的第一特征值的方式。It should be noted that, for the manner of calculating the eigenvalues of other data blocks based on the first calculation strategy, reference may be made to the manner of calculating the first eigenvalue of the data block to be compressed based on the first calculation strategy.
步骤S102:该终端判断第一查找库中是否存在第一参考值,该第一参考值为与该第一特征值相同的特征值,该第一查找库包含N个特征值且每个特征值为基于该第一计算策略对该特征值对应的数据块计算得到,该第一查找库中的该N个特征值一一对应N个数据块,N大于等于1。Step S102: The terminal judges whether there is a first reference value in the first search library, the first reference value is the same feature value as the first feature value, the first search library contains N feature values and each feature value The data block corresponding to the feature value is calculated based on the first calculation strategy, and the N feature values in the first search library correspond to N data blocks one by one, and N is greater than or equal to 1.
具体地,每个计算策略对应有一个查找库,为了方便区分可以将第一计算策略对应的查找库称为第一查找库,将第二计算策略对应的查找库称为第二查找库,其余依次类推。下面以第一查找库为例讲述查找库的特点,该第一查找库中包含了特征值与数据块的对应关系,该第一查找库中每个特征值均为通过该第一计算策略基于该特征值对应的数据块计算得到,假设该查找库中包含N个特征值,那么该N个特征值与N个数据块一一对应。该N个数据块可以为在该带压缩数据块之前已经被压缩过的数据块。该第二查找库中也包含N个特征值与该N个数据块一一对应,只不过该第二查找库中的这N个特征值是通过该第二计算策略对该N个数据块计算得到的。Specifically, each computing strategy corresponds to a search library. For convenience, the search library corresponding to the first computing strategy can be called the first search library, and the search library corresponding to the second computing strategy can be called the second search library. And so on. The following takes the first search library as an example to describe the characteristics of the search library. The first search library contains the correspondence between eigenvalues and data blocks. Each eigenvalue in the first search library is based on the first calculation strategy. The data block corresponding to the feature value is calculated. Assuming that the search database contains N feature values, the N feature values correspond to the N data blocks one by one. The N data blocks may be data blocks that have been compressed before the compressed data block. The second lookup library also contains N eigenvalues corresponding to the N data blocks one-to-one, except that the N eigenvalues in the second lookup library are calculated for the N data blocks through the second calculation strategy owned.
该终端需要判断该第一查找库中是否包含与第一特征值相同的特征值,若有则可称该相同的特征值为第一参考值以方便后续描述。The terminal needs to judge whether the first search database contains the same feature value as the first feature value, and if so, the same feature value can be called the first reference value for the convenience of subsequent description.
步骤S103:若存在该第一参考值,则该终端通过相似压缩技术以该第一参考值对应的数据块为参考数据块对该待压缩数据块压缩。Step S103: If the first reference value exists, the terminal compresses the data block to be compressed by using the data block corresponding to the first reference value as a reference data block through a similar compression technique.
具体地,如果该第一查找库中存在第一参考值则表明存在与该待压缩数据块相似的数据块,因为只有两个相似的数据块基于该第一计算策略计算得到的特征值才可能相同。Specifically, if the first reference value exists in the first lookup library, it indicates that there is a data block similar to the data block to be compressed, because only two similar data blocks can be calculated based on the eigenvalues of the first calculation strategy. same.
步骤S104:若不存在该第一参考值,则该终端通过第二计算策略计算该待压缩数据块的第二特征值。Step S104: If the first reference value does not exist, the terminal calculates a second characteristic value of the data block to be compressed through a second calculation strategy.
具体地,使用第二计算策略计算待压缩数据块的特征值的方式,与使用第一计算策略计算待压缩数据块的特征值的原理类似。在一种可选的方案中,当该终端通过第一计算策略计算预设的待压缩数据块的第一特征值,具体为:将该M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算该待压缩数据块的第一特征值时,该终端通过第二计算策略计算该待压缩数据块的第二特征值,具体为:将该M个数据单元中至少两个数据单元的初始参考值代入到预设的Q个过滤函数中计算该待压缩数据块的第二特征值,该P个过滤函数包括该Q个过滤函数。参照以上P等于3,M等于5,K=3所举的例子,假设Q=2,那么第二特征值可能为HQ={9,5}。Specifically, the way of using the second calculation strategy to calculate the eigenvalue of the data block to be compressed is similar to the principle of using the first calculation strategy to calculate the eigenvalue of the data block to be compressed. In an optional solution, when the terminal calculates the preset first characteristic value of the data block to be compressed through the first calculation strategy, specifically: the initial reference value of at least two data units among the M data units When substituting into the preset P filter functions to calculate the first eigenvalue of the data block to be compressed, the terminal calculates the second eigenvalue of the data block to be compressed through the second calculation strategy, specifically: the M data The initial reference values of at least two data units in the unit are substituted into preset Q filter functions to calculate the second characteristic value of the data block to be compressed, and the P filter functions include the Q filter functions. Referring to the above example where P is equal to 3, M is equal to 5, and K=3, assuming that Q=2, then the second eigenvalue may be H Q ={9,5}.
步骤S105:该终端判断第二查找库中是否存在第二参考值,该第二参考值为与该第二特征值相同的特征值,该第二查找库包含N个特征值且每个特征值为基于该第二计算策略对该特征值对应的数据块计算得到,该第二查找库中的该N个特征值一一对应该N个数据块;Step S105: The terminal judges whether there is a second reference value in the second search library, the second reference value is the same feature value as the second feature value, the second search library contains N feature values and each feature value The data block corresponding to the feature value is calculated based on the second calculation strategy, and the N feature values in the second search library correspond to N data blocks one by one;
具体地,该终端需要判断该第二查找库中是否包含与第二特征值相同的特征值,若有则可称该相同的特征值为第二参考值以方便后续描述。Specifically, the terminal needs to judge whether the second search library contains the same feature value as the second feature value, and if so, the same feature value can be called the second reference value for the convenience of subsequent description.
步骤S106:若存在该第二参考值,则该终端通过相似压缩技术以该第二参考值对应的数据块为参考数据块对该待压缩数据块压缩。Step S106: If the second reference value exists, the terminal compresses the data block to be compressed by using the data block corresponding to the second reference value as a reference data block through a similar compression technique.
具体地,如果该第二查找库中存在第二参考值则表明存在与该待压缩数据块相似的数据块,因为只有两个相似的数据块基于该第二计算策略计算得到的特征值才可能相同。Specifically, if there is a second reference value in the second lookup library, it indicates that there is a data block similar to the data block to be compressed, because only two similar data blocks can be calculated based on the eigenvalues obtained by the second calculation strategy. same.
通过上述方法可知,如果存在第一参考值则表明存在较高相似度的数据块,因此参照该相似度较高的数据块压缩;如果不存在第一参考值而存在第二参考值则表明存在较低相似度的数据块,因此参照该相似度较低的数据块压缩,提高了总体压缩率。图5为本发明实施例提供的一种数据压缩的流程示意图,图中示意了多个计算策略且计算策略从上到下优先级依次降低。It can be seen from the above method that if there is a first reference value, it indicates that there is a data block with a higher similarity, so the data block with a higher similarity is referred to for compression; if there is no first reference value but there is a second reference value, it indicates that there is A data block with a lower similarity is therefore compressed with reference to the data block with a lower similarity, improving the overall compression ratio. FIG. 5 is a schematic flowchart of a data compression process provided by an embodiment of the present invention, in which multiple computing strategies are shown, and the priorities of the computing strategies decrease from top to bottom.
在一种可选的方案中,对该待压缩数据块压缩之后,该方法还包括:该终端将该第一特征值加入到该第一查找库中,以及将该第二特征值加入到该第二查找库中,在该第一查找库中该第一特征值对应的数据块为该待压缩数据块,在该第二查找库中该第二特征值对应的数据块为该待压缩数据块。也即是说,压缩过的数据块可用作后续压缩时的参考数据块。In an optional solution, after the data block to be compressed is compressed, the method further includes: the terminal adding the first feature value to the first search library, and adding the second feature value to the In the second search library, the data block corresponding to the first characteristic value in the first search library is the data block to be compressed, and the data block corresponding to the second characteristic value in the second search library is the data to be compressed piece. That is to say, the compressed data block can be used as a reference data block for subsequent compression.
在又一种可选的方案中,该方法还包括:该终端将存在于该第一查找库中的时间超过预设时间阈值的特征值从该第一查找库中删除,以及将存在于该第二查找库中的时间超过该预设时间阈值的特征值从该第二查找库中删除。该预设时间阈值可以根据实际需要预先设定好,也即是说,可以将存在时间较长特征值从查找库中删除,因为存在时间较久远的特征值对应的数据块很可能不具备参考价值,因此将其删掉以腾出更多存储空间,可选的,在删除时还可以先判断该特征值对应的数据块是否从未被用作参考数据块,若该特征值存在的时间较长且对应的数据块未被用作参考数据块,则将该特征值从查找库中删除。图6示出了一种详细的流程示意图。In yet another optional solution, the method further includes: the terminal deleting the feature values existing in the first search library for a time exceeding a preset time threshold from the first search library, and deleting the feature values existing in the first search library The feature values whose time exceeds the preset time threshold in the second search library are deleted from the second search library. The preset time threshold can be pre-set according to actual needs, that is to say, the feature values that have existed for a long time can be deleted from the search library, because the data blocks corresponding to the feature values that have existed for a long time probably do not have reference Value, so delete it to free up more storage space. Optionally, when deleting, you can also first determine whether the data block corresponding to the feature value has never been used as a reference data block. If the time of the feature value If the data block is longer and the corresponding data block is not used as a reference data block, then the characteristic value is deleted from the lookup library. Fig. 6 shows a detailed flowchart.
在又一种可选的方案中,该终端通过第一计算策略计算待压缩数据块的第一特征值之前,该方法还包括:该终端统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值,以及确定该第二查找库和该多个特征值中存在相同特征值的数量,历史压缩过程中未通过该第一计算策略计算该多个数据块的特征值;该终端根据该多个特征值中特征值的数量以及该相同特征值的数量计算第一中标率;当该第一中标率高于预设的第一中标阈值时,若存在待压缩数据块,则执行该终端通过第一计算策略计算待压缩数据块的第一特征值的步骤。In yet another optional solution, before the terminal calculates the first eigenvalue of the data block to be compressed through the first calculation strategy, the method further includes: the terminal statistics history compression process through the second calculation strategy on multiple data A plurality of eigenvalues calculated by the block, and determining the number of identical eigenvalues in the second search library and the plurality of eigenvalues, the eigenvalues of the plurality of data blocks were not calculated by the first calculation strategy in the historical compression process ; The terminal calculates a first bid winning rate according to the number of eigenvalues in the plurality of eigenvalues and the number of the same eigenvalues; when the first winning rate is higher than the preset first winning threshold, if there is a data block to be compressed , then execute the step of calculating the first eigenvalue of the data block to be compressed by the terminal through the first calculation strategy.
具体地,该可选方案中的第一计算策略为多个计算策略中优先级最高的计算策略,该可选的方案讲述了如何将该第一计算策略加入到该多个计算策略中,在该第一计算策略未加入到该多个计算策略中时,该第二计算策略为优先级最高的计算策略,因此,该终端会通过该第二计算策略计算数据块的特征值而不会通过该第一计算策略计算该数据块的特征值。假设统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值的数量为X,以及确定该第二查找库和该多个特征值中存在相同特征值的数量Y,那么,该第一中标率可以为Y除以X,当该第一中标率高于预设的第一中标阈值,则表明该第二查找库中有很多特征值对应的数据块可用作参考数据块,这反映了该第二计算策略衡量两个数据块相似的标准可能过低,因此需要向该多个计算策略中添加优先级更高的计算策略,即该第一计算策略。Specifically, the first computing strategy in this optional solution is the computing strategy with the highest priority among multiple computing strategies, and this optional solution describes how to add the first computing strategy to the multiple computing strategies. When the first computing strategy is not added to the multiple computing strategies, the second computing strategy is the computing strategy with the highest priority, therefore, the terminal will use the second computing strategy to calculate the characteristic value of the data block instead of passing The first calculation strategy calculates the feature value of the data block. Assuming that the number of multiple eigenvalues calculated by the second calculation strategy for multiple data blocks in the statistical history compression process is X, and determining the number Y of the same eigenvalues in the second search library and the multiple eigenvalues, Then, the first winning rate can be Y divided by X, and when the first winning rate is higher than the preset first winning threshold, it indicates that there are many data blocks corresponding to feature values in the second search library that can be used as reference data block, which reflects that the second computing strategy may have too low a standard for measuring the similarity of two data blocks, so it is necessary to add a computing strategy with a higher priority, that is, the first computing strategy, to the multiple computing strategies.
在又一种可选的方案中,该终端通过第二计算策略计算该待压缩数据块的第二特征值之前,该方法还包括:该终端统计历史压缩过程中通过第一计算策略对多个数据块压缩得到的多个特征值,以及确定该第一查找库和该多个特征值中存在相同特征值的数量;历史压缩过程中未通过该第二计算策略计算该多个数据块的特征值;该终端根据该多个特征值中特征值的数量以及该相同特征值的数量计算第二中标率;当该第二中标率低于预设的第二中标阈值时,执行该若不存在该第一参考值,则该终端通过第二计算策略计算该待压缩数据块的第二特征值的步骤。In yet another optional solution, before the terminal calculates the second eigenvalue of the data block to be compressed through the second calculation strategy, the method further includes: during the compression history of the terminal statistics, the first calculation strategy is used to calculate the A plurality of eigenvalues obtained by data block compression, and determining the number of identical eigenvalues in the first lookup library and the plurality of eigenvalues; the characteristics of the plurality of data blocks were not calculated by the second calculation strategy during the historical compression process value; the terminal calculates a second bid winning rate according to the number of feature values in the plurality of feature values and the number of the same feature value; when the second bid winning rate is lower than the preset second winning threshold, execute the if not present For the first reference value, the terminal calculates a second characteristic value of the data block to be compressed through a second calculation strategy.
具体地,该可选方案中的第二计算策略为多个计算策略中优先级最低的计算策略,该可选的方案讲述了如何将该第二计算策略加入到该多个计算策略中,在该第二计算策略未加入到该多个计算策略中时,该第一计算策略为优先级最低的计算策略,因此,该终端会通过该第一计算策略计算数据块的特征值而不会通过该第二计算策略计算该数据块的特征值。假设统计历史压缩过程中通过第一计算策略对多个数据块计算得到的多个特征值的数量为S,以及确定该第一查找库和该多个特征值中存在相同特征值的数量T,那么,该第二中标率可以为T除以S,当该第二中标率低于预设的第二中标阈值,则表明该第一查找库中有较少特征值对应的数据块可用作参考数据块,这反映了该第一计算策略衡量两个数据块相似的标准可能过高,因此需要向该多个计算策略中添加优先级更低的计算策略,即该第二计算策略。Specifically, the second computing strategy in this optional solution is the computing strategy with the lowest priority among multiple computing strategies, and this optional solution describes how to add the second computing strategy to the multiple computing strategies. When the second computing strategy is not added to the multiple computing strategies, the first computing strategy is the computing strategy with the lowest priority, therefore, the terminal will use the first computing strategy to calculate the eigenvalue of the data block without passing The second calculation strategy calculates the feature value of the data block. Assuming that the number of multiple eigenvalues calculated by the first calculation strategy for multiple data blocks in the statistical history compression process is S, and determining the number T of the same eigenvalues in the first search library and the multiple eigenvalues, Then, the second winning rate can be T divided by S, and when the second winning rate is lower than the preset second winning threshold, it indicates that there are fewer data blocks corresponding to feature values in the first search library that can be used as Referring to data blocks, this reflects that the first computing strategy may have too high a standard for measuring the similarity of two data blocks, so it is necessary to add a computing strategy with a lower priority, that is, the second computing strategy, to the plurality of computing strategies.
基于上述增加第一计算策略和第二计算策略的两种可选的方案,还可以衍生出其他方案,例如,参照基于第一计算策略计算第二中标率以及基于第二计算策略计算第一中标率的原理,计算出上述多个计算策略中每个计算策略各自的中标率,当其中优先级最高的计算策略的中标率高于预先设定的上限阈值TH时,则表明该计算策略衡量两个数据块相似的标准比较低,因此在该多个计算策略中添加更高一级的计算策略以供后续使用(新添加的计算策略比原先已有的任意一个计算策略的计算等级都高);当其中优先级最高的计算策略的中标率不高于预先设定的上限阈值TH时,进一步判断是否所有的计算策略的中标率均小于预先设定的下限阈值TL,若所有的计算策略的中标率均小于下限阈值TL,则表明这些计算策略衡量两个数据块相似的标准比较高,因此向该多个计算策略中添加优先级更低一级的计算策略(新添加的计算策略比原先已有的任意一个计算策略的计算等级都低),若不是所有的计算策略的中标率均小于下限阈值TL,则保持该多个计算策略不变;图7为对应的流程示意图。Based on the above two optional schemes of adding the first calculation strategy and the second calculation strategy, other schemes can also be derived, for example, referring to the calculation of the second bid winning rate based on the first calculation strategy and the calculation of the first bid winning rate based on the second calculation strategy According to the principle of the rate, calculate the bid winning rate of each of the above multiple computing strategies. When the winning rate of the computing strategy with the highest priority is higher than the preset upper threshold TH , it indicates that the computing strategy measures The similarity standard of two data blocks is relatively low, so a higher-level computing strategy is added to the multiple computing strategies for subsequent use (the newly added computing strategy has a higher computing level than any of the original computing strategies ); when the bid-winning rate of the calculation strategy with the highest priority is not higher than the preset upper threshold TH , it is further judged whether the bid-winning rates of all the calculation strategies are less than the preset lower threshold T L , if all If the bidding success rates of the calculation strategies are all less than the lower threshold TL , it indicates that these calculation strategies measure the similarity of two data blocks with relatively high standards. Therefore, a calculation strategy with a lower priority is added to the multiple calculation strategies (the newly added calculation strategy is lower than the calculation level of any one of the existing calculation strategies), if not all calculation strategies have a bid winning rate less than the lower threshold T L , then keep the multiple calculation strategies unchanged; Figure 7 shows the corresponding process schematic diagram.
在图1所描述的方法中,终端以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。In the method described in Figure 1, the terminal compresses the data in units of data blocks, and first judges whether there is a reference data block with a high similarity to the data block to be compressed when compressing, and if it exists, refer to the high similarity Compress the data block to be compressed, if it does not exist, judge whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, refer to the reference data block with a lower similarity to the data block to be compressed Compressed data block compression; that is to say, the embodiment of the present invention selects reference data blocks from high to low through multi-level similarity standards, which generally improves the compression rate during data compression and saves storage space.
上述详细阐述了本发明实施例的方法,为了便于更好地实施本发明实施例的上述方案,相应地,下面提供了本发明实施例的装置。The method of the embodiment of the present invention has been described in detail above. In order to facilitate better implementation of the above solution of the embodiment of the present invention, correspondingly, the following provides the device of the embodiment of the present invention.
请参见图8,图8是本发明实施例提供的一种终端80的结构示意图,该终端80可以包括第一计算单元801、第一判断单元802、第一压缩单元803、第二计算单元804、第二判断单元805和第二压缩单元806,其中,各个单元的详细描述如下。Please refer to FIG. 8. FIG. 8 is a schematic structural diagram of a terminal 80 provided by an embodiment of the present invention. The terminal 80 may include a first calculation unit 801, a first judgment unit 802, a first compression unit 803, and a second calculation unit 804. . A second judging unit 805 and a second compressing unit 806, wherein the detailed description of each unit is as follows.
第一计算单元801用于通过第一计算策略计算待压缩数据块的第一特征值;The first calculation unit 801 is configured to calculate a first eigenvalue of the data block to be compressed through a first calculation strategy;
第一判断单元802用于判断第一查找库中是否存在第一参考值,所述第一参考值为与所述第一特征值相同的特征值,所述第一查找库包含N个特征值且每个特征值为基于所述第一计算策略对所述特征值对应的数据块计算得到,所述第一查找库中的所述N个特征值一一对应N个数据块,N大于等于1;The first judging unit 802 is used to judge whether there is a first reference value in the first search library, the first reference value is the same feature value as the first feature value, and the first search library contains N feature values And each eigenvalue is calculated based on the first calculation strategy for the data block corresponding to the eigenvalue, and the N eigenvalues in the first search library correspond to N data blocks one by one, and N is greater than or equal to 1;
第一压缩单元803用于在所述第一判断单元802判断出存在所述第一参考值时,通过相似压缩技术以所述第一参考值对应的数据块为参考数据块对所述待压缩数据块压缩;The first compression unit 803 is configured to, when the first judging unit 802 judges that the first reference value exists, use a similar compression technique to use the data block corresponding to the first reference value as a reference data block to compress the to-be-compressed data block compression;
第二计算单元804用于在所述第一判断单元802判断出不存在所述第一参考值时,通过第二计算策略计算所述待压缩数据块的第二特征值,两个数据块的相似度高于第一相似阈值时,通过所述第一计算策略计算出的所述两个数据块的特征值相同;所述两个数据块的相似度高于第二相似阈值时,通过所述第二计算策略计算出的所述两个数据块的特征值相同,所述第一相似阈值高于所述第二相似阈值;The second calculation unit 804 is configured to calculate the second characteristic value of the data block to be compressed through the second calculation strategy when the first judgment unit 802 judges that the first reference value does not exist, and the two data blocks When the similarity is higher than the first similarity threshold, the eigenvalues of the two data blocks calculated by the first calculation strategy are the same; when the similarity of the two data blocks is higher than the second similarity threshold, by the The feature values of the two data blocks calculated by the second calculation strategy are the same, and the first similarity threshold is higher than the second similarity threshold;
第二判断单元805用于判断第二查找库中是否存在第二参考值,所述第二参考值为与所述第二特征值相同的特征值,所述第二查找库包含N个特征值且每个特征值为基于所述第二计算策略对所述特征值对应的数据块计算得到,所述第二查找库中的所述N个特征值一一对应所述N个数据块;The second judging unit 805 is used to judge whether there is a second reference value in the second search library, the second reference value is the same feature value as the second feature value, and the second search library contains N feature values And each eigenvalue is calculated based on the second calculation strategy for the data block corresponding to the eigenvalue, and the N eigenvalues in the second search library correspond to the N data blocks one by one;
第二压缩单元806用于在所述第二判断单元805判断出存在所述第二参考值时,通过相似压缩技术以所述第二参考值对应的数据块为参考数据块对所述待压缩数据块压缩。The second compression unit 806 is configured to, when the second judging unit 805 judges that the second reference value exists, use a similar compression technique to use the data block corresponding to the second reference value as a reference data block to compress the to-be-compressed Data block compression.
通过运行上述单元,终端80以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。By running the above units, the terminal 80 compresses the data in units of data blocks, and first judges whether there is a reference data block with a higher similarity with the data block to be compressed when compressing, and if it exists, refer to the reference data with a higher similarity block compresses the data block to be compressed, if it does not exist, it is judged whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, refer to the reference data block with a lower similarity to the data block to be compressed Compression; that is to say, the embodiment of the present invention selects reference data blocks from high to low through multi-level similarity standards, which generally improves the compression rate during data compression and saves storage space.
在一种可选的方案中,所述终端还包括:In an optional solution, the terminal further includes:
划分单元,用于在所述第一计算单元801通过第一计算策略计算预设的待压缩数据块的第一特征值之前,从待压缩数据块中划分出M个数据单元,所述M个数据单元中每个数据单元对应有各自的初始参考值,M大于等于1;A division unit, configured to divide M data units from the data block to be compressed before the first calculation unit 801 calculates the preset first eigenvalue of the data block to be compressed through the first calculation strategy, and the M data units Each data unit in the data unit corresponds to its own initial reference value, and M is greater than or equal to 1;
所述第一计算单元801具体用于将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算所述待压缩数据块的第一特征值,P大于等于2;The first calculation unit 801 is specifically configured to substitute the initial reference values of at least two data units in the M data units into preset P filter functions to calculate the first characteristic value of the data block to be compressed, P is greater than or equal to 2;
所述第二计算单元804具体用于将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的Q个过滤函数中计算所述待压缩数据块的第二特征值,所述P个过滤函数包括所述Q个过滤函数。The second calculation unit 804 is specifically configured to substitute the initial reference values of at least two data units in the M data units into preset Q filter functions to calculate the second characteristic value of the data block to be compressed, The P filter functions include the Q filter functions.
在又一种可选的方案中,所述终端还包括:In yet another optional solution, the terminal further includes:
添加单元,用于将所述第一特征值加入到所述第一查找库中,以及将所述第二特征值加入到所述第二查找库中,在所述第一查找库中所述第一特征值对应的数据块为所述待压缩数据块,在所述第二查找库中所述第二特征值对应的数据块为所述待压缩数据块。An adding unit, configured to add the first feature value to the first lookup library, and add the second feature value to the second lookup library, in the first lookup library, The data block corresponding to the first characteristic value is the data block to be compressed, and the data block corresponding to the second characteristic value in the second search database is the data block to be compressed.
在又一种可选的方案中,所述终端还包括:In yet another optional solution, the terminal further includes:
删除单元,用于将存在于所述第一查找库中的时间超过预设时间阈值的特征值从所述第一查找库中删除,以及将存在于所述第二查找库中的时间超过所述预设时间阈值的特征值从所述第二查找库中删除。A deletion unit, configured to delete the feature values that exist in the first search library for a time exceeding a preset time threshold from the first search library, and delete the feature values that exist in the second search library for a time exceeding the preset time threshold. The characteristic value of the preset time threshold is deleted from the second search library.
在又一种可选的方案中,所述终端还包括:In yet another optional solution, the terminal further includes:
第一统计单元,用于在所述第一计算单元通过第一计算策略计算待压缩数据块的第一特征值之前,统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值,以及确定所述第二查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第一计算策略计算所述多个数据块的特征值;The first statistical unit is configured to, before the first calculation unit calculates the first eigenvalue of the data block to be compressed by the first calculation strategy, count the number of data blocks calculated by the second calculation strategy for multiple data blocks during the historical compression process eigenvalues, and determine the number of identical eigenvalues in the second search library and the plurality of eigenvalues; the eigenvalues of the plurality of data blocks are not calculated by the first calculation strategy in the historical compression process;
第三计算单元,用于根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第一中标率;当所述第一中标率高于预设的第一中标阈值时,若存在待压缩数据块,则触发所述第一计算单元通过第一计算策略计算待压缩数据块的第一特征值。A third calculation unit, configured to calculate a first bid winning rate according to the number of feature values in the plurality of feature values and the number of the same feature values; when the first bid winning rate is higher than a preset first winning bid threshold , if there is a data block to be compressed, triggering the first calculation unit to calculate the first characteristic value of the data block to be compressed by using a first calculation strategy.
在又一种可选的方案中,所述终端还包括:In yet another optional solution, the terminal further includes:
第二统计单元,用于在所述第一计算单元通过第二计算策略计算所述待压缩数据块的第二特征值之前所述终端统计历史压缩过程中通过第一计算策略对多个数据块压缩得到的多个特征值,以及确定所述第一查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第二计算策略计算所述多个数据块的特征值;The second statistical unit is configured to use the first calculation strategy to analyze multiple data blocks during the statistical history compression process of the terminal before the first calculation unit calculates the second eigenvalue of the data block to be compressed through the second calculation strategy. Compressing the multiple eigenvalues obtained, and determining the number of identical eigenvalues in the first lookup library and the multiple eigenvalues; the multiple data blocks were not calculated by the second calculation strategy during the historical compression process eigenvalues;
第四计算单元,用于根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第二中标率;当所述第二中标率低于预设的第二中标阈值时,触发所述第二计算单元在不存在所述第一参考值时,通过第二计算策略计算所述待压缩数据块的第二特征值。A fourth calculation unit, configured to calculate a second bid-winning rate according to the number of feature values in the plurality of feature values and the number of the same feature values; when the second bid-winning rate is lower than a preset second winning threshold and triggering the second calculation unit to calculate the second characteristic value of the data block to be compressed by using a second calculation strategy when the first reference value does not exist.
需要说明的是,在本发明实施例中,各个单元的具体实现还可以对应参照图1所示的方法实施例的相应描述。It should be noted that, in the embodiment of the present invention, the specific implementation of each unit may also refer to the corresponding description of the method embodiment shown in FIG. 1 .
在图8所描述的终端80中,终端80以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。In the terminal 80 described in FIG. 8 , the terminal 80 compresses data in units of data blocks. When compressing, it first judges whether there is a reference data block with a higher similarity with the data block to be compressed. If it exists, refer to the higher The reference data block of similarity is compressed to the data block to be compressed, and if it does not exist, it is judged whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, the pair of reference data blocks with lower similarity is referred to The data block to be compressed is compressed; that is to say, the embodiment of the present invention selects the reference data block from high to low through multi-level similarity standards, which improves the compression rate of data compression as a whole and saves storage space.
请参见图9,图9是本发明实施例提供的一种终端90,该终端90包括处理器901和存储器902,所述处理器901和存储器902通过总线相互连接。Please refer to FIG. 9. FIG. 9 is a terminal 90 provided by an embodiment of the present invention. The terminal 90 includes a processor 901 and a memory 902, and the processor 901 and the memory 902 are connected to each other through a bus.
存储器902包括但不限于是随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或者快闪存储器)、或便携式只读存储器(CD-ROM),该存储器902用于相关指令及数据。Memory 902 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), or portable read-only memory (CD-ROM). Memory 902 is used for related instructions and data.
处理器901可以是一个或多个中央处理器(英文:Central Processing Unit,简称:CPU),在处理器901是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The processor 901 may be one or more central processing units (English: Central Processing Unit, CPU for short). When the processor 901 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
所述终端90中的处理器901用于读取所述存储器902中存储的程序代码,执行以下操作:The processor 901 in the terminal 90 is used to read the program code stored in the memory 902, and perform the following operations:
通过第一计算策略计算待压缩数据块的第一特征值;判断第一查找库中是否存在第一参考值,所述第一参考值为与所述第一特征值相同的特征值,所述第一查找库包含N个特征值且每个特征值为基于所述第一计算策略对所述特征值对应的数据块计算得到,所述第一查找库中的所述N个特征值一一对应N个数据块,N大于等于1;若存在所述第一参考值,则通过相似压缩技术以所述第一参考值对应的数据块为参考数据块对所述待压缩数据块压缩;若不存在所述第一参考值,则通过第二计算策略计算所述待压缩数据块的第二特征值,两个数据块的相似度高于第一相似阈值时,通过所述第一计算策略计算出的所述两个数据块的特征值相同;所述两个数据块的相似度高于第二相似阈值时,通过所述第二计算策略计算出的所述两个数据块的特征值相同,所述第一相似阈值高于所述第二相似阈值;判断第二查找库中是否存在第二参考值,所述第二参考值为与所述第二特征值相同的特征值,所述第二查找库包含N个特征值且每个特征值为基于所述第二计算策略对所述特征值对应的数据块计算得到,所述第二查找库中的所述N个特征值一一对应所述N个数据块;若存在所述第二参考值,则通过相似压缩技术以所述第二参考值对应的数据块为参考数据块对所述待压缩数据块压缩。Calculate the first eigenvalue of the data block to be compressed through the first calculation strategy; determine whether there is a first reference value in the first search library, the first reference value is the same eigenvalue as the first eigenvalue, and the first reference value is the same as the first eigenvalue. The first search library contains N eigenvalues, and each eigenvalue is calculated based on the first calculation strategy for the data block corresponding to the eigenvalue, and the N eigenvalues in the first search library are one by one Corresponding to N data blocks, N is greater than or equal to 1; if there is the first reference value, the data block corresponding to the first reference value is used as a reference data block to compress the data block to be compressed through a similar compression technique; if If the first reference value does not exist, the second characteristic value of the data block to be compressed is calculated through the second calculation strategy, and when the similarity between the two data blocks is higher than the first similarity threshold, the first calculation strategy is used to calculate the second characteristic value of the data block to be compressed. The calculated eigenvalues of the two data blocks are the same; when the similarity of the two data blocks is higher than a second similarity threshold, the eigenvalues of the two data blocks calculated through the second calculation strategy Same, the first similarity threshold is higher than the second similarity threshold; judging whether there is a second reference value in the second search library, the second reference value is the same feature value as the second feature value, so The second search library includes N eigenvalues, and each eigenvalue is calculated based on the second calculation strategy for the data block corresponding to the eigenvalue, and the N eigenvalues in the second search library are one One corresponds to the N data blocks; if there is the second reference value, compress the data block to be compressed by using the data block corresponding to the second reference value as a reference data block through a similar compression technique.
通过执行上述操作,终端90以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。By performing the above operations, the terminal 90 compresses the data in units of data blocks. When compressing, it first judges whether there is a reference data block with a high similarity with the data block to be compressed, and if it exists, refer to the reference data with a high similarity. block compresses the data block to be compressed, if it does not exist, it is judged whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, refer to the reference data block with a lower similarity to the data block to be compressed Compression; that is to say, the embodiment of the present invention selects reference data blocks from high to low through multi-level similarity standards, which generally improves the compression rate during data compression and saves storage space.
在一种可选的方案中,所述处理器901通过第一计算策略计算预设的待压缩数据块的第一特征值之前,还用于:从待压缩数据块中划分出M个数据单元,所述M个数据单元中每个数据单元对应有各自的初始参考值,M大于等于1;所述处理器901通过第一计算策略计算预设的待压缩数据块的第一特征值,具体为:将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的P个过滤函数中计算所述待压缩数据块的第一特征值,P大于等于2;所述处理器901通过第二计算策略计算所述待压缩数据块的第二特征值,具体为:将所述M个数据单元中至少两个数据单元的初始参考值代入到预设的Q个过滤函数中计算所述待压缩数据块的第二特征值,所述P个过滤函数包括所述Q个过滤函数。In an optional solution, before the processor 901 calculates the preset first eigenvalue of the data block to be compressed through the first calculation strategy, it is further configured to: divide M data units from the data block to be compressed , each of the M data units corresponds to its own initial reference value, and M is greater than or equal to 1; the processor 901 calculates the first characteristic value of the preset data block to be compressed through a first calculation strategy, specifically It is: substituting the initial reference values of at least two data units in the M data units into preset P filter functions to calculate the first characteristic value of the data block to be compressed, where P is greater than or equal to 2; the processing The unit 901 calculates the second characteristic value of the data block to be compressed through the second calculation strategy, specifically: substituting the initial reference values of at least two data units in the M data units into the preset Q filter functions Calculating a second eigenvalue of the data block to be compressed, where the P filter functions include the Q filter functions.
在又一种可选的方案中,对所述待压缩数据块压缩之后,所述处理器901还用于:将所述第一特征值加入到所述第一查找库中,以及将所述第二特征值加入到所述第二查找库中,在所述第一查找库中所述第一特征值对应的数据块为所述待压缩数据块,在所述第二查找库中所述第二特征值对应的数据块为所述待压缩数据块。In yet another optional solution, after the data block to be compressed is compressed, the processor 901 is further configured to: add the first feature value to the first lookup library, and add the The second feature value is added to the second lookup library, the data block corresponding to the first feature value in the first lookup library is the data block to be compressed, and the data block in the second lookup library is The data block corresponding to the second characteristic value is the data block to be compressed.
在又一种可选的方案中,所述处理器901还用于:将存在于所述第一查找库中的时间超过预设时间阈值的特征值从所述第一查找库中删除,以及将存在于所述第二查找库中的时间超过所述预设时间阈值的特征值从所述第二查找库中删除。In yet another optional solution, the processor 901 is further configured to: delete a feature value that exists in the first search library for a time exceeding a preset time threshold from the first search library, and Deleting feature values that have existed in the second search library for a time exceeding the preset time threshold from the second search library.
在又一种可选的方案中,所述处理器901通过第一计算策略计算待压缩数据块的第一特征值之前,还用于:统计历史压缩过程中通过第二计算策略对多个数据块计算得到的多个特征值,以及确定所述第二查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第一计算策略计算所述多个数据块的特征值;根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第一中标率;当所述第一中标率高于预设的第一中标阈值时,若存在待压缩数据块,则执行通过第一计算策略计算待压缩数据块的第一特征值的操作。In yet another optional solution, before the processor 901 calculates the first eigenvalue of the data block to be compressed through the first calculation strategy, it is also used to: use the second calculation strategy to perform statistics on multiple data during historical compression. block the calculated multiple eigenvalues, and determine the number of identical eigenvalues in the second lookup library and the multiple eigenvalues; the multiple data are not calculated by the first calculation strategy during the historical compression process The feature value of the block; calculate the first bid winning rate according to the number of feature values in the plurality of feature values and the number of the same feature value; when the first bid winning rate is higher than the preset first winning threshold, if If there is a data block to be compressed, the operation of calculating the first characteristic value of the data block to be compressed by using the first calculation strategy is performed.
在又一种可选的方案中,所述处理器901通过第二计算策略计算所述待压缩数据块的第二特征值之前,还用于:统计历史压缩过程中通过第一计算策略对多个数据块压缩得到的多个特征值,以及确定所述第一查找库和所述多个特征值中存在相同特征值的数量;历史压缩过程中未通过所述第二计算策略计算所述多个数据块的特征值;根据所述多个特征值中特征值的数量以及所述相同特征值的数量计算第二中标率;当所述第二中标率低于预设的第二中标阈值时,执行所述若不存在所述第一参考值,则通过第二计算策略计算所述待压缩数据块的第二特征值的操作。In yet another optional solution, before the processor 901 calculates the second eigenvalue of the data block to be compressed through the second calculation strategy, it is further configured to: use the first calculation strategy to compare multiple A plurality of eigenvalues obtained by compressing data blocks, and determining the number of identical eigenvalues in the first lookup library and the plurality of eigenvalues; the plurality of eigenvalues were not calculated by the second calculation strategy in the historical compression process feature value of a data block; calculate a second bid winning rate according to the number of feature values in the plurality of feature values and the number of the same feature value; when the second bid winning rate is lower than the preset second winning threshold , performing the operation of calculating a second characteristic value of the data block to be compressed by using a second calculation strategy if the first reference value does not exist.
需要说明的是,在本发明实施例中,各个单元的具体实现还可以对应参照图2所示的方法实施例的相应描述。It should be noted that, in the embodiment of the present invention, the specific implementation of each unit may also refer to the corresponding description of the method embodiment shown in FIG. 2 .
在图9所描述的终端中,终端90以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。In the terminal described in FIG. 9 , the terminal 90 compresses the data in units of data blocks. When compressing, it first judges whether there is a reference data block with a high similarity with the data block to be compressed. degree of reference data block to compress the data block to be compressed, if it does not exist, judge whether there is a reference data block with a lower similarity to the data block to be compressed, and if it exists, refer to the reference data block with a lower similarity to the data block to be compressed Compression of data blocks to be compressed; that is to say, the embodiment of the present invention selects reference data blocks from high to low through multi-level similarity standards, which generally improves the compression rate of data compression and saves storage space.
综上所述,通过实施本发明实施例,终端以数据块为单位对数据进行压缩,在压缩时先判断是否存在与待压缩数据块相似度较高的参考数据块,若存在则参照该较高相似度的参考数据块对该待压缩数据块压缩,若不存在则判断是否存在与该待压缩数据块相似度较低的参考数据块,若存在则参照该较低相似度的参考数据块对该待压缩数据块压缩;也即是说,本发明实施例通过多级相似度标准从高到底来选择参考数据块,总体提升了数据压缩时的压缩率,节省了存储空间。To sum up, by implementing the embodiment of the present invention, the terminal compresses data in units of data blocks, and first judges whether there is a reference data block with a high similarity with the data block to be compressed when compressing, and if so, refers to the reference data block. A reference data block with a high similarity is compressed to the data block to be compressed, and if it does not exist, it is judged whether there is a reference data block with a lower similarity with the data block to be compressed, and if it exists, the reference data block with a lower similarity is referred to The data block to be compressed is compressed; that is to say, the embodiment of the present invention selects the reference data block from high to low through multi-level similarity standards, which improves the compression rate of data compression as a whole and saves storage space.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in computer-readable storage media. During execution, it may include the processes of the embodiments of the above-mentioned methods. The aforementioned storage medium includes various media capable of storing program codes such as ROM, RAM, magnetic disk or optical disk.
以上实施例仅揭露了本发明中较佳实施例,不能以此来限定本发明之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本发明权利要求所作的等同变化,仍属于发明所涵盖的范围。The above embodiments only disclose the preferred embodiments of the present invention, and cannot limit the scope of rights of the present invention with this. Those of ordinary skill in the art can understand the whole or part of the process of realizing the above embodiments, and make according to the claims of the present invention Equivalent changes still belong to the scope covered by the invention.
Claims (12)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610729693.7A CN107783990B (en) | 2016-08-26 | 2016-08-26 | Data compression method and terminal |
| PCT/CN2017/092525 WO2018036290A1 (en) | 2016-08-26 | 2017-07-11 | Data compression method and terminal |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610729693.7A CN107783990B (en) | 2016-08-26 | 2016-08-26 | Data compression method and terminal |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107783990A true CN107783990A (en) | 2018-03-09 |
| CN107783990B CN107783990B (en) | 2021-11-19 |
Family
ID=61245421
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610729693.7A Active CN107783990B (en) | 2016-08-26 | 2016-08-26 | Data compression method and terminal |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107783990B (en) |
| WO (1) | WO2018036290A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110784227A (en) * | 2019-10-21 | 2020-02-11 | 清华大学 | A method, device and storage medium for multiplexing data set |
| CN111010189A (en) * | 2019-10-21 | 2020-04-14 | 清华大学 | Multi-path compression method and device for data set and storage medium |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119166347B (en) * | 2024-09-04 | 2025-08-08 | 广州市宏茂能源技术有限公司 | Electric power detecting system based on multidimensional data |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4386416A (en) * | 1980-06-02 | 1983-05-31 | Mostek Corporation | Data compression, encryption, and in-line transmission system |
| CN1144583A (en) * | 1994-04-01 | 1997-03-05 | 多尔拜实验特许公司 | Compact source code table for encoder/decoder systems |
| US6804676B1 (en) * | 1999-08-31 | 2004-10-12 | International Business Machines Corporation | System and method in a data processing system for generating compressed affinity records from data records |
| CN102103630A (en) * | 2010-12-08 | 2011-06-22 | 中国联合网络通信集团有限公司 | Data compression method and device as well as data decompression method and device |
| EP2444909A2 (en) * | 2004-04-15 | 2012-04-25 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression |
| CN102831222A (en) * | 2012-08-24 | 2012-12-19 | 华中科技大学 | Differential compression method based on data de-duplication |
| CN105426413A (en) * | 2015-10-31 | 2016-03-23 | 华为技术有限公司 | An encoding method and device |
| CN105630999A (en) * | 2015-12-28 | 2016-06-01 | 华为技术有限公司 | Data compressing method and device of server |
| CN105743509A (en) * | 2016-01-26 | 2016-07-06 | 华为技术有限公司 | Data compression device and method |
| CN106557777A (en) * | 2016-10-17 | 2017-04-05 | 中国互联网络信息中心 | It is a kind of to be based on the improved Kmeans clustering methods of SimHash |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8751462B2 (en) * | 2008-11-14 | 2014-06-10 | Emc Corporation | Delta compression after identity deduplication |
| CN102609491A (en) * | 2012-01-20 | 2012-07-25 | 东华大学 | Column-storage oriented area-level data compression method |
| US9141301B1 (en) * | 2012-06-13 | 2015-09-22 | Emc Corporation | Method for cleaning a delta storage system |
| CN104348490B (en) * | 2014-11-14 | 2017-09-19 | 北京东方国信科技股份有限公司 | A kind of data splitting compression method preferred based on effect |
-
2016
- 2016-08-26 CN CN201610729693.7A patent/CN107783990B/en active Active
-
2017
- 2017-07-11 WO PCT/CN2017/092525 patent/WO2018036290A1/en not_active Ceased
Patent Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4386416A (en) * | 1980-06-02 | 1983-05-31 | Mostek Corporation | Data compression, encryption, and in-line transmission system |
| CN1144583A (en) * | 1994-04-01 | 1997-03-05 | 多尔拜实验特许公司 | Compact source code table for encoder/decoder systems |
| US6804676B1 (en) * | 1999-08-31 | 2004-10-12 | International Business Machines Corporation | System and method in a data processing system for generating compressed affinity records from data records |
| EP2444909A2 (en) * | 2004-04-15 | 2012-04-25 | Microsoft Corporation | Efficient algorithm and protocol for remote differential compression |
| CN102103630A (en) * | 2010-12-08 | 2011-06-22 | 中国联合网络通信集团有限公司 | Data compression method and device as well as data decompression method and device |
| CN102831222A (en) * | 2012-08-24 | 2012-12-19 | 华中科技大学 | Differential compression method based on data de-duplication |
| CN105426413A (en) * | 2015-10-31 | 2016-03-23 | 华为技术有限公司 | An encoding method and device |
| CN105630999A (en) * | 2015-12-28 | 2016-06-01 | 华为技术有限公司 | Data compressing method and device of server |
| CN105743509A (en) * | 2016-01-26 | 2016-07-06 | 华为技术有限公司 | Data compression device and method |
| CN106557777A (en) * | 2016-10-17 | 2017-04-05 | 中国互联网络信息中心 | It is a kind of to be based on the improved Kmeans clustering methods of SimHash |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110784227A (en) * | 2019-10-21 | 2020-02-11 | 清华大学 | A method, device and storage medium for multiplexing data set |
| CN111010189A (en) * | 2019-10-21 | 2020-04-14 | 清华大学 | Multi-path compression method and device for data set and storage medium |
| CN110784227B (en) * | 2019-10-21 | 2021-07-30 | 清华大学 | A method, device and storage medium for multiplexing data set |
| CN111010189B (en) * | 2019-10-21 | 2021-10-26 | 清华大学 | Multi-path compression method and device for data set and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107783990B (en) | 2021-11-19 |
| WO2018036290A1 (en) | 2018-03-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| RU2626334C2 (en) | Method and device for processing data object | |
| US9048862B2 (en) | Systems and methods for selecting data compression for storage data in a storage system | |
| CN104008064B (en) | The method and system compressed for multi-level store | |
| EP3896564A1 (en) | Data processing method and device, and computer readable storage medium | |
| CN108089814B (en) | Data storage method and device | |
| CN105204781A (en) | Compression method, device and equipment | |
| US10585856B1 (en) | Utilizing data access patterns to determine compression block size in data storage systems | |
| CN103150260B (en) | Data de-duplication method and device | |
| WO2014094479A1 (en) | Method and device for deleting duplicate data | |
| JP7299334B2 (en) | Chunking method and apparatus | |
| CN104753626A (en) | Data compression method, equipment and system | |
| CN107783990A (en) | A kind of data compression method and terminal | |
| CN111061428B (en) | Method and device for data compression | |
| CN117097717B (en) | File transmission optimization method, system and electronic device for simulation results | |
| CN115809013A (en) | Data deduplication method and related device | |
| CN111209254A (en) | File fingerprint acquisition method and device, electronic equipment and storage medium | |
| CN117687977A (en) | File deduplication method, device and system, electronic equipment and computer storage medium | |
| WO2024021491A1 (en) | Data slicing method, apparatus and system | |
| CN119440413B (en) | A method, apparatus, electronic device, and storage medium for reducing the size of a storage system. | |
| CN116383290B (en) | Data generalization and analysis method | |
| CN112749138A (en) | Method, electronic device and computer program product for processing data | |
| US20230153005A1 (en) | Block Storage Device and Method for Data Compression | |
| EP4068071B1 (en) | Data storage method in storage system and related device | |
| CN110968575B (en) | A deduplication method for big data processing system | |
| CN108520067A (en) | Method, device and storage medium for compressing and decompressing gzip format files |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |