CN118626517B - A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data - Google Patents

A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data

Info

Publication number
CN118626517B
CN118626517B CN202410918721.4A CN202410918721A CN118626517B CN 118626517 B CN118626517 B CN 118626517B CN 202410918721 A CN202410918721 A CN 202410918721A CN 118626517 B CN118626517 B CN 118626517B
Authority
CN
China
Prior art keywords
data
relative
bit
blockchain
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410918721.4A
Other languages
Chinese (zh)
Other versions
CN118626517A (en
Inventor
赵新奎
熊礽荣
程冠杰
智晨
张旭鸿
尹建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University School Of Software Ningbo Innovation And Management Center
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202410918721.4A priority Critical patent/CN118626517B/en
Publication of CN118626517A publication Critical patent/CN118626517A/en
Application granted granted Critical
Publication of CN118626517B publication Critical patent/CN118626517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于区块链数据不可篡改特性的高效分类检索结构及其衔接查询方法。本发明中的数据属性按特征进行分类,每个数据是否具有某一个特征用二进制的一个比特标识,其最终索引的数据保持区块链上数据的时序性;检索过程中以位运算和数学运算为主,最终检索得到的结果是满足条件的数据相对于第一个数据的相对偏移量,最后在数据文件中通过计算偏移量移动磁头进行精确I/O。本发明基于区块链上数据不可篡改的独特特性,能以常数数量级进行维护,具有极高的可扩展性;检索单个数据的复杂度也能降为常数数量级。同时提出了衔接查询,可以极大地减少检索开销,解决了现有区块链检索结构随着区块链上数据规模增大而导致的性能下降问题。

The present invention discloses an efficient classification retrieval structure based on the tamper-proof property of blockchain data and its connection query method. In the present invention, data attributes are classified by features, and whether each data has a certain feature is identified by a binary bit. The data finally indexed maintains the temporal sequence of the data on the blockchain. The retrieval process mainly uses bit operations and mathematical operations. The final retrieval result is the relative offset of the data that meets the conditions relative to the first data. Finally, the offset is calculated in the data file to move the magnetic head for precise I/O. Based on the unique tamper-proof property of data on the blockchain, the present invention can be maintained at a constant order of magnitude and has extremely high scalability. The complexity of retrieving a single data can also be reduced to a constant order of magnitude. At the same time, a connection query is proposed, which can greatly reduce the retrieval overhead and solve the problem of performance degradation of the existing blockchain retrieval structure caused by the increase in the scale of data on the blockchain.

Description

Classification retrieval method based on non-tamperable characteristic of blockchain data and connection query method thereof
Technical Field
The invention belongs to the technical field of block chains, and particularly relates to a classification retrieval method based on the non-tamperable characteristic of block chain data and a connection query method thereof.
Background
Blockchains act as a distributed data ledger, with data consistency across multiple participants through a specific consensus mechanism. The blockchain data structure is composed of time-ordered blocks, each block containing transaction records and linked to the previous block by hash pointers to form an unalterable and continuously growing chain. Invariance of blockchain data is ensured by encrypted links between blocks, any modification disabling all subsequent blocks. Data retrieval is a large demand on the blockchain.
In response to the need for data retrieval on blockchain, industry and academia have proposed some preliminary solutions, which are mostly upgraded or improved based on b+tree. The search and maintenance costs of these structures are related to the data size, because the search needs to be performed in all indexes, and the maintenance may be performed in all indexes, and the search and maintenance costs may become unacceptable when the data size is large.
Conventional search structures do not fit well into large data searches in a blockchain scenario, and over time, the size of data on the blockchain is getting larger and growing, meaning that the search and maintenance overhead of these search structures can become larger and larger. In addition, because the data on the blockchain has the property of being unable to be tampered, when the client performs one-time search and if the data on the blockchain is updated, the client performs the search for the same problem again, only the data meeting the condition needs to be searched in the newly added data, and the traditional search structure cannot complete the search under the condition of low cost.
Disclosure of Invention
In view of the above, the application provides a retrieval structure and a link up query method based on the non-tamperable characteristic of data in a blockchain, which can realize constant maintenance cost without increasing with the increase of the data scale.
In a first aspect, the present application provides an efficient classification retrieval structure based on the non-tamperable nature of blockchain data, in which:
Classifying the data attributes according to the characteristics, wherein whether each data has a certain characteristic is identified by a bit of binary system, 0 represents none, 1 represents none, and the data of the final index maintains the time sequence of the data on the block chain;
in the searching process, bit operation and mathematical operation are taken as main materials, the final searching result is the relative offset of the data meeting the condition relative to the first data, and finally the head is moved in the data file by calculating the offset to perform accurate I/O.
The second aspect of the application is based on the classified retrieval structure, and further provides a link query method based on the non-tamperable characteristic of the blockchain data, wherein each time the client retrieves and selects to carry an identifier, the server calculates the identifier to obtain the stopping position of the client when the client retrieves the same problem last time, retrieves the data meeting the condition from the stopping position to the latest data, finally obtains the data meeting the retrieval condition from the data updated after the last retrieval, and the server returns the data and the new identifier to the client.
The high-efficiency classified retrieval structure realizes constant maintenance cost, namely the maintenance cost cannot be increased along with the increase of the total data amount, and accurate I/O can be performed during retrieval, so that a large amount of redundant I/O is avoided. In addition, the application also provides the link inquiry, based on the characteristic that the data on the blockchain is not tamperable, the data obtained by the client after one-time retrieval is latest and cannot be modified, and the client only needs to retrieve the data meeting the condition in the updated data when the client retrieves the same problem next time, so that the retrieval cost is greatly reduced.
Drawings
FIG. 1 is a schematic diagram of the classification search structure of the present invention with the best maintenance overhead when a piece of data is newly added.
FIG. 2 is a schematic diagram showing the worst maintenance overhead of the classification search structure according to the present invention when a piece of data is newly added.
Detailed Description
In order to describe the present invention more specifically, the following describes the technical scheme of the present invention in detail in connection with the specific embodiments.
The method comprises the steps of classifying data according to data scenes and demand scenes, classifying the data according to the data scenes and the demand scenes, wherein the same piece of data can have multiple characteristics, classifying the data according to keywords, classifying the data according to a range of a certain attribute value of the data, finally obtaining the category number F n of the characteristics, and determining the maximum number B n,Bn of sub-nodes of each node to be a constant larger than 2 according to the data scale and the user demand.
When the classification search structure is established, the data on the block chain are ordered according to the time sequence, each data is encoded one by one, each data has F n bits, 1 and 0 of each bit respectively represent whether the data has or does not have F n characteristics, and F n is determined by a specific application scene. B n data are grouped from left to right, and the characteristic bits of the B n data representing each characteristic are divided together in the same manner to obtain F n characteristic codes. The B n data form a leaf node, and the B n leaf nodes are partitioned together to form child nodes of an intermediate node in the same manner, and the intermediate node has F n feature codes, and each of the left-to-right B n bits in each feature code respectively indicates whether the left-to-right leaf nodes have data containing the feature. Similarly, the B n intermediate nodes are partitioned together to form the child nodes of a root node, where the root node has F n feature codes, and each of the left to right B n bits in each feature code indicates whether the subtrees from left to right thereof have data containing the feature. Thus, a classification search tree is constructed, and when all data are constructed, a classification search forest is obtained.
In a preferred example, a plurality of trees corresponding to the combined root nodes form a search forest, only the last constructed tree may not be completed in the construction process, and when new data is added, only the latest constructed tree needs to be maintained without adjusting the whole index structure.
In a preferred example, after the leaf node is retrieved, the relative positions of all 1s in the feature code in the binary code are obtained, and the relative positions of the data are compared with the relative positions, and the position of the result data in the file can be obtained by calculating the head offset according to the relative positions.
In a preferred example, the traversal process is to find all 1s in the binary system, calculate their relative offsets with respect to the first node of the first layer by finding the position of this 1, and recursively obtain the relative offsets of all the data satisfying the condition with respect to the first data.
Based on the efficient classified retrieval structure, the specific steps of the retrieval method are as follows:
Step A, determining the characteristics to be searched, wherein the characteristics can be single characteristics or combination of multi-characteristic logic operation, and the following searching processes all carry out the same characteristic logic operation;
searching in all root nodes in the search forest, and finding all 1 s in the feature code binary system after the combination of the features of the designated features or the logic operation through bit operation;
calculating the relative offset of the 1 relative to the first bit in all root node feature code binary, so as to find the intermediate node to be searched;
Step D, searching in all the intermediate nodes found in the step C, and finding all 1 s in the feature code binary system after the combination of the features of the specified features or the logic operation through bit operation;
Step E, calculating the relative offsets of the 1 relative to the first bit in all intermediate node feature code binary, so as to find the leaf node to be searched;
Step F, searching in all leaf nodes found in the step E, and finding all 1s in the feature code binary system after the combination of the features of the specified feature or the logic operation through bit operation;
Calculating the relative offsets of the 1 relative to the first bit in all leaf node feature code binary, so as to find the data to be retrieved;
And H, calculating the relative offset of the magnetic head to be moved according to the offset obtained in the step G, and reading in a data file.
When new data is added, only F n bits need to be added to indicate whether the data has the characteristics. Additionally, the worst case is that the node needs to be extended, as in fig. 2, although the overhead of maintenance is still on the order of a constant.
In addition, because the data on the blockchain has the property of being non-tamperable, after the client side performs one-time searching, the data obtained by the searching is not modified, which means that the data are always up to date, if the data on the blockchain are updated, the client side performs searching again for the same problem, and only the data meeting the condition need to be searched in the newly-increased data, and repeated searching is not needed, so the embodiment of the application also provides the following link inquiry method:
The client can select to carry an identifier every time, the server calculates the identifier to obtain the stopping position of the client when the client searches the same problem last time, and the server returns the data and the new identifier to the client after the stopping position reaches the data meeting the condition in the latest data and finally obtains the data meeting the searching condition in the data updated after the last search.
In some embodiments, by calculating this identification, it is possible to obtain which search tree the client stopped at when retrieving the same problem last time, and thus which leaf node stopped in this search tree, and finally which bit of this leaf node is encoded for the feature.
In some embodiments, the client will skip all data previously retrieved when retrieving the same problem again, and retrieve data meeting the criteria only among the newly updated data that the client did not retrieve.
In some embodiments, the server returns a new identifier to the client after each search is completed, and the new identifier is used for recording the position where the search of the client is stopped, the identifier is saved by the client, and when the search is performed next time, the client can select the identifier carrying the corresponding search condition to perform the connection query.
The embodiments described above are described in order to facilitate the understanding and application of the present invention to those skilled in the art, and it will be apparent to those skilled in the art that various modifications may be made to the embodiments described above and that the general principles described herein may be applied to other embodiments without the need for inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications within the scope of the present invention.

Claims (9)

1. A classification retrieval method based on the non-tamperable characteristic of block chain data is characterized by comprising the following steps:
Classifying the data attributes according to the characteristics, wherein whether each data has a certain characteristic is identified by a bit of binary system, 0 represents none, 1 represents none, and the data of the final index maintains the time sequence of the data on the block chain;
In the searching process, bit operation and mathematical operation are taken as main materials, the result obtained by final searching is the relative offset of the data meeting the condition relative to the first data, and finally the head is moved in the data file by calculating the offset to perform accurate I/O;
the searching steps are as follows:
Step A, determining the features to be searched;
searching in all root nodes in the search forest, and finding all 1 s in the feature code binary system after the combination of the features of the designated features or the logic operation through bit operation;
calculating the relative offset of the 1 relative to the first bit in all root node feature code binary, so as to find the intermediate node to be searched;
Step D, searching in all the intermediate nodes found in the step C, and finding all 1 s in the feature code binary system after the combination of the features of the specified features or the logic operation through bit operation;
Step E, calculating the relative offsets of the 1 relative to the first bit in all intermediate node feature code binary, so as to find the leaf node to be searched;
Step F, searching in all leaf nodes found in the step E, and finding all 1s in the feature code binary system after the combination of the features of the specified feature or the logic operation through bit operation;
Calculating the relative offsets of the 1 relative to the first bit in all leaf node feature code binary, so as to find the data to be retrieved;
And H, calculating the relative offset of the magnetic head to be moved according to the offset obtained in the step G, and reading in a data file.
2. The method of claim 1, wherein the index order of all data in the leaf nodes is consistent with the time sequence of the data in the blockchain, and the order in which all data in the nodes of all leaves are arranged from left to right is the order in which the data in the blockchain are arranged in time sequence.
3. The method of claim 1, wherein the searching process is to search for features of different classifications by classifying different characteristics of all attributes of each piece of data, and F n features are obtained after classification, and F n is determined by a specific application scene.
4. The method of claim 3, wherein each piece of data has F n features, corresponding to F n bits in the code, the binary bit arrangement sequence in each feature code is the same as the time sequence of the data in the block chain, and each node has at most B n child nodes, wherein B n is a constant greater than 2;
B n data packets are a leaf node, B n leaf nodes are grouped into child nodes of an intermediate node, B n intermediate nodes are grouped into child nodes of a root node, and finally each node is represented by F n bit codes, each feature code is provided with B n bits, and the B n data or child nodes respectively correspond to each other.
5. The method of claim 1, wherein the plurality of combined trees corresponding to the root nodes form a search forest, and only the last tree is possibly incomplete in the construction process.
6. The method of claim 1, wherein the leaf nodes are retrieved to obtain the relative positions of all 1 s in the feature codes in the binary codes, the relative positions of the data are corresponding to the relative positions, and the positions of the result data in the file can be obtained by calculating the head offset according to the relative positions.
7. The method of claim 1, wherein the traversal process searches for all 1 s in the binary system, calculates the relative offset of the 1 s relative to the first node in the same layer by finding the 1 s position, and recursively obtains the relative offset of all the data satisfying the condition relative to the first data.
8. A link query method based on the non-tamperable characteristic of blockchain data is characterized in that a client side carries an identifier in each retrieval selection, a server side calculates the identifier to obtain a stopping position of the client side when the client side retrieves the same problem last time, data meeting the condition is retrieved from the stopping position to the latest data, finally, data meeting the retrieval condition in the data updated after the last retrieval is obtained, and the server side returns the data and the new identifier to the client side.
9. The method of claim 8, wherein the step of calculating the identifier to obtain a certain leaf node stopped in a certain search tree when the client searches the same problem last time, and finally obtaining a certain bit of the feature code in the leaf node;
The server returns a new identifier to the client after each search is completed, and the new identifier is used for recording the position where the search of the client is stopped, the identifier is automatically saved by the client, and the client selects the identifier carrying the corresponding search condition for connection inquiry when searching is performed next time.
CN202410918721.4A 2024-07-10 2024-07-10 A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data Active CN118626517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410918721.4A CN118626517B (en) 2024-07-10 2024-07-10 A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410918721.4A CN118626517B (en) 2024-07-10 2024-07-10 A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data

Publications (2)

Publication Number Publication Date
CN118626517A CN118626517A (en) 2024-09-10
CN118626517B true CN118626517B (en) 2025-09-12

Family

ID=92603334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410918721.4A Active CN118626517B (en) 2024-07-10 2024-07-10 A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data

Country Status (1)

Country Link
CN (1) CN118626517B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766389A (en) * 2019-01-09 2019-05-17 华东师范大学 A kind of light client revene lookup method of block chain based on bitmap index

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10581613B2 (en) * 2017-06-09 2020-03-03 Ecole Polytechnique Federale De Lausanne (Epfl) Cryptographically verifiable data structure having multi-hop forward and backwards links and associated systems and methods
US11283616B2 (en) * 2019-04-03 2022-03-22 Hong Kong Baptist University Method for index-based and integrity-assured search in a blockchain
CN113138989B (en) * 2021-03-12 2022-12-27 莘上信息技术(上海)有限公司 Block chain data retrieval method and device
CN113779025B (en) * 2021-08-06 2024-01-26 西安电子科技大学 An optimization method, system and application for classification data retrieval efficiency in blockchain
CN114756603B (en) * 2022-05-23 2023-04-07 天津大学 High-efficiency verifiable query method for lightweight block chain
CN115048377B (en) * 2022-06-10 2025-03-28 东北大学 A spatiotemporal keyword query method in a hybrid storage blockchain environment
CN116628083B (en) * 2023-04-27 2024-05-24 中国人民解放军战略支援部队信息工程大学 Blockchain transaction data expansion storage method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766389A (en) * 2019-01-09 2019-05-17 华东师范大学 A kind of light client revene lookup method of block chain based on bitmap index

Also Published As

Publication number Publication date
CN118626517A (en) 2024-09-10

Similar Documents

Publication Publication Date Title
US8156117B2 (en) Method and system for storing, searching and retrieving information based on semistructured and de-centralized data sets
CN100530185C (en) Network behavior based personalized recommendation method and system
US7325013B2 (en) Database with efficient fuzzy matching
Krishnan et al. Estimating alphanumeric selectivity in the presence of wildcards
US20100106713A1 (en) Method for performing efficient similarity search
CN109166615B (en) Medical CT image storage and retrieval method based on random forest hash
CN112765181B (en) Data tracing query method based on block index structure
US20080098020A1 (en) Incremental maintenance of an XML index on binary XML data
CN111782663A (en) Aggregation index structure and aggregation index method for improving aggregation query efficiency
US12346328B2 (en) Creating compressed data slabs that each include compressed data and compression information for storage in a database system
CN109815232B (en) Method and system for retrieving and processing data ranking by using binary search tree
CN112579921B (en) Track indexing and query method and system based on inverted sorting index and prefix tree
CN112632118B (en) Data query method, device, computing device and storage medium
CN116881243A (en) Learning type indexing method and system based on time sequence data characteristics
CN116628083B (en) Blockchain transaction data expansion storage method and system
US11853858B2 (en) Chart building user interface providing machine learned chart recommendations
CN119311882B (en) A document comparison method based on improved LCS algorithm
CN114385587B (en) Construction method and query method for relational database version snapshot
CN118626517B (en) A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data
WO2007048015A2 (en) Method and apparatus for a restartable hash in a trie
CN115563058A (en) Retrieval Method of Similar Cases Based on Element Extraction
CN101326522A (en) A Concise Index Structure for XML
CN119226570A (en) Cloud-edge collaborative data storage and retrieval architecture for industrial scenarios
CN114996267B (en) A method for constructing dynamic index of database
US7693850B2 (en) Method and apparatus for adding supplemental information to PATRICIA tries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 310058 Xihu District, Zhejiang, Yuhang Tong Road, No. 866, No.

Patentee after: ZHEJIANG University

Country or region after: China

Patentee after: Zhejiang University School of Software (Ningbo) Innovation and Management Center

Address before: 310058 Xihu District, Zhejiang, Yuhang Tong Road, No. 866, No.

Patentee before: ZHEJIANG University

Country or region before: China

Patentee before: SCHOOL OF SOFTWARE TECHNOLOGY ZHEJIANG University (NINGBO) CONTROL CENTER (NINGBO SOFTWARE EDUCATION CENTER)