CN118626517B

CN118626517B - A classification retrieval method and its connection query method based on the tamper-proof property of blockchain data

Info

Publication number: CN118626517B
Application number: CN202410918721.4A
Authority: CN
Inventors: 赵新奎; 熊礽荣; 程冠杰; 智晨; 张旭鸿; 尹建伟
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University School Of Software Ningbo Innovation And Management Center; Zhejiang University ZJU
Priority date: 2024-07-10
Filing date: 2024-07-10
Publication date: 2025-09-12
Anticipated expiration: 2044-07-10
Also published as: CN118626517A

Abstract

The present invention discloses an efficient classification retrieval structure based on the tamper-proof property of blockchain data and its connection query method. In the present invention, data attributes are classified by features, and whether each data has a certain feature is identified by a binary bit. The data finally indexed maintains the temporal sequence of the data on the blockchain. The retrieval process mainly uses bit operations and mathematical operations. The final retrieval result is the relative offset of the data that meets the conditions relative to the first data. Finally, the offset is calculated in the data file to move the magnetic head for precise I/O. Based on the unique tamper-proof property of data on the blockchain, the present invention can be maintained at a constant order of magnitude and has extremely high scalability. The complexity of retrieving a single data can also be reduced to a constant order of magnitude. At the same time, a connection query is proposed, which can greatly reduce the retrieval overhead and solve the problem of performance degradation of the existing blockchain retrieval structure caused by the increase in the scale of data on the blockchain.

Description

Classification retrieval method based on non-tamperable characteristic of blockchain data and connection query method thereof

Technical Field

The invention belongs to the technical field of block chains, and particularly relates to a classification retrieval method based on the non-tamperable characteristic of block chain data and a connection query method thereof.

Background

Blockchains act as a distributed data ledger, with data consistency across multiple participants through a specific consensus mechanism. The blockchain data structure is composed of time-ordered blocks, each block containing transaction records and linked to the previous block by hash pointers to form an unalterable and continuously growing chain. Invariance of blockchain data is ensured by encrypted links between blocks, any modification disabling all subsequent blocks. Data retrieval is a large demand on the blockchain.

In response to the need for data retrieval on blockchain, industry and academia have proposed some preliminary solutions, which are mostly upgraded or improved based on b+tree. The search and maintenance costs of these structures are related to the data size, because the search needs to be performed in all indexes, and the maintenance may be performed in all indexes, and the search and maintenance costs may become unacceptable when the data size is large.

Conventional search structures do not fit well into large data searches in a blockchain scenario, and over time, the size of data on the blockchain is getting larger and growing, meaning that the search and maintenance overhead of these search structures can become larger and larger. In addition, because the data on the blockchain has the property of being unable to be tampered, when the client performs one-time search and if the data on the blockchain is updated, the client performs the search for the same problem again, only the data meeting the condition needs to be searched in the newly added data, and the traditional search structure cannot complete the search under the condition of low cost.

Disclosure of Invention

In view of the above, the application provides a retrieval structure and a link up query method based on the non-tamperable characteristic of data in a blockchain, which can realize constant maintenance cost without increasing with the increase of the data scale.

In a first aspect, the present application provides an efficient classification retrieval structure based on the non-tamperable nature of blockchain data, in which:

Classifying the data attributes according to the characteristics, wherein whether each data has a certain characteristic is identified by a bit of binary system, 0 represents none, 1 represents none, and the data of the final index maintains the time sequence of the data on the block chain;

in the searching process, bit operation and mathematical operation are taken as main materials, the final searching result is the relative offset of the data meeting the condition relative to the first data, and finally the head is moved in the data file by calculating the offset to perform accurate I/O.

The second aspect of the application is based on the classified retrieval structure, and further provides a link query method based on the non-tamperable characteristic of the blockchain data, wherein each time the client retrieves and selects to carry an identifier, the server calculates the identifier to obtain the stopping position of the client when the client retrieves the same problem last time, retrieves the data meeting the condition from the stopping position to the latest data, finally obtains the data meeting the retrieval condition from the data updated after the last retrieval, and the server returns the data and the new identifier to the client.

The high-efficiency classified retrieval structure realizes constant maintenance cost, namely the maintenance cost cannot be increased along with the increase of the total data amount, and accurate I/O can be performed during retrieval, so that a large amount of redundant I/O is avoided. In addition, the application also provides the link inquiry, based on the characteristic that the data on the blockchain is not tamperable, the data obtained by the client after one-time retrieval is latest and cannot be modified, and the client only needs to retrieve the data meeting the condition in the updated data when the client retrieves the same problem next time, so that the retrieval cost is greatly reduced.

Drawings

FIG. 1 is a schematic diagram of the classification search structure of the present invention with the best maintenance overhead when a piece of data is newly added.

FIG. 2 is a schematic diagram showing the worst maintenance overhead of the classification search structure according to the present invention when a piece of data is newly added.

Detailed Description

In order to describe the present invention more specifically, the following describes the technical scheme of the present invention in detail in connection with the specific embodiments.

The method comprises the steps of classifying data according to data scenes and demand scenes, classifying the data according to the data scenes and the demand scenes, wherein the same piece of data can have multiple characteristics, classifying the data according to keywords, classifying the data according to a range of a certain attribute value of the data, finally obtaining the category number F _n of the characteristics, and determining the maximum number B _n,B_n of sub-nodes of each node to be a constant larger than 2 according to the data scale and the user demand.

When the classification search structure is established, the data on the block chain are ordered according to the time sequence, each data is encoded one by one, each data has F _n bits, 1 and 0 of each bit respectively represent whether the data has or does not have F _n characteristics, and F _n is determined by a specific application scene. B _n data are grouped from left to right, and the characteristic bits of the B _n data representing each characteristic are divided together in the same manner to obtain F _n characteristic codes. The B _n data form a leaf node, and the B _n leaf nodes are partitioned together to form child nodes of an intermediate node in the same manner, and the intermediate node has F _n feature codes, and each of the left-to-right B _n bits in each feature code respectively indicates whether the left-to-right leaf nodes have data containing the feature. Similarly, the B _n intermediate nodes are partitioned together to form the child nodes of a root node, where the root node has F _n feature codes, and each of the left to right B _n bits in each feature code indicates whether the subtrees from left to right thereof have data containing the feature. Thus, a classification search tree is constructed, and when all data are constructed, a classification search forest is obtained.

In a preferred example, a plurality of trees corresponding to the combined root nodes form a search forest, only the last constructed tree may not be completed in the construction process, and when new data is added, only the latest constructed tree needs to be maintained without adjusting the whole index structure.

In a preferred example, after the leaf node is retrieved, the relative positions of all 1s in the feature code in the binary code are obtained, and the relative positions of the data are compared with the relative positions, and the position of the result data in the file can be obtained by calculating the head offset according to the relative positions.

In a preferred example, the traversal process is to find all 1s in the binary system, calculate their relative offsets with respect to the first node of the first layer by finding the position of this 1, and recursively obtain the relative offsets of all the data satisfying the condition with respect to the first data.

Based on the efficient classified retrieval structure, the specific steps of the retrieval method are as follows:

Step A, determining the characteristics to be searched, wherein the characteristics can be single characteristics or combination of multi-characteristic logic operation, and the following searching processes all carry out the same characteristic logic operation;

searching in all root nodes in the search forest, and finding all 1 s in the feature code binary system after the combination of the features of the designated features or the logic operation through bit operation;

calculating the relative offset of the 1 relative to the first bit in all root node feature code binary, so as to find the intermediate node to be searched;

Step D, searching in all the intermediate nodes found in the step C, and finding all 1 s in the feature code binary system after the combination of the features of the specified features or the logic operation through bit operation;

Step E, calculating the relative offsets of the 1 relative to the first bit in all intermediate node feature code binary, so as to find the leaf node to be searched;

Step F, searching in all leaf nodes found in the step E, and finding all 1s in the feature code binary system after the combination of the features of the specified feature or the logic operation through bit operation;

Calculating the relative offsets of the 1 relative to the first bit in all leaf node feature code binary, so as to find the data to be retrieved;

And H, calculating the relative offset of the magnetic head to be moved according to the offset obtained in the step G, and reading in a data file.

When new data is added, only F _n bits need to be added to indicate whether the data has the characteristics. Additionally, the worst case is that the node needs to be extended, as in fig. 2, although the overhead of maintenance is still on the order of a constant.

In addition, because the data on the blockchain has the property of being non-tamperable, after the client side performs one-time searching, the data obtained by the searching is not modified, which means that the data are always up to date, if the data on the blockchain are updated, the client side performs searching again for the same problem, and only the data meeting the condition need to be searched in the newly-increased data, and repeated searching is not needed, so the embodiment of the application also provides the following link inquiry method:

The client can select to carry an identifier every time, the server calculates the identifier to obtain the stopping position of the client when the client searches the same problem last time, and the server returns the data and the new identifier to the client after the stopping position reaches the data meeting the condition in the latest data and finally obtains the data meeting the searching condition in the data updated after the last search.

In some embodiments, by calculating this identification, it is possible to obtain which search tree the client stopped at when retrieving the same problem last time, and thus which leaf node stopped in this search tree, and finally which bit of this leaf node is encoded for the feature.

In some embodiments, the client will skip all data previously retrieved when retrieving the same problem again, and retrieve data meeting the criteria only among the newly updated data that the client did not retrieve.

In some embodiments, the server returns a new identifier to the client after each search is completed, and the new identifier is used for recording the position where the search of the client is stopped, the identifier is saved by the client, and when the search is performed next time, the client can select the identifier carrying the corresponding search condition to perform the connection query.

The embodiments described above are described in order to facilitate the understanding and application of the present invention to those skilled in the art, and it will be apparent to those skilled in the art that various modifications may be made to the embodiments described above and that the general principles described herein may be applied to other embodiments without the need for inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications within the scope of the present invention.

Claims

1. A classification retrieval method based on the non-tamperable characteristic of block chain data is characterized by comprising the following steps:

In the searching process, bit operation and mathematical operation are taken as main materials, the result obtained by final searching is the relative offset of the data meeting the condition relative to the first data, and finally the head is moved in the data file by calculating the offset to perform accurate I/O;

the searching steps are as follows:

Step A, determining the features to be searched;

2. The method of claim 1, wherein the index order of all data in the leaf nodes is consistent with the time sequence of the data in the blockchain, and the order in which all data in the nodes of all leaves are arranged from left to right is the order in which the data in the blockchain are arranged in time sequence.

3. The method of claim 1, wherein the searching process is to search for features of different classifications by classifying different characteristics of all attributes of each piece of data, and F _n features are obtained after classification, and F _n is determined by a specific application scene.

4. The method of claim 3, wherein each piece of data has F _n features, corresponding to F _n bits in the code, the binary bit arrangement sequence in each feature code is the same as the time sequence of the data in the block chain, and each node has at most B _n child nodes, wherein B _n is a constant greater than 2;

B _n data packets are a leaf node, B _n leaf nodes are grouped into child nodes of an intermediate node, B _n intermediate nodes are grouped into child nodes of a root node, and finally each node is represented by F _n bit codes, each feature code is provided with B _n bits, and the B _n data or child nodes respectively correspond to each other.

5. The method of claim 1, wherein the plurality of combined trees corresponding to the root nodes form a search forest, and only the last tree is possibly incomplete in the construction process.

6. The method of claim 1, wherein the leaf nodes are retrieved to obtain the relative positions of all 1 s in the feature codes in the binary codes, the relative positions of the data are corresponding to the relative positions, and the positions of the result data in the file can be obtained by calculating the head offset according to the relative positions.

7. The method of claim 1, wherein the traversal process searches for all 1 s in the binary system, calculates the relative offset of the 1 s relative to the first node in the same layer by finding the 1 s position, and recursively obtains the relative offset of all the data satisfying the condition relative to the first data.

8. A link query method based on the non-tamperable characteristic of blockchain data is characterized in that a client side carries an identifier in each retrieval selection, a server side calculates the identifier to obtain a stopping position of the client side when the client side retrieves the same problem last time, data meeting the condition is retrieved from the stopping position to the latest data, finally, data meeting the retrieval condition in the data updated after the last retrieval is obtained, and the server side returns the data and the new identifier to the client side.

9. The method of claim 8, wherein the step of calculating the identifier to obtain a certain leaf node stopped in a certain search tree when the client searches the same problem last time, and finally obtaining a certain bit of the feature code in the leaf node;

The server returns a new identifier to the client after each search is completed, and the new identifier is used for recording the position where the search of the client is stopped, the identifier is automatically saved by the client, and the client selects the identifier carrying the corresponding search condition for connection inquiry when searching is performed next time.