Disclosure of Invention
In view of the above, the application provides a retrieval structure and a link up query method based on the non-tamperable characteristic of data in a blockchain, which can realize constant maintenance cost without increasing with the increase of the data scale.
In a first aspect, the present application provides an efficient classification retrieval structure based on the non-tamperable nature of blockchain data, in which:
Classifying the data attributes according to the characteristics, wherein whether each data has a certain characteristic is identified by a bit of binary system, 0 represents none, 1 represents none, and the data of the final index maintains the time sequence of the data on the block chain;
in the searching process, bit operation and mathematical operation are taken as main materials, the final searching result is the relative offset of the data meeting the condition relative to the first data, and finally the head is moved in the data file by calculating the offset to perform accurate I/O.
The second aspect of the application is based on the classified retrieval structure, and further provides a link query method based on the non-tamperable characteristic of the blockchain data, wherein each time the client retrieves and selects to carry an identifier, the server calculates the identifier to obtain the stopping position of the client when the client retrieves the same problem last time, retrieves the data meeting the condition from the stopping position to the latest data, finally obtains the data meeting the retrieval condition from the data updated after the last retrieval, and the server returns the data and the new identifier to the client.
The high-efficiency classified retrieval structure realizes constant maintenance cost, namely the maintenance cost cannot be increased along with the increase of the total data amount, and accurate I/O can be performed during retrieval, so that a large amount of redundant I/O is avoided. In addition, the application also provides the link inquiry, based on the characteristic that the data on the blockchain is not tamperable, the data obtained by the client after one-time retrieval is latest and cannot be modified, and the client only needs to retrieve the data meeting the condition in the updated data when the client retrieves the same problem next time, so that the retrieval cost is greatly reduced.
Detailed Description
In order to describe the present invention more specifically, the following describes the technical scheme of the present invention in detail in connection with the specific embodiments.
The method comprises the steps of classifying data according to data scenes and demand scenes, classifying the data according to the data scenes and the demand scenes, wherein the same piece of data can have multiple characteristics, classifying the data according to keywords, classifying the data according to a range of a certain attribute value of the data, finally obtaining the category number F n of the characteristics, and determining the maximum number B n,Bn of sub-nodes of each node to be a constant larger than 2 according to the data scale and the user demand.
When the classification search structure is established, the data on the block chain are ordered according to the time sequence, each data is encoded one by one, each data has F n bits, 1 and 0 of each bit respectively represent whether the data has or does not have F n characteristics, and F n is determined by a specific application scene. B n data are grouped from left to right, and the characteristic bits of the B n data representing each characteristic are divided together in the same manner to obtain F n characteristic codes. The B n data form a leaf node, and the B n leaf nodes are partitioned together to form child nodes of an intermediate node in the same manner, and the intermediate node has F n feature codes, and each of the left-to-right B n bits in each feature code respectively indicates whether the left-to-right leaf nodes have data containing the feature. Similarly, the B n intermediate nodes are partitioned together to form the child nodes of a root node, where the root node has F n feature codes, and each of the left to right B n bits in each feature code indicates whether the subtrees from left to right thereof have data containing the feature. Thus, a classification search tree is constructed, and when all data are constructed, a classification search forest is obtained.
In a preferred example, a plurality of trees corresponding to the combined root nodes form a search forest, only the last constructed tree may not be completed in the construction process, and when new data is added, only the latest constructed tree needs to be maintained without adjusting the whole index structure.
In a preferred example, after the leaf node is retrieved, the relative positions of all 1s in the feature code in the binary code are obtained, and the relative positions of the data are compared with the relative positions, and the position of the result data in the file can be obtained by calculating the head offset according to the relative positions.
In a preferred example, the traversal process is to find all 1s in the binary system, calculate their relative offsets with respect to the first node of the first layer by finding the position of this 1, and recursively obtain the relative offsets of all the data satisfying the condition with respect to the first data.
Based on the efficient classified retrieval structure, the specific steps of the retrieval method are as follows:
Step A, determining the characteristics to be searched, wherein the characteristics can be single characteristics or combination of multi-characteristic logic operation, and the following searching processes all carry out the same characteristic logic operation;
searching in all root nodes in the search forest, and finding all 1 s in the feature code binary system after the combination of the features of the designated features or the logic operation through bit operation;
calculating the relative offset of the 1 relative to the first bit in all root node feature code binary, so as to find the intermediate node to be searched;
Step D, searching in all the intermediate nodes found in the step C, and finding all 1 s in the feature code binary system after the combination of the features of the specified features or the logic operation through bit operation;
Step E, calculating the relative offsets of the 1 relative to the first bit in all intermediate node feature code binary, so as to find the leaf node to be searched;
Step F, searching in all leaf nodes found in the step E, and finding all 1s in the feature code binary system after the combination of the features of the specified feature or the logic operation through bit operation;
Calculating the relative offsets of the 1 relative to the first bit in all leaf node feature code binary, so as to find the data to be retrieved;
And H, calculating the relative offset of the magnetic head to be moved according to the offset obtained in the step G, and reading in a data file.
When new data is added, only F n bits need to be added to indicate whether the data has the characteristics. Additionally, the worst case is that the node needs to be extended, as in fig. 2, although the overhead of maintenance is still on the order of a constant.
In addition, because the data on the blockchain has the property of being non-tamperable, after the client side performs one-time searching, the data obtained by the searching is not modified, which means that the data are always up to date, if the data on the blockchain are updated, the client side performs searching again for the same problem, and only the data meeting the condition need to be searched in the newly-increased data, and repeated searching is not needed, so the embodiment of the application also provides the following link inquiry method:
The client can select to carry an identifier every time, the server calculates the identifier to obtain the stopping position of the client when the client searches the same problem last time, and the server returns the data and the new identifier to the client after the stopping position reaches the data meeting the condition in the latest data and finally obtains the data meeting the searching condition in the data updated after the last search.
In some embodiments, by calculating this identification, it is possible to obtain which search tree the client stopped at when retrieving the same problem last time, and thus which leaf node stopped in this search tree, and finally which bit of this leaf node is encoded for the feature.
In some embodiments, the client will skip all data previously retrieved when retrieving the same problem again, and retrieve data meeting the criteria only among the newly updated data that the client did not retrieve.
In some embodiments, the server returns a new identifier to the client after each search is completed, and the new identifier is used for recording the position where the search of the client is stopped, the identifier is saved by the client, and when the search is performed next time, the client can select the identifier carrying the corresponding search condition to perform the connection query.
The embodiments described above are described in order to facilitate the understanding and application of the present invention to those skilled in the art, and it will be apparent to those skilled in the art that various modifications may be made to the embodiments described above and that the general principles described herein may be applied to other embodiments without the need for inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications within the scope of the present invention.