Enabling Fine-grained Multi-keyword Search

Abstract—Using cloud computing, individuals can store their data on remote servers and allow data access to public users through the cloud servers. As the outsourced data are likely to contain sensitive privacy information, they are typically encrypted before uploaded to the cloud. This, however, significantly limits the usability of outsourced data due to the difficulty of searching over the encrypted data. In this paper, we address this issue by developing the fine-grained multi-keyword search schemes over encrypted cloud data. Our original contributions are three-fold. First, we introduce the relevance scores and preference factors upon keywords which enable the precise keyword search and personalized user experience. Second, we develop a practical and very efficient multi-keyword search scheme.
The proposed scheme can support complicated logic search the mixed “AND”, “OR” and “NO”  perations of keywords. Third, we further employ the classified sub-dictionaries technique to achieve better efficiency on index building, trapdoor generating and query. Lastly, we analyze the security of the proposed schemes in terms of confidentiality of documents, privacy protection of index and trapdoor, and unlinkability of trapdoor. Through extensive experiments using the real-world dataset, we validate the performance of the proposed schemes. Both the security analysis and experimental results demonstrate that the proposed schemes can achieve the same security level comparing to the existing ones and better performance in terms of functionality, query complexity and efficiency.

INTRODUCTION
The cloud computing treats computing as a utility and leases out the computing and storage capacities to the public individuals [1], [2], [3]. In such a framework, the individual can remotely store her data on the cloud server, namely data outsourcing, and then make the cloud data open for public access through the cloud server. This represents a more scalable, low-cost and stable way for public data access because of the scalability and high efficiency of cloud servers, and therefore is favorable to small enterprises.

Note that the outsourced data may contain sensitive privacy information. It is often necessary to encrypt the private data before transmitting the data to the cloud servers [4], [5]. The data encryption, however, would significantly lower the usability of data due to the difficulty of searching over the encrypted data [6]. Simply encrypting the data may still cause other security concerns. For instance, Google Search uses SSL (Secure Sockets Layer) to encrypt the connection between search user and Google server when private data, such as documents and emails, appear in the search results [7].
However, if the search user clicks into another website from the search results page, that website may be able to identify the search terms that the user has used.
On addressing above issues, the searchable encryption (e.g., [8], [9], [10]) has been recently developed as a fundamental approach to enable searching over encrypted cloud data, which proceeds the following operations. Firstly, the data owner needs to generate several keywords according to the outsourced data. These keywords are then encrypted and stored at the cloud server. When a search user needs to access the outsourced data, it can select some relevant keywords and send the ciphertext of the selected keywords to the cloud server. The cloud server then uses the ciphertext to match the outsourced encrypted keywords, and lastly returns the matching results to the search user. To achieve the similar
search efficiency and precision over encrypted data as that of plaintext keyword search, an extensive body of research has been developed in literature. Wang et al. [11] propose a ranked keyword search scheme which considers the relevance scores of keywords. Unfortunately, due to using order-preserving
encryption (OPE) [12] to achieve the ranking property, the proposed scheme cannot achieve unlinkability of trapdoor.
Later, Sun et al. [13] propose a multi-keyword text search scheme which considers the relevance scores of keywords and utilizes a multidimensional tree technique to achieve efficient search query. Yu et al. [14] propose a multi-keyword top-k retrieval scheme which uses fully homomorphic encryption to
encrypt the index/trapdoor and guarantees high security. Cao et al. [6] propose a multi-keyword ranked search (MRSE), which applies coordinate machine as the keyword matching rule, i.e., return data with the most matching keywords.
Although many search functionalities have been developed in previous literature towards precise and efficient searchable encryption, it is still difficult for searchable encryption to achieve the same user experience as that of the plaintext search, like Google search. This mainly attributes to following
two issues. Firstly, query with user preferences is very popular in the plaintext search [15], [16]. It enables personalized search and can more accurately represent user’s requirements, but has
not been thoroughly studied and supported in the encrypted data domain. Secondly, to further improve the user’s experience on searching, an important and fundamental function is to enable the multi-keyword search with the comprehensive logic operations, i.e., the “AND”, “OR” and “NO” operations
of keywords. This is fundamental for search users to prune the searching space and quickly identify the desired data.
Cao et al. [6] propose the coordinate matching search scheme (MRSE) which can be regarded as a searchable encryption scheme with “OR” operation. Zhang et al. [17] propose a conjunctive keyword search scheme which can be regarded as a searchable encryption scheme with “AND” operation with
the returned documents matching all keywords. However, most existing proposals can only enable search with single logic operation, rather than the mixture of multiple logic operations on keywords, which motivates our work. In this work, we address above two issues by developing two Fine-grained Multi-keyword Search (FMS) schemes over encrypted cloud data. Our original contributions can be summarized in three aspects as follows:
• We introduce the relevance scores and the preference factors of keywords for searchable encryption. The relevance scores of keywords can enable more precise returned results, and the preference factors of keywords represent the importance of keywords in the search keyword set specified by search users and correspondingly enables personalized search to cater to specific user preferences. It thus further improves the search functionalities and user experience.
• We realize the “AND”, “OR” and “NO” operations in the multi-keyword search for searchable encryption. Compared with schemes in [6], [13] and [14], the proposed scheme can achieve more comprehensive functionality and lower query complexity.
• We employ the classified sub-dictionaries technique to enhance the efficiency of the above two schemes. Extensive experiments demonstrate that the enhanced schemes can achieve better efficiency in terms of index building, trapdoor generating and query in the comparison with schemes in [6], [13] and [14].