Data Depuplication for Cloud Services
Data De-duplication in broad terms refers to the specialized data compression method used in the elimination of duplicate copies of repeating data (Puzio et al. 370; Srinivasan 50; Wang 31). In essence, this popular technique allows cloud storage providers to decrease the amount of needed storage space significantly. Fuller et al. (671) reveal that the principle of data de-duplication is comprehensive, that is, ‘only a single copy of each piece of data is stored.' Although reduplication removes data replication and redundancy, some researchers including Puzio et al. (369) have observed that it identifies major data security and privacy issues for the cloud computing users. Nevertheless, in the recent years, several states of the art technologies relevant for handling security issues during the deduplication have been introduced. This paper will provide brief descriptions of these technologies as well as a comparison of these technologies.
State of the Art Technologies
Most users of cloud computing services have cited data confidentiality and security as their main concern while storing data online. Consequently, various states of the art technologies that aim to enhance security during the data deduplication process have emerged in the recent years.
Convergent encryption refers to a cryptosystem that produces identical cipher text from identical plaintext files (Dasgupta et al. 115). Douceur et al. (11) introduced this technology in their attempt to combine data confidentiality with the prospect of data deduplication. Convergence encryption mainly involves encrypting plaintext using the symmetric (deterministic) encryption scheme with a key, which derived deterministically from plaintext. Precisely, it encrypts a data copy using a convergent key, which users derive by computing the cryptographic hash value of the content of the data copy itself. After users have generated the key and encrypted data, they retain the keys and send the ciphertext to the cloud.
Fuller et al. (671) observe that since encryption is deterministic in nature, identical copies of data would generate the same key as well as the same ciphertext. In turn, this allows the cloud providers to perform data duplication on the cipher-text. For instance, Carstensen et al. (48) point out that, if two users, Bob and Alice, start with the same plaintext m, they would arrive at same ciphertext, and hence data deduplication is possible. Various technology suppliers that have been known to use this technology include Permabit, GNUnet, and Freenet.
Proof of Ownership (POW) technology
Proof of ownership is s state of the art technology developed by Halevi et al. (491) in an attempt to enhance data deduplication for cloud storage. In most cases, after deduplication, a significant risk arises, whereby any user can claim ownership of information or data stored on the cloud server. For that reason, proof of ownership is often needed to help lighten the risk of such opportunistic users. In light of this, Proof of Ownership technology emerged to help such security issues. The concept of this technology is to solve the problem of using small hash values as a proxy for the whole file during deduplication (client-side deduplication).
This technology is implemented mainly as an interactive mechanism/algorithm run by the prover (client) and the verifier (storage server). The storage server (verifier) derives a short value, say, ⱷ(m), from a copy of data m. Consequently, to prove data ownership of m, the client (prover) must send ⱷ’ as well as use the verifier to run the proof algorithm. The proof will pass only if ⱷ’ = ⱷ(m). Precisely, the proof mechanism in this technology provides a solution that enhances security in client-side duplication (Srinivasan 50; Wang 31). To this extent, the client or user can prove to the server he or she indeed has the file. Scholars such as Stanek et al. (363) have supported client-side duplication methods that use ‘Proof of Ownership’ because it enables clients to prove their ownership of information or data copies to the cloud or storage server.
DupLESS (Server-aided CE)
DupLESS, also known as Server-Aided encryption for a deduplicated scheme, is a state of the art technology that uses a modified or sophisticated convergent encryption scheme using secure components for key generation. In respect to DupLESS, as described by Li et al. (1296), clients encrypt their data under message-based keys, which they obtain from a key server via PRF protocol. In turn, this enables users to store their encrypted data with the server or existing service. The cloud service providers would perform deduplication on the behalf of the clients or users while achieving strong security or confidentiality at the same time. Puzio et al. (369) agree that server-aided encryption for deduplicated storage or DupLESS technology not only achieve data confidentiality and performance but also save space by removing data replication and data redundancy.
ClouDedup is a new state of the art technology that aims to provide deduplication at block level while, at the same time, coping with the inherent security exposures of CE (convergent encryptions). This technology is considered an advancement of the Convergence encryption (CE) technology. ClouDedup technique has two basic components including gateway, which is in charge of access control and helps achieve the main protection against potential attacks, and metadata manager, which is in charge of the actual deduplication as well key management operations. The main goal of this technology, as described by Fuller et al. (672), is to facilitate data privacy or confidentiality without losing the benefits or advantage of deduplication. In particular, as mentioned by Stanek et al. (365), the method guarantees confidentially for all files stored by users or clients in the cloud. The diagram below shows a brief summary of the function of ClouDedup technology.
As noted in the diagram above, ClouDedup is highly modified encryption technique that employs various mechanisms including gateway and metadata manager to help enhance security during the deduplication process. The gateway prevents attacks against convergent encryption by encrypting the cipher-texts that result from CE with other encryption algorithms while using the same keying material for all inputs. Metadata manager, on the other hand, as shown in the diagram, enhances the storage of data as well as block signatures and encrypted keys. In essence, metadata manager maintains a small database as well as a linked list to keep track of file composition, file ownership while avoiding the storage of multiple copies of a similar or same data segment.
Comparison of the state of art technologies
As demonstrated in the previous section, various technologies aimed at improving data confidentiality and securities have emerged in the recent years. While some of these technologies are modifications of other technologies, others are completely new technologies introduced to address security issues in data deduplication for storage in clouds. This section provides a comparison of these state of the art technologies.
Firstly, the three technologies, convergence encryption, Proof of Ownership, and ClouDedup are all client-side deduplication techniques (Rass and Slamanig 13; Chang 27). Client-side deduplication is whereby the software of the client transmits the hash value of the file or data to the cloud storage provider. Precisely, in light of this three technologies, the user or client play an important role in enhancing the security of their data. Both convergence encryption and ClouDedup involves the generation of encrypted key, which users derive through the computation of the cryptographic hash value of the content of the information or data copy (Carstensen et al. 49; Dasgupta et al. 119). Equally, in both cases, after users have generated the key and encrypted data, they retain the keys while the ciphertext is sent to the cloud.
Conversely, unlike these three technologies, DupLESS technology enhances data confidentiality in serve-aided deduplication. In this context, the cloud service provider or the server would perform deduplication on the behalf of the user while achieving strong security or confidentiality through the generation of encrypted keys. Another notable difference between these states of the arts is whether they involve the generation of keys or algorithms for security purposes (Park 91; Dasgupta and Naseem 37). While DuPLESS, ClouDedup, and convergence encryption involve the generation of an encrypted key during deduplication and data storage, Proof of Ownership involves the generation of an interactive algorithm, which the client would use to prove that it is the owner of the data stored in the cloud. However, although these technologies have several similarities and differences, they play an important role in enhancing security and data confidentiality in cloud storage.
In conclusion, various state of the art technologies has emerged in the recent years to help ensure security during data deduplication for cloud services. Although these technologies have their own strengths and limitations, researchers such as Li et al. (1297) agree that they have played a critical role in enhancing security in cloud storage.
Fuller, Benjamin, Adam O’Neill, and Leonid Reyzin. "A Unified Approach to Deterministic Encryption: New Constructions and a Connection to Computational Entropy." Journal of Cryptology. 28.3 (2015): 671-717. Print.
Stanek, Jan, Alessandro Sorniotti, Elli Androulaki, and Lukas Kencl. A Secure Data Deduplication Scheme for Cloud Storage. Journal of Computer Science
Puzio, Pasquale, Refik Molva, Melek Onen, and Sergio Loureiro. "Cloudedup: Secure Deduplication with Encrypted Data for Cloud Storage." IEEE Publications Database (2013): 363-370. Print.
Prajapati, Priteshkumar, and Parth Shah. "Efficient Cross User Data Deduplication in Remote Data Storage." IEEE Publications Database. (2014): 1-5. Print.
Li, Jin, Yan K. Li, Xiaofeng Chen, Patrick P. C. Lee, and Wenjing Lou. "A Hybrid Cloud Approach for Secure Authorized Deduplication." IEEE Transactions on Parallel and Distributed Systems. 26.5 (2015): 1206-1216. Print.
Srinivasan, S. Security, Trust, and Regulatory Aspects of Cloud Computing in Business Environments. Hershey, PA : Information Science Reference, 2014. Print.
Wang, Lizhe. Cloud Computing: Methodology, Systems, and Applications. Boca Raton, Fla: CRC Press, 2012. Print.
Carstensen, Jared, Bernard Golden, and JP Morgenthal. Cloud Computing: Assessing the Risks. Ely: IT Governance Publishing, 2012. Print.
Dasgupta, Dipankar, and Durdana Naseem. A Framework for Compliance and Security Coverage Estimation for Cloud Services. Hershey, PA : Information Science Reference, 2014. Print.
Park, James J. Frontier and Innovation in Future Computing and Communications. Dordrecht : Springer, 2014. Print.
Chang, Victor, Robert J. Walters, and Gary Wills. Delivery and Adoption of Cloud Computing Services in Contemporary Organizations, Hershey, PA : Information Science Reference, 2015. Print.
Rass, Stefan, and Daniel Slamanig. Cryptography for Security and Privacy in Cloud Computing. Boston : Artech House, 2014. Print.
Douceur, J.R, A Adya, J Benaloh, W.J Bolosky, and G Yuval. "A Secure Directory Service Based on Exclusive Encryption." IEEE Publications (2002): 10-32. Print.
Halevi, S, D Harnik, A Shulman-Peleg, and B Pinkas. "Proofs of Ownership in Remote Storage Systems." Proceedings of the Acm Conference on Computer and Communications Security. (2011): 491-500. Print.