ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Data Depuplication for Cloud Services

Updated on June 24, 2018

Data De-duplication in broad terms refers to the specialized data compression method used in the elimination of duplicate copies of repeating data (Puzio et al. 370; Srinivasan 50; Wang 31). In essence, this popular technique allows cloud storage providers to decrease the amount of needed storage space significantly. Fuller et al. (671) reveal that the principle of data de-duplication is comprehensive, that is, ‘only a single copy of each piece of data is stored.' Although reduplication removes data replication and redundancy, some researchers including Puzio et al. (369) have observed that it identifies major data security and privacy issues for the cloud computing users. Nevertheless, in the recent years, several states of the art technologies relevant for handling security issues during the deduplication have been introduced. This paper will provide brief descriptions of these technologies as well as a comparison of these technologies.

State of the Art Technologies

Most users of cloud computing services have cited data confidentiality and security as their main concern while storing data online. Consequently, various states of the art technologies that aim to enhance security during the data deduplication process have emerged in the recent years.

Convergent Encryption

Convergent encryption refers to a cryptosystem that produces identical cipher text from identical plaintext files (Dasgupta et al. 115). Douceur et al. (11) introduced this technology in their attempt to combine data confidentiality with the prospect of data deduplication. Convergence encryption mainly involves encrypting plaintext using the symmetric (deterministic) encryption scheme with a key, which derived deterministically from plaintext. Precisely, it encrypts a data copy using a convergent key, which users derive by computing the cryptographic hash value of the content of the data copy itself. After users have generated the key and encrypted data, they retain the keys and send the ciphertext to the cloud.

Fuller et al. (671) observe that since encryption is deterministic in nature, identical copies of data would generate the same key as well as the same ciphertext. In turn, this allows the cloud providers to perform data duplication on the cipher-text. For instance, Carstensen et al. (48) point out that, if two users, Bob and Alice, start with the same plaintext m, they would arrive at same ciphertext, and hence data deduplication is possible. Various technology suppliers that have been known to use this technology include Permabit, GNUnet, and Freenet.

Proof of Ownership (POW) technology

Proof of ownership is s state of the art technology developed by Halevi et al. (491) in an attempt to enhance data deduplication for cloud storage. In most cases, after deduplication, a significant risk arises, whereby any user can claim ownership of information or data stored on the cloud server. For that reason, proof of ownership is often needed to help lighten the risk of such opportunistic users. In light of this, Proof of Ownership technology emerged to help such security issues. The concept of this technology is to solve the problem of using small hash values as a proxy for the whole file during deduplication (client-side deduplication).

This technology is implemented mainly as an interactive mechanism/algorithm run by the prover (client) and the verifier (storage server). The storage server (verifier) derives a short value, say, ⱷ(m), from a copy of data m. Consequently, to prove data ownership of m, the client (prover) must send ⱷ’ as well as use the verifier to run the proof algorithm. The proof will pass only if ⱷ’ = ⱷ(m). Precisely, the proof mechanism in this technology provides a solution that enhances security in client-side duplication (Srinivasan 50; Wang 31). To this extent, the client or user can prove to the server he or she indeed has the file. Scholars such as Stanek et al. (363) have supported client-side duplication methods that use ‘Proof of Ownership’ because it enables clients to prove their ownership of information or data copies to the cloud or storage server.

DupLESS (Server-aided CE)

DupLESS, also known as Server-Aided encryption for a deduplicated scheme, is a state of the art technology that uses a modified or sophisticated convergent encryption scheme using secure components for key generation. In respect to DupLESS, as described by Li et al. (1296), clients encrypt their data under message-based keys, which they obtain from a key server via PRF protocol. In turn, this enables users to store their encrypted data with the server or existing service. The cloud service providers would perform deduplication on the behalf of the clients or users while achieving strong security or confidentiality at the same time. Puzio et al. (369) agree that server-aided encryption for deduplicated storage or DupLESS technology not only achieve data confidentiality and performance but also save space by removing data replication and data redundancy.


ClouDedup is a new state of the art technology that aims to provide deduplication at block level while, at the same time, coping with the inherent security exposures of CE (convergent encryptions). This technology is considered an advancement of the Convergence encryption (CE) technology. ClouDedup technique has two basic components including gateway, which is in charge of access control and helps achieve the main protection against potential attacks, and metadata manager, which is in charge of the actual deduplication as well key management operations. The main goal of this technology, as described by Fuller et al. (672), is to facilitate data privacy or confidentiality without losing the benefits or advantage of deduplication. In particular, as mentioned by Stanek et al. (365), the method guarantees confidentially for all files stored by users or clients in the cloud. The diagram below shows a brief summary of the function of ClouDedup technology.

As noted in the diagram above, ClouDedup is highly modified encryption technique that employs various mechanisms including gateway and metadata manager to help enhance security during the deduplication process. The gateway prevents attacks against convergent encryption by encrypting the cipher-texts that result from CE with other encryption algorithms while using the same keying material for all inputs. Metadata manager, on the other hand, as shown in the diagram, enhances the storage of data as well as block signatures and encrypted keys. In essence, metadata manager maintains a small database as well as a linked list to keep track of file composition, file ownership while avoiding the storage of multiple copies of a similar or same data segment.

Comparison of the state of art technologies

As demonstrated in the previous section, various technologies aimed at improving data confidentiality and securities have emerged in the recent years. While some of these technologies are modifications of other technologies, others are completely new technologies introduced to address security issues in data deduplication for storage in clouds. This section provides a comparison of these state of the art technologies.

Firstly, the three technologies, convergence encryption, Proof of Ownership, and ClouDedup are all client-side deduplication techniques (Rass and Slamanig 13; Chang 27). Client-side deduplication is whereby the software of the client transmits the hash value of the file or data to the cloud storage provider. Precisely, in light of this three technologies, the user or client play an important role in enhancing the security of their data. Both convergence encryption and ClouDedup involves the generation of encrypted key, which users derive through the computation of the cryptographic hash value of the content of the information or data copy (Carstensen et al. 49; Dasgupta et al. 119). Equally, in both cases, after users have generated the key and encrypted data, they retain the keys while the ciphertext is sent to the cloud.

Conversely, unlike these three technologies, DupLESS technology enhances data confidentiality in serve-aided deduplication. In this context, the cloud service provider or the server would perform deduplication on the behalf of the user while achieving strong security or confidentiality through the generation of encrypted keys. Another notable difference between these states of the arts is whether they involve the generation of keys or algorithms for security purposes (Park 91; Dasgupta and Naseem 37). While DuPLESS, ClouDedup, and convergence encryption involve the generation of an encrypted key during deduplication and data storage, Proof of Ownership involves the generation of an interactive algorithm, which the client would use to prove that it is the owner of the data stored in the cloud. However, although these technologies have several similarities and differences, they play an important role in enhancing security and data confidentiality in cloud storage.

In conclusion, various state of the art technologies has emerged in the recent years to help ensure security during data deduplication for cloud services. Although these technologies have their own strengths and limitations, researchers such as Li et al. (1297) agree that they have played a critical role in enhancing security in cloud storage.

Works Cited

Fuller, Benjamin, Adam O’Neill, and Leonid Reyzin. "A Unified Approach to Deterministic Encryption: New Constructions and a Connection to Computational Entropy." Journal of Cryptology. 28.3 (2015): 671-717. Print.

Stanek, Jan, Alessandro Sorniotti, Elli Androulaki, and Lukas Kencl. A Secure Data Deduplication Scheme for Cloud Storage. Journal of Computer Science

Puzio, Pasquale, Refik Molva, Melek Onen, and Sergio Loureiro. "Cloudedup: Secure Deduplication with Encrypted Data for Cloud Storage." IEEE Publications Database (2013): 363-370. Print.

Prajapati, Priteshkumar, and Parth Shah. "Efficient Cross User Data Deduplication in Remote Data Storage." IEEE Publications Database. (2014): 1-5. Print.

Li, Jin, Yan K. Li, Xiaofeng Chen, Patrick P. C. Lee, and Wenjing Lou. "A Hybrid Cloud Approach for Secure Authorized Deduplication." IEEE Transactions on Parallel and Distributed Systems. 26.5 (2015): 1206-1216. Print.

Srinivasan, S. Security, Trust, and Regulatory Aspects of Cloud Computing in Business Environments. Hershey, PA : Information Science Reference, 2014. Print.

Wang, Lizhe. Cloud Computing: Methodology, Systems, and Applications. Boca Raton, Fla: CRC Press, 2012. Print.

Carstensen, Jared, Bernard Golden, and JP Morgenthal. Cloud Computing: Assessing the Risks. Ely: IT Governance Publishing, 2012. Print.

Dasgupta, Dipankar, and Durdana Naseem. A Framework for Compliance and Security Coverage Estimation for Cloud Services. Hershey, PA : Information Science Reference, 2014. Print.

Park, James J. Frontier and Innovation in Future Computing and Communications. Dordrecht : Springer, 2014. Print.

Chang, Victor, Robert J. Walters, and Gary Wills. Delivery and Adoption of Cloud Computing Services in Contemporary Organizations, Hershey, PA : Information Science Reference, 2015. Print.

Rass, Stefan, and Daniel Slamanig. Cryptography for Security and Privacy in Cloud Computing. Boston : Artech House, 2014. Print.

Douceur, J.R, A Adya, J Benaloh, W.J Bolosky, and G Yuval. "A Secure Directory Service Based on Exclusive Encryption." IEEE Publications (2002): 10-32. Print.

Halevi, S, D Harnik, A Shulman-Peleg, and B Pinkas. "Proofs of Ownership in Remote Storage Systems." Proceedings of the Acm Conference on Computer and Communications Security. (2011): 491-500. Print.


    0 of 8192 characters used
    Post Comment

    No comments yet.


    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at:

    Show Details
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the or domains, for performance and efficiency reasons. (Privacy Policy)
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)