ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Crack password of documents - Word, Excel, Pdf - security concerns

Updated on April 2, 2013

There are many tools/software out there claiming they can recover or crack the password of documents and all these tools are getting good customer attention. But the question here is how many of you have succeeded with such tools! Most of these tools share common methods or concepts explained below and we should really think about success rate of such tools before spending money on them.

I believe if a document is strongly protected (128 bit encryption), as of now the possibility of cracking the password and open the document is almost nil! All we can do is try our luck (yes, I really mean it) by these tools. And if we are really lucky, voila, we got the password or else we can leave the password cracking tool running for millions of years!!

This article is for learning purpose only and trying to show vulnerability of legacy RC4 40 bit encryption. 

Common type of attacks (methods of cracking)

Password cracking tools usually uses any/all of the below basic methods, with some mix and match on the approach.

1. Dictionary attack

As in the name, in this method the tool will try to open the document by trying a set of all possible words/combination of words from an exhaustive list - called a dictionary. This again can be dictionary + numbers, number + dictionary + special characters etc. Usually there will be options where user can mention what kind of dictionary should be used, how many characters of length tool should try etc.

Dictionary attack usually will take less time to complete compare to Brute force attack, which may run forever!! I highlighted ‘to complete’ because completing the attack does not mean the password has been recovered. It may complete without any result and finally say ‘we could not find the password’ – damn heh.

2. Brute force attack

While dictionary attack try to open the document by trying all the possible combination of dictionary words, the brute force will try to open the document by trying all the possible combination of alphabets, numbers, special characters, punctuations etc. Example like, start with a, then aa, then aaa a….., then next b, bb, bbb, b….other combinations ab, ac, ad etc. Each tool would be having their own method for these combinations and also there will be options like password minimum and maximum length, what kind of characters should be included – lowercase, uppercase, only alphabets etc. If we could mention at least some option, there is a chance that you can recover the password. If you have no idea about password, then very very minimal chance to get the password.

Ok, well, is there any guaranteed method?

The good news (bad news to someone) is that, yes, we have some guaranteed method that will not recover the password but will remove the  password (decrypt the document) so that we can open the document with out password.

The bad news (good news to someone) is, not all documents, especially documents created and protected using latest versions of software like MS office 2010 etc, can be unprotected. The reason being they use stronger 128 bit encryption and other such methods to protect the document.

So what kind of protection can be broken?

Well, if the document is encrypted using RC4-40 bit encryption, then we can break the encryption. Earlier versions of MS office (97/2000/XP/2002/2003) uses RC4 encryption and out of these MS Office 97/2000 uses only 40 bit encryption. XP/2002 and 2003 provided with an option to choose different encryption methods. Hence for XP/2002 and 2003, we can unprotect document only if it has been protected using default method – that is if user had not chosen any advanced option while setting his password.

How do we unprotect (decrypt) the document then?

Before we check this, it is better to know what is RC4 encryption. RC4 is a stream cipher encryption algorithm which is used to encrypt text streams (of documents in our case). Once encrypted, to decrypt it uses a key which is generated out of a password.

So here is the catch, it uses a key to decrypt the document. So if we have the key we can decrypt the document without a password.

How to get the encryption key?

The method to get the key remains same, brute force. But here the advantage is, the time taken for brute forcing the key is very very minimal compare to brute forcing the password. As it is a 40 bit encryption, the entire scope of the key is 240(called key space) combinations. So we will try all these possible combination to get the key, which can be then used to decrypt the document.

To search the entire key space with now-a-days advanced processors (Intel dual core, core 2 etc.), and with a single process (one thread), it will take only couple of days – may be a week. And if we use more advanced processor/ or multiple process (threads) it will break in minutes to hours.

See the below chart from Wikipedia showing theoretical limit of brute force attack on various levels of encryption.

Source

Do you still believe if someone/tool claim he can brute force a 128 bit encrypted document and waste your money?

And there is one more new concept called GPU accelerated technique, which uses the power of graphic cards to brute force and the RC4 key can be brute forced in seconds or minutes!! Oh..man.

Once we got the key, then we can decrypt the document using common RC4 algorithm. (You can get this algorithm in various programming languages from the internet, so no worry)

How to validate the encryption key?

This question would have come to your mind when I said we will try all possible key combinations to check which key is the right one to decrypt. Well in case of word documents, the verification is doing against verifier string/characters (encrypted verifier) which is stored in the RC4 encryption header in the document. So we should also know a little about document headers and the document structure.

You may read more about the MS Word document header structure here.

Source

If you would like to know how programmatically read the document header, what the structure of a document is, how we validate the key against the verifier string and very importantly, how technically we can crack the document please check my next post on this Crack password - RC4 40 bit decryption of documents - If I include that also, this document will run pages.

I would also suggest you to do a search on Internet using some of the new terms and key words learned here to learn further if you are interested.

Security concerns – last but not least

Hence if you are really concerned about the security of your documents, I would recommend you to always select 128 bit and other advanced security options from the option ‘Advanced’.

Also when you plan to buy a password cracking/recovering tool think twice.

If you like to publish your own article on hubpages register here

working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)