Crack password of documents - Word, Excel, Pdf - security concerns
There are many tools/software out there claiming they can recover or crack the password of documents and all these tools are getting good customer attention. But the question here is how many of you have succeeded with such tools! Most of these tools share common methods or concepts explained below and we should really think about success rate of such tools before spending money on them.
I believe if a document is strongly protected (128 bit encryption), as of now the possibility of cracking the password and open the document is almost nil! All we can do is try our luck (yes, I really mean it) by these tools. And if we are really lucky, voila, we got the password or else we can leave the password cracking tool running for millions of years!!
This article is for learning purpose only and trying to show vulnerability of legacy RC4 40 bit encryption.
Common type of attacks (methods of cracking)
Password cracking tools usually uses any/all of the below basic methods, with some mix and match on the approach.
1. Dictionary attack
As in the name, in this method the tool will try to open the document by trying a set of all possible words/combination of words from an exhaustive list - called a dictionary. This again can be dictionary + numbers, number + dictionary + special characters etc. Usually there will be options where user can mention what kind of dictionary should be used, how many characters of length tool should try etc.
Dictionary attack usually will take less time to complete compare to Brute force attack, which may run forever!! I highlighted ‘to complete’ because completing the attack does not mean the password has been recovered. It may complete without any result and finally say ‘we could not find the password’ – damn heh.
2. Brute force attack
While dictionary attack try to open the document by trying all the possible combination of dictionary words, the brute force will try to open the document by trying all the possible combination of alphabets, numbers, special characters, punctuations etc. Example like, start with a, then aa, then aaa a….., then next b, bb, bbb, b….other combinations ab, ac, ad etc. Each tool would be having their own method for these combinations and also there will be options like password minimum and maximum length, what kind of characters should be included – lowercase, uppercase, only alphabets etc. If we could mention at least some option, there is a chance that you can recover the password. If you have no idea about password, then very very minimal chance to get the password.
Ok, well, is there any guaranteed method?
The good news (bad news to someone) is that, yes, we have some guaranteed method that will not recover the password but will remove the password (decrypt the document) so that we can open the document with out password.
The bad news (good news to someone) is, not all documents, especially documents created and protected using latest versions of software like MS office 2010 etc, can be unprotected. The reason being they use stronger 128 bit encryption and other such methods to protect the document.
So what kind of protection can be broken?
Well, if the document is encrypted using RC4-40 bit encryption, then we can break the encryption. Earlier versions of MS office (97/2000/XP/2002/2003) uses RC4 encryption and out of these MS Office 97/2000 uses only 40 bit encryption. XP/2002 and 2003 provided with an option to choose different encryption methods. Hence for XP/2002 and 2003, we can unprotect document only if it has been protected using default method – that is if user had not chosen any advanced option while setting his password.
How do we unprotect (decrypt) the document then?
Before we check this, it is better to know what is RC4 encryption. RC4 is a stream cipher encryption algorithm which is used to encrypt text streams (of documents in our case). Once encrypted, to decrypt it uses a key which is generated out of a password.
So here is the catch, it uses a key to decrypt the document. So if we have the key we can decrypt the document without a password.
How to get the encryption key?
The method to get the key remains same, brute force. But here the advantage is, the time taken for brute forcing the key is very very minimal compare to brute forcing the password. As it is a 40 bit encryption, the entire scope of the key is 240(called key space) combinations. So we will try all these possible combination to get the key, which can be then used to decrypt the document.
To search the entire key space with now-a-days advanced processors (Intel dual core, core 2 etc.), and with a single process (one thread), it will take only couple of days – may be a week. And if we use more advanced processor/ or multiple process (threads) it will break in minutes to hours.
See the below chart from Wikipedia showing theoretical limit of brute force attack on various levels of encryption.
Do you still believe if someone/tool claim he can brute force a 128 bit encrypted document and waste your money?
And there is one more new concept called GPU accelerated technique, which uses the power of graphic cards to brute force and the RC4 key can be brute forced in seconds or minutes!! Oh..man.
Once we got the key, then we can decrypt the document using common RC4 algorithm. (You can get this algorithm in various programming languages from the internet, so no worry)
How to validate the encryption key?
This question would have come to your mind when I said we will try all possible key combinations to check which key is the right one to decrypt. Well in case of word documents, the verification is doing against verifier string/characters (encrypted verifier) which is stored in the RC4 encryption header in the document. So we should also know a little about document headers and the document structure.
You may read more about the MS Word document header structure here.
If you would like to know how programmatically read the document header, what the structure of a document is, how we validate the key against the verifier string and very importantly, how technically we can crack the document please check my next post on this Crack password - RC4 40 bit decryption of documents - If I include that also, this document will run pages.
I would also suggest you to do a search on Internet using some of the new terms and key words learned here to learn further if you are interested.
Security concerns – last but not least
Hence if you are really concerned about the security of your documents, I would recommend you to always select 128 bit and other advanced security options from the option ‘Advanced’.
Also when you plan to buy a password cracking/recovering tool think twice.
If you like to publish your own article on hubpages register here