Hello fellow Techie, In this article, I will show how to decrypt and crack a PDF file using Python with different approaches.
PDF is an abbreviation that stands for Portable Document Format. It's a versatile file format created by Adobe that gives people an easy, reliable way to present and exchange documents - regardless of the software, hardware, or operating systems being used by anyone who views the document.
There are two common ways of encrypting a pdf:
The User Password, or
Document open password, prevents to open or view the document.
The Owner Password, or
Permission password, or
master password, is used to set document restrictions, such as printing, copying contents, editing, extracting pages, commenting, etc. When this password is set, you need it to modify the document.
You can actually apply both protections to the document. Password protecting is the most common method of encrypting PDFs.
Encrypting a PDF document protects its content from unauthorized access.
Confidential PDF documents can be encrypted and protected with a password. Only people who know the password will be able to decrypt, open and view those documents.
Decrypting means that we are able to undo the encryption of a PDF file so that we can read it.- According to the file format's specifications, PDF supports encryption, using the AES algorithm with Cipher Block Chaining encryption mode.
There are other Advanced Algorithms that might be used by different PDF Software Managers and the methods we are going to use may work or not depending on the algorithm used, password and protection set in place.
PyPDF3 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files.
It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.
pip install PyPDF3
import PyPDF3 pdffile= open('IOT.pdf', 'rb') pdfreader= PyPDF3.PdfFileReader(pdffile) try: # print decryption outcome as 1 if successful and 0 otherwise pdfreader.decrypt('password') except NotImplementedError as errMsg: print(pdfreader, 'can not be decrypted on error message', errMsg) pdffile.close()
Cracking passwords with pikepdf
This method is a probability and exploits the vulnerability of weak and common passwords often use and in this method, we get to use a word list and try to break open the pdf.
This is more of a brute-force method & you can try various and possible combinations! You many need some good computing power to achieve this since it can be time & CPU intensive.
pikepdf is a Python library allowing the creation, manipulation and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation
pip install pikepdf
We shall also use
tqdm for this method
tqdm is a library in Python which is used for creating Progress Meters or Progress Bars.
tqdm got its name from the Arabic name
taqaddum which means ‘progress’.
It is a fast, extensible progress bar for Python and CLI
pip install tqdm
Here is the wordlist I am using:
import pikepdf from tqdm import tqdm # Empty password list passwords =  # Contain passwords in text file password_text_file = "wordlist.txt" # Iterate through each line # and store in passwords list for line in open(password_text_file, errors="ignore"): passwords.append(line.strip()) # iterate over passwords for password in tqdm(passwords, "Cracking PDF File"): try: # open PDF file and check each password with pikepdf.open("IOT.pdf", password = password) as p: # If password is correct, break the loop print("[+] Password found:", password) break # If password will not match, it will raise PasswordError except pikepdf._qpdf.PasswordError as e: # if password is wrong, continue the loop continue
It is a Python binding with support for MuPDF (current version 1.18. *), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.
pip install PyMuPDF
import fitz # this is PyMuPDF # doc1 = "IOT.pdf" with open("IOT.pdf", 'r+') as doc1: doc = fitz.Document(doc1) # the document should be password protected assert doc.needsPass # print(doc.permissions) # decrypt the document if not doc.authenticate("pass"): print('cannot decrypt the document')
Full code here: GitHub Repo
Disclaimer Note: None of the methods above is guaranteed to fully work since there are different mechanisms used in encryption.
If you want a non-python solution, try using
QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous other features.
Feel free to suggest for me any cool idea out there that I could learn about too.
That's It! If you enjoyed this article, consider subscribing to my channel for related content especially about Tech, Python & Programming.
📢Follow me on Twitter :♥