Crack Protected PDFs Using Python

Crack Protected PDFs Using Python

Hello fellow Techie, In this article, I will show how to decrypt and crack a PDF file using Python with different approaches.

image.png
Have you ever come across an encrypted pdf file which is probably a resource that you need for your reading or solving a particular problem at hand.

PDF -Defined

PDF is an abbreviation that stands for Portable Document Format. It's a versatile file format created by Adobe that gives people an easy, reliable way to present and exchange documents - regardless of the software, hardware, or operating systems being used by anyone who views the document.

There are two common ways of encrypting a pdf:
📌 The User Password, or Document open password, prevents to open or view the document.

📌 The Owner Password, or Permission password, or master password, is used to set document restrictions, such as printing, copying contents, editing, extracting pages, commenting, etc. When this password is set, you need it to modify the document.

You can actually apply both protections to the document. Password protecting is the most common method of encrypting PDFs.

image.png

Encrypting a PDF document protects its content from unauthorized access.

Confidential PDF documents can be encrypted and protected with a password. Only people who know the password will be able to decrypt, open and view those documents.

Decrypting means that we are able to undo the encryption of a PDF file so that we can read it.- According to the file format's specifications, PDF supports encryption, using the AES algorithm with Cipher Block Chaining encryption mode.

There are other Advanced Algorithms that might be used by different PDF Software Managers and the methods we are going to use may work or not depending on the algorithm used, password and protection set in place.

Method 1: Using PyPDF3

PyPDF3 is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files.

It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

pip install PyPDF3
import PyPDF3
pdffile= open('IOT.pdf', 'rb')

pdfreader= PyPDF3.PdfFileReader(pdffile)

try:
    # print decryption outcome as 1 if successful and 0 otherwise
    pdfreader.decrypt('password')
except NotImplementedError as errMsg:
    print(pdfreader, 'can not be decrypted on error message', errMsg)
    pdffile.close()

Method 2:Cracking passwords with pikepdf

This method is a probability and exploits the vulnerability of weak and common passwords often use and in this method, we get to use a word list and try to break open the pdf.

This is more of a brute-force method & you can try various and possible combinations! You many need some good computing power to achieve this since it can be time & CPU intensive.

Tools like PyCrunch , which is a Crunch implementation in Python can also help us generate word lists with possible combinations fed.

pikepdf is a Python library allowing the creation, manipulation and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation

pip install pikepdf

We shall also use tqdm for this method

tqdm is a library in Python which is used for creating Progress Meters or Progress Bars. tqdm got its name from the Arabic name taqaddum which means ‘progress’.
It is a fast, extensible progress bar for Python and CLI

pip install tqdm

Here is the wordlist I am using:

import pikepdf
from tqdm import tqdm

# Empty password list
passwords = []

# Contain passwords in text file
password_text_file = "wordlist.txt"

# Iterate through each line
# and store in passwords list
for line in open(password_text_file, errors="ignore"):
    passwords.append(line.strip())

# iterate over passwords
for password in tqdm(passwords, "Cracking PDF File"):
    try:

        # open PDF file and check each password
        with pikepdf.open("IOT.pdf",
                        password = password) as p:

            # If password is correct, break the loop
            print("[+] Password found:", password)
            break

    # If password will not match, it will raise PasswordError
    except pikepdf._qpdf.PasswordError as e:

        # if password is wrong, continue the loop
        continue

Method 3: PyMuPDF

It is a Python binding with support for MuPDF (current version 1.18. *), a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc.

pip install PyMuPDF

import fitz  # this is PyMuPDF

# doc1 = "IOT.pdf"
with open("IOT.pdf", 'r+') as doc1:
    doc = fitz.Document(doc1)

# the document should be password protected
assert doc.needsPass

# print(doc.permissions) 

# decrypt the document
if not doc.authenticate("pass"):
    print('cannot decrypt the document')

Full code here: GitHub Repo

Disclaimer Note: None of the methods above is guaranteed to fully work since there are different mechanisms used in encryption.

If you want a non-python solution, try using QPDF :

QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous other features.

Download qpdf here or Visit GitHub

There are other pdf manipulation libraries you could read about like:
🛠 py XPDF on PyPI
🛠 Python Poppler on PyPI
🛠 py pdftk on PyPI

Feel free to suggest for me any cool idea out there that I could learn about too.

That's It! If you enjoyed this article, consider subscribing to my channel for related content especially about Tech, Python & Programming.

📢Follow me on Twitter :♥

Ronnie Atuhaire