Highlight Text In PDF With Different Colors Using Python

 About this Post

In this post I will be sharing a simple python script which will highlight text with different colors in PDF.

Prerequisite

  • Python 2 / Python 3

Python Package Required

  • PyMuPDF

Sample File

Link - https://easyupload.io/2hiobb

Python Code

#fitz is used to highlight text in PDF

import fitz

from fitz.utils import getColor

#we need to read pdf file as binary

with open("sample.pdf", "rb") as f:

    file = f.read()


doc = fitz.open('pdf', file)


#function for highlighting text with color

def highlight(document, text, color_name):

    for i in range(len(document)):

        #looping through pages one by one (here we are having only one page in sample PDF)

        page = document[i]

        # searchFor is a page method that search text and based on finding returns list of Rect value which is used for highlighting

        text_instances = page.searchFor(text.strip()) 

        #here we are defining color for highlighting

        color = {"stroke": getColor(color_name)} 

        for inst in text_instances:

            #annot: additional objects that can be added in document(here we are adding highlight annot)

            annot = page.addHighlightAnnot(inst) 

            #setting color for highlighting

            annot.setColors(color)

            #updating of annotation

            annot.update() 



#We are having "red", "green", "blue", "pink", "yellow", "brown", "purple", "orange" text present in sample pdf file. We are highlighting text with same color 

color_list = ["red", "green", "blue", "pink", "yellow", "brown", "purple", "orange"]

for val in color_list:

    highlight(doc, val, val)

#saving output pdf

doc.save("new.pdf")


Screenshots

Sample PDF

Highlighted PDF


Comments

Post a Comment

Popular posts from this blog

How To Convert HTML Page Into PDF File Using Python

Basic of Python - Variable and Memory Address