本期文章,我们用python实现对pdf文件中图片的复制,比如在wps中,这个功能是要收费的,如下图:
要会员,两年198元,呵呵呵。但在python中不存在,哈哈哈。我们来提取一下下面一个pdf中的图片。
共11张图片,直接上代码:
import osfrom io import BytesIOfrom PIL import Imageimport PyPDF2 def extract_images_from_pdf(pdf_path, image_dir): if not os.path.exists(image_dir): os.makedirs(image_dir) with open(pdf_path, 'rb') as pdf_file: pdf_reader = PyPDF2.PdfFileReader(pdf_file) for page_num in range(pdf_reader.numPages): page = pdf_reader.getPage(page_num) try: xObject = page['/Resources']['/XObject'].getObject() for obj in xObject: if xObject[obj]['/Subtype'] == '/Image': img_data = xObject[obj]._data img = Image.open(BytesIO(img_data)) img.save(os.path.join(image_dir, f'{obj[1:]}.png')) except Exception as e: pass if __name__ == '__main__': pdf_path =r'C:\Users\XXXX\Python_project\python提取pdf中图片\input.pdf' image_dir = r'C:\Users\XXXX\Python_project\python提取pdf中图片\images' extract_images_from_pdf(pdf_path, image_dir)
跑一遍,看看文件夹里有没有? ,如下图:
done