Workbook提供的方法如下:
- get_sheet_names:获取所有表格的名称(新版已经不建议使用,通过Workbook的sheetnames属性即可获取)
- get_sheet_by_name:通过表格名称获取Worksheet对象(新版也不建议使用,通过Worksheet[‘表名‘]获取)
- get_active_sheet:获取活跃的表格(新版建议通过active属性获取)(已经不可用)
- remove_sheet:删除一个表格
- create_sheet:创建一个空的表格
- copy_worksheet:在Workbook内拷贝表格
常用的Worksheet方法如下:
- iter_rows:按行获取所有单元格,内置属性有(min_row,max_row,min_col,max_col)
- iter_columns:按列获取所有的单元格
- append:在表格末尾添加数据
- merged_cells:合并多个单元格
- unmerged_cells:移除合并的单元格
Cell对象比较简单,常用的属性如下:
- row:单元格所在的行
- column:单元格坐在的列
- value:单元格的值
- coordinate:单元格的坐标
4.2 获取工作簿中所有表名
import openpyxl wb = openpyxl.load_workbook('C:\\Users\\zdb\\Desktop\\test.xlsx') print(wb.get_sheet_names())
['Sheet1', 'Sheet2', 'Sheet3']
4.3 读取列表元素
import openpyxl wb = openpyxl.load_workbook('C:\\Users\\zdb\\Desktop\\test.xlsx') sheet = wb.get_sheet_by_name('Sheet1') print(sheet['A1'].value) print(sheet.cell(row=2, column=2).value)
4.4 遍历Excel
import openpyxl wb = openpyxl.load_workbook('C:\\Users\\zdb\\Desktop\\test.xlsx') sheet = wb.get_sheet_by_name('Sheet1') for r in sheet: for j in r: print(j.value) print('......')
1 4 ...... 2 3 ......
4.5 获取行列数
import openpyxl wb = openpyxl.load_workbook('C:\\Users\\zdb\\Desktop\\test.xlsx') sheet = wb.get_sheet_by_name('Sheet1') print(sheet.max_row) print(sheet.max_column)
2 2
4.6 获取活跃worksheet
import openpyxl wb = openpyxl.load_workbook('C:\\Users\\zdb\\Desktop\\test.xlsx') sheet = wb.active # 获取当前活跃的worksheet print(sheet) sheet.title = 'abc' # sheet重命名 wb.save('C:\\Users\\zdb\\Desktop\\test_copy.xlsx') # 保存覆盖
<Worksheet "Sheet1">
4.7 创建、删除工作表
import openpyxl wb = openpyxl.load_workbook('C:\\Users\\zdb\\Desktop\\test.xlsx') print(wb.get_sheet_names()) # 显示表名 wb.create_sheet(index=0, title='First sheet') # 创建表 print(wb.get_sheet_names()) wb.remove_sheet(wb.get_sheet_by_name('Sheet1')) # 删除表 print(wb.get_sheet_names())
['Sheet1'] ['First sheet', 'Sheet1'] ['First sheet']
4.8 创建xlsx,将值写入工作表
import openpyxl wb = openpyxl.Workbook() # 创建xlsx sheet = wb.create_sheet("mySheet") # 创建工作表 sheet['A1'] = "Hello World" sheet.cell(row=2, column=1).value = "zdb" wb.save('test.xlsx')
注意:所有的更变之后记得调用save()方法保存变更结果
4.9 项目:更新一个电子表格
import openpyxl """更新某些产品的售价""" wb = openpyxl.load_workbook("productSales.xlsx") sheet = wb.get_sheet_by_name("Sheet") # 新价格表 PRICE_UPDATES = {"Garlic" : 3.07, "Celery" : 1.19, "Lemon" : 1.27} for rowNum in range(2, sheet.max_row + 1): produceName = sheet.cell(row=rowNum, column=1).value if produceName in PRICE_UPDATES: sheet.cell(row=rowNum, column=2).value = PRICE_UPDATES[produceName] wb.save('updateProduceSales.xlsx') # 保存新文件
五. 处理CSV
- CSV文件是简化的电子表格,保存为纯文本文件
- 逗号分隔单元格
- 值没有类型,所有东西都是字符串
- 没有字体大小和颜色的设置
- 没有多个工作表
- 不能指定单元格的宽度和高度
- 不能合并单元格
- 不能嵌入图像和图表
5.1 读操作
import csv, os os.chdir('C:\\Users\\zdb\\Desktop\\') exampleFile = open('test1.csv', encoding='utf-8') exampleReader = csv.reader(exampleFile) exampleData = list(exampleReader) print(exampleData) print(exampleData[1][1]) # 第一行第一列,从第0行开始
[['\ufeff1', '4'], ['2', '3']] 3
5.2 写操作
import csv, os os.chdir('C:\\Users\\zdb\\Desktop\\') outFile = open('output.csv', 'w', newline='') # 写形式打开文件 outputWriter = csv.writer(outFile) outputWriter.writerow(['a', 'b', 'c', 'd']) outputWriter.writerow((['e', 'f', 'g', 'h'])) outFile.close() # 关闭文件
注意:newline关键字必不可少,忘记则行距将有两倍
5.3 项目:从多个csv文件中删除比表头
import csv, os # 从多个csv文件中删除第一行 os.makedirs("headerRemoved", exist_ok=True) for csvFilename in os.listdir('.'): # 遍历每个csv文件 if not csvFilename.endswith('.csv'): continue print("Remove header from " + csvFilename + '...') csvRows = [] # 创建新的csv文件副本,不包含第一行 csvFileObj = open(csvFilename) readerObj = csv.reader(csvFileObj) for row in readerObj: if readerObj.line_num == 1: continue csvRows.append(row) csvFileObj.close() # 生成没有第一行的新文件 csvFileObj = open(os.path.join('headerRemoved', csvFilename), 'w', newline='') csvWriter = csv.writer(csvFileObj) for row in csvRows: # 写入每一行 csvWriter.writerow(row) csvFileObj.close()
六. 处理PDF
6.1 从pdf提取文本
import PyPDF2 pdfFileObj = open("myPDF.pdf", 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) print(pdfReader.numPages) # pdf文件的页数 pageObj = pdfReader.getPage(0) print(pageObj.extract_text()) # 输出第0页的文本内容
6.2 解密pdf
import PyPDF2 pdfFileObj = open("myPDF.pdf", 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) print(pdfReader.isEncrypted) # True则为加密文件 pdfReader.decrypt('mima') # 输入解密密码 pageObj = pdfReader.getPage(0) print(pageObj.extract_text())
6.3 创建pdf文件
模块不允许直接编辑PDF,必须创建一个新的PDF,然后从已有的文档拷贝内容。
- 打开一个或多个已有的PDF文件,得到PdfFileReader对象
- 创建一个新的PdfFileWriter对象
- 将页面从PdfFileReader对象拷贝到PdfFileWriter对象中
- 最后,利用PdfFileWriter对象写入输出的PDF
import PyPDF2 """两个pdf文件合成一个新的PDF文件,并保存""" pdf1File = open('1.pdf', 'rb') # 二进制读 pdf2File = open('2.pdf', 'rb') pdf1Reader = PyPDF2.PdfFileReader(pdf1File) pdf2Reader = PyPDF2.PdfFileReader(pdf2File) pdfWriter = PyPDF2.PdfFileWriter() # 创建一个新的PdfFileWriter对象 for pageNum in range(pdf1Reader.numPages): pageObj = pdf1Reader.getPage(pageNum) pdfWriter.addPage(pageObj) for pageNum in range(pdf2Reader.numPages): pageObj = pdf2Reader.getPage(pageNum) pdfWriter.addPage(pageObj) pdfOutputFile = open('fin.pdf', 'wb') pdfWriter.write(pdfOutputFile) pdfOutputFile.close() pdf1File.close() pdf2File.close()
6.4 旋转页面
import PyPDF2 minutesFile = open("1.pdf", 'rb') pdfReader = PyPDF2.PdfFileReader(minutesFile, strict=False) page = pdfReader.getPage(0) page.rotateClockwise(90) pdfWriter = PyPDF2.PdfFileWriter() pdfWriter.addPage(page) resultPdfFile = open("rotatedPage.pdf", 'wb') pdfWriter.write(resultPdfFile) resultPdfFile.close() minutesFile.close()
6.5 叠加页面
import PyPDF2 """在第一页添加水印""" minutesFile = open("1.pdf", 'rb') pdfReader = PyPDF2.PdfFileReader(minutesFile, strict=False) page = pdfReader.getPage(0) pdfWatermarkReader = PyPDF2.PdfFileReader(open("watermark.pdf", 'rb')) page.mergePage(pdfWatermarkReader.getPage(0)) pdfWriter = PyPDF2.PdfFileWriter() pdfWriter.addPage(page) for pageNum in range(1, pdfReader.numPages): pageObj = pdfReader.getPage(pageNum) pdfWriter.addPage(pageObj) resultPdfFile = open("watermarkedCover.pdf", 'wb') pdfWriter.write(resultPdfFile) minutesFile.close() resultPdfFile.close()
6.6 加密pdf
import PyPDF2 pdfFile = open("meetingminutes.pdf", 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFile) pdfWriter = PyPDF2.PdfFileWriter() for pageNum in range(pdfReader.numPages): pdfWriter.addPage(pdfReader.getPage(pageNum)) pdfWriter.encrypt("mima") resultPdf = open("encrytedminutes.pdf", 'wb') pdfWriter.write(resultPdf) resultPdf.close()
6.7 项目:从多个PDF中合并选择的页面
import PyPDF2, os # 1.获取所有pdf文件 pdfFiles = [] for filename in os.listdir('.'): if filename.endswith('.pdf'): pdfFiles.append(filename) # 2.排序 pdfFiles.sort(key=str.lower) # 3.为输出文件创建PdfFileWriter对象 pdfWriter = PyPDF2.PdfFileWriter() for filename in pdfFiles: pdfFileObj = open(filename, 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) for pageNum in range(1, pdfReader.numPages): pageObj = pdfReader.getPage(pageNum) pdfWriter.addPage(pageObj) pdfOutput = open("allminutes.pdf", 'wb') pdfWriter.write(pdfOutput) pdfOutput.close()
七. 处理docx
pip install python-docx
7.1 读取完整文本
例:读取Word文档
import docx doc = docx.Document('demo.docx') print(len(doc.paragraphs)) print(doc.paragraphs[0].text) print(doc.paragraphs[1].text) print(doc.paragraphs[1].runs) print(doc.paragraphs[1].runs[0].text) print(doc.paragraphs[1].runs[1].text) print(doc.paragraphs[1].runs[2].text)
例:从.docx文件中取得完整的文本
import docx def getText(filename): doc = docx.Document(filename) fullText = [] for para in doc.paragraphs: fullText.append(para.text) return '\n'.join(fullText) print(getText('demo.docx'))
7.2 写入docx文档
import docx doc = docx.Document() doc.add_paragraph("Hello world!") doc.save("helloworld.docx")
例2
import docx doc = docx.Document() doc.add_paragraph("Hello world!") paraObj1 = doc.add_paragraph("This is a second paragraph.") paraObj2 = doc.add_paragraph("This is a yet another paragraph.") paraObj1.add_run(" This text is being added to the second paragraph.") doc.save("newHelloworld.docx")
7.3 添加标题
import docx doc = docx.Document() doc.add_paragraph("Hello world!", 'Title') doc.save("helloworld.docx")
例2
import docx doc = docx.Document() doc.add_heading("Header 0", 0) doc.add_heading("Header 1", 1) doc.add_heading("Header 2", 2) doc.add_heading("Header 3", 3) doc.add_heading("Header 4", 4) doc.save("headings.docx")
八. 处理JSON
8.1 存取操作
例:存
import json numbers = [1, 3, 5, 7, 9] filename = "numbers.json" with open(filename, 'w') as f_obj: json.dump(numbers, f_obj)
例:取
import json filename = "numbers.json" with open(filename) as f_obj: numbers = json.load(f_obj) print(numbers)
[1, 3, 5, 7, 9]