6. 修改 XML
早些时候,电影标题绝对是一团糟。现在,再次打印出来:
for movie in root.iter('movie'): print(movie.attrib)
修复 Back 2 the Future
中的“2”。这应该是一个查找和替换的问题。编写代码以查找标题“Back 2 the Future”并将其保存为变量:
b2tf = root.find("./genre/decade/movie[@title='Back 2 the Future']") print(b2tf)
输出:
<Element 'movie' at 0x10ce00ef8>
请注意,使用该.find()方法会返回树的一个元素。很多时候,编辑元素内的内容更有用。
修改titleBack 2 the Future 元素变量的属性为“Back to the Future”。然后,打印出变量的属性以查看您的更改。您可以通过访问元素的属性然后为其分配新值来轻松地做到这一点:
b2tf.attrib["title"] = "Back to the Future" print(b2tf.attrib)
输出:
{'favorite': 'False', 'title': 'Back to the Future'}
将您的更改写回 XML
,以便它们永久固定在文档中。再次打印您的电影属性以确保您的更改有效。使用.write()
方法来做到这一点:
tree.write("movies.xml") tree = ET.parse('movies.xml') root = tree.getroot() for movie in root.iter('movie'): print(movie.attrib)
输出:
{'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'} {'favorite': 'True', 'title': 'THE KARATE KID'} {'favorite': 'False', 'title': 'Back to the Future'} {'favorite': 'False', 'title': 'X-Men'} {'favorite': 'True', 'title': 'Batman Returns'} {'favorite': 'False', 'title': 'Reservoir Dogs'} {'favorite': 'False', 'title': 'ALIEN'} {'favorite': 'True', 'title': "Ferris Bueller's Day Off"} {'favorite': 'FALSE', 'title': 'American Psycho'} {'favorite': 'False', 'title': 'Batman: The Movie'} {'favorite': 'True', 'title': 'Easy A'} {'favorite': 'True', 'title': 'Dinner for SCHMUCKS'} {'favorite': 'False', 'title': 'Ghostbusters'} {'favorite': 'True', 'title': 'Robin Hood: Prince of Thieves'}
7. 修复属性
该multiple
属性在某些地方不正确。用于ElementTree
根据影片进入的格式来修复指示符。首先,打印format
属性和 text
以查看需要修复的部分。
for form in root.findall("./genre/decade/movie/format"): print(form.attrib, form.text)
输出:
{'multiple': 'No'} DVD {'multiple': 'Yes'} DVD,Online {'multiple': 'False'} Blu-ray {'multiple': 'Yes'} dvd, digital {'multiple': 'No'} VHS {'multiple': 'No'} Online {'multiple': 'Yes'} DVD {'multiple': 'No'} DVD {'multiple': 'No'} blue-ray {'multiple': 'Yes'} DVD,VHS {'multiple': 'No'} DVD {'multiple': 'Yes'} DVD,digital,Netflix {'multiple': 'No'} Online,VHS {'multiple': 'No'} Blu_Ray
在这个标签上需要做一些工作。
您可以使用正则表达式查找逗号 - 这将判断multiple
属性应该是“是”还是“否”。.set()
使用该方法可以轻松地添加和修改属性。
注意:re是 Python 的标准正则表达式解释器。如果您想了解更多关于正则表达式的信息,请考虑本教程。
import re for form in root.findall("./genre/decade/movie/format"): # Search for the commas in the format text match = re.search(',',form.text) if match: form.set('multiple','Yes') else: form.set('multiple','No') # Write out the tree to the file again tree.write("movies.xml") tree = ET.parse('movies.xml') root = tree.getroot() for form in root.findall("./genre/decade/movie/format"): print(form.attrib, form.text)
输出:
{'multiple': 'No'} DVD {'multiple': 'Yes'} DVD,Online {'multiple': 'No'} Blu-ray {'multiple': 'Yes'} dvd, digital {'multiple': 'No'} VHS {'multiple': 'No'} Online {'multiple': 'No'} DVD {'multiple': 'No'} DVD {'multiple': 'No'} blue-ray {'multiple': 'Yes'} DVD,VHS {'multiple': 'No'} DVD {'multiple': 'Yes'} DVD,digital,Netflix {'multiple': 'Yes'} Online,VHS {'multiple': 'No'} Blu_Ray
8. 移动元素
一些数据被放置在错误的十年中。使用您所了解的有关 XML 的知识ElementTree来查找和修复十年数据错误。
打印出整个文档中的decade标签和标签会很有用
for decade in root.findall("./genre/decade"): print(decade.attrib) for year in decade.findall("./movie/year"): print(year.text, '\n')
输出:
{'years': '1980s'} 1981 1984 1985 {'years': '1990s'} 2000 1992 1992 {'years': '1970s'} 1979 {'years': '1980s'} 1986 2000 {'years': '1960s'} 1966 {'years': '2010s'} 2010 2011 {'years': '1980s'} 1984 {'years': '1990s'} 1991
错误十年的两年是2000年代的电影。使用 XPath 表达式找出这些电影是什么。
for movie in root.findall("./genre/decade/movie/[year='2000']"): print(movie.attrib)
输出:
{'favorite': 'False', 'title': 'X-Men'} {'favorite': 'FALSE', 'title': 'American Psycho'}
您必须在动作类型中添加一个新的十年标签,即 2000 年代,才能移动 X 战警数据。该.SubElement()
方法可用于将此标记添加到 XML 的末尾。
action = root.find("./genre[@category='Action']") new_dec = ET.SubElement(action, 'decade') new_dec.attrib["years"] = '2000s' print(ET.tostring(action, encoding='utf8').decode('utf8'))
输出:
<?xml version='1.0' encoding='utf8'?> <genre category="Action"> <decade years="1980s"> <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark"> <format multiple="No">DVD</format> <year>1981</year> <rating>PG</rating> <description> 'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.' </description> </movie> <movie favorite="True" title="THE KARATE KID"> <format multiple="Yes">DVD,Online</format> <year>1984</year> <rating>PG</rating> <description>None provided.</description> </movie> <movie favorite="False" title="Back to the Future"> <format multiple="No">Blu-ray</format> <year>1985</year> <rating>PG</rating> <description>Marty McFly</description> </movie> </decade> <decade years="1990s"> <movie favorite="False" title="X-Men"> <format multiple="Yes">dvd, digital</format> <year>2000</year> <rating>PG-13</rating> <description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description> </movie> <movie favorite="True" title="Batman Returns"> <format multiple="No">VHS</format> <year>1992</year> <rating>PG13</rating> <description>NA.</description> </movie> <movie favorite="False" title="Reservoir Dogs"> <format multiple="No">Online</format> <year>1992</year> <rating>R</rating> <description>WhAtEvER I Want!!!?!</description> </movie> </decade> <decade years="2000s" /></genre>
.append()
现在,分别使用和将 X-Men
电影附加到 2000 年代并将其从 1990 年代删除.remove()
。
xmen = root.find("./genre/decade/movie[@title='X-Men']") dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']") dec2000s.append(xmen) dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']") dec1990s.remove(xmen) print(ET.tostring(action, encoding='utf8').decode('utf8'))
输出:
<?xml version='1.0' encoding='utf8'?> <genre category="Action"> <decade years="1980s"> <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark"> <format multiple="No">DVD</format> <year>1981</year> <rating>PG</rating> <description> 'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.' </description> </movie> <movie favorite="True" title="THE KARATE KID"> <format multiple="Yes">DVD,Online</format> <year>1984</year> <rating>PG</rating> <description>None provided.</description> </movie> <movie favorite="False" title="Back to the Future"> <format multiple="No">Blu-ray</format> <year>1985</year> <rating>PG</rating> <description>Marty McFly</description> </movie> </decade> <decade years="1990s"> <movie favorite="True" title="Batman Returns"> <format multiple="No">VHS</format> <year>1992</year> <rating>PG13</rating> <description>NA.</description> </movie> <movie favorite="False" title="Reservoir Dogs"> <format multiple="No">Online</format> <year>1992</year> <rating>R</rating> <description>WhAtEvER I Want!!!?!</description> </movie> </decade> <decade years="2000s"><movie favorite="False" title="X-Men"> <format multiple="Yes">dvd, digital</format> <year>2000</year> <rating>PG-13</rating> <description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description> </movie> </decade></genre>
9. 构建 XML 文档
很好,所以你基本上可以将整部电影移到新的十年。将更改保存回 XML。
tree.write("movies.xml") tree = ET.parse('movies.xml') root = tree.getroot() print(ET.tostring(root, encoding='utf8').decode('utf8'))
输出:
<?xml version='1.0' encoding='utf8'?> <collection> <genre category="Action"> <decade years="1980s"> <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark"> <format multiple="No">DVD</format> <year>1981</year> <rating>PG</rating> <description> 'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.' </description> </movie> <movie favorite="True" title="THE KARATE KID"> <format multiple="Yes">DVD,Online</format> <year>1984</year> <rating>PG</rating> <description>None provided.</description> </movie> <movie favorite="False" title="Back to the Future"> <format multiple="No">Blu-ray</format> <year>1985</year> <rating>PG</rating> <description>Marty McFly</description> </movie> </decade> <decade years="1990s"> <movie favorite="True" title="Batman Returns"> <format multiple="No">VHS</format> <year>1992</year> <rating>PG13</rating> <description>NA.</description> </movie> <movie favorite="False" title="Reservoir Dogs"> <format multiple="No">Online</format> <year>1992</year> <rating>R</rating> <description>WhAtEvER I Want!!!?!</description> </movie> </decade> <decade years="2000s"><movie favorite="False" title="X-Men"> <format multiple="Yes">dvd, digital</format> <year>2000</year> <rating>PG-13</rating> <description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description> </movie> </decade></genre> <genre category="Thriller"> <decade years="1970s"> <movie favorite="False" title="ALIEN"> <format multiple="No">DVD</format> <year>1979</year> <rating>R</rating> <description>"""""""""</description> </movie> </decade> <decade years="1980s"> <movie favorite="True" title="Ferris Bueller's Day Off"> <format multiple="No">DVD</format> <year>1986</year> <rating>PG13</rating> <description>Funny movie about a funny guy</description> </movie> <movie favorite="FALSE" title="American Psycho"> <format multiple="No">blue-ray</format> <year>2000</year> <rating>Unrated</rating> <description>psychopathic Bateman</description> </movie> </decade> </genre> <genre category="Comedy"> <decade years="1960s"> <movie favorite="False" title="Batman: The Movie"> <format multiple="Yes">DVD,VHS</format> <year>1966</year> <rating>PG</rating> <description>What a joke!</description> </movie> </decade> <decade years="2010s"> <movie favorite="True" title="Easy A"> <format multiple="No">DVD</format> <year>2010</year> <rating>PG--13</rating> <description>Emma Stone = Hester Prynne</description> </movie> <movie favorite="True" title="Dinner for SCHMUCKS"> <format multiple="Yes">DVD,digital,Netflix</format> <year>2011</year> <rating>Unrated</rating> <description>Tim (Rudd) is a rising executive who “succeeds” in finding the perfect guest, IRS employee Barry (Carell), for his boss’ monthly event, a so-called “dinner for idiots,” which offers certain advantages to the exec who shows up with the biggest buffoon. </description> </movie> </decade> <decade years="1980s"> <movie favorite="False" title="Ghostbusters"> <format multiple="Yes">Online,VHS</format> <year>1984</year> <rating>PG</rating> <description>Who ya gonna call?</description> </movie> </decade> <decade years="1990s"> <movie favorite="True" title="Robin Hood: Prince of Thieves"> <format multiple="No">Blu_Ray</format> <year>1991</year> <rating>Unknown</rating> <description>Robin Hood slaying</description> </movie> </decade> </genre> </collection>
10. 结论
关于 XML 和使用ElementTree.
标签构建树结构并指定应该在那里描述的值。使用智能结构可以轻松读取和写入 XML。标签总是需要左括号和右括号来显示父子关系。
属性进一步描述了如何验证标签或允许布尔指定。属性通常采用非常具体的值,以便 XML 解析器(和用户)可以使用这些属性来检查标记值。
ElementTree是一个重要的 Python 库,可让您解析和导航 XML 文档。使用ElementTree将 XML 文档分解为易于使用的树结构。如有疑问,请将其打印出来 ( print(ET.tostring(root, encoding=‘utf8’).decode(‘utf8’))) - 使用这个有用的打印语句一次查看整个 XML 文档。它有助于检查何时从 XML 中编辑、添加或删除。
参考: