python提取xml指定内容

简介: python提取xml指定内容

1.第一种方法:python操作xml文件

随手找了一个xml文件内容(jenkins相关文件)

<?xml version="1.0" encoding="UTF-8"?>
<!--
The MIT License
Copyright (c) 2004-2009, Sun Microsystems, Inc., Kohsuke Kawaguchi, Tom Huybrechts, id:digerata, Yahoo! Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
-->
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
         version="3.1"
         metadata-complete="true">
  <display-name>Jenkins v2.336</display-name>
  <description>Build management system</description>
  <servlet>
    <servlet-name>Stapler</servlet-name>
    <servlet-class>org.kohsuke.stapler.Stapler</servlet-class>
    <init-param>
      <param-name>default-encodings</param-name>
      <param-value>text/html=UTF-8</param-value>
    </init-param>
    <init-param>
      <param-name>diagnosticThreadName</param-name>
      <param-value>false</param-value>
    </init-param>
    <async-supported>true</async-supported>
  </servlet>
  <servlet-mapping>
    <servlet-name>Stapler</servlet-name>
    <url-pattern>/*</url-pattern>
  </servlet-mapping>
  <filter>
    <filter-name>suspicious-request-filter</filter-name>
    <filter-class>jenkins.security.SuspiciousRequestFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>diagnostic-name-filter</filter-name>
    <filter-class>org.kohsuke.stapler.DiagnosticThreadNameFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>encoding-filter</filter-name>
    <filter-class>hudson.util.CharacterEncodingFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>compression-filter</filter-name>
    <filter-class>org.kohsuke.stapler.compression.CompressionFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>authentication-filter</filter-name>
    <filter-class>hudson.security.HudsonFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>csrf-filter</filter-name>
    <filter-class>hudson.security.csrf.CrumbFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <filter>
    <filter-name>plugins-filter</filter-name>
    <filter-class>hudson.util.PluginServletFilter</filter-class>
    <async-supported>true</async-supported>
  </filter>
  <!--
  The Headers filter allows us to override headers sent by the container
  that may be in conflict with what we want.  For example, Tomcat will set
  Cache-Control: no-cache for any files behind the security-constraint
  below.  So if Hudson is on a public server, and you want to only allow
  authorized users to access it, you may want to pay attention to this.
  See: http://www.nabble.com/No-browser-caching-with-Hudson- -tf4601857.html
  <filter>
    <filter-name>change-headers-filter</filter-name>
    <filter-class>hudson.ResponseHeaderFilter</filter-class>
    <!- The value listed here is for 24 hours.  Increase or decrease as you see 
    fit.  Value is in seconds. Make sure to keep the public option ->
    <init-param>
      <param-name>Cache-Control</param-name>
      <param-value>max-age=86400, public</param-value>
    </init-param>
    <!- It turns out that Tomcat just doesn't want to let
    go of its cache option.  If you override Cache-Control,
    it starts to send Pragma: no-cache as a backup.
     ->
    <init-param>
      <param-name>Pragma</param-name>
      <param-value>public</param-value>
    </init-param>
  </filter>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.css</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.gif</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.js</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>change-headers-filter</filter-name>
    <url-pattern>*.png</url-pattern>
  </filter-mapping>
  -->
  <filter-mapping>
    <filter-name>suspicious-request-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>diagnostic-name-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>encoding-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>compression-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>authentication-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>csrf-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter-mapping>
    <filter-name>plugins-filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <listener>
    <!-- Must be before WebAppMain in order to initialize the context before the first use of this class. -->
    <listener-class>jenkins.util.SystemProperties$Listener</listener-class>
  </listener>
  <listener>
    <listener-class>hudson.WebAppMain</listener-class>
  </listener>
  <listener>
    <listener-class>jenkins.JenkinsHttpSessionListener</listener-class>
  </listener>
  <!--
    JENKINS-1235 suggests containers interpret '*' as "all roles defined in web.xml"
    as opposed to "all roles defined in the security realm", so we need to list some
    common names in the hope that users will have at least one of those roles.
  -->
  <security-role>
    <role-name>admin</role-name>
  </security-role>
  <security-role>
    <role-name>user</role-name>
  </security-role>
  <security-role>
    <role-name>hudson</role-name>
  </security-role>
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Hudson</web-resource-name>
      <url-pattern>/loginEntry</url-pattern>
      <!--http-method>GET</http-method-->
    </web-resource-collection>
    <auth-constraint>
      <role-name>**</role-name>
    </auth-constraint>
  </security-constraint>
  <!-- Disable TRACE method with security constraint (copied from jetty/webdefaults.xml) -->
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Disable TRACE</web-resource-name>
      <url-pattern>/*</url-pattern>
      <http-method>TRACE</http-method>
    </web-resource-collection>
    <auth-constraint />
  </security-constraint>
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>other</web-resource-name>
      <url-pattern>/*</url-pattern>
    </web-resource-collection>
    <!-- no security constraint --> 
  </security-constraint>
  <login-config>
    <auth-method>FORM</auth-method>
    <form-login-config>
      <form-login-page>/login</form-login-page>
      <form-error-page>/loginError</form-error-page>
    </form-login-config>
  </login-config>
  <!-- if specified, this value is used as the Hudson home directory -->
  <env-entry>
    <env-entry-name>HUDSON_HOME</env-entry-name>
    <env-entry-type>java.lang.String</env-entry-type>
    <env-entry-value></env-entry-value>
  </env-entry>
  <!-- configure additional extension-content-type mappings -->
  <mime-mapping>
    <extension>xml</extension>
    <mime-type>application/xml</mime-type>
  </mime-mapping>
  <!--mime-mapping> commenting out until this works out of the box with JOnAS. See  http://www.nabble.com/Error-with-mime-type%2D-%27application-xslt%2Bxml%27-when-deploying-hudson-1.316-in-jonas-td24740489.html
    <extension>xsl</extension>
    <mime-type>application/xslt+xml</mime-type>
  </mime-mapping-->
  <mime-mapping>
    <extension>log</extension>
    <mime-type>text/plain</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>war</extension>
    <mime-type>application/octet-stream</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>ear</extension>
    <mime-type>application/octet-stream</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>rar</extension>
    <mime-type>application/octet-stream</mime-type>
  </mime-mapping>
  <mime-mapping>
    <extension>webm</extension>
    <mime-type>video/webm</mime-type>
  </mime-mapping>
  <error-page>
    <exception-type>java.lang.Throwable</exception-type>
    <location>/oops</location>
  </error-page>
  <session-config>
    <cookie-config>
      <!-- See https://www.owasp.org/index.php/HttpOnly for the discussion of this topic in OWASP -->
      <http-only>true</http-only>
    </cookie-config>
    <!-- Tracking mode is managed by WebAppMain.FORCE_SESSION_TRACKING_BY_COOKIE_PROP -->
  </session-config>
</web-app>

提取某个单个字段:

# coding=utf-8
"""
    作者:gaojs
    功能:
    新增功能:
    日期:2022/6/2 17:12
"""
import xml.dom.minidom
dom = xml.dom.minidom.parse('web.xml')
root = dom.documentElement
bond_list = root.getElementsByTagName('filter-name')
print(bond_list[0].firstChild.data)

运行结果

image.png

批量提取某个标签值,并将其写入文本:

# coding=utf-8
"""
    作者:gaojs
    功能:
    新增功能:
    日期:2022/6/2 17:12
"""
import xml.dom.minidom
dom = xml.dom.minidom.parse('web.xml')
root = dom.documentElement
filter_list = root.getElementsByTagName('filter-name')
# print(filter_list[0].firstChild.data)
for bond in filter_list:
    s = bond.firstChild.data
    print(s)
    with open('filter_result.txt', 'a') as fin:
        fin.write(s + '\n')

文件结果:

image.png

2.第二种:正则提取xml指定内容方法

with open('web.xml', mode='r') as fin:
    test = fin.read()
    result = re.findall('<filter-name>(.*?)</filter-name>', test)
    for key in result:
        print(key)
        with open('array/filter_result.txt', 'a') as f:
            f.write(key + '\n')

结果:

image.png

相关文章
|
21天前
|
XML 存储 JSON
Python学习 -- 常用数据交换格式(CSV、XML、JSON)
Python学习 -- 常用数据交换格式(CSV、XML、JSON)
33 0
|
21天前
|
XML 前端开发 数据格式
BeautifulSoup 是一个 Python 库,用于从 HTML 和 XML 文件中提取数据
【5月更文挑战第10天】BeautifulSoup 是 Python 的一个库,用于解析 HTML 和 XML 文件,即使在格式不规范的情况下也能有效工作。通过创建 BeautifulSoup 对象并使用方法如 find_all 和 get,可以方便地提取和查找文档中的信息。以下是一段示例代码,展示如何安装库、解析 HTML 数据以及打印段落、链接和特定类名的元素。BeautifulSoup 还支持更复杂的查询和文档修改功能。
30 1
|
6天前
|
XML JavaScript API
Python XML 解析
Python XML 解析
|
16天前
|
XML 数据格式 Python
python挑出训练集里图片对应的xml文件,方便统计标签框的类别与数目_python 统计voc2007xml中某一类别框个数(1)
python挑出训练集里图片对应的xml文件,方便统计标签框的类别与数目_python 统计voc2007xml中某一类别框个数(1)
|
21天前
|
XML 数据格式 Python
【代码片段】【Python】XML 字符串格式化打印
【代码片段】【Python】XML 字符串格式化打印
18 0
|
21天前
|
XML JavaScript API
「Python系列」Python XML解析
在Python中,解析XML文件通常使用内置的`xml.etree.ElementTree`模块,它提供了一个轻量级、高效的方式来解析XML文档。此外,还有其他的第三方库,如`lxml`和`xml.dom`,它们提供了更多的功能和灵活性。
16 0
|
21天前
|
XML 测试技术 API
Python下的XML文件处理技巧与实践
【2月更文挑战第2天】 Python下的XML文件处理技巧与实践
68 0
|
21天前
|
XML 安全 API
Python读写XML文件:深入解析与技术实现
Python读写XML文件:深入解析与技术实现
75 0
|
21天前
|
XML 数据格式 Python
Python生成XML文件
Python生成XML文件
23 0
|
21天前
|
XML JavaScript 数据格式
python - bs4提取XML/HTML中某个标签下的属性
python - bs4提取XML/HTML中某个标签下的属性
32 0