【python】SAX和DOM处理XML文件

在这里插入图片描述

在这里插入图片描述

文章目录

  • 前言
  • SAX模块
    • 用SAX读取XML文件
      • 常用函数
      • SAX解析器
      • SAX事件处理器
    • 用SAX解析XML文件综合代码

前言

SAX和DOM都是用于处理XML文件的技术,但它们的处理方式不同。SAX是一种基于事件驱动的解析方式,它逐行读取XML文件并触发相应的事件加粗样式,从而实现对XML文件的解析。而DOM则是将整个XML文件加载到内存中,形成一棵树形结构,通过对树的遍历来实现对XML文件的解析。两种方式各有优缺点,具体使用哪种方式取决于具体的需求。

SAX模块

SAX模块是一种解析XML文档的方式,它基于事件驱动的模型,逐个解析XML文档中的元素和属性,并触发相应的事件。相比于DOM模型,SAX模型更加轻量级,适用于处理大型XML文档

用SAX读取XML文件

XML.sax是一种Python库,用于解析XML文档。它提供了一种基于事件的API,可以在解析XML文档时触发事件,从而实现对XML文档的解析和处理。

常用函数

make_parser建立并返回一个SAX解析器的XMLReader对象

def make_parser(parser_list=()):"""Creates and returns a SAX parser.解析器Creates the first parser it is able to instantiate of the onesgiven in the iterable created by chaining parser_list anddefault_parser_list.  The iterables must contain the names of Python modules containing both a SAX parser and a create_parser function."""

创建它能够实例化的第一个解析器在通过链接 parser _ list 和Default _ parser _ list: 迭代程序必须包含同时包含 SAX 解析器和 create _ parser 函数的 Python 模块的名称。


parse建立一个SAX解析器,并用它来解析XML文档

def parse(source, handler, errorHandler=ErrorHandler()):parser = make_parser()parser.setContentHandler(handler)parser.setErrorHandler(errorHandler)parser.parse(source)

parseString与parse函数类似,但从string参数所提供的字符串中解析XML

def parseString(string, handler, errorHandler=ErrorHandler()):

SAXException封装了XML操作相关错误或警告

class SAXException(Exception):"""Encapsulate an XML error or warning. This class can containbasic error or warning information from either the XML parser orthe application: you can subclass子类 it to provide additionalfunctionality, or to add localization. Note that although you willreceive a SAXException as the argument to the handlers in theErrorHandler interface, you are not actually required to raisethe exception; instead, you can simply read the information init."""

SAX解析器

主要作用是:向事件处理器发送时间

SAX事件处理器

ContentHandler类来实现

# ===== CONTENTHANDLER =====class ContentHandler:"""Interface for receiving logical document content events.This is the main callback interface in SAX, and the one mostimportant to applications. The order of events in this interfacemirrors the order of the information in the document."""

此接口中事件的顺序反映了文档中信息的顺序。

class ContentHandler:"""Interface for receiving logical document content events.This is the main callback interface in SAX, and the one mostimportant to applications. The order of events in this interfacemirrors the order of the information in the document."""def __init__(self):self._locator = None定位器def setDocumentLocator(self, locator):"""Called by the parser to give the application a locator forlocating the origin of document events.由解析器调用,为应用程序提供一个定位文档事件的起源。SAX parsers are strongly encouraged 鼓励(though not absolutelyrequired虽然不是绝对必需的) to supply提供 a locator: if it does so, it must supplythe locator to the application by invoking this method beforeinvoking调用 any of the other methods in the DocumentHandlerinterface.The locator allows the application to determine the endposition of any document-related event, even if the parser isnot reporting an error. Typically, the application will usethis information for reporting its own errors (such ascharacter content that does not match an application'sbusiness rules). The information returned by the locator isprobably not sufficient for use with a search engine.Note that the locator will return correct information onlyduring the invocation 调用of the events in this interface. Theapplication should not attempt to use it at any other time."""self._locator = locatordef startDocument(self):"""Receive notification of the beginning of a document.The SAX parser will invoke this method only once, before anyother methods in this interface or in DTDHandler (except forsetDocumentLocator)."""def endDocument(self):"""Receive notification of the end of a document.The SAX parser will invoke this method only once, and it willbe the last method invoked during the parse. The parser shallnot invoke this method until it has either abandoned parsing(because of an unrecoverable error) or reached the end ofinput."""def startPrefixMapping(self, prefix, uri):"""Begin the scope of a prefix-URI Namespace mapping.
开始了prefix-URI名称空间映射的范围。The information from this event is not necessary for normalNamespace processing: the SAX XML reader will automaticallyreplace prefixes for element and attribute names when thehttp://xml.org/sax/features/namespaces feature is true (thedefault).There are cases, however, when applications need to useprefixes in character data or in attribute values, where theycannot safely be expanded automatically; thestart/endPrefixMapping event supplies the information to theapplication to expand prefixes in those contexts itself, ifnecessary.Note that start/endPrefixMapping events are not guaranteed tobe properly nested relative to each-other: allstartPrefixMapping events will occur before the correspondingstartElement event, and all endPrefixMapping events will occurafter the corresponding endElement event, but their order isnot guaranteed."""def endPrefixMapping(self, prefix):"""End the scope of a prefix-URI mapping映射.See startPrefixMapping for details. This event will alwaysoccur after the corresponding endElement event, but the orderof endPrefixMapping events is not otherwise guaranteed.不以其他方式保证"""def startElement(self, name, attrs):"""Signals the start of an element in non-namespace mode.The name parameter contains the raw XML 1.0 name of theelement type as a string and the attrs parameter holds aninstance of the Attributes class containing the attributes ofthe element."""def endElement(self, name):"""Signals the end of an element in non-namespace mode.The name parameter contains the name of the element type, justas with the startElement event."""def startElementNS(self, name, qname, attrs):"""Signals the start of an element in namespace mode.The name parameter contains the name of the element type as a(uri, localname) tuple, the qname parameter the raw XML 1.0name used in the source document, and the attrs parameterholds an instance of the Attributes class containing theattributes of the element.The uri part of the name tuple is None for elements which haveno namespace."""def endElementNS(self, name, qname):"""Signals the end of an element in namespace mode.The name parameter contains the name of the element type, justas with the startElementNS event."""def characters(self, content):"""Receive notification of character data.The Parser will call this method to report each chunk ofcharacter data. SAX parsers may return all contiguouscharacter data in a single chunk, or they may split it intoseveral chunks; however, all of the characters in any singleevent must come from the same external entity so that theLocator provides useful information."""def ignorableWhitespace(self, whitespace):"""Receive notification of ignorable whitespace in element content.Validating Parsers must use this method to report each chunkof ignorable whitespace (see the W3C XML 1.0 recommendation,section 2.10): non-validating parsers may also use this methodif they are capable of parsing and using content models.SAX parsers may return all contiguous whitespace in a singlechunk, or they may split it into several chunks; however, allof the characters in any single event must come from the sameexternal entity, so that the Locator provides usefulinformation."""def processingInstruction(self, target, data):"""Receive notification of a processing instruction.The Parser will invoke this method once for each processinginstruction found: note that processing instructions may occurbefore or after the main document element.A SAX parser should never report an XML declaration (XML 1.0,section 2.8) or a text declaration (XML 1.0, section 4.3.1)using this method."""def skippedEntity(self, name):"""Receive notification of a skipped entity.实体The Parser will invoke this method once for each entityskipped. Non-validating processors may skip entities if theyhave not seen the declarations (because, for example, theentity was declared in an external DTD subset). All processorsmay skip external entities, depending on the values of thehttp://xml.org/sax/features/external-general-entities and thehttp://xml.org/sax/features/external-parameter-entitiesproperties."""# ===== DTDHandler =====

用SAX解析XML文件综合代码

SAX_parse_XML.py

# coding=gbk
import xml.sax
import sys
get_record=[] # 接受获取xml文档数据
class GetStorehouse(xml.sax.ContentHandler):# 事件处理器def __init__(self):self.CurrentDate=""# 自定义当前元素标签名属性self.title=""# 自定义商品二级分类属性self.name=""self.amount=""self.price=""def startElement(self,label,atrributes):# 遇到元素开始标签出发该函数self.CurrentDate=label # label为实例对象在解析的时候传递的标签名if label=="goods":category=atrributes["category"]return categorydef endElement(self,label):global get_recordif self.CurrentDate=="title":get_record.append(self.title)elif self.CurrentDate=="name":get_record.append(self.name)elif self.CurrentDate=="amount":get_record.append(self.amount)elif self.CurrentDate=="price":get_record.append(self.price)def characters(self,content):if self.CurrentDate=="title":self.title=contentelif self.CurrentDate=="name":self.name=contentelif self.CurrentDate=="amount":self.amount=contentelif self.CurrentDate=="price":self.price=content#=======
parser=xml.sax.make_parser()#创建一个解析器的XMLreader对象
parser.setFeature(xml.sax.handler.feature_namespaces,0)# 从xml文件解析数据,关闭从命名空间解析数据
Handler=GetStorehouse()
parser.setContentHandler(Handler)
parser.parse("storehouse.xml")
print(get_record)
['淡水鱼', '鲫鱼', '18', '8', '    ', '温带水果', '猕猴桃', '10', '10', '    ', '\n']
<storehouse><goods category="fish"><title>淡水鱼</title><name>鲫鱼</name><amount>18</amount><price>8</price></goods><goods category="fruit"><title>温带水果</title><name>猕猴桃</name><amount>10</amount><price>10</price></goods>
</storehouse>


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部