http://www.oreillynet.com/xml/blog/2007/03/parsing_xml_backwards.html
Okay, I’ve heard jokes about people parsing XML files backwards, starting at the end of the file and reporting SAX events in reverse document order, but it seems that someone has actually gone and done it. The justification sounds almost plausible: an instant messaging client (Adium on the Mac) that writes out XML message log files and uses backwards parsing as a method for retrieving the last N messages in constant time, regardless of how many messages the file contains in total. However, it’s crazy to think of doing this for XML in general.
First problem: the document encoding. You don’t know what it is unless you sniff the beginning of the file and read the XML declaration, if present. A specific application may always write out XML in the same encoding and thus not bother to check, but this is not good enough for the general case.
Second problem: the DOCTYPE declaration. This can define entities and fixed attribute values, and again it’s at the beginning of the file, not the end. If you parse the file backwards and hit an entity reference, you have no idea what to do with it. A specific application may decide it’s just not going to handle entities, but that won’t work for XML in general.
Third problem: comments. Say you’re parsing backwards through an XML file and you see this: -->
. Must be the end of a comment, right? Wrong, it’s the end of an element: </hello-->
. This is a killer for efficient parsing as it means you need potentially unbounded look-ahead (or look-behind, in this case) to decide what something is. (This problem could be avoided if comments were symmetrical and ended with --!>
, but XML just wasn’t designed to be parsed backwards).
Fourth problem: processing instructions. As with the comments, how can you tell what this is:?>
. The problem this time is that text can contain unescaped >
characters (as long as they don’t follow ]]
), so a backwards parser may need to look a very long way ahead to tell if this is the end of a processing instruction or just some text.
Fifth problem: documents that are not well-formed. I shudder to think what this parser would do with them, especially considering that it may stop parsing before it even reaches the beginning of the document.
Sixth problem: how do you append to the document in the first place if the root element is already closed? And if you never close the root element, then it’s not an XML file at all.
In summary, don’t use this technique! Sure, you might think that all of these problems can be avoided by making sure that your XML is in a fixed encoding, doesn’t use entities or processing instructions or whatever, but what’s the point of choosing an open standard text format if you’re going to impose arbitrary limitations on its use?
There are a number of other ways that applications can use to maintain growable log files. You can write multiple well-formed XML documents to a single file, following each one by a binary trailer that gives the size of the last chunk of XML. Then it is trivial for code to jump backwards through the file, grabbing a little document each time and passing it to a real XML parser. Or if your filesystem doesn’t suck, you can place each XML document in a separate file with a suitable filename and scan the directory, just like a maildir. Or if you don’t like writing this kind of code, grab a simple database library like Berkeley DB and let it do the work for you. Just don’t parse XML backwards!
[Note: unless you’re writing a syntax highlighting XML editor, and you want to do efficient update while the user edits the file. Then go for it.]
分享到:
相关推荐
郁闷啊,有时候不得不承认,无论是什么事,曾经是好的,到后边未必还是好的,不要拿曾经的种种来判断今天的结果, 前景:之前本地用jeecg(1.7版本)设计流程、发布流程、修改流程,所有的操作都是项目有汉字启动的,...
bootchart执行出错,替换ubuntu下面的/usr/share/pyshared/pybootchartgui三个脚本就可以了。
Pyramid Scene Parsing Network 是关于语义分割的论文,效果不错。
C++ parsing DSL
This is a code for XML . API includes creating xml file, parsing of xml file, searching
用sax方法解析xml文件,url可以随意改写,解析后的数据以systemout打印在logcat中,可通过logcat看到打印的时间。源码中有写测试语句。
1. Overview of Web Applications, XML, and Java 2. Parsing XML Documents 3. Const
Chapter 5: Parsing XML with DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 6: Parsing XML with SAX . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Chapter 7: XSLT ...
·比RapidXML功能强很多.比Tiny都强多了.速度和Rapid差不多 ·源代码只有285k 3个文件 ·Low memory consumption and fragmentation (compared to other DOM style parsers)....·Lacks UTF-16/32 parsing.
Nodejs: Using for Parsing the XML file from the program
Lexical Analysis and Parsing .pdf
资源来自pypi官网。 资源全名:nr.parsing.date-0.4.4.tar.gz
The xml parsing example code
String Parsing.zip西门子PLC编程实例程序源码下载String Parsing.zip西门子PLC编程实例程序源码下载String Parsing.zip西门子PLC编程实例程序源码下载String Parsing.zip西门子PLC编程实例程序源码下载 1.合个人...
mo-sql-parsing-9.294.22344.tar.gz
资源来自pypi官网。 资源全名:bibliograph.parsing-1.0.1.tar.gz
dll控件常规安装方法(仅供参考): 一、如果在运行某软件或编译程序时提示缺少、找不到dll等类似提示,您可将下载来的dll拷贝到指定目录即可(一般是system系统目录或放到软件同级目录里面),或者重新添加文件引用...
evb project XML parsing demo