• 6

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

XML Parsing: Element Tree (etree) vs. minidom [duplicate]

DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.

The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.

Depending on your build, each of those has an underlying C implementation that makes them run fast.

None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).

There is also third-party module called lxml which is also a popular choice (full featured and fast).

  • 19
Reply Report
    • And if you have performance issues with the element, there's lxml which provides a compatible interface but uses a battle-hardened, highly tuned C library behind the scenes.
      • 2
    • ElementTree is "more Pythonic" mainly because you say myNode[3] instead of myNode.childNodes[3] to get the second child. It takes 2 lines of code to tweak any DOM implementation so you can do the same. More importantly, ElementTree treats text content vastly different from nearly every other tool, and makes some common tasks much more difficult. For example, to collect all the text, you have to not only recurse, but grab 2 properties off each node (text at the start of an element is stored differently than text that follows a sub-element!)

Python has two interfaces probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM.

If you're concerned about speed, there's also lxml, which builds an ElementTree-compatible DOM using libxml2 and should be quite fast – they have a benchmark suite comparing themselves to ElementTree's Python and C implementations available.

If you're concerned about memory use, you shouldn't be using a tree API anyway; PullDOM might be a better choice, but I'm extrapolating from experience using Java's excellent pull parser – there doesn't seem to be much current information on PullDOM.

  • 15
Reply Report

Trending Tags