Documentation: the xmlproc APIs
Using the API
Ordinary XML parsing
An application that uses the xmlproc API has to import the xmlproc
module (non-validating parsing) or the xmlval module (validating
parsing). A parser object is created by instantiating an object of the
XMLProcessor class (non-validating) or XMLValidator (validating). Both
classes have the same interface.
If you want to receive information about the document being parsed you
must implement an object conforming to the Application interface, and
tell the parser about it with the set_application method.
If you want to receive error events and react to them you must
implement an object conforming to the ErrorHandler interface, and tell
the parser to use your error handler with the set_error_handler
It is also possible to control the way the parser interprets system
identifiers, by implementing an object conforming to the
InputSourceFactory interface and giving it to the parser with the
Working with DTDs and catalog files
See the DTD API documentation and
the catalog file documentation.
List of interfaces
These are the classes of interest to xmlproc application writers:
This is the interface implemented by the two XML parser objects and is
used to control parsing.
- Instantiates a parser.
- Tells the parser where to send data events.
- Tells the parser where to send error events.
- Tells the parser which object to use to map system identifiers
to file-like objects.
- Tells the parser which object to use to map public identifiers
to system identifiers.
def set_dtd_listener(self, dtd_listener):
- Tells the parser where to send DTD parse events. The dtd_listener
object must implement the
- Makes the parser parse the XML document with the given system identifier.
- Resets the parser to process another file, losing all unparsed data.
- Makes the parser parse a chunk of data.
- Closes the parser, making it process all remaining data. The
effects of calling feed after close and before the first reset are
- Returns the system identifier of the current entity being
- Returns the current offset (in characters) from the start of
- Returns the current line number.
- Returns the current column position.
- Returns the object holding information about the DTD of the
document. This object conforms to the DTD
interface. (Note that the DTD object returned by XMLProcessor will have
much less information, since the XMLProcessor does not keep as much
- Tells the parser which language to report errors in. 'language'
must be an ISO 3166 language code (case does not matter). A KeyError
will be thrown if the language is not supported.
- Tells the parser whether to report data events to the
application after a well-formedness error (0) or whether to stop
reporting data (which is the default, 1).
def set_read_external_subset(self, read):
- Tells the parser whether to read the external DTD subset of
documents (including external parameter entities). Note that
XMLValidator will ignore this method and always read the external
- The parser creates circular data structures during parsing. When
the parser object is no longer to be used and you wish to free the
memory it has allocated, call this method. The parser object will
be non-functional afterwards.
- This method returns the list that holds the stack of open elements.
Note that this list is live and must not be modified
by the application.
- Returns the raw XML string that triggered the current callback event.
- Returns a snapshot of the current stack of open entities as a list
of (entity name, entity sysid) tuples.
This is the interface of the objects that data events from the parsed
- Called by the parser to give the application an object to query
for the current location. The object conforms to the parser interface.
- Called at the start of the document, first of all method calls,
- Called at the end of the document, last of all method calls.
- Notifies the application of comments. (Note that it is improper
for applications to let information in comments affect their
- Called by the parser for each start tag. 'name' is the name of
the element, 'attrs' a attribute name to attribute value hash.
- Called by the parser for each end tag. 'name' is the name of the
- Called by the parser whenever it encounters textual data. (This
callback does not distinguish between character entity references,
entity references, CDATA marked sections or plain text.)
- The validating parser calls this method instead of handle_data
for whitespace that does not appear in elements which allow mixed
content (ie: #PCDATA content).
- Called to notify the application of processing instructions.
- Called to notify the application of the contents of the DOCTYPE
- Called to notify the application of the contents of the XML
declaration (and also for text declarations in external parsed
entities). The values of the parameters will be None if the PI
attributes were not present in the document.
This interface is used to receive information about errors encountered
during the parsing of the document.
- Creates a new error handler, and gives it the locator to use to
locate error events.
- Tells the error handler where to find location information for
the error events. The object given in the 'loc' parameter conforms
to the Parser interface.
- Returns the locator of this error handler.
- Called to handle a warning message.
- Called to handle a non-fatal error.
- Called to handle a fatal error.
This interface is used by the parser to resolve any public identifiers
used in the document to their corresponding system identifiers. The
default implementation always returns the given system identifier, but
the interface has been included mainly to allow support for catalog
- Called to resolve the system identifier at which this external
parameter entity can be found.
- Called to resolve the system identifier at which this document
type definition can be found. (Called from the DOCTYPE declaration.)
- Called to resolve the system identifier of an external entity.
This interface is used to allow users to control the way in which the
parser interprets system identifiers. This is especially useful for
embedding the parser in a larger document system, which may want to
use system identifiers to refer to other documents inside the document
system and not just to be ordinary URLs. It is also useful to allow
the application to interpret system identifiers that are URIs, but not
URLs, such as URNs.
The default implementation interprets system identifiers as URLs.
- This method returns a file-like object from which the document
referred to by the system identifier can be read.
Last update 2000-05-11 14:20, by
Lars Marius Garshol.