Documentation: the xmlproc DTD APIs

Working with DTDs

Accessing DTD information

The complete DTD information is only available from the validating parser, although both parsers implement a get_dtd method, which can be used to get an object conforming to the DTD interface, containing information about the DTD.

Parsing a DTD without parsing a document

This is now supported through the dtdparser module.

List of interfaces

These are the interfaces used to discover information about the DTD of the parsed document:

The DTD interface

This is the interface of the object that holds information about the DTD of the current document. This object can be queried to discover information about the DTD.

def get_root_elem(self):
Returns the name of the element declared as the root element (as a string), or None if none were declared.
def get_elem(self,name):
Returns the element object of the element with the given name. Throws a KeyError if no such element has been declared.
def get_elements(self):
Returns a list of all declared element names.
def get_notation(self,name):
Returns a (pubid, sysid) tuple representing the named notation. If no such notation has been declared a KeyError is thrown.
def get_notations(self):
Returns a list of the names of all declared notations.
def get_general_entities(self):
Returns a list of all declared general entity names.
def get_parameter_entities(self):
Returns a list of all declared parameter entity names.
def resolve_pe(self,name):
Returns the entity object (either InternalEntity or ExternalEntity) of the parameter entity with the given name. If no parameter entity with this name has been declared a KeyError is thrown.
def resolve_ge(self,name):
Returns the entity object (either InternalEntity or ExternalEntity) of the general entity with the given name. If no general entity with this name has been declared a KeyError is thrown.

The ElementType interface

This class encapsulates information about an element type.

def get_name(self):
Returns the name of the element type.
def get_attr_list(self):
Returns a list of the names of the declared attributes for this element (as strings).
def get_attr(self,name):
Returns the attribute object of the given attribute or throws a KeyError if none has been declared.
def get_start_state(self):
Returns the start state of the content model of the element. (No guarantees is made as to the type of this value; just think of it as a magic cookie instead.)
def final_state(self,state):
Returns true if the given state (as returned by get_start_state or next_state) is a final state, ie: one in which the element is allowed to end.
def next_state(self,state,elem_name):
Returns the next state of the element (again in an unspecified type) when the an element with the given name is encountered in the given state. Character data is represented as the element name '#PCDATA'. If the element is not allowed in this state the value 0 will be returned.
def get_valid_elements(self,state):
Returns a list of the valid elements in the given state, or the empty list if none are valid (or if the state is unknown).
def get_content_model(self):
Returns the element content model in (sep,cont,mod) format, where cont is a list of (name,mod) and (sep,cont,mod) tuples. ANY content models are represented as None, and EMPTYs as ("",[],"").

The Attribute interface

This class encapsulates information about an attribute.

def get_name(self):
Returns the name of the attribute.
def get_type(self):
Returns the declared type of the attribute. (ID, CDATA etc.)
def get_decl(self):
Returns the default declaration of the attribute. (#IMPLIED, #REQUIRED, #FIXED or #DEFAULT.)
def get_default(self):
Return the default value of the attribute, or None if none has been declared.
def validate(self,value,err):
Takes an attribute value ('value') and an ErrorHandler ('err') and validates the attribute value for correctness, reporting errors to 'err'.

The Entity interface

This class encapsulates information about entities. It is implemented by two classes: InternalEntity and ExternalEntity. InternalEntity only implements a subset of the interface.

def is_internal(self):
True if the entity is internal, false otherwise.
def is_parsed(self):
True if the entity is parsed, false otherwise. (Not implemented by InternalEntity.)
def get_pubid(self):
Returns the public identifier of the entity. (Not implemented by InternalEntity.)
def get_sysid(self):
Returns the system identifier of the entity. (Not implemented by InternalEntity.)
def get_notation(self):
Return the name of the notation associated with the entity or None if there is None. (Not implemened by InternalEntity.)

The DTDConsumer interface

This interface is used to receive parse events from the DTD parser.

def set_error_handler(self,err):
Sets the error handler of the DTDConsumer. The error handler does not have to be used, but the DTDConsumer must accept this method call.
def dtd_start(self):
Called before any DTD events arrive. (Note: This will be called once for the internal DTD subset (if any) and once for the external DTD subset (if parsed).)
def dtd_end(self):
Called when the DTD is completely parsed. (Note: This will be called once for the internal DTD subset (if any) and once for the external DTD subset (if parsed).)
def new_general_entity(self,name,val):
Called when an internal general entity declaration is encountered. 'val' contains the entity replacement text.
def new_external_entity(self,ent_name,pub_id,sys_id,ndata):
Called when an external general entity declaration is encountered. 'ndata' is the name of the associated notation, or None if none was associated.
def new_parameter_entity(self,name,val):
Called when an internal parameter entity declaration is encountered. 'val' contains the entity replacement text.
def new_external_pe(self,name,pubid,sysid):
Called when an external parameter entity declaration is encountered.
def new_notation(self,name,pubid,sysid):
Called when a notation declaration is encountered.
def new_element_type(self,elem_name,elem_cont):
Called when an element type declaration is encountered. 'elem_cont' is a tuple, as returned by the get_content_model method of the ElementType interface.
def new_attribute(self,elem,attr,a_type,a_decl,a_def):
Called when an attribute declaration is encountered. 'elem' is the name of the element, 'attr' the name of the attribute, 'a_type' the name of the attribute type (ID, CDATA...), 'a_decl' the name of the declared default type (#REQUIRED, #IMPLIED...) and 'a_def' the declared default value (or None if none were declared).
def handle_comment(self,contents):
Called when a comment is encountered inside the DTD.
def handle_pi(self,target,data):
Called when a processing instruction is encountered inside the DTD.

Last update 2000-05-11 14:20, by Lars Marius Garshol.