Preface
- Who is this book for?
- What the book covers
- The choice of programming language
- What is it?
- A common denominator
- Python can talk to anything
- Python is a natural fit for XML programming
- Acknowledgments
- Uptodateness
Working with XML
1. XML and information systems
- 1.1. Representing data digitally
- 1.1.1. Notations
- 1.1.2. Data representation
- 1.1.3. Serialization and deserialization
- 1.1.4. Data model
- 1.1.5. Summary
- 1.2. XML and digital data
- 1.3. Information systems
- 1.3.1. Anatomy of classical information systems
- 1.3.2. Structured vs unstructured systems
- 1.3.3. Ontologies
- 1.3.4. Information models
- 1.3.5. Summary
- 1.4. XML and information systems
- 1.4.1. XML in traditional information systems
- 1.4.2. Bridging information systems
2. The XML processing model
- 2.1. A bit of XML history
- 2.2. An introduction to XML namespaces
- 2.2.1. Why namespaces?
- 2.2.2. The syntax of namespaces
- 2.2.3. Consequences for the data model
- 2.2.4. What namespaces do
- 2.3. Documents and parsers
- 2.3.1. Storing XML documents
- 2.3.2. The parser model
- 2.3.3. What does the parser do?
- 2.4. The result of parsing
- 2.4.1. Why use a parser?
- 2.4.2. Logical and lexical information
- 2.4.3. DTD information
- 2.4.4. Drawing the line
3. Views of documents
- 3.1. Documents viewed as events
- 3.1.1. Generating output with events
- 3.2. Documents viewed as trees
- 3.3. Virtual views
- 3.4. Virtual documents
4. Common processing tasks
- 4.1. Serialization and deserialization
- 4.2. Translation
- 4.2.1. Data format differences
- 4.2.2. Differences in the data model
- 4.2.3. Differences in the information model
- 4.2.4. Ontological differences
- 4.3. Validation
- 4.4. Modification
- 4.5. Information extraction
5. Characters — the atoms of text
- 5.1. Terminology
- 5.1.1. What is a character?
- 5.1.2. What is a character set?
- 5.2. Digital text
- 5.2.1. Character sets and encodings
- 5.2.2. Character repertoires
- 5.3. Important character standards
- 5.3.1. ISO 8859
- 5.3.2. The problem with exchange
- 5.3.3. The Windows code pages
- 5.3.4. Unicode
- 5.3.5. Other character sets
- 5.3.6. XML and Unicode
- 5.4. Characters in programming languages
- 5.4.1. C
- 5.4.2. C++
- 5.4.3. Java
- 5.4.4. Perl
- 5.4.5. Python
- 5.4.6. Common Lisp
- 5.4.7. tcl
- 5.4.8. Ada95
- 5.5. Further problems
Event-based processing
6. Event-based processing
- 6.1. Benefits and disadvantages
- 6.2. Writing event-based applications
- 6.3. Tools for event-based processing
- 6.3.1. What parsers are there?
- 6.4. RSS: An example application
- 6.4.1. Typical RSS usage
- 6.4.2. The structure of RSS documents
- 6.4.3. RSS 1.0
7. Using the XML parsers
- 7.1. xmlproc
- 7.1.1. Interface outline
- 7.1.2. Interface reference
- 7.1.3. An example application
- 7.1.4. Using xmlproc to validate documents
- 7.1.5. Namespace support in xmlproc
- 7.1.6. Pitfalls
- 7.2. Pyexpat
- 7.2.1. The interface
- 7.2.2. An example application
- 7.2.3. Dealing with encodings and namespaces
- 7.2.4. Lexical and DTD information
- 7.2.5. Handling external entities
- 7.3. xmllib
- 7.3.1. Interface overview
- 7.3.2. The parser control interface
- 7.3.3. The general interface
- 7.3.4. The specialized interface
- 7.3.5. An example application
- 7.3.6. Handling lexical information
- 7.3.7. More advanced use
- 7.3.8. Pitfalls
- 7.4. Working in Jython
- 7.5. Choosing a parser
8. SAX: An introduction
- 8.1. Background and history
- 8.2. Introduction
- 8.2.1. What SAX does
- 8.2.2. The SAX parsers
- 8.2.3. An overview of SAX
- 8.2.4. A very simple example
- 8.3. The SAX classes
- 8.3.1. XMLReader
- 8.3.2. ContentHandler
- 8.3.3. Attributes
- 8.3.4. ErrorHandler
- 8.3.5. The xml.sax module
- 8.4. Two example applications
- 8.4.1. RSS to HTML converter
- 8.4.2. A statistics collector
- 8.5. The Python SAX utilities
9. Using SAX
- 9.1. An introduction to XBEL
- 9.1.1. The structure of XBEL documents
- 9.2. Thinking in SAX
- 9.2.1. Acting after the event
- 9.2.2. Tracking state
- 9.3. Application-specific data structures
- 9.3.1. The XBEL object structure
- 9.3.2. The XBEL structure builder
- 9.3.3. The XBEL serializer
- 9.3.4. The XBEL to HTML converter
- 9.4. Example applications
- 9.4.1. The RSS to HTML converter revisited
- 9.4.2. An XML generator
- 9.4.3. A document example
- 9.5. Tips and tricks
- 9.5.1. Pitfalls in SAX programming
- 9.5.2. How to write an error handler
- 9.5.3. Using SAX in Jython
- 9.6. Speed
- 9.6.1. Optimizing code
- 9.6.2. Benchmarks
10. Advanced SAX
- 10.1. The advanced parts of the API
- 10.1.1. SAXException
- 10.1.2. SAXParseException
- 10.1.3. InputSource
- 10.1.4. EntityResolver
- 10.1.5. DTDHandler
- 10.1.6. Locator
- 10.1.7. SAX 2.0 extensibility support
- 10.1.8. SAX 2.0 and namespaces
- 10.1.9. The LexicalHandler
- 10.1.10.
- 10.2. Parser filters
- 10.2.1. Developing filters
- 10.2.2. The character joiner filter
- 10.2.3. The attribute inheritance filter
- 10.2.4. The XInclude filter
- 10.3. Working with entities
- 10.3.1. Public identifiers and catalog files
- 10.3.2. Using the SAX EntityResolver
- 10.4. Mapping non-XML data to XML
Tree-based processing
11. An introduction to the DOM
- 11.1. Tree-based processing
- 11.2. Getting to know the DOM
- 11.2.1. The Python DOMs
- 11.2.2. The specification language
- 11.2.3. The basic DOM model
- 11.3. A DOM overview
- 11.3.1. A quick introduction
- 11.3.2. The flat API
- 11.4. The fundamental DOM interfaces
- 11.4.1. The Document interface
- 11.4.2. The Element interface
- 11.4.3. The CharacterData, Text and Comment interfaces
- 11.4.4. The attribute interface
- 11.4.5. The DocumentFragment interface
- 11.4.6. The DOMImplementation interface
- 11.5. A simple example application
- 11.6. The extended DOM interfaces
- 11.6.1. The CDATASection interface
- 11.6.2. The DocumentType interface
- 11.6.3. The Notation interface
- 11.6.4. The Entity interface
- 11.6.5. The EntityReference interface
- 11.6.6. The ProcessingInstruction interface
12. Using the DOM
- 12.1. Creating DOM trees
- 12.1.1. Creating an empty document
- 12.1.2. Loading an XML document
- 12.2. DOM serialization
- 12.2.1. Non-XML serialization
- 12.3. Some examples
- 12.3.1. Modifying an RSS document
- 12.3.2. XBEL to HTML conversion
- 12.3.3. Shakespeare revisited
- 12.3.4. Using DOM for serialization
- 12.4. An example: a tree walker
13. Advanced DOM
- 13.1. Other DOM implementations
- 13.1.1. Using the Java DOMs
- 13.1.2. minidom
- 13.2. The HTML part of the DOM
- 13.3. The DOM level 2
- 13.3.1. DOM namespace support
- 13.3.2. Other level 2 extensions
- 13.3.3. Traversal
- 13.4. Future directions for the DOM
- 13.5. DOM performance
- 13.5.1. Loading XML documents
- 13.5.2. Serialization
- 13.5.3. Memory use
14. Other tree-based APIs
- 14.1. qp_xml
- 14.1.1. The qp_xml API
- 14.1.2. An example application
- 14.1.3. Performance
- 14.2. Groves
- 14.2.1. What groves are
- 14.2.2. What can groves be used for?
- 14.2.3. Grove software
- 14.2.4. The GPS implementation
- 14.2.5. An example property set
- 14.2.6. Using the grove
Declarative processing
15. Introducing XSLT
- 15.1. Declarative processing
- 15.2. XSLT background
- 15.2.1. A quick overview
- 15.2.2. Usage contexts for XSL and XSLT
- 15.2.3. An overview of XSL and XSLT implementations
- 15.2.4. XPath: uses and implementations
- 15.3. Introducing XSLT
- 15.3.1. The XSLT processing model
- 15.3.2. The XSLT and XPath data models
- 15.3.3. XSLT basics
- 15.3.4. Some more useful XSLT instructions
- 15.3.5. Processing modes
- 15.3.6. Useful bits and pieces
- 15.3.7. Some pitfalls
- 15.4. More examples
- 15.4.1. XBEL to HTML conversion
16. XSLT in more detail
- 16.1. XPath in more detail
- 16.1.1. The context
- 16.1.2. Location paths
- 16.1.3. XPath expressions
- 16.1.4. The abbreviated syntax
- 16.2. More advanced XSLT topics
- 16.2.1. Instantiation elements
- 16.2.2. Output methods
- 16.2.3. Combining stylesheets
- 16.2.4. Conflict resolution: precedence
- 16.2.5. Single-template stylesheets
- 16.2.6. Variables, result tree fragments and named templates
- 16.2.7. Extra XPath functions
- 16.2.8. Keys and cross-references
- 16.2.9. Messages
- 16.2.10. XSLT extensions and fallback
- 16.2.11. Producing XSLT stylesheets as output
- 16.3. More advanced XSLT examples
- 16.3.1. Converting Shakespeare's plays to HTML
- 16.3.2. The rfc-index example
- 16.4. XSLT performance
17. Using XSLT in applications
- 17.1. The XSLT processor APIs
- 17.1.1. Using 4XSLT
- 17.1.2. Sablotron
- 17.2. Larger examples of XSLT programming
- 17.2.1. Some XPath utility functions
- 17.2.2. The group and item elements
- 17.2.3. An XBEL conversion application
- 17.3. Using XPath in software
- 17.3.1. The 4XPath APIs
- 17.3.2. Creating XPath expressions
- 17.3.3. Mapping XML to objects
- 17.4. The future of XSLT
18. Architectural forms
- 18.1. Introduction
- 18.2. Uses of architectural forms
- 18.3. Architectural forms software
- 18.4. An example
Java and XML
19. The Java XML parsers
- 19.1. XML and Java
- 19.2. The Java XML parsers
- 19.2.1. Xerces-J
- 19.2.2. Ælfred
- 19.2.3. XP
20. SAX in Java
- 20.1. The Java version of SAX
- 20.2. JAXP
- 20.2.1. How to create a parser
- 20.2.2. The JAXP APIs
- 20.2.3. JAXP examples
- 20.3. The Java SAX APIs
- 20.3.1. The XMLReader interface
- 20.3.2. The ContentHandler interface
- 20.3.3. The ErrorHandler interface
- 20.3.4. The DTDHandler interface
- 20.3.5. The EntityResolver interface
- 20.3.6. The Attributes interface
- 20.3.7. The Locator interface
- 20.3.8. The XMLFilter interface
- 20.3.9. The InputSource class
- 20.3.10. The SAXException
- 20.3.11. The SAXParseException
- 20.3.12. The SAXNotSupportedException
- 20.3.13. The SAXNotRecognizedException
- 20.3.14. The helpers package
- 20.4. Java SAX examples
- 20.4.1. RSS conversion
- 20.4.2. XBEL conversion
21. DOM in Java
- 21.1. JAXP and the DOM
- 21.1.1. The DocumentBuilderFactory class
- 21.1.2. The DocumentBuilder class
- 21.2. The Java DOM APIs
- 21.2.1. The DOMImplementation interface
- 21.2.2. The Node interface
- 21.2.3. The NodeList interface
- 21.2.4. The NamedNodeMap interface
- 21.2.5. The Document interface
- 21.2.6. The DocumentType interface
- 21.2.7. The Element interface
- 21.2.8. The Attr interface
- 21.2.9. The CharacterData interface
- 21.2.10. The Text interface
- 21.2.11. The Comment interface
- 21.2.12. The CDATASection interface
- 21.2.13. The ProcessingInstruction interface
- 21.3. Using some Java DOMs
- 21.3.1. Accessing Xerces directly
- 21.3.2. Accessing the DOM through JAXP
- 21.4. JDOM
- 21.4.1. A JDOM example application
22. Using XSLT in Java applications
- 22.1. Using JAXP
- 22.1.1. JAXP API reference
- 22.1.2. A JAXP example
- 22.2. The SAXON XSLT Processor
- 22.2.1. SAXON XSLT extensions
- 22.3. The Xalan XSLT Processor
Processing in depth
23. Other approaches to processing
- 23.1. Pull APIs
- 23.2. Hybrid event/tree-based approaches
- 23.2.1. Pyxie
- 23.2.2. eventdom
- 23.3. Simplified approaches
24. Schemas
- 24.1. Schemas and XML
- 24.1.1. The schema languages
- 24.1.2. XML Schemas
- 24.1.3. Other languages
- 24.2. Validating documents
- 24.2.1. Why validate?
- 24.2.2. Using a validating parser
- 24.2.3. Other approaches to validation
- 24.3. DTD programming
- 24.3.1. The xmlproc DTD APIs
- 24.3.2. DTD normalization
- 24.3.3. Producing test documents
25. Creating XML
- 25.1. Creating XML from HTML
- 25.1.1. How to read HTML documents
- 25.1.2. A larger example
- 25.2. Creating XML from SGML
- 25.3. Creating XML from other document formats
- 25.4. Creating XML from data formats
26. The tabproc framework
- 26.1. Input handling
- 26.1.1. The CSV file reader
- 26.1.2. The DB-API generator
- 26.1.3. The DBF file reader
- 26.2. Generating XML from tables
- 26.2.1. The generic XML representation
- 26.2.2. The simple XML mapping
- 26.2.3. The XSLT generator
- 26.3. A SAX XMLReader interface
- 26.4. Handling the XML output
- 26.5. Examples of use
- 26.5.1. Making an RSS document
- 26.5.2. Making a web page
27. The RSS development kit
- 27.1. The RSS object structure
- 27.1.1. The structure builder
- 27.1.2. The serializers
- 27.1.3. The rsslib module
- 27.2. The client kit
- 27.2.1. The config module
- 27.2.2. The clientlib module
- 27.3. The RSS email client
- 27.4. The GUI RSS client
- 27.5. The RSS editor
Appendices
A1. A lightning introduction to Python
- A1.1. A quick introduction
- A1.2. Basic building blocks
- A1.2.1. Variables, values and types
- A1.2.2. The numeric types
- A1.2.3. Strings
- A1.2.4. No value
- A1.2.5. Truth values
- A1.2.6. The sequence types
- A1.2.7. Dictionaries
- A1.2.8. The statements
- A1.2.9. Functions
- A1.3. An example program
- A1.4. Classes and objects
- A1.4.1. Defining classes
- A1.4.2. Inheritance and scoping
- A1.4.3. The magic methods
- A1.5. Various useful APIs
- A1.5.1. The string module
- A1.5.2. The sys module
- A1.5.3. File handling
- A1.5.4. Modules and packages
- A1.5.5. Exception handling
- A1.5.6. Memory management
- A1.5.7. Documentation strings
- A1.5.8. Unicode support
- A1.5.9. A useful idiom
A2. Glossary of terms
A3. The Python XML packages
- A3.1. The Python interpreter