Developing XML Applications - table of contents

Preface

Who is this book for?
What the book covers
The choice of programming language
- What is it?
- A common denominator
- Python can talk to anything
- Python is a natural fit for XML programming
Acknowledgments
Uptodateness

Working with XML

1. XML and information systems

1.1. Representing data digitally
- 1.1.1. Notations
- 1.1.2. Data representation
- 1.1.3. Serialization and deserialization
- 1.1.4. Data model
- 1.1.5. Summary
1.2. XML and digital data
1.3. Information systems
- 1.3.1. Anatomy of classical information systems
- 1.3.2. Structured vs unstructured systems
- 1.3.3. Ontologies
- 1.3.4. Information models
- 1.3.5. Summary
1.4. XML and information systems
- 1.4.1. XML in traditional information systems
- 1.4.2. Bridging information systems

2. The XML processing model

2.1. A bit of XML history
2.2. An introduction to XML namespaces
- 2.2.1. Why namespaces?
- 2.2.2. The syntax of namespaces
- 2.2.3. Consequences for the data model
- 2.2.4. What namespaces do
2.3. Documents and parsers
- 2.3.1. Storing XML documents
- 2.3.2. The parser model
- 2.3.3. What does the parser do?
2.4. The result of parsing
- 2.4.1. Why use a parser?
- 2.4.2. Logical and lexical information
- 2.4.3. DTD information
- 2.4.4. Drawing the line

3. Views of documents

3.1. Documents viewed as events
- 3.1.1. Generating output with events
3.2. Documents viewed as trees
3.3. Virtual views
3.4. Virtual documents

4. Common processing tasks

4.1. Serialization and deserialization
4.2. Translation
- 4.2.1. Data format differences
- 4.2.2. Differences in the data model
- 4.2.3. Differences in the information model
- 4.2.4. Ontological differences
4.3. Validation
4.4. Modification
4.5. Information extraction

5. Characters — the atoms of text

5.1. Terminology
- 5.1.1. What is a character?
- 5.1.2. What is a character set?
5.2. Digital text
- 5.2.1. Character sets and encodings
- 5.2.2. Character repertoires
5.3. Important character standards
- 5.3.1. ISO 8859
- 5.3.2. The problem with exchange
- 5.3.3. The Windows code pages
- 5.3.4. Unicode
- 5.3.5. Other character sets
- 5.3.6. XML and Unicode
5.4. Characters in programming languages
- 5.4.1. C
- 5.4.2. C++
- 5.4.3. Java
- 5.4.4. Perl
- 5.4.5. Python
- 5.4.6. Common Lisp
- 5.4.7. tcl
- 5.4.8. Ada95
5.5. Further problems

Event-based processing

6. Event-based processing

6.1. Benefits and disadvantages
6.2. Writing event-based applications
6.3. Tools for event-based processing
- 6.3.1. What parsers are there?
6.4. RSS: An example application
- 6.4.1. Typical RSS usage
- 6.4.2. The structure of RSS documents
- 6.4.3. RSS 1.0

7. Using the XML parsers

7.1. xmlproc
- 7.1.1. Interface outline
- 7.1.2. Interface reference
- 7.1.3. An example application
- 7.1.4. Using xmlproc to validate documents
- 7.1.5. Namespace support in xmlproc
- 7.1.6. Pitfalls
7.2. Pyexpat
- 7.2.1. The interface
- 7.2.2. An example application
- 7.2.3. Dealing with encodings and namespaces
- 7.2.4. Lexical and DTD information
- 7.2.5. Handling external entities
7.3. xmllib
- 7.3.1. Interface overview
- 7.3.2. The parser control interface
- 7.3.3. The general interface
- 7.3.4. The specialized interface
- 7.3.5. An example application
- 7.3.6. Handling lexical information
- 7.3.7. More advanced use
- 7.3.8. Pitfalls
7.4. Working in Jython
7.5. Choosing a parser

8. SAX: An introduction

8.1. Background and history
8.2. Introduction
- 8.2.1. What SAX does
- 8.2.2. The SAX parsers
- 8.2.3. An overview of SAX
- 8.2.4. A very simple example
8.3. The SAX classes
- 8.3.1. XMLReader
- 8.3.2. ContentHandler
- 8.3.3. Attributes
- 8.3.4. ErrorHandler
- 8.3.5. The xml.sax module
8.4. Two example applications
- 8.4.1. RSS to HTML converter
- 8.4.2. A statistics collector
8.5. The Python SAX utilities
- 8.5.1. XMLGenerator

9. Using SAX

9.1. An introduction to XBEL
- 9.1.1. The structure of XBEL documents
9.2. Thinking in SAX
- 9.2.1. Acting after the event
- 9.2.2. Tracking state
9.3. Application-specific data structures
- 9.3.1. The XBEL object structure
- 9.3.2. The XBEL structure builder
- 9.3.3. The XBEL serializer
- 9.3.4. The XBEL to HTML converter
9.4. Example applications
- 9.4.1. The RSS to HTML converter revisited
- 9.4.2. An XML generator
- 9.4.3. A document example
9.5. Tips and tricks
- 9.5.1. Pitfalls in SAX programming
- 9.5.2. How to write an error handler
- 9.5.3. Using SAX in Jython
9.6. Speed
- 9.6.1. Optimizing code
- 9.6.2. Benchmarks

10. Advanced SAX

10.1. The advanced parts of the API
- 10.1.1. SAXException
- 10.1.2. SAXParseException
- 10.1.3. InputSource
- 10.1.4. EntityResolver
- 10.1.5. DTDHandler
- 10.1.6. Locator
- 10.1.7. SAX 2.0 extensibility support
- 10.1.8. SAX 2.0 and namespaces
- 10.1.9. The LexicalHandler
- 10.1.10.
10.2. Parser filters
- 10.2.1. Developing filters
- 10.2.2. The character joiner filter
- 10.2.3. The attribute inheritance filter
- 10.2.4. The XInclude filter
10.3. Working with entities
- 10.3.1. Public identifiers and catalog files
- 10.3.2. Using the SAX EntityResolver
10.4. Mapping non-XML data to XML

Tree-based processing

11. An introduction to the DOM

11.1. Tree-based processing
11.2. Getting to know the DOM
- 11.2.1. The Python DOMs
- 11.2.2. The specification language
- 11.2.3. The basic DOM model
11.3. A DOM overview
- 11.3.1. A quick introduction
- 11.3.2. The flat API
11.4. The fundamental DOM interfaces
- 11.4.1. The Document interface
- 11.4.2. The Element interface
- 11.4.3. The CharacterData, Text and Comment interfaces
- 11.4.4. The attribute interface
- 11.4.5. The DocumentFragment interface
- 11.4.6. The DOMImplementation interface
11.5. A simple example application
11.6. The extended DOM interfaces
- 11.6.1. The CDATASection interface
- 11.6.2. The DocumentType interface
- 11.6.3. The Notation interface
- 11.6.4. The Entity interface
- 11.6.5. The EntityReference interface
- 11.6.6. The ProcessingInstruction interface

12. Using the DOM

12.1. Creating DOM trees
- 12.1.1. Creating an empty document
- 12.1.2. Loading an XML document
12.2. DOM serialization
- 12.2.1. Non-XML serialization
12.3. Some examples
- 12.3.1. Modifying an RSS document
- 12.3.2. XBEL to HTML conversion
- 12.3.3. Shakespeare revisited
- 12.3.4. Using DOM for serialization
12.4. An example: a tree walker

13. Advanced DOM

13.1. Other DOM implementations
- 13.1.1. Using the Java DOMs
- 13.1.2. minidom
13.2. The HTML part of the DOM
13.3. The DOM level 2
- 13.3.1. DOM namespace support
- 13.3.2. Other level 2 extensions
- 13.3.3. Traversal
13.4. Future directions for the DOM
13.5. DOM performance
- 13.5.1. Loading XML documents
- 13.5.2. Serialization
- 13.5.3. Memory use

14. Other tree-based APIs

14.1. qp_xml
- 14.1.1. The qp_xml API
- 14.1.2. An example application
- 14.1.3. Performance
14.2. Groves
- 14.2.1. What groves are
- 14.2.2. What can groves be used for?
- 14.2.3. Grove software
- 14.2.4. The GPS implementation
- 14.2.5. An example property set
- 14.2.6. Using the grove

Declarative processing

15. Introducing XSLT

15.1. Declarative processing
15.2. XSLT background
- 15.2.1. A quick overview
- 15.2.2. Usage contexts for XSL and XSLT
- 15.2.3. An overview of XSL and XSLT implementations
- 15.2.4. XPath: uses and implementations
15.3. Introducing XSLT
- 15.3.1. The XSLT processing model
- 15.3.2. The XSLT and XPath data models
- 15.3.3. XSLT basics
- 15.3.4. Some more useful XSLT instructions
- 15.3.5. Processing modes
- 15.3.6. Useful bits and pieces
- 15.3.7. Some pitfalls
15.4. More examples
- 15.4.1. XBEL to HTML conversion

16. XSLT in more detail

16.1. XPath in more detail
- 16.1.1. The context
- 16.1.2. Location paths
- 16.1.3. XPath expressions
- 16.1.4. The abbreviated syntax
16.2. More advanced XSLT topics
- 16.2.1. Instantiation elements
- 16.2.2. Output methods
- 16.2.3. Combining stylesheets
- 16.2.4. Conflict resolution: precedence
- 16.2.5. Single-template stylesheets
- 16.2.6. Variables, result tree fragments and named templates
- 16.2.7. Extra XPath functions
- 16.2.8. Keys and cross-references
- 16.2.9. Messages
- 16.2.10. XSLT extensions and fallback
- 16.2.11. Producing XSLT stylesheets as output
16.3. More advanced XSLT examples
- 16.3.1. Converting Shakespeare's plays to HTML
- 16.3.2. The rfc-index example
16.4. XSLT performance

17. Using XSLT in applications

17.1. The XSLT processor APIs
- 17.1.1. Using 4XSLT
- 17.1.2. Sablotron
17.2. Larger examples of XSLT programming
- 17.2.1. Some XPath utility functions
- 17.2.2. The group and item elements
- 17.2.3. An XBEL conversion application
17.3. Using XPath in software
- 17.3.1. The 4XPath APIs
- 17.3.2. Creating XPath expressions
- 17.3.3. Mapping XML to objects
17.4. The future of XSLT

18. Architectural forms

18.1. Introduction
18.2. Uses of architectural forms
18.3. Architectural forms software
18.4. An example

Java and XML

19. The Java XML parsers

19.1. XML and Java
19.2. The Java XML parsers
- 19.2.1. Xerces-J
- 19.2.2. Ælfred
- 19.2.3. XP

20. SAX in Java

20.1. The Java version of SAX
20.2. JAXP
- 20.2.1. How to create a parser
- 20.2.2. The JAXP APIs
- 20.2.3. JAXP examples
20.3. The Java SAX APIs
- 20.3.1. The XMLReader interface
- 20.3.2. The ContentHandler interface
- 20.3.3. The ErrorHandler interface
- 20.3.4. The DTDHandler interface
- 20.3.5. The EntityResolver interface
- 20.3.6. The Attributes interface
- 20.3.7. The Locator interface
- 20.3.8. The XMLFilter interface
- 20.3.9. The InputSource class
- 20.3.10. The SAXException
- 20.3.11. The SAXParseException
- 20.3.12. The SAXNotSupportedException
- 20.3.13. The SAXNotRecognizedException
- 20.3.14. The helpers package
20.4. Java SAX examples
- 20.4.1. RSS conversion
- 20.4.2. XBEL conversion

21. DOM in Java

21.1. JAXP and the DOM
- 21.1.1. The DocumentBuilderFactory class
- 21.1.2. The DocumentBuilder class
21.2. The Java DOM APIs
- 21.2.1. The DOMImplementation interface
- 21.2.2. The Node interface
- 21.2.3. The NodeList interface
- 21.2.4. The NamedNodeMap interface
- 21.2.5. The Document interface
- 21.2.6. The DocumentType interface
- 21.2.7. The Element interface
- 21.2.8. The Attr interface
- 21.2.9. The CharacterData interface
- 21.2.10. The Text interface
- 21.2.11. The Comment interface
- 21.2.12. The CDATASection interface
- 21.2.13. The ProcessingInstruction interface
21.3. Using some Java DOMs
- 21.3.1. Accessing Xerces directly
- 21.3.2. Accessing the DOM through JAXP
21.4. JDOM
- 21.4.1. A JDOM example application

22. Using XSLT in Java applications

22.1. Using JAXP
- 22.1.1. JAXP API reference
- 22.1.2. A JAXP example
22.2. The SAXON XSLT Processor
- 22.2.1. SAXON XSLT extensions
22.3. The Xalan XSLT Processor

Processing in depth

23. Other approaches to processing

23.1. Pull APIs
- 23.1.1. RXP
23.2. Hybrid event/tree-based approaches
- 23.2.1. Pyxie
- 23.2.2. eventdom
23.3. Simplified approaches

24. Schemas

24.1. Schemas and XML
- 24.1.1. The schema languages
- 24.1.2. XML Schemas
- 24.1.3. Other languages
24.2. Validating documents
- 24.2.1. Why validate?
- 24.2.2. Using a validating parser
- 24.2.3. Other approaches to validation
24.3. DTD programming
- 24.3.1. The xmlproc DTD APIs
- 24.3.2. DTD normalization
- 24.3.3. Producing test documents

25. Creating XML

25.1. Creating XML from HTML
- 25.1.1. How to read HTML documents
- 25.1.2. A larger example
25.2. Creating XML from SGML
- 25.2.1. SP
25.3. Creating XML from other document formats
25.4. Creating XML from data formats

26. The tabproc framework

26.1. Input handling
- 26.1.1. The CSV file reader
- 26.1.2. The DB-API generator
- 26.1.3. The DBF file reader
26.2. Generating XML from tables
- 26.2.1. The generic XML representation
- 26.2.2. The simple XML mapping
- 26.2.3. The XSLT generator
26.3. A SAX XMLReader interface
26.4. Handling the XML output
26.5. Examples of use
- 26.5.1. Making an RSS document
- 26.5.2. Making a web page

27. The RSS development kit

27.1. The RSS object structure
- 27.1.1. The structure builder
- 27.1.2. The serializers
- 27.1.3. The rsslib module
27.2. The client kit
- 27.2.1. The config module
- 27.2.2. The clientlib module
27.3. The RSS email client
27.4. The GUI RSS client
27.5. The RSS editor

Appendices

A1. A lightning introduction to Python

A1.1. A quick introduction
A1.2. Basic building blocks
- A1.2.1. Variables, values and types
- A1.2.2. The numeric types
- A1.2.3. Strings
- A1.2.4. No value
- A1.2.5. Truth values
- A1.2.6. The sequence types
- A1.2.7. Dictionaries
- A1.2.8. The statements
- A1.2.9. Functions
A1.3. An example program
A1.4. Classes and objects
- A1.4.1. Defining classes
- A1.4.2. Inheritance and scoping
- A1.4.3. The magic methods
A1.5. Various useful APIs
- A1.5.1. The string module
- A1.5.2. The sys module
- A1.5.3. File handling
- A1.5.4. Modules and packages
- A1.5.5. Exception handling
- A1.5.6. Memory management
- A1.5.7. Documentation strings
- A1.5.8. Unicode support
- A1.5.9. A useful idiom

A2. Glossary of terms

A3. The Python XML packages

A3.1. The Python interpreter