Parsing XML in Kamaelia using an expat parser

January 13, 2008 at 03:05 PM | categories: python, oldblog | View Comments

Generally we've used sax for parsing XML, but it's useful to show how to parse XML using a parser like expat that works by calling back into your code. The trick, as usual with anything long running, is to put the thing that does the long running call into a thread and have it emit messages when a callback is called. The following is a minimal example:
import time
import Axon
import xml.parsers.expat
from Kamaelia.Chassis.Pipeline import Pipeline
from Kamaelia.Util.Console import ConsoleEchoer

class Parser(Axon.ThreadedComponent.threadedcomponent):
    data = "<h1> Default </h1>"  # Can be overridden by kwargs as normal

    def start_element(self,name,attrs):
        self.send(("START", name,attrs), "outbox")

    def end_element(self,name):
        self.send(("END", name), "outbox")

    def char_data(self,data):
        data = data.strip()
        self.send(("DATA", data), "outbox")

    def main(self):
        p = xml.parsers.expat.ParserCreate()
        p.StartElementHandler = self.start_element
        p.EndElementHandler = self.end_element
        p.CharacterDataHandler = self.char_data
        p.Parse(self.data, 1)
        time.sleep(1)
        self.send(Axon.Ipc.producerFinished(), "signal")

Pipeline(
    Parser(data="<body><h1>Hello</h1> world <p>Woo</p></body>"),
    ConsoleEchoer(),
).run()
This generates the following output:
('START', u'body', {})('START', u'h1', {})('DATA', u'Hello')('END', u'h1')('DATA', u'world')('START', u'p', {})('DATA', u'Woo')('END', u'p')('END', u'body')
The nice thing about this of course is that this then allows you to test the thing that's taking this information in isolation from the XML handling code. Indeed, it allows for a much simpler test harness overall.
blog comments powered by Disqus