Benchmarking - Kamaelia vs Stackless

November 21, 2007 at 11:48 PM | categories: python, oldblog | View Comments

Interesting post by rhonabwy on comparing Kamaelia to Stackless. The benchmark figures there are pretty grim from my perspective, but a useful starting point:

10 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10,1000)"

10 loops, best of 3: 127 msec per loop

100 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(100,1000)"

10 loops, best of 3: 587 msec per loop

1000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(1000,1000)"

10 loops, best of 3: 6.05 sec per loop

10000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10000,1000)"

10 loops, best of 3: 60.4 sec per loop
The grim part of course is the scaling aspect here. Handily though, rhonabwy post the code as well. I noticed that a key change we often make in mature code - to pause - was missing, so I changed the main loop slightly from this:
    def main(self):
        yield 1
        while 1:
            if self.dataReady('inbox'):

To this:
    def main(self):
        yield 1
        while 1:
            while not self.anyReady():
                self.pause()
                yield 1

            while self.dataReady('inbox'): # Note change from "if"
Green is new code, blue indicates a change. So, how does this perform? Well, first of all, running the unchanged code I get this:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 182 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 820 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 9.23 sec per loop
I lost patience at that point. It does show the same bad scaling properties though. So I then added n the changes above and reran it. This is what I got:
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 206 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 267 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 816 msec per loop

Now I don't actually have that much memory on my machine so going above this causes my machine to start swapping, but even factoring that in, this is the next level up:
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(5000,1000)"
10 loops, best of 3: 2.77 sec per loop

Now, clearly this still isn't as good as stackless - which isn't surprising, Stackless removes a whole layer of stack frame shenanigans from the entire system and it also implements its scheduler & channel handling in C. I also don't know how well the Stackless scheduler is optimised. Probably better than ours - ours implements a very simple round robin scheduler.

However, despite all this, after this small optimisation, the scaling properties are significantly better, and to my mind rather importantly much more in line with the scaling properties that stackless exhibits. A key point from this, using the optimisation I made it implies that we DO have better scaling properties using generators than with threads. Woo :-)

More seriously, this does imply that a natural way to optimise Kamaelia systems could be to simply create a new base class Axon.TaskletComponent.taskletcomponent that uses channels to implement inboxes/outboxes and scale out that way. It'd mean that one could take the bulk of their code with them and make small changes to gain further optimisations.

(aside: I'm a bit irritated by the memory consumption though, I know I don't have a huge amount of memory and I was running other applications on the system, but I did expect better. I'll have to look into that I think. When I get a round tuit. )

Overall though, despite not performing as well as stackless (which I did really expect, understanding the changes it makes) I am very pleased wth the scaling properties beng similar :-)

My full version of the code:
#!/usr/bin/python

import Axon
import time
import random
import sys

class hackymsg:
    def __init__(self,name):
        self.name = name

class counter:
   def __init__(self):
      self.count = 0
   def inc(self):
      self.count +=1

class hackysacker(Axon.Component.component):
    def __init__(self,name,circle,cntr,loops):
        Axon.Component.component.__init__(self)
        self.cntr = cntr
        self.name = name
        self.loops = loops # terminating condition
        self.circle = circle # a list of all the hackysackers
        circle.append(self)

    def main(self):
        yield 1
        while 1:
            while not self.anyReady():
                self.pause()
                yield 1
            while self.dataReady('inbox'):
                msg = self.recv('inbox')
                if msg == 'exit':
                    return
                if self.cntr.count > self.loops:
                    for z in self.circle:
                        z.inboxes['inbox'].append('exit')
                    return
                #print "%s got hackysack from %s" % (self.name, msg.name)
                kickto = self.circle[random.randint(0,len(self.circle)-1)]
                while kickto is self:
                    kickto = self.circle[random.randint(0,len(self.circle)-1)]
                #print "%s kicking hackysack to %s" %(self.name, kickto.name)
                msg = hackymsg(self.name)
                kickto.inboxes['inbox'].append(msg)
                self.cntr.inc()
                #print self.cntr.count
            yield 1

def runit(num_hackysackers=5,loops=100):
    cntr = counter()
    circle=[]
    first_hackysacker = hackysacker('1',circle,cntr,loops)
    first_hackysacker.activate()
    for i in range(num_hackysackers):
        foo = hackysacker(`i`,circle,cntr,loops)
        foo.activate()

    # throw in the first sack...
    msg = hackymsg('me')
    first_hackysacker.inboxes['inbox'].append(msg)

    Axon.Component.scheduler.run.runThreads()

if __name__ == "__main__":
    runit(num_hackysackers=1000,loops=1000)


blog comments powered by Disqus