Colin's Journal: A place for thoughts about politics, software, and daily life.
Python is my language of choice, something that I’ve used for a number of websites, applications and tools. It’s speed of execution has never been an issue for me, even when using it to wrangle large XML files into something more useful.
Despite this, I’m happy to see the existence of the Unladen Swallow project. Whether this tiny team of two Google engineers will really be able to make Python 5 times faster remains to be seen, but the attempt will surely be worthwhile. Already improvements have been made, and brought back into the mainline of Python, and the approach being taken seems solid.
The project’s plan is both practical and well thought out. There have been several attempts to re-implement or speed up Python before (PyPy, Pysco, IronPython, Jython), none of which have come close to challenging CPython in terms of adoption. By accepting that CPython is the common implementation of choice, and concentrating on making it better, the project’s benefits should be widely felt. Similarly, planning to gain the performance improvements through application of the relevant academic research, rather than trying to discover something new, should ensure a measure of success in a short period of time.
I have finished porting SimpleTAL to Python 3. Release 5.0 of SimpleTAL is for Python 3.1 and provides similar functionality as SimpleTAL 4.2 does for Python 2.5. The differences between using 4.2 and 5.0 are documented on the SimpleTAL notes page.
At first the porting process was fairly easy. I started by getting all test cases to run cleanly under Python 2.6 with the -3 flag, and then ran 2to3 to convert the basic syntax. The next step was to run the test cases under Python 3 to highlight issues that required manual changes. Sgmllib has been removed from the standard library, so I had to remove HTMLStructureCleaner from simpleTALUtils (it was unused within the library itself). The Iterator protocol change from “next” to “__next__” meant my iterator detecting code had to be updated.
The changes to character set handling in Python 3 introduced slightly more complex changes for the template handling. In Python 2.x the SimpleTAL library would handle all encoding / decoding itself, but in Python 3 this is not always required as there is now a clean separation between bytes and strings.
One issue that I hit when porting to Python 3 was the use of regular expressions. In order for SimpleTAL to pass through singleton XML elements from the template (i.e. <tag /> rather than <tag></tag>) it needs to carry out a regex check against the raw XML that the SAX library provides. This is done by retrieving the xml.sax.handler.property_xml_string property, which is documented as returning a string. In practise however the Python 3 SAX implementation returns bytes, which at first I assumed the regex library would not work with. A little bit of research later, and I learned that the regex library can work on bytes as well.
One final surprise was the huge performance gain moving from Python 2.6 to 3.1. The SimpleTAL performance tests show a minimum speed increase of 60% (on the METAL test), with some tests clocking in 90% increases. Both HTML and XML basic template expansions are now hitting over 1600 pages/sec on a single 1.7GHz CPU.
Email: colin at owlfish.com