Colin's Journal: A place for thoughts about politics, software, and daily life.
There’s a small debate on the use of English on the continent going on thanks to Bjørn Stærk, with Tobias Schwarz providing some dialogue. It’s hard as a native English speaker, who knows no other languages, to comment on this issue with any level of authority. One aspect I feel I can comment on is mentioned by Tobias, and that’s the use of English as the language between people who speak two different languages (in this case Germans and the French using English as a common language).
The important aspect to this is not it’s evident benefit to the likes of me, but that it pushes forward the ideal of a Europe that can identify with it’s self. A Europe that can speak a common language, along with it’s common currency, is a Europe full of people more likely to get along with one another. In this sense any common choice of language would be fine, that it’s English just brings the bonus of being able to communicate with large chunks of the rest of the world.
I can definitely agree with Bjørn in welcoming the opportunity to read the views of Continentals in English. I’ve tried to track down European weblogs that are written in English, and I’ve not had much luck. The most promising list I found does not break up the weblogs by language, and seems to feature many broken links. Maybe I should start a list myself…
I’m nearly done with the refactoring of SimpleTAL for version 2.0. All of the unit test cases that I have now pass, both for HTML and XML templates, and I’ve even added a few more to cover some commands that were not tested before. As part of final testing I updated my weblog software to use the new version, and found some interesting relics.
This weblog has archives going back to my first ever post, back long before my current weblog software was written. I started out with a few python scripts that would grab HTML files and merge them together looking for special tags in a template file; it was very crude but it mostly worked. The current system stores all my posts as XML, has TAL templates, and some cunning logic in between. When I moved to this new system I had to port over all the old posts to the new format if I wanted to keep a consistent history. The cut-over didn’t happen cleanly however, and there were a small number of posts which had strange XML source files.
The files had the content body of the posting encapsulated in a CDATA section, and were also escaped using entity references. In SimpleTAL 1.x the posting body would be sent through an SGML parser prior to inclusion in the template, and that parser would expand the entity references back into normal characters. In SimpleTAL 2.0 content structure is not sent through a parser (if you need to use TAL in the structure then you can create a compiled template before placing it into the context), and so the entity references would be included as literal text. A few search-and-replace commands have sorted those postings out, and it looks like the new version of SimpleTAL works reasonably well with my weblog system.
I have some documentation and clean-up work to do prior to releasing SimpleTAL 2.0, but it shouldn’t take me long to do. The performance of the new engine is pretty good, anywhere between 1.5X and 7X faster than the current version (depending on the type of template). The top performance is a little lower (~70 templates/sec) than I achieved with my prototype version, but I had expected that there would be some sort of drop once the character encoding conversion and other must have features were added.
Yesterday I thought I would take a look at the performance of SimpleTAL, and look to see if there were any easy ways of improving it. I took a small (one screen full) template consisting of lots of ordinary text, a repeat command, and a couple of content commands, and timed SimpleTAL expanding it 200 times. The result was around 5 templates/sec.
I had an idea of pre-parsing the template and turning it into a series of events (start tag, data, and end tag). I implemented this fairly quickly, and found that performance improved up to the 11 templates/sec mark. I know, however, that Zope’s TAL engine can go significantly faster than this, so I started looking at it again and trying to work out how I could improve things significantly.
The current SimpleTAL implementation uses OO methodology fairly heavily. This means that for each tag in the template an object is created, and at least one handler object (often more). The tag is then passed to each handler which does various things to it based on the evaluated expressions coming back from the simpleTALES module. The result is that for a given run of the template, even with the HTML/XML parsing done before hand, there is a significant amount of object creation (expensive), a large number of method calls rather than variable access (expensive) and text manipulation/parsing.
The Zope way of getting around this is to parse the template into an inter-mediate byte code. This byte code is then used by an interpreter to generate the template, with very little in the way of object creation. I’m now re-factoring SimpleTAL in a similar way to see how much improvement I can get, and so far it’s looking promising. I’m still along way from finishing, but I have content and repeat working well enough to run my performance template, and the result is now around 90 templates/sec – a near 95% improvement! The unfortunate side effect though is that the code is harder to understand because it’s data structure driven instead of object driven, which will make maintaining the code a lot harder.
We were on our way to the airport, on a long and indirect series of flights back home for Christmas, when I first saw our neighbourhood wind turbine. The fact that up to then we had failed to notice a 94 meter tall wind turbine protruding out of the city scape indicates how far away from it we live. Upon return to Toronto I have spent the odd moment looking out over our deck to see whether or not we can see it from our flat, and thought several times that I maybe could see it, hiding behind a tree in the distance.
When I awoke this morning and glanced out over the deck I noticed, in the near distance, the turbine happily rotating away. It is indeed behind a tree some distance away, but quite visible when it’s turning. Apparently it cost approximately C$1.2M to build (it’s not clear if any maintenance is included in that figure) and will generate 1,800 MWh of electricity a year, which at market rates (now fixed by the government at 4.3c/kWh for most consumers) could bring in $77K per year.
As such it’s only a symbolic and public awareness development, but it seems that when a large site is developed the cost of wind generated electricity is pretty comparable to a new coal station and close to that or a modern gas fired station (see the excellent British wind energy site and this rather more independent FT article). Wind energy is only part of the answer (apparently only 10% of the UK electricities needs can be met this way before reliability becomes an issue), but it is still nice to have our own neighbourhood turbine, and even nicer to see how quickly wind power generation is being built.
I received an email this morning from Thomas Weholt which detailed an interesting problem he encountered when using SimpleTAL. The source of the problem turned out to be that the path resolution rules being used would match an attribute before looking at the mapping an object provides (more details here). I spent some time looking at what might be a good fix for this, and then found that Zope behaves the same way, so for now I’ve left the implementation as is.
The research however got me looking at another potential problem: when content is included using the “structure” keyword any TAL attributes included will be expanded. This allows for some very cool and interesting things, but it does present a problem when you need to display user input strings using structure. The problem is that the user’s input has access to all of the attributes of all objects that are included in the context, which is a potential security problem. I was in two minds as to whether or not I should provide a way of disabling this, so I again checked on how Zope handled this situation, and I found that it would not expand TAL included in this fashion. So that both behaviours are available I’ve now added a “allowTALInStructure” parameter which will control whether any TAL found in “structure” content will be expanded. I also found, during the creation of some unit test cases for XML templates, that SimpleTAL 1.0 could not handle content included using “structure”, thankfully that turned out to be a one line fix.
The end result is that I’ve just uploaded version 1.1 of SimpleTAL. I’ve run through all of the unit test cases I have, and compared the results of my weblog program using the new version to the old, and everything seems to still work.
Here are some links to a few tech articles that have caught my eye over the last few days. First up, a problem with RSS – it seems that lots of sites out there are not creating valid XML files for their RSS feeds, and so aggregators are being modified to no longer handle just XML, but also trying to handle mal-formed XML as well. An article by Mark explains why this is happening, but provides no ideas on how to deal with it.
Why should anyone care whether their RSS feeds are valid XML? Well if they are valid XML files it means that they can be used by other programs. If they are not valid then they can only be used by certain programs, and so the cost of software rises (fewer features because people are spending their time writing parsers to handle bad XML, or more costly to cover the extra effort). What was really surprising about the article (on xml.com) was to note that even Scripting News occasionally publishes bad XML, which is a site run by someone who is responsible for one of the most popular RSS aggregators used! There really is no excuse for this lack of quality in RSS feeds, XML processing tools are freely available and easy to use, so why do people insist on rolling their own that don’t work?
Another story, this time an interview on the art of programming, and how it might be improved (via Slashdot). It’s a very theoretical discussion, but an interesting one that has some relevance to my previous thoughts on RSS. The idea expressed is that programming doesn’t scale to large systems well because you only need a small bug in one piece to cause a large failure, rather than a failure that is on the scale of the original defect. The solution proposed is that systems should communicate using pattern recognition rather than via defined protocols. This approach would endorse the idea of having XML parsers handling bad XML rather than complaining; software modules should extract whatever information they can out of what they are given rather than demanding that it matches a well defined protocol.
An alternative that I would promote instead, is that software should demand all communication be done using well defined protocols, but that it should make no assumptions as to what the information means to others, or care about any extra information that may be present. In practise this would mean that software should demand valid XML, and then it should extract from that XML whatever it finds interesting and ignore the rest. This means that a bug in a software module is localised to a specific set of information, the rest of the system carries on running, with only modules that rely on that piece of information affected.
Finally, as most people reading this will already have found out first hand, the Internet was struggling today thanks to the spread of an SQL Server worm. The thing that this highlighted to me was not the number of people running un-patched versions of the software (not unexpected), but rather the number of people who have made their databases accessible from the Internet directly. There seems little reason why anyone would do this, but the sheer volume of traffic generated by this thing shows that a very large number of people indeed have databases running open on the network. It’s also a classic example of a small defect in one module having a dis-proportionally large affect on the whole system. It would be relatively easy for networking switches and firewalls to match patterns of network usage that could be deemed ‘unusual’ and so drop packets that fall into this category. If this is what Jaron Lanier is referring to in his interview then I can see what he means, but I would think of it as just robust programming, rather than a huge change in how we think about software.
A fairly good article by the BBC on the recent strengthening of the French/German alliance. The timing of these developments is interesting, and I’m not sure what to make of it. My personal reaction is to think about the current work of the convention on the future of the EU, and to consider that any constitutional arrangement will have to ensure that a French/German alliance does not dominate policy.
This is also likely to be the response of the leaders of the other members of the EU – and surely France and Germany know this. So could it be that this is exactly the response that the pair (or one of the pair) is looking for? If so why? I suppose it might push the federal cause a little further ahead, but I’m not sure it works that much. Another answer might be that they are trying to concentrate minds – France and Germany are moving forward on European integration, so other countries need to come forward with commitments on integration if they don’t want to be left behind.
Hopefully I’ll find some ideas on this out there somewhere…
It’s been rather cold out recently. It’s not cold in the British sense of “it’s been really cold recently, there was a frost on the ground this morning!”, rather it’s been cold in the “beware you don’t freeze to death on your way to work”. This morning it was around -20C and, according to Environment Canada, it’s currently -16C. That’s without the wind chill. Thankfully this morning there was little in the way of wind, but tonight there is enough to put the forecast at a wind chill of -35C.
So it’s cold. Despite this coldness however I noticed, on the way home from work, that there are still a couple of shops in China-town that have their shop fronts completely open. When I type “shop fronts” I really mean it – the whole front of the shop – open to the elements, which currently means -16C. The increasing costs of energy in Ontario don’t seem to be biting as hard as perhaps they should.
I’ve had some great feedback on my SimpleTAL library, and a few questions. The original pages that I put up were a little spartan, even by my standards, but I’ve been adding to them over the last couple of days to try and make them a little more informative. I’ve added a couple of examples that show how to use the library, and a page documenting the differences between this implementation of TAL and the Zope version.
It would be nice to add pages demonstrating each of the different TAL attributes and how they work, but it’s a fair amount of work, so for now I’m relying on the Zope documentation. An aspect of the documentation that I will work on however is a description of the SimpleTAL API. It is very easy to work out from the source, but it’s much nicer and easier to have it put into a web page instead.
One of my shoe laces broke this morning, leaving just enough lace left to keep my shoe on my foot. At lunch I went to purchase a replacement shoe lace, and thankfully the local chemist had them. I was expecting that I would have to buy a pair of shoe laces, instead of the one that I needed, but I was wrong. I had, in fact, to purchase two pairs of shoe laces instead.
Shoe laces also come in multiple lengths, with a handy (in-accurate) chart on the packaging indicating what length you may need based on the number of islets your shoes have. Sod’s law – my shoes fall at the upper end of one length recommendation. Still I got the size indicated, and although they are a little on the shy side, they will do. The question remains however why you have to buy two pairs, with a single pair not being an option? How many people have two identically coloured shoes, of the same number of islets, suffer broken laces at the same time? If shoe laces have to be sold four at a time, why can they not at least put two different sizes in the same packet, so that you can buy in the confidence that at least one of them will be correct?
Email: colin at owlfish.com