Colin's Journal: A place for thoughts about politics, software, and daily life.
I updated my RSS Aggregator this weekend to make it distinguish between changes to posts and new posts. Originally it would compare the title and description of every RSS item that it read in with those already in the database (via a checksum for performance reasons). A problem I kept encountering was that some items would be updated several times after they first appeared, and so my aggregator would treat them as new posts.
Now I use the <guid> element if it is present to distinguish unique items, or if these are not present I use the title and link of the items. If the description of the item has updated since the last time it was read, I update the version in my database, but leave the date of discovery the same so that the reverse chronological ordering isn't affected. While doing this I encountered a problem when pulling data out of MySQL.
The problem is that the python module I use to access mysql (MySQL for Python), while happy to accept Unicode strings as parameters, will present any data retrieved from the database as a plain string. When doing a comparison between the Unicode extracted from the RSS feed and the results from the database query Python attempted to convert the string to Unicode, treating it as ASCII, which would cause an error if it contained latin1 characters.
Unfortunately MySQL doesn't seem to support the storage of Unicode (certainly not at version 3.23.49), you have to store your strings in a particular character set. This will work fine for myself (latin1 will cover everything I need), but I can't see how it would work if you subscribed to two RSS feeds, say one in big5 and one in latin1. The documentation for version 4.1 states that it adds "Extensive Unicode (UTF8) support.", so hopefully once it makes it into Debian stable this problem will go away...
I carry bad news with these words. Citron, our favourite restaurant in Toronto, has passed away. It is no more, replaced physically but not culinary by a third version of the Butler's Pantry. It's friendly staff, great selection of new world and fusion dishes, and delicious deserts will be most sorely missed. Citron was a great little restaurant for spending the entire evening in, relaxing with a bottle of wine and conversing over great food, with no worry about the passing of time. They updated their menu a few times during the year with the passing of the seasons, making it hard to tire of their offerings as it is so easy to do with favourite eateries.
The spread of SARS is being particularly felt in Ontario this week. We have had the request for voluntary quarantine of all those that had visited Toronto's Scarborough Grace Hospital on or after the 16th of March. It's estimated that this will affect thousands, although how many will actually place themselves into quarantine for ten days is questionable. In fact it seems like the perfect cover for 10 days sick leave - "Boss, I got this tan in quarantine!".
Today it's been announced that starting this weekend there will be screening of passengers at the airport to try and limit the further import and export of the disease. The total number of cases around the world, broken down by country, and other interesting information can be found at the WHO site. Currently it stands at 53 deaths and 1485 total cases.
It's been a quiet start to the week, with work taking up most of my time. There are a few eager skaters around which is encouraging, and I'm thinking it's nearly time for me to dust mine off and try and remember how to use them.
I seem to have got my aggregator working OK at this point, so at some point in the near future I'll add some configuration options to it and look at releasing it to the world. I'm not sure how much the world will care, but someone somewhere might find it useful.
I've started reading the Tesseracts, a collection of short science fiction stories by Canadian authors, that I received at Ad Astra (see Friday). I'm travelling back in science fiction time, having already read (most of) the fourth collection of the series, which I got last year. We now have the third collection as well, with just the second to acquire at some point, maybe next year.
The first story is a variation on the Blade Runner world, not badly written, but not particularly interesting. The second is hard to describe, but I'll try anyway. Set in a far future with human immortality, the inability to reproduce, a decaying society, an automated baby factory, some off-worlders of indeterminate species, and finally the development of warrior children by encouraging them to fight to the death over Christmas. I doubt I've given the plot away somehow...
This post can't be classed as breaking news, but it's an important subject from the perspective of how this war started. As pointed out here by Charles Dodgson the original French position was not a veto under any circumstances. The original position was that any resolution that automatically authorised war would be vetoed because the UN weapon inspectors had not given up on Iraq's disarmament through inspections.
This statement was made on the 10th of March, and was reported fairly accurately by the BBC here. By the 12th however it was being spun by the British that France had threatened to veto under any circumstances. Unfortunately France did not act on this interpretation and issue a statement to clarify the position, a move that implicitly gave credibility to the British interpretation of their position. It wasn't through lack of time either, the war started a week later on the 20th.
This lack of clarification is in my opinion the biggest mistake that the French made in handling the crisis. A statement to the effect that France will back a war, and commit troops to the effort, as and when the weapon inspectors declared that Iraq could not be disarmed peacefully would have turned the tables on the US/UK position. The public could easily have supported such a position, and the focus would have shifted back to whether inspections worked rather than the politics of France versus the US.
There's another film being shot in our neighbourhood, apparently called Soul Food. It's not in IMDB, so I can't tell you anything about it, except that it's shot in our part of Toronto.
Ad Astra starts tomorrow, and while I haven't yet decided on going to the whole weekend, I will certainly be going to the con tomorrow for Jason's show...
I wonder why more people don't use the image element of the RSS specification. It's been available since version 0.91 and is still there in 2.0, yet of the 17 feeds that my aggregator is currently collecting only 1 (my own weblog) has an image present. The BBC RSS feeds have them, but that's the only other place where I've seen them used. You would imagine, given the popularity of favicons, that more sites would be interesting in being able to associate an image with their posts.
As more of the major websites start using RSS we should hopefully see the use of images expand. By using an image in their RSS feeds a publisher can avoid the problem of loosing brand identity (one new article among hundreds a user my have aggregated), and it provides an easy way for a user to quickly identify the source of a feed. A good example of how this can work is the use of images on the friends pages of LiveJournal (LJ), where each post has next to it the image selected by the LJ user who made the post. It also shows a potential explanation for the lack of image use in RSS: even though my feed has an image, the LJ aggregator does not include it in the friends page.
If aggregators do not support the image element it becomes a chicken and egg problem of few tools to display it, and few feeds bothering to supply it.
I've just finished watching Robin Cook's resignation speech, and it's good. In fact it goes beyond good, it's eleven minutes of a clear, rational, well argued description of exactly why the UK should not go to war. While there are additional reasons to oppose the UK participation in the war that aren't mentioned in this speech, all of the reasons mentioned are strong enough to stand on their own. If you can spare more than eleven minutes on this subject then also read the news article, which accompanies the speech.
This snippet alone should show just how desperate the US is to wage war on Iraq, regardless of any justification for it:
Furthermore, he said, Iraq probably had no weapons of mass destruction in the "commonly understood" sense of being a credible threat that could be delivered on "a city target."
While it still seems highly unlikely that parliament will rebel against the government on this vote, at least the cause now has someone that can articulate the argument forcefully and with credibility.
I'm back after a good trip down to Connecticut for the weekend, celebrating an 80th birthday and visiting some of Shana's extended family. The weather was wonderful, as it was forecast to be here in Toronto, and the environment interesting. The countryside of New England, or at least the small part I saw of it, is very wooded with houses dotted throughout it in a strange semi-natural arrangement. Houses have plenty of land around them, which made it difficult for me to think of them as belonging to a village, and gave an impression of a continuous wood dotted with houses and roads. It was not my first trip there, and I'm sure I'll be back again, so hopefully I'll get to see more of the place and form a fairer impression.
The last thing I had expected on the trip was to see much evidence of any political debate going on, but there were a couple of instances where the current debate surfaced. It's understandable that the issue was raised by people with either far-left (or what passes for far-left in the US) or far-right stances, but it was healthy to see first hand some of the internal debate going on in the US.
If you are at all interested, even in a passive way, in British politics then there exists a weblog that you should be reading. Formerly known as British Spin, the author stopped blogging for a while, but is back now as good as ever.
Now known simply as British Politics I've added a link to my, very slowly growing, list of recommended weblogs.
Would you like to know the truth behind the story regarding the definition of an island? What about the "toys for pigs" affair?
Find some of the answers at Press Watch (via Why Do They Call Me Mr Happy)
I've changed my RSS template to no longer include the Date/time in the title of the post. The change is motivated by my newly minted web-based, template driven, RSS aggregator. More on the aggregator in another post, except to say that the default template displays the date/time next to the titles of posts, so it was looking a bit strange on my own feeds...
I've just uploaded a bug fix release of SimpleTAL, now at version 3.1. This release also introduces the new code layout and distutils support, so installing should be even easier. The only thing to bear in mind when upgrading is that the import statements need to be changed to: from simpletal import simpleTAL, simpleTALES.
On Monday I read this brief article regarding Object Prevalence [slashdot.org], but I've only just got around to writing down some of my thoughts on the subject. Object Prevalence is essentially just a mechanism for object persistence, with the added bonus of using a log to try and ensure some level of recoverability in the event of abnormal shutdown.
The comments to the article include some good points regarding the problems with this approach. It obviously can not scale to large enterprise applications like those seen in telecommunications and other such industries. It does not provide atomicity and transactions, limits the ability to use 3rd party reporting products, and restricts the ability to perform ad-hoc queries of your data. All of this is well and good, but it's also rather obvious. There is a bigger issue with using the Object Prevalence approach, even when the size of your dataset, nature of transactions, and reporting requirements would otherwise lead you to thinking it's a suitable solution.
The problem is one of data accessibility. An application is ultimately only a tool, it's the data that you care about. The data is the thing that differentiates your installation from all others, it's the data that you are dealing with that contains the value of what you are doing. Object Prevalence though locks your data into the implementation of the application, it's only this particular application that can load and makes sense of the data that you have. This is in turn causes two other problems, one of integration and one of application lock-in.
The issue of integration is potentially the most serious. If you want any other application to be able to access that data then you need to interface at the object level, or the application that hosts the data needs to be modified to be able to extract the data in another, non-application specific, form. Either route will almost always require modification of your Object Prevalence based application, especially if you want the data sharing to be real time. Often integration of systems is a two way exercise, you want to be able to send data the other way as well, which in turn means your Object Prevalence application must also support an import mechanism or some object level API to specifically allow this kind of integration. By the time you have made the changes to your application to support these interfaces you have done much of the work that you were trying to avoid by using Object Prevalence in the first place.
Most integration between applications happens at a very crude level. Often it's just a matter of reading data out of this database and writing it into a flat file, and conversely loading out of this file and putting it into that table. The reason that these integrations can be done cheaply and easily is because the storage mechanism where the data sits is widely understood, and completely application independent. Transferring data between spreadsheets and databases often is done using files in a CSV (Comma Separated Values) format, one of the crudest but simplest ways of formatting data around.
The other draw back, that of application lock-in, has the same root cause, but occurs when it's time to move to a different application or system. Any team looking to migrate data from an old application to a new one wants the old data in a database or nice simple, application independent, format. If the data only exists as live data object written out using Object Prevalence, your estimated cost of migration has just increased significantly.
Ultimately, if your data matters to you, you have to understand how it's stored. A relational database is preferable, but an easy to read file format (XML, or at a push fixed field format) is fine as well. Anything that is stored in it's own, application dependent, format should be avoided unless the application has proven support for exporting and importing in an application independent format. It should always be possible to get your data out of an application specific format, but the opportunity cost of simple integration, as well as the actual cost of eventual migration should be thought on heavily before going in that direction.
We actually remembered Pancake Tuesday this year, and made some excellent pancakes. Lots of lemon juice and sugar topped them off nicely, and the tossing went well enough that there none of them ended up on the walls, floor or ceiling.
There's a bunch (although more than one hundred probably counts as more than a bunch) of labour MPs that are opposing the introduction of foundation hospitals. A foundation hospital is one that is not run by the government, but rather run as an independent not for profit organisation. They are, at the same time, still part of the NHS and "monitored" by stakeholder councils.
Obviously the effectiveness of the scheme will depend a great deal on the amount of control that remains with central government and the monitoring council. Too many targets and diktats from the government will eliminate the independence of the foundation hospitals, forcing them to ignore local concerns, and losing the opportunity to experiment and innovate on how provision is made. There is also the potential for abuse, especially in areas where there is no real choice as to which hospital NHS patients are sent to, and this presumably is what the monitoring councils are meant to stop.
Whether any of this will work is very open to question, but it is hard to see it being poorer than the centralised system that is in place today. The reasons given for opposition tend to focus on the fact that only qualifying hospitals will be given this independence, so leading to a "two-tier" hospital system where the independent hospitals can set their own wages and, the presumption is made, deliver better care. The obvious answer would be to make all hospitals independent, and while the government claims to have this aim, it seems unlikely to happen quickly. The opposing MPs however are against any differentiation in the service, still holding the ridiculous belief that all hospitals across the board can be brought up to the same level. This has never been achieved in the private sector (even when the product is identical, like the telephony market, there is still differentiation), and I can't think of a public sector service that is of uniform quality across the board.
While MPs waking up to their responsibility of questioning the government is to be welcomed, it seems to me that they have picked a poor subject to do this over. Hopefully this new assertiveness will also be applied to those issues where the government is truly heading in a poor direction.
(Inspired by Why Do They Call Me My Happy?)
Thanks to the efforts of Michael Twomey the next release of SimpleTAL will be packaged in a more python friendly way with distutil support. Installation will be as simple as 'python setup.py install'.
The timing of the next release will depend on the presence of any bug reports...
Version 3.0 of SimpleTAL is now available! The major new feature in this release is support for METAL. METAL is an HTML/XML macro language that augments TAL. It allows a sub-tree of a document to be associated with a macro name, and additionally for customisation points (slots) within that sub-tree to be defined. Once a macro has been defined in this way you can then graft the sub-tree into a (potentially different) document and optionally customise the macro by passing in another sub-tree (i.e. filling the slot).
While METAL is pretty nifty, it's also very hard to explain without an example. So if you are unsure what it is and why you might want to use it, take a look at this example.
Last Modified: Mon, 30 Jun 2003 23:21:00 BST
Made with PubTal 3.2.0
Copyright 2008 Colin Stewart
Email: colin at owlfish.com