Colin's Journal: A place for thoughts about politics, software, and daily life.
I had hoped to release PubTal 3.0 sooner than this, but I ran into a bug that took a considerable amount of time to fix. During my attempts to fix the bug I introduced a lot more test cases, and so found more bugs lurking in the code. The result of this delay should therefore be considerably better software!
I’m particularly pleased with the improved OpenOffice support and better HTMLText handling, although the convenience of having built-in FTP support is a worthy contender for best new feature.
In the course of adding image support to PubTal’s OpenOffice converter, I noticed that the HTML it was generating was not always valid, and so I set about trying to fix it.
OpenOffice is a huge application with a wide range of features, and it has a correspondingly large file format. The specification is 571 pages long, and the book on the subject is inaccurate. The book appears to have been written by looking at the output of the program, rather than the excellent (but large) DTDs.
I’ve not had the time, nor the motivation, to write something that would handle the whole format. With OpenOffice using an XML format, I could however pluck out a few basic things that I could easily convert to HTML. The problem with this approach is that supported XML structures can appear in unexpected places within a file. This meant several assumptions made by the conversion code, such as text:p never being nested, turn out to be wrong under hard-to-predict circumstances.
To correct this I’ve added a filter to the OpenOffice plugin. This filter silently blocks any XML structures that are not explicitly supported, while passing through all the others to the conversion code. To make this useful I’ve had to trawl through the conversion code in conjunction with the DTDs, and work out exactly what XML fragments I can support.
This reduces the chances of the code producing bad HTML, but it doesn’t eliminate it. The conversion code is modular, which means that one part might accidentally produce HTML that combines in an invalid way with the output of a different part. To solve this half of the problem I’ve written another filter, applied on the output of the conversion code.
This HTML filter increases the chances of valid HTML being written, by keeping track of what elements are valid within other elements. Ideally it would do full validation against a relevant DTD, but that seems like too much work, and would probably impose too much processing overhead.
I’m fairly certain that the combination of these improvements will result in only valid HTML or XHTML being produce, but I can’t be certain without significantly more work.
At least the code now handles images.
I’ve finally taken the time to add a long overdue feature to PubTal, namely upload support. My development version now includes a new command (uploadSite.py) that determines which files have changed since the last upload, and then uploads them to a server. The only upload method I’ve implemented is FTP, which is good enough for me. More upload methods can be added using plugins, should anyone care to develop them.
To keep things fast PubTal now uses a database (using anydbm which should mean it works everywhere) to record the MD5 checksum of every file it generates. When an upload occurs, PubTal records, on a per user per server basis, the checksum of each file it uploads. The normal operation is to find all files in the built website, and then compare their checksums out of the databases. When uploading, PubTal checks that all the directories exist on the FTP site, and then creates them if they don’t.
There are several additional options for uploads:
I’m pleased with the current development version; in my local tests it works very well. I’ve a few more ideas that I want to have a shot at implementing before I release this however.
(The picture is from a trip, a couple of weeks ago, to the beaches. The spray from the lake forms amazing ice coatings on the trees and rocks – click on the picture for a large version).
Subversion should, if all goes to plan, have its version 1.0 release on Monday. This news should be of great interest to anyone who uses CVS, either professionally or personally. Subversion started development back in 2000 (I’ve been following its progress on and off since October 2001) and so reaching the 1.0 milestone is a very significant achievement.
Subversion has been developed to replace CVS as the free version control system of choice. Although the working philosophy behind Subversion is very similar to CVS (such as using copy-modify-merge instead of locking) it has a significantly improved feature set, and a very different architecture to support those features. The top three most important features that Subversion has, and that CVS lacks, are:
Subversion can either work on a local repository (and it must be local – network shares will not work), or over a network. Currently there are two different server implementations, one that sits inside Apache and uses the WebDAV/DeltaV protocol, and a standalone server that uses a custom protocol. The standalone server is probably more interesting because it doesn’t depend on Apache, and can act as a traditional daemon (either running in the background or through inetd), or be used through SSH in a similar fashion to CVS.
Subversion includes many utilities, including a not-yet 1.0 quality CVS to Subversion converter. By running this on my small (20MB) home repository I successfully created a new Subversion repository in about 15 minutes. The resulting repository is very similar in size to the CVS one, despite the completely different storage mechanisms (Berkeley DB versus RCS files).
It will take some considerable time for Subversion to take over from CVS as the major free version control tool. For new projects it is easy to choose Subversion over CVS, but for existing projects it will take a 1.0 quality converter before many people will be willing to switch. For large repositories there are questions regarding the scalability of Subversion, which will only be answered through performance testing. As these questions are answered, and the converter is improved in quality, there will be little to prevent most people from migrating from CVS to Subversion.
To learn more about Subversion take a look at Subversion: The Definitive Guide, an excellent free book on the subject.
It looks like the UK might join almost every other existing EU state, and impose restrictions on the citizens of those new member states looking for work. Although this is unlikely to affect may people, because the number looking for work in the UK is not going to be very large, it is still bad news.
The two major issues I have with this are:
1 – You have the right as an EU citizen to live and work anywhere in the EU. This is one of the most visible benefits of EU membership, and so it should be absolutely protected. The idea that an EU state can shut its borders to some EU citizens and not others makes EU citizenship a joke.
2 – This action lends credibility to those who think immigration from new member states is a problem. The government, having acted, has endorsed this view point, and left in people’s mind the idea that the government “had to do something” to solve this “problem”. When no major migration does occur, and the restrictions are lifted, very few will notice. The perception left in people’s minds will be that EU expansion equals a migration problem, and therefore any future expansion is a problem.
It would have been nice if we could have learnt from the introduction of the Euro. Just prior to the introduction of the currency, both in real terms and in terms of notes and coins, there were numerous scare stories about how bad the resulting chaos would be. Despite this, continental governments ploughed on regardless, and there was a virtually flawless introduction of the single currency. These governments could have run around talking about contingency plans, but this would have led to a public perception of likely failure, and maybe even a self-fulfilling prophecy. Instead they stuck to their original plans, and so the scary stories are now seen as just that, instead of warnings that had to be acted upon.
When I woke this morning there was sun streaming through the window, so I had hopes for good photo opportunities today. By the time I got to High Park, however, the day had turned decidedly grey, with occasional snow being the only highlight.
I had a walk around, but as can be seen from the shot to the right the light just wasn’t up to much. I did find a squirrel in a tree hole, but it was being uncooperative and not coming out for photos.
I got back my prints from Henry’s this week, and they are very good. The colours don’t match the monitor exactly (which is to be expected), but they are far closer than the other prints I’ve had made. I’ll certainly use them in preference to either Blacks or the local development shop.
I’ve been messing around with Jabber again after a long absence. The link with AIM now seems stable, so I can be reached at email@example.com or wibble103 on AIM if you fancy a chat. Writing code to link with Jabber is very easy, but I’m not sure if I can think of a useful application for it.
It has been a stressful past twenty four hours, but thankfully it’s over in a satisfactory manner. My troubles started when I looked at the photos I had taken, in the late afternoon, down on the beaches. I started with the ones from the end of the batch which contained the sunset, and on a first glance they looked very good.
A more closer inspection showed a dark spec in the sky, and a quick look at some other frames showed the same thing. At this point I was just annoyed at having taken pictures with a mucky lens, but things soon got worse. It turned out that the lens was indeed clean, the problem was dust on my sensor.
Unfortunately I tried to clean it there and then, and managed to make things worse by adding moisture to the dust. I have to admit to having been pretty devastated by this – I was genuinely worried that I had permanently damaged the sensor.
After getting home this afternoon I had a go with some sensor swabs from Photographic Solutions. They didn’t at first seem to be having much effect, but after putting more effort into the cleaning motion, the muck on my sensor started to shift.
Now things are pretty much back as new – there’s still some slight dust visible at f/22, but its pretty minor. I’m extremely relieved that I’ve been able to recover from my mistake, and hoping that I’ll never again have to resort to the sensor swabs.
This image of the sunset is one of those that had the original dust problem, as it is this one looks better cropped anyway…
It has been a long time since I last posted anything regarding the war in Iraq, mainly because I had nothing to add to the daily news reports. Given today’s news, I do have something to say (or write) on the subject, and for once it is hopeful.
Prior to the war, we were told that our government had intelligence showing that there were weapons of mass destruction in Iraq, and it was even possible that some of these could be just 45 minutes from use. Now the government is set to follow in the footsteps of the US and launch an independent inquiry into the failure to find any of these WMDs.
Another news item that caught my notice today was David Blunkett’s plan to lower the burden of proof from “beyond reasonable doubt” to “the balance of probabilities” for cases pertaining to the prevention of people carrying out terrorist activities. In these cases, the evidence used would be kept secret from defendants so as to protect MI5, MI6, and GCHQ intelligence sources.
The timing of these two developments couldn’t have been worse for David Blunkett’s agenda. The idea of people being locked up without them, or the public, hearing any evidence against them is bad. The idea of this happening when a “security-vetted judge” decides that, on the balance of probabilities, they intend to commit terrorist acts is frightening. Doing all of this based on intelligence evidence, yes the same intelligence that incorrectly justified an entire war, is staggering. It is so staggeringly bad in fact that I feel fairly hopeful that it will never happen.
An inquiry into the intelligence agencies’ failure regarding weapons of mass destruction in Iraq will surely be in the headlines for the next twelve months or so. In this case it will be very hard to justify bringing in laws that will lock people up based on nothing but the secret evidence of these agencies.
That the UK government is considering allowing one part of the establishment to lock people up, on the basis of undisclosed evidence presented by another part of the establishment, shows how little they care or understand about freedom. I hope that enough MPs also understand how bad these proposals are, that they will never be adopted. However, I fear we may have to rely on the government’s terrible timing to save us.
I got back from Mexico on Friday night. I unexpectedly had some trouble changing my ticket from the Saturday to Friday, but thankfully I could do it in the end. I was delivering a course, and although there were some serious logistical problems, it seemed to go pretty well.
Yesterday was spent sleeping in and catching up with news. I spent most of today finishing off releases of several pieces of software (see below), and then dashing out of the door to catch the sunset.
I didn’t have much time, so I went down to Exhibition Place. I got a few OK pictures, including some shots of our local windmill. I like this one in particular, even if it isn’t a perfectly composed shot.
PubTal now includes the OpenOffice plugin by default, which in turn now handles smart quotes and hyphens properly.
SimpleTAL has a had a slew of edge case problems addressed, but no interesting new features.
TALAggregator now has category support, and handles errors better. This version is the same as the one I’ve been running on my machine continously with no problems for months. In an interesting side note: I’ve been using TALAggregator since September 2001, bringing in a total of 9708 articles from 51 different feeds.
I’ve been meaning to get these out for some time as most of the work was done months ago.
Email: colin at owlfish.com