Colin's Journal: A place for thoughts about politics, software, and daily life.
I’ve let nearly two weeks slip by since last updating my weblog. The event most worthy of note during that period occurred over the past weekend, and that was our trip down to Arkansas. The trip itself was a bit of a nightmare on the outbound leg, a combination of bad luck with the weather and the typical bad customer service of American Airlines.
Discounting the travel as a necessary evil, the trip overall was a great deal of fun. I met the part of Shana’s extended family that I had until now missed out on meeting, and was reacquainted with those that I had met before. The weather in Arkansas was a very welcome break from Toronto, with highs somewhere between 70 and 80 Fahrenheit (21-26 C).
I, of course, took my camera with me. The poor Toronto weather has kept photography to a minimum recently, so I was looking forward to this opportunity to take some shots. Unfortunately the combination of rusty skills and time pressure meant that I came back with fewer good pictures than I had hoped for. If I had been alone I could have taken many more, but then the trip wasn’t intended to be a photography outing. This shot of a daffodil was one of the ones that did come out, despite the flowers being well past their best.
Prior to the trip to Arkansas, I had spent a good few days trying to unite the software which creates my website (PubTal) with my weblog program. I’ve done this in the form of a plugin for PubTal, which I’ll probably include in the next version I release. I can now write weblog posts in a text editor, or even OpenOffice.
The low cost of bandwidth, coupled with a competitive North American telephone market, has lead to some interesting business models becoming viable.
Take for instance this rather nifty sounding service UK 2 ME. The website allows you to enter a U.S or Canadian phone number, and it will in turn allocate you a national-rate non-geographic UK number (i.e. one in the 0870 range).
There is no charge for the service. Callers pay the normal UK national rate, and are connected through to your North American number, several thousand miles away. I haven’t tried the service yet, so I don’t know what the call quality is like. If it turns out to be good then it’ll be a very useful service, especially for travellers.
The service provider makes money because the cost of the bandwidth and the NA termination, are lower than the interconnect fee that it will receive for calls to the UK number. How long that’ll be true for I don’t know, but in the meantime it is a easy to offer service with virtually no overhead. There’s no customer service required, no billing to be done, just some automated provisioning to the network.
I hate finding bugs immediately after I’ve released software. It is particularly annoying when the bug is in the install script and not the application. I’ve just uploaded PubTal 3.0.1, now featuring an installer that includes the OpenOffice plugin….
Over this weekend our host had a problem with spamassassin, and it stopped marking anything as spam. I don’t receive a huge amount of spam, somewhere in the 30-60 messages a day, but it is more than enough to drive me crazy without filtering.
I set about trying to find a quick and easy filtering solution, and settled on DSPAM based on its reputation for accurate filtering. DSPAM has, unfortuately, got several issues that stops it being the non-intrusive spam filtering solution that I would like.
Firstly DSPAM is designed to work at the MTA level rather than working with email clients. Configuring MTAs is a pain, so at first I just ran it directly from Evolution with some limited sucess. The second problem I encountered was its speed, or lack of it. Although the website touts speed as one of DSPAMs major benfits, I didn’t see much evidence of this, with processing taking nearly one second per mail.
The final show stopper came when I finally tried to integrate it with my MTA (exim). The configuration wasn’t too bad, but once I had it all setup I couldn’t get it to succesfully process email because DSPAM would suffer a segmentation fault.
At this point I gave up and tried something else: Bogofilter. It was very easy to compile and install, except for the application of a small patch that is required for it to work with Berkley. Training on my mailboxes of spam and my inbox was extremely fast, and integrating it into Evolution was very simple.
Since doing this our host has got spamassassin working again. I’m still leaving Bogofilter as a second line of defence, and it has already caught some spams that spamassassin let through.
I had hoped to release PubTal 3.0 sooner than this, but I ran into a bug that took a considerable amount of time to fix. During my attempts to fix the bug I introduced a lot more test cases, and so found more bugs lurking in the code. The result of this delay should therefore be considerably better software!
I’m particularly pleased with the improved OpenOffice support and better HTMLText handling, although the convenience of having built-in FTP support is a worthy contender for best new feature.
In the course of adding image support to PubTal’s OpenOffice converter, I noticed that the HTML it was generating was not always valid, and so I set about trying to fix it.
OpenOffice is a huge application with a wide range of features, and it has a correspondingly large file format. The specification is 571 pages long, and the book on the subject is inaccurate. The book appears to have been written by looking at the output of the program, rather than the excellent (but large) DTDs.
I’ve not had the time, nor the motivation, to write something that would handle the whole format. With OpenOffice using an XML format, I could however pluck out a few basic things that I could easily convert to HTML. The problem with this approach is that supported XML structures can appear in unexpected places within a file. This meant several assumptions made by the conversion code, such as text:p never being nested, turn out to be wrong under hard-to-predict circumstances.
To correct this I’ve added a filter to the OpenOffice plugin. This filter silently blocks any XML structures that are not explicitly supported, while passing through all the others to the conversion code. To make this useful I’ve had to trawl through the conversion code in conjunction with the DTDs, and work out exactly what XML fragments I can support.
This reduces the chances of the code producing bad HTML, but it doesn’t eliminate it. The conversion code is modular, which means that one part might accidentally produce HTML that combines in an invalid way with the output of a different part. To solve this half of the problem I’ve written another filter, applied on the output of the conversion code.
This HTML filter increases the chances of valid HTML being written, by keeping track of what elements are valid within other elements. Ideally it would do full validation against a relevant DTD, but that seems like too much work, and would probably impose too much processing overhead.
I’m fairly certain that the combination of these improvements will result in only valid HTML or XHTML being produce, but I can’t be certain without significantly more work.
At least the code now handles images.
Email: colin at owlfish.com