Colin's Journal: A place for thoughts about politics, software, and daily life.
Writing web page content in OpenOffice is a lot easier than writing pages in a text editor, even though I’ve been using HTMLText rather than raw HTML. The PubTal OpenOffice plugin (available here) works well enough that I could convert my remaining pages over to using PubTal.
I’ve been avoiding moving the last of my archived content over to PubTal because it’s stuff that I don’t really care about any more. With OpenOffice I could just drag and drop existing pages out of my web browser, and then clean up a few things like the relative links. The main benefit of having done this is that all of the pages on my site now validate, and they are all produced using the same template.
I haven’t decided yet whether to convert other pages from HTMLText to OpenOffice, but it’s tempting for ease of maintenance.
I am having to reconsider the use of AbiWord as an editor for web page content. The reason for this is not due to a flaw in the idea itself, but rather the quality of the AbiWord software. Even the stable version (2.0) has some significant bugs that make it untrustworthy for handling important content.
The two most serious problems I’ve hit are:
I will continue to maintain and distribute the AbiWord plugin in the hope that future versions of the software will address these fatal defects, but I am now going to look at alternative editors.
The most promising is OpenOffice. The software is well maintained by a large team, is regarded as being of high quality, and the file format is very well documented. My initial impression of the file format is that it will be easier to handle than the AbiWord format turned out to be.
The biggest drawback to attempting an OpenOffice PubTal plugin is the huge numbers of features that OpenOffice has. Most of these features will not translate well into a web page, and so will have to be ignored by the plugin.
Features currently supported include:
Things that I haven’t been able to get working yet:
If you download and use this plugin please email me and let me know whether it works for you. I’m using AbiWord 1.99.5, and aside from the unsupported stuff, it seems 100% reliable so far.
Last week I wrote a short article on the importance of a template based solution for web page maintenance, and the sort of innovations that could be made to ease template design. At the end of that article I noted two other problems with web publication tools today: markup of the content, and the handling of non-journal style pages. This article addresses some thoughts I’ve had on the first of these two problems.
The most popular template based systems today are those provided by blogging software. They allow an author to enter new web content either using a web browser (thin client) or a small application (fat client). When the author decides to include a link, make some text bold, or apply some other markup the most common solution is to have them enter HTML codes.
Alternatives to entering the HTML manually include utilising IE specific enhanced textarea widgets, using a different markup language such as Textile, or providing buttons that automatically insert the HTML tags.
The markup in which an author’s content is written, and the markup in which it is published, must be treated separately, even if they happen to be the same. The reason for this is that an evolving web also means evolving markup for web pages, for example the transistion between HTML4 and XHTML1. When an author of a site chooses to move their pages from HTML to XHTML the software they use needs to be able to rebuild old pages using XHTML.
For software to be able to perform transformations of markup from one language to another it needs to be able to parse the original markup perfectly. If the original markup is HTML this poses two problems: writing and parsing correct HTML programmatically is fairly difficult, and if users enter markup by hand then there will be errors in it. The inability to convert cleanly to a new publishing markup language is a major defect in all of the blogging tools today that store and accept content using HTML markup. It is a hole that can be coded out of, but never in a 100% satisfactory way.
The solution to this problem requires a combination of three things:
The critical piece missing today of these three items is the third one: a GUI tool that allows the markup of web content in a strict, easy to parse, format. The bare minimum that such a tool should be able to support includes: links, text decoration (bold, italic, etc), lists (bullet and numeric), and images. There are lots of other types of markup which would be very useful (e.g. tables), but for most web content this limited list would suffice. Today there are many weblog authors who have tools and knowledge such that they don’t use the most basic of markup in their content. A GUI application supporting these features, and whoes output is in a strict format, would be enough to bring painless, sustainable content authoring to a much wider audiance.
Writing such a tool, while not technically difficult, does take time and effort. I hope to one day soon find an open source tool to do this. In the meantime however I have a partial solution: AbiWord.
AbiWord is an open source word processor. A word processor isn’t really the best choice of tool for editing web content, simply beacuse it has too many features that are not needed or do not apply to the web. For example AbiWord supports Mail Merge, multiple document sections with different headers and footers on the pages, and other such features that are needed for document creation, but not for editing web content.
Despite these drawbacks the use of AbiWord does bring some significant advantages:
To see whether or not this can work I’ve written a plugin for PubTal which takes AbiWord documents, converts it to HTML markup, and then publishes it using PubTal templates. There is still much testing to be done, but it now handles: headings, text decoration (bold, italic, underline, strikeout, overline, superscript, subscript), pre-formated text, hyperlinks, bookmarks (anchor’s), bullet lists, numeric lists, footnotes/endnotes, and tables.
The biggest missing feature is the ability to include images in the content. The problem here is that AbiWord doesn’t record the original location of the image file – it just places the binary content (encoded using base64) into the XML file. I can probably live with that restriction for most pages, at least until I can find a better solution.
Spam in blog comments was always inevitable because it brings two benefits to spammers:
As is clear from the discussion on Making Light it is a loosing battle to try and block comment spammers based on their IP addresses.
I’m currently thinking that there are two likely approaches to blocking this kind of spam that might stand a chance. The first approach is to show an image of a random letter in a hard to OCR font, and then asking the user to enter the letter (or series of letters) into the form with their comment. This is used on several large sites today, but I don’t know how effective it actually is.
The second approach would be to apply statistical filtering to comments in the same way as it is used for email. This approach has been very successful in reducing email spam getting into in-boxes as can be seen by the technique’s continued roll-out. It seems like an easy enough extension to apply this kind of filtering to comments in weblogs.
I’m sure we’ll hear a lot more about weblog comment spam as time goes on.
While it doesn’t strike me as strange that Germany bans heavy lorries from it’s roads on Sundays, it does seem strange that the government is working hard to maintain this ban. Germany is struggling economically and the government has accepted that it needs to reform labour markets. Yet when presented with a politically easy opportunity to remove an obstacle to growth such as this, it fights to keep it.
In other (rather older) news there’s tax competition at work in Denmark, where tax on alcohol has been reduced significantly. This is in an attempt to reduce the amount of booze bought in the rest of the EU and (legally) imported. We can hold out hope that similar pressure will eventually cap the tax we see on alcohol in the UK as well. (In an ideologically inconsistent fashion I don’t care how high tax on cigarettes gets!)
We could also do with some price competition here in Ontario, where the government run monopoly keeps prices significantly higher than the UK (e.g. £3 a pint!).
Until I moved to Canada I had never really considered the question of when someone should be allowed to vote, and when they shouldn’t. When you are a citizen of a country, and almost all of the people you know are also citizens the question of eligibility does not arise.
The issue is of particular importance in Latvia because 21% of Latvian residents are not citizens, and they are currently excluded from all elections including local elections. It seems clear to me that when nearly a quarter of the permanent residents in a country are dis-enfranchised in this way that something needs to change.
As a Brit in Canada I can’t vote in any Canadian election whether National or Provincial. If I “landed” (i.e. became a permanent resident) then I would still be excluded from voting, regardless of how long I lived here, unless I took up Canadian citizenship. Conversely I’m eligible to vote in the UK despite being out of the country for the last 3 years.
In the EU any EU Citizen is allowed to vote (even stand for office) at the local level if they are a resident. This logic hasn’t been extended to voting in national elections, and I doubt it will be any time soon.
With the increased mobility (particularly in Europe) of people between countries I would like to see the right to vote being tied to permanent residency. The latest EU directive on freedom of movement will bring an immediate right to permanent residency for EU citizens after 5 years in a member state. This seems to me like an appropriate length of time before someone is able to make an informed electoral choice in a country.
PubTal has had many changes made. It now supports XHTML, has a simpler configuration syntax, more content types, and better character set support.
Thanks to Florian Schulze for all of the patches and ideas!
Coding web pages is difficult. It has been difficult from the start of the web and has, in some respects, become harder as time has gone on and the technologies involved have grown. The preferred approach to making web site design easier used to be WYSIWYG (what you see is what you get), the idea being that Desktop Publishing was easy for anyone to do, so why shouldn’t web page publishing be the same way?
It is easy to denounce the WYSIWYG approach because of the poor quality HTML that it tends to generate, but this is to ignore it’s biggest flaw. The problem with using WYSIWYG design is not that the resulting code is a mess, but rather that the result of the design is a page.
The problem with building a web page is that at some point you will want to change the content of that page. Maybe you need to change your contact details that are at the bottom of the page. Maybe the site navigation bar down the side now needs another entry. Or it could simply be time to abandon the dark-purple on black colour scheme that looked so good when you first decided that you had something worth putting on the web.
Regardless of the motivation for wanting to update a web page there will certainly come a time when it needs to be done. If you have one page this isn’t a problem, if you have several hundred then it is a problem. Part of the solution is to separate content from design, to keep the HTML in one place so that changes can be made once. This solution has been known for a long time and yet it has not been a technique that many had access to.
The rise of blogging tools has brought this powerful technique to many, at least for journal style web pages such as this. Blogging tools have made the process of publishing on the web easy enough that almost any web reader can now become a web writer, should they choose to do so. There are still however many further improvements that can be made to make the task of publishing on the web easier. As Felix Salmon explains in today’s post, altering the templates of such blogging tools requires a significant technical ability. My own contribution to the ease of web publication, PubTal, certainly requires users to be able to code in HTML in order to generate their own templates.
I think the problem of web page template design can be solved by allowing users to work with components that fit together to form templates. Components can then be designed and built by those who know, or are willing to learn, the technologies behined them. Meanwhile users can mix-and-match components to form individual designs. Here’s an example of how this might work:
Using the scheme outlined here a GUI tool could be developed that allows for easy template design using the drag-n-drop of components. With components being distributed over the ‘net there would soon be a huge variety of template designs possible, without any of the problems of normal WYSIWYG design. The underlying technologies required to develop a system such as this are already in place, it’s just a matter of writing the tools to use them (no small task).
There are at least two other problems with the current crop of web publication tools that I’ve not written about yet: markup of the content, and the handling of non-journal style pages. That’ll have to wait for another day.
I’m working on some enhancements to PubTal at the moment, and so far it has been surprisingly easy. The next version will require updates to existing configuration files because I have consolidated a number of the configuration directives.
New features I’ve been able to implement include:
The next challenge will be adding the ability to specify an extra plugin directory, and an option to suppress output of the XML declaration for XHTML files (working around a CSS bug in IE 6).
Email: colin at owlfish.com