I wrote TALAggregator because there were a few features that I was looking for that were not present in existing RSS Aggregators. That and RSS aggregators are fairly easy to write. Here's the main features I was looking for:
- Template driven web output. I wanted to be able to integrate the output from the RSS aggregator into any web page.
- Multi-user. I wanted each user to be able to have as many templates as they wanted, and for the aggregator to share feeds across users to reduce load and bandwidth usage.
- Last-Modified support. By using Last-Modified work load is reduced all round - a must have.
Once I started writing my aggregator I found that there were several other features that were desirable, so I added them in:
- HTML Handling. I wanted the templates to be able to include either the text as delivered by the RSS feed (potentially including HTML), or to remove the markup and keep the text.
- Bad RSS. Not all RSS is good, and so I added support to try and parse bad RSS.
- Mozilla side bar. Keeping the latest articles in the Mozilla sidebar is a great way to keep up-to-date, and was trivial to implement with my template system.
- Updated articles. I found that there were several feeds whose <item> elements were updated after first publication. This meant they would appear several times in the new articles list - so I developed a finger printing technique for determining whether an item was a new one or an updated one.
A few other things that aren't really features, more implementation notes.
- Performance. Aggregation performance depends almost entirely on network speed. Retrieval, parsing, updating, and inserting new articles from a local feed happens in ~0.1 seconds.
- Times all converted to GMT and stored as GMT. I'll add support for users to specify a GMT time-offset so that all articles can appear dated either GMT or localtime.
- Unicode support. All data is convereted to Unicode before storage, unfortunately MySQL doesn't yet support Unicode, so it then gets converted to the character set of the database.
Back to TALAggregator