archives and technology

November 1 2005

One thing I didn’t explicitly mention when discussing the BBC Archive catalogue project was the technology that’s being used to produce it.

“It’s all run with Ruby on Rails. Yes, the BBC have allowed me (after some persuasion) to rapidly prototype and deploy this 7,000,000-row database-backed site in everyone’s new favourite web framework. This first version is really just a prototype; wisely, the BBC have decided to get it out there quick and see the public reaction.” – Matt Biddulph

Rails and to a far lesser extent Django are part of the fast, elegant web frameworks that powering some of the new web applications and web 2.0 startups that are mushrooming into view. It will be interesting to see how well it can be scaled to support the huge demands that the BBC Archive site will attract. I’d also be interested to see how they approaching the database end, and which database they’re using.

12 reasons not to use Microsoft

Microsoft’s Scoble has just outlined the reasons he perceives people don’t use Microsoft technologies for these type of web projects.

Another one I would add is the additional costs required in administration and in particular around patch management of the constant security fixes. Bear in mind that after every significant patch you need to ensure your web application still works. Sometimes it doesn’t which means non-productive work just to get to where you were prior to the patch. Either way it means more time being spent on testing. In all a lot of time, effort and money being spent with no real pay off to your users.

Text Indexing

I have begun doing some research into the state of database text support. A feature that’s very useful in projects such as archive catalogues.

Given that many organisations in NZ have defacto Microsoft technology policies I started looking at what Microsoft’s SQL Server now offers. Compared to Oracle it was pretty poor last time I looked but I expected there to have been considerable progress and to find that SQL Server now rivaled Oracle’s capabilities.

I was somewhat surprised to discover what a particularly well respected ex Microsoft Product Manager had to say about the current state of Microsoft’s SQL Server text indexing.

SQL Server 2000, even though it technically has a full-text search feature, actually has a very-badly grafted-on full-text engine that is poorly integrated, slow, unreliable, and assumes that programmers have nothing better to do than think about when the indexes are built and where they are stored. In production, the full text engine grafted onto SQL Server 2000 falls down all the time. – Joel Spolsky

Doesn’t sound promising.