Shelter

Tech, philosophy and random musings

You are here: Home / Home

22 May, 2006 by Alex

Single-sign-on (SSO) and Service Oriented Architecture (SOA)

Today I’ll share a few thoughts and design issues when dealing with Single-sign-on (SSO) and Service Oriented Architecture (SOA) (or read John Reynolds’s SOA Elevator Pitch). SSO is a pipe-dream that’s been around since the dawn of computing, where you sign into one service, and if you need to enter other services and those services is under the domain of where you first logged in, you’re already logged in and your session is with you. (If any of this didn’t make sense I fear this post is not for you; it is rather technical 🙂

Our problem

We’re a big organisation with a diverse set of operating systems, servers and skilled people. We’ve got Solaris (servers), MacOS X, Windows and Linux (servers and users), carefully spread across both servers or users, although most users are Windows, some MacOS X. We have bucketloads of different services, from backend servers and logging stuff, budget applications, HR systems, issue tracking tools, wikis, time reporting tools, staff directories … too many to count. I spend a significant part of a week logging into systems, some of them with different usernames and passwords.

For many years, vendors have pushed their various SSO solutions on us, most complicated and fragile, some better but with a lot of work, and a few reasonable ones. We’ve created a few minor and sub-par ones ourselves. They all are pretty expensive systems though, not nescessarily from a purchase angle alone, but certainly from an implementation stand-point; lots of work needs to be put in to configure, implement and maintain these systems. Lots of people in the IT industry deals with SSO as their prime job.

SSO systems usually tries to handle the problem of user identity, or co-operate with other systems, such as LDAP and X.500, or pure authentication such as Radius or even Kerberos ticketing systems. Then applications themselves store bits of stuff in their local session, some user information in their local database, synchronises some of that out, but mostly keeps it to themselves. There are lots of problems here, so let’s talk about what I’d like to see them do.

A better system

Here’s what I would want from a better system ;

  • Web Services API
  • User identity management
  • Roles and groups management
  • Profile management
  • Session handling

Most a) SSO, b) user management and c) session management systems are either just one of these three, or is too linked into some technology (Windows only, or Java only, or LDAP only, etc). We need one that does all of this, elegantly and simply, and through web services, and notice that web services is the first point on that list; if it ain’t web services, it’s not a solution.

A design I’m considering with my collegues is a simple database system with users and groups, a default profile attached to each user, a default session data blob, a timer mechanism, and the ability to add application-specific data blobs over time (using the same XML schemas). The only interface into these goodies are through a web service; REST or SOAP in, and a generic XML schema (Topic Maps based) out (or embedded in SOAP out).

By doing it this way, any system in the future is technology-agnostic outside the world of web services; we’re not tied to Java, Windows, LDAP, whatever. It’s very easy to implement into exsisting application (even applications who never thought that they would be part of a larger system such as this), partly by removing complex code (code that does either user management, session handling, and possibly some degree of SSO; out with it, and replace it with web services instead) but also because all of our platforms knows XML in such a basic form.

Now, since this is SOA, it becomes apparent that there’s a great lot of oppertunities for innovation here, especially within rapid prototyping and testing out various functionality, mixing in experimental services and so forth; we can create simpler PHP scripts to try out an idea, hack some Perl to discover some new semantics, or use Ruby to put up exciting new applications, or chuck stuff into Lucene without worrying about what technology the data is coming from. It also makes good for dealing with scalability and performance issues; smaller bits are easier to move around than large ones, and these issues can now be handled on the network level instead of within your chosen development technology (instead of designing an application to handle distributed transactions, you split the transaction further up the pipe and design your application simpler; less complex code to worry about).

Finally, we’ve looking at reusing OSUser from Atlassian (they’re working on next next generation of their user-management module called AtlassianUser, but they’re difficult to squeese info out of; will it be open-source, will it be available to others, when is it due, etc?), but if you know of alternatives, please let me know.

Filed Under: Uncategorized

19 May, 2006 by Alex

The importance of user-interfaces

All through my computer-infused life I’ve struggled with user-interfaces, and I’m pretty darn sure I’m not the only one.

The funny part about talking about the importance of a good user-interface is that we all know the importance of a good user-interface, yet when it comes down to it, it is the part of our systems that gets the least priority! From start to finish we talk about business requirements, functional prototypes and user acceptance testing (the “user” here not being an end-user, but usually the owner of the project). Rarely, if money is left over or we’ve got too much time on our hands, things like information architecture, usability testing, persona pathways, participation and interaction design are scarcely interspersed, usually by the wrong people, in a haphazard way. Why is that?

I’m a technologist and a geek; I’ve been doing technical and functional stuff all my life, yet I’d fight you to the death to do user-centred design, usability testing in iterative lumps and be a bit creative about the information architecture before you try to nail it down in a strict taxonomy!

Ease up, people! We humans have a great sense of order in things; we classify, sort and think about their placement. On the other hand, we are also forgiving fuzzy creatures. Why are we mostly developing systems that adhore to the first group while ignoring the second at the same time? Because it is that crazy combination of the two that makes us humans work the way we work! In other words, why are we creating systems that work against human-nature?

I’m annoyingly baffled, in an unsurprised way.

Filed Under: Uncategorized

17 May, 2006 by Alex

Not this time

As some of you might know, I was opting for moving from Canberra, Australia back to Oslo, Norway in or around August time this year; I’ve had interest from two successful consultancy companies (the only two I’ve bothered contacting, too). It seems now that both have resulted in nothing;

The smaller company can’t overcome their meeting-in-person interviewing practice (meaning, I need to go to Norway for two interviews before they’ll truly hire me, but I’m currently a public servant and as such does not have that kind of jet-set budget) and the second company, my old company in fact, haven’t replied back to me (nor my chase mails) for a couple of months after proclaiming serious interest. Not sure what happened there, but maybe I jogged their memory to hard? 🙂

I’ve also had a few good other leads to jobs elsewhere, but somehow each and every one of them have resulted in interest but not any practical solutions (Some can’t pay “enough” [family to feed], some are abroad and there’s VISA issues, some have found closer similar skillsets, etc). Maybe I need to address larger companies who can afford me?

I guess I’m stuck in Canberra for now, in a place where 80% of all jobs are inaccessible to me because I’m not an Australian citizen, unless I go contracting … which I don’t feel comfortable doing as I’m the sole provider for my family and my network here is poor (only been here 2 years).

Now, it’s not that my current job is so bad, but I feel the time to move on and do great things have come. Yes, there’s the option of doing great things in where I’m at, but I’ve struggled with this places’ idea of “innovation” and resource priorities of late. Oh well, I’ll bite my teeth harder together and we’ll see what happens next.

Filed Under: Uncategorized

15 May, 2006 by Alex

A few thoughts on my online communication

As you may have noticed, I have been on a rather extended hiatus in regards to blogging. I thought I should quickly summarise my absence ;

1. I was rambling too much about my miseries, too much for my own comfort. Yes, work has been trying, so here’s the quick summary of that; the public service is a slow bitch when you come from a commercial high-flying world. I’ve had to learn to deal with this better, because, in this town, 80% of all jobs are government, and for the moment I seem to be stuck here. I also thought that there would be companies out there who would see my wonderful CV and snatch me up before anybody else would, but there were errors in my plan (see next point).

2. Everyone is a frigging expert while I’m starting to sound a bit like an wannabe jerk. Not a good sound at all, so my writing will surely be toned down a notch, and my expertise adjusted. I feel a bit stupid, really, but still haven’t figured out if I’m behind or ahead of current thinking on a number of issues. Report at 11.

3. Nothing is surprising anymore, and everyone is a blogger. Everyone writes about a number of “new” things, but seriously, none of the really are. Everything is regurtitations of other ideas, and I simply don’t get surprised anymore. I don’t feel there is anything important to write about, no matter how wrong that might be. (See point 2 above)

4. Communication was failing me; after 17 years in the IT industry I came to a stand-still in communication, be it with friends, family and people I know around the world. I keep responding to things, but the amount of replies were decreasing. I’ve felt the dreaded “am I missing emails?” syndrome, another bad state to be in, when emails and blog comments holds higher importance than real-life. I’ve adjusted my importance on the online world accordingly and don’t rely on online friendships for self-realisation.

So I feel a bit fresher and wiser, and I’m being a bit humbler, I feel properly embaressed, and will probably be a bit gentler in my online approach.

Filed Under: Uncategorized

15 May, 2006 by Alex

Wiki as a KM and PM tool

I posted a comment to Denham Grey’s blog about capturing corporate knowledge, from which two people have asked me to say more. Two people asking for that in my book qualifies for a blog post, so here goes;

First, the two acronyms; KM (Knowledge Management) is a cauldron which contains many things (processes, methods, systems, software, etc) that tries to manage (meaning; collecting, storing, finding, repurposing and changing) “knowledge”. PM (Project Management) is that crazy category of “things we do to do things on time and within budget.”

Right then, a Wiki is basically a really simplified web-based page-driven “anyone can make edits to a page” system, but instead of wasting my time rewriting what’s been said before, here’s the worlds best Wiki explaining itself. I’ve been doing Wiki’s since 1997 (two years after they were ‘invented’, so I’ve been doing them for quite a while now, seeing them grow and flourish).

Knowledge What?

First of all, let me just state that I don’t belive in Knowledge Management. I do have some hope in Knowledge Representation Management at least, but the difference between the two is the realisation that “knowledge” is a human thing that computers don’t have, don’t handle right now, not in the next few years, possibly not until Quantum Computing and serious AI systems taking off, possibly long after I’ve passed away, and that the only thing they can do and do well, is representation of little bits of information. The current thinking that we’re on to that golden path of “Knowledge” in computers is what brought us all that ontology noise and semantic web porn, but I’ll leave that rant for another days.

The goal of KM is a worthwile thing though, and we use a variety of systems, methods, tricks, software and sanity to trick ourselves into believing we’ve got a good grasp on the concept of “we’re doing KM.” For most people it involves some kind of intranet in the shape of a Content Management System, possibly with a few KM features bolted on, and perhaps some records management and customer relation management system. So, we can list a few good acronyms here; CMS, CRM, KMS, RMS. You can google them if you like; hours of fun reading, if you’re a maschocist.

In short, most of these systems are huge databases with an underlying data-model that tries to do what they state on their respective tin. A popular game with enterprise management is to buy one system for each component of your enterprise, so one for taxes, one for the website, one for the intranet, one for customer relations, one for finance, one for leave and pay, another for filling in your hours, one for the helpdesk, one for systems support, one for deployment and / or configuration, one for holding your hand, another for wiping your bottom, etc, and so forth.

So the first obvious problem with all this is of course that there are many of them! And all sporting their own unique way of doing things! With their own unique user-interface! Most of them using some proprietary user-management module which results with you having to have about 5 usernames and passwords just to get you through a normal week.

One can argue that all these systems combined surely holds a bit of the corporate knowledge, and quite pssobily if you merged all those data-models and interfaces and methods and ways of reporting, we might have a pretty good Knowledge Representation System … provided, of course, that you know all those data-models by heart, the user-interface was far smarter than you, and everybody in the world was working towards making you a happy human being in liue with the universe.

I’ve seen some pretty complex enterprise setups in my life, and I’ll swear that no one – no one! – has ever come close to capture knowledge (in representation-form or otherwise) with this one-system-to-every-part nonsense. The proof is in the pudding, and I’ve yet to find a pudding that tastes wonderful, is good in shape and form, looks pleasing and leaves me feeling satisfied after use.

What’s a document?

It’s a good question; what’s a document? A word document? A meeting invitation in Outlook? A mail? A picture? A diagram? A todo list? A meeting minutes? A draft of a specification? A combination of many things? An atomic unit?

Very often people’s notion of what a “document” is is quite varied; do you mean a document on a company, by the company, for the company, is it about fish, a todo list for fishermen, a complaint on our smell of fish, a fishy document … what is it? In my book it seems like a worthwhile thing not to do is chasing the “document” paradigm, because “document” often is represented by some finished work, a piece we can fit into our KM machinery. (In rare circumstance we refer to draft documents, which really are drafts, before we treat them again as a produced document)

Instead, let’s work with something that has proven itself to work quite well; a web page. It has proven itself over the last decade to be a very good spot for information, especially for changing information. Web pages change all the time.

The Wiki way

The Wiki is a changing web page about something, anything. So instead of creating a document about “Fisheries” you make a Wiki page about “Fisheries”. Instead of using a special tool (like a word processor), you use the browser directly. Instead of saving it locally first through drafts (my_doc_v1.doc, my_doc_v2.doc), share it over email (my_doc_v1.doc, my_doc_v2.doc, my_doc_v3.doc … uh, who’s making changes to what document?!), get it back and do more edits (my_doc_v5.doc, my_doc_v6.doc, oops! my_doc_v5.5.doc, my_doc_v7.doc), upload it through the intranet thingy (my_doc_v2.html, using the most abysmal HTML known to man) … instead of all that, you simply go to the page, click an edit button, make some changes, click the save button, and you’re done. Everybody can edit and save all pages; no need to share it around as it is naturally shareable.

Ok, so let’s assume we all know the simplicity of this model. What’s stopping us from dealing with almost all of those KM tools in a Wiki way? What stops you from setting up a page about yourself with a picture of you, your contact details, where you fit in the organisation, what you do, how you do it, what your hobbies are and what other extraordinary skills you’ve got? What stops you from setting up a page about a project? With links to documentation of various kind? What stops you setting up a page with your hours in them?

The answer to a lot of those questions are mostly “you can’t mine and reuse the data for other purposes”, again referring us back to the KM machinery. But that’s just where things are about to change, and in big ways. Do you really need everything to be in a highly-structure database. I mean, seriously, I know you want to use that data, mine it, sort it and report on it, but do we really do it? And if we really do it, does it matter if the data comes from a database of fields or a database of pages?

Most good Wiki engines support different ways of taking your input and converting it into something more useful for computer processing, either through crude file export or more sophisticated Web Services API’s. This latter is what I’ve done with huge success.

Web Services

A page in a Wiki system is usually stored internally in a loosly structured way, often in something known as Wiki markup; it consists of plain vanilla text that is given some special meaning, so that “it was *the dog* who ate it” is converted to “it was the dog who ate it” when displayed.

Here’s a better example, a page called “DimwitProject_HoursWeek52_AlexJ” ;

Dimwit Project
————–
Application design : 20h
Usability study : 9h
XML schema work : 3h

It isn’t hard to get or write a little parser (a lot of Wiki’s have lots of these already out of the box) that can convert the above to ;

<hours>
<title>Dimwit Project</title>
<item type=”Application design” duration=”20h” />
<item type=”Usability study” duration=”9h” />
<item type=”XML schema work” duration=”3h” />
</hours>

The road from using the Wiki with another part of the KM machinery is a lot closer as these systems more and more utilise web services; in fact, you can use the Wiki as the interface to almost all of it.

At work these days we’re using an enterprise Wiki system called Confluence that has both SOAP and XML-RPC web services available, and I’ve created parsers and scripts that basically allows me to use Confluence as a Wiki interface into a number of services. What happens then?

Well, first of all you get one point of origin for most of your normal processes, the very pipe-dream that portal systems dream of, only in portals you’re at the whim of developers creating user-interfaces and systems in perticular ways for it to really work. In the Wiki, you’re already familiar with the interface, and, perhaps more importantly, it is within the page-paradigm, which is easy to bookmark, easy to reference, easy to modify, easy to remember, and easy to search. And if there’s some things portal systems suck at, it is pretty much all those things listed. And the Wiki has a distinct Google advantage; free-text parsing and linking that can convey importance much better than most metadata can! (I know; bold statement, solely based on the tremendous success Google has shown us in this area)

Second, because of the simplicity of adding and editing data, the freshness of the information becomes higher. If you only allow people to use the Wiki for most things, even more so will the Wiki be fresh in content. Instead of John using Word to write down the minutes of a meeting, make him do it straight in the Wiki. Instead of letting Doreen write a draft letter to the fish caterer in Word, let her do it in the Wiki. Instead of adding a meeting to Sonjas schedule in Outlook (where perhaps a few know how to properly use that information), just put it on her Wiki page.

Third, and this bit is a bit philosophical and possibly psychological, but an open space for all to work in helps people a) understand what others are doing and what they’re working on, b) helps generate an atmosphere of less secrecy, and c) promotes a less rigid structure for live information (there is no longer just draft and published documents; they are all living, changing all the time).

Here’s what I do

First of all, I created a really simple URL for people to go to, such as wiki.work.com or work.com/wiki. (I’d recomend that you create a few shortcuts as well, so that wiki.work.com/project/MyProject is the same as wiki.work.com/projectMyProject, as this gives the impression of structured data)

For project management, every project has a starting Wiki page. On it, I foremost write a) what the project is about and for (divided into 1. the problem and 2. the solution), b) where we’re up to, and c) where you can find some current representation of what we’re doing (an application in test, a document, a draft design guide, whatever we can prove our exsistance through). Then I write who’s involved in the project, stakeholders, developers, watchers, and all these have their own pages which are Wiki pgaes themselves. Finally I have a separate documentation page with links to all our various documentation, all Wiki pages.

If we have a Word document, it will immediatly be Wikified and deleted from the offenders PC. This is important; delete all Word (or other proprietary format) as soon as you possibly can; if the Wiki is to work for us, we must work with it. This is probably the hardest transition for most people at first, but after a short while they’ll never look back. 🙂

Once a day I update the front page with status information. I usually do it at a specific time everyday, like 1pm. For every important information bit I might add a comment of progress and possible resolution. Once in a while I create a GANTT chart (because some people can’t live without them) which I’ll attach to the Wiki page and link to. If I can’t give people a good overview of where we’re up to on that front page, I doubt any other PM software would do a better job.

All documentation is a separate page which you’ll link through to the documentation page. It doesn’t matter if this documentation page gets long; group the links reasonably well and label the links, and people will have no problem finding them. If I need to write a report, I create a page that’s a sub-page of the project reports page. You can almost never have enough pages, and you certainly will never run out of them.

Some Wiki’s support structured pages (and our Confluence does just that) where you can create sub-pages of a page where that structure can automatically be called upon in terms of navigation, display and organisation. Use this wisely. Some Wikis also support things like tags, blogging, WYSIWYG editors, sub-Wiki’s etc, and all this will help you out in creating a good intranet.

Some pages are worth republishing, and this is done by taking the page name and push it through a simple PHP script I’ve got that fetches the page content through web services and displays them on our various other webs. Over time this will probably run the whole website, but currently there’s an assorted pages done this way, and I’me working on making all news / newsletters done this way, repurposing bits of news. (Our Confluence supports various blogging paradigms, and creating and reusing newsfeeds from pages/ Wiki blogs is easy)

Some pages are reports by themselves, sporting simple Wiki macros that take information from various places, and creates a summary page (which is the report itself). If your Wiki markup is well-structured, creating quite sophisticated reports is easy. For example, I can create an automatic page that is a monthly and / or yearly summary of all my hours spent, using the Wiki markup I described earlier.

Hmm, I do do a lot more, and I might update this post as I’m doing them.

Conclusions

I suppose there are a lot of straw-men in this post, ungrounded facts and dubious claims, so I suggest not to take my word for it but simply try it out yourself. Start simple; install some Wiki engine, start documenting your own projects, and invite a select few to participate. See what happens. There are a number of things to be said about whether an organisation is ‘ripe’ for a Wiki approach or not. I personally have witnessed conservative technologically-challenged folks use Wikis with ease and pleasure, but I’m sure there are counter-stories to be told as well.

Often people think of Wiki’s as another tool for their toolbox, but in my experience Wiki’s tend to work best when you remove some of those other tools; it seems to be worth the re-training and initial frustration and scare. Just because everybody uses Office doesn’t mean it is the best tool for the job, nor does it mean you should even use the darn thing; we have a tendency to use Office for all the wrong things as well, just because it is there. Time to rethink.

Finally; spreadsheets. Yeah. Wiki’s can’t compete with them. Sorry. 🙂 There are however smart ways to link into the information within them (again, think web services) and reuse that information for all your Wiki pleasures.

Enjoy.

Filed Under: Uncategorized

12 May, 2006 by Alex

I’m baaaaack!

Ok, so I’ve reconsidered my life and blogging, and I’m back to fiddle some more. This just a warning. Oh, and I’ve changed to blogger to handle it; I just couldn’t bother with all that silly code anymore.

I’m redirecting my old feed https://shelter.nu/shelter.rss to https://shelter.nu/atom.xml. If any of this means anything to you, you know what to do. If not, don’t worry.

Also, I’ll be reposting some recent mails just in case. Also, all my old posts and comments will still be available as is, but I’ll probably just shoot in a little message at the front gate to have go to https://shelter.nu/ from now on.

Wish me luck!

Filed Under: Uncategorized

12 May, 2006 by Alex

Knee-deep in SOA

Lately at work I’ve been knee-deep in SOA; Service-Oriented Architecture, a concept that for me has roots in web services but extends beyond that in that it has made my holistic thinking possible.

There is more to application design than to solve the problem at hand, far beyond the scope of any requirements document; it’s more about supporting the infrastructure than whatever else you think you’re doing. Maybe this requires some explanation;

Business Analysist create requirement documents to solve some business problem. However, unless that business analyst is especially sharp and holistic, there are so many undercurrants, twists and turns to the final solution that more often than not we shouldn’t even attempt to solve problems. A lot of places employ Solution / systems / information Architects to try to rectify this problem, but often it simply ain’t enough; detachement from technical solutions seems to be a huge problem for understanding business problems.

A lot of us are technically inclined people; we try to take the business requirements and make technical solutions of them. Anyone with a speck of experience in these areas know how dreadfully wrong it can go, and we say “oh, you need to employ the right people to make it work.” Most of the time, that’s true; with really super people these things will work much better. How often do we have the luxury of only working with top-dogs? I’ll leave that an open question, of course.

Enter the SOA; think of your business requirements as services, tiny and disjointed or large and intertwined, and open up Interfaces to them, and what happens? Well, notice that bold word; interface. Not application. Not program. Not even requirement. It’s a service, a service that programs, applications and requirements can use to solve their problems.

We all know web services by now; most of the time it means either a SOAP or REST call where XML is the carrier of various bits of information. Because of the openness of these technologies we can quickly cook up various other applications and programs from a smørgåsbord (yeah, look it up) of services that might address your issues or wants. They are completely disjointed from the applications that use them, meaning a clear separation of business logic, application frameworks and user-interfaces. If played correctly, it can have an amazing synergetic effect on everything you do.

If you’re a geek like me, the prospect of this is great, but over the next little while I’d like to talk about all those things it affects in better business management, usability, information architecture, user-interface design, application design, application scalability and performance enhancements, and more.

To sign off though, I’d like to talk a little about what I’ve done so far. First, I’ve created a hug from which all web services come, something like http://ws.example.com/ which works as either an application context for your servlets/scripts/etc that are plain web services, or it works as a proxy for external services. This hub has a wrapper so that it doesn’t matter if you want to use SOAP or REST or even partly RSS/Atom feeds of stuff.

Next, put some good services in there, like user-authentication, and you’re half-way there to the single-sign-on pipe-dream. I’ve implemented it with some thesaurus services, authentication, an OPAC service and a wrapper for Amazon.com.

I can now, through PHP, create a completely new application to a few services that were written in Java and Perl. I can pass bits of information in to our OPAC and do more complex searches. As a test, we recently create a Lucene database prototype of about 11.4 million MARC records. The user-interface looked terrible, but through the SOA hub we split the requests in two (at random, fire your web service request at two different servers for load balancing), took the first record from the service and fed it to another lookup-service, fed the subject headlines from this request to the thesaurus, did a third search in the Lucene prototype for the subjects that had thesaurus entries, and Voila! we could present a new application with a good user-interface, all in two days from start to finish. And we really didn’t break a sweat, either.

That’s what I want to talk about; what happens once you’ve got a basic SOA in place; synergy!

Filed Under: Uncategorized

9 March, 2006 by Alex

Here is a How to Topic Maps, Sir!

It is said that people who write technical articles for the purpose of explaining a given something shouldn’t know all there is to know about the given subject, as an exercise for the writer to become as knowledgeable about the given subject as the headline touts and hence feel the real pain felt by their readers in comprehending the same.

It is also said that the landing on the moon really happened in a film studio in London. Some say that Elvis isn’t dead. A lot of things – as it happens – has been said about a number of things, but apparently nobody has said much about their CD collection and Topic Maps, and hence I rise forth to the tedious task of not knowing what I’m talking about to make you and me as knowledgeable about this subject as the headline reads.

This is not a tutorial. This is an essay written to be preceding a tutorial I’m writing about Topic Maps and how to Sort your CD collection with it, which in itself will come in parts. The reason for this prelude is two-edged; philosophy vs. real-life;

There is a great deal of philosophy involved in working with Topic Maps. Not in the sense of arguing for extentionalism or purport a theory of when cats die in boxes, but in the sense of epistemology, the philosophy of learning. It is about how we perceive things, how human cognition works, about how we label things, how we categorise and find our way in the vast information layer between our brains and our tools.

The other reason is that this essay was easier to write. Bare with me.

Why Topic Maps?

What are Topic Maps? No, let’s start at the other end; what is a computer, and why did we design one? What need do we have for a computer? Well, the name says a lot of it; we need to do computations, and a computer these days is all about processing vast amounts of information as fast as possible. But the computer is a very logical thing, and humans are not, despite rumours that says otherwise. For a computer to work, humans must tell it what to do, and for a logical beast to behave the way a human wants it to behave, they need to come to certain compromises; The computer must act as if it was a bit more human, and the human must act a bit more logically. These compromises are also known as abstractions, and a world-wide all-truth all-encompassing goal is to make abstractions that are as close to human nature as possible without losing that logical processing power.

We keep trying to create technological solutions that resemble human nature so that the information can be processed and handled as good as possible by both man and machine. Sometimes the abstraction happens in the user interface, where we create a cute icon for a complex computation or write the word “do” when what really happens is “do, fiddle, tweak, load, count, compute, save, tweak, squiggle, save again, spit and you’re done.” Other times the abstraction happens on a data model level, creating tables in such a way as to make human sense. Maybe the abstraction is on a hardware level. And, in fact, bits of abstractions are everywhere, from the inner CPU out through software to the keyboard you type on and the screen you’re viewing. Unfortunately, all these bits of abstractions don’t necessary make it easy to grasp what is going on, because they are – surprise, surprise! – bits that more often than not speak their own parables, and don’t form a complete story.

Topic Maps is an abstraction that tries to bring together quite a lot of these bits, from the data model to the user interface, making an effort to try to tell the same story across the many layers we have in computers. And as such, it not only permutes through the technical layers of “data model” and “user interface”, but also the people involved in using it, from designers and developers, project managers and general management, to users and interested parties. John the developer can now speak in the same language as the user, which is no small feat in itself and one that should lower the cost of miscommunication.

Topic Maps tries hard to lower the cost of miscommunication. It is a data model and accompanying exchange formats and API’s that share a common set of terms, so when the user speaks about a bad association role, we all know – from the content producer that put it there, to the developer that implements the functions to do it, to the manager that handles the users request – exactly what that is.

For many, this is music to their ears.

Music

What is music? A simple question we all know the instinctive answer to, but often fail to formalize. One much used statement is that music is “the art of arranging sounds in time so as to produce a continuous, unified, and evocative composition, as through melody, harmony, rhythm, and timbre.” [Dictionary.com] Now, as much fun as we could have going through the academics of music, let’s just note for the record that music ain’t as straight forward to classify as many would like it to be.

We could go to Amazon.com and have a look at what’s on offer:

y

Something for everyone it seems, and poking into each of these categories we find yet more sub-categories, related categories and subjects, searches and recommendation. In fact, Amazon.com does a pretty darn good job in assisting us in trying to find what we’re after. In my case, I’m after a new recording of Monteverdi’s Marian Vespers published in 1610. Under “classic” I could use the “historical period” link to brush forth to the baroque era – although it could be renaissance, and neither this or baroque are classical, but never mind the nitpicking – or search for “Vespro della Beata Virgine” or “Monteverdi” or both, or I could browse composers under “M”. I’m pretty confident that what’s available will come up; my trust in superior technology is paramount.

bugg

I search for “Claudio Monteverdi”, and the “most popular search for Monteverdi” is his eight book of Madrigals, and hence not what we’re after. In the full list of our search for Monteverdi, the first CD that pops up has nothing of Monteverdi to do with it. Nothing at all. Neither does the second. Nor the third. What’s up with that? Umm, this is definitely the wrong tree. Let’s try something else as I must have done something wrong.

I click on “related searches” to get to just “Monteverdi”; that surely must be the right way to go. Now, this gave me three CD’s of two of Monteverdi’s operas, and the old list above of unrelated CD’s. But I’m not after opera, I’m after vespers. My bad.

Ok, I’ll try a search for “Monteverdi vesper”; that ought to do the trick. It found three; first one has excerpts, the second too, and the third is an out of stock version of the vespers. Now, I personally have several recordings of these vespers, and none of them showed up in the search. Alright, alright, I’ll try historical time period. Then go to baroque. And then either vocal music or religious and sacred music. Hmm. Let’s try sacred. Now a massive list pops up, and I do a search for “Monteverdi” within this context, umm, which brings up one CD without any Monteverdi music on it.

Ok, last try, and straight for the jugular; I do a general search for “Parrott Vespers Monteverdi”. That surely must yield what I want. And lo and behold; the first CD is the 2000 reissue of the 1996 recording of the vespers by Andrew Parrott. Ok, fine and well, I found something familiar, but I was after something else; who and what are the latest release of these vespers? What is related to this familiar one that I finally found?

Well, Boston Baroque has a version of it from 1997, and got good reviews I see, but I know there are many, many more of it. For someone like me, who study music and embark on journeys of pilgrimage just to hear a 17th version of a “Psalmus 109: Dixit Dominus”, diversity is very, very important. And, as such, a problem with most CD collection software I’ve had the displeasure of using over the years, just like my problem with Amazon.com.

Classification

The story so far tells us a tale of certain types of music that can be classified in many ways, and that of the misguided attempts by yours truly to trust that a computer system gives what is expected of them, which also points to the problem of findability. My musical thirst seems to be hindered by the fact that my music can be performed in various ways, giving further classifications to it not automatically present in general terms, it can be re-issued, re-recorded, re-ordered, re-performed, re-opened, re-searched and / or re-decorated. There are so many ‘fuzzy’ things about music that it has proven to be one of the more challenging problems to solve.

But why this fuzz about music and Amazon.com? You’ll see later that Topic Maps lends itself to a natural way of doing classifications, point to resources and merge information from different layers so that I wouldn’t have to search and search and search in all the wrong places to get all the wrong results. To understand how Topic Maps can make things better, let’s take a closer look at what my musical problem really is.

y

For me, Amazon.com did not do a good job, it did not meet my expectations, and did not provide what I wanted. In fact, the same search job in Google.com gave by far better results, even to items within Amazon.com itself, but that is another story. However, the jolt gave us some important hints to why music is so difficult.

The Marian Vespers of 1610 by Monteverdi are many things; sacred music, renaissance style, baroque style, Roman style, Venetian style, composed over several years, no evidence of being performed before its publication in 1610, uses several librettists, catholic music, show piece, adapted secular music, performed by many in many styles, recorded by many in many styles, played differently by many, played with different settings by many, basso continuo, a Capella, sad, joyous, separation of styles period, recordings re-issued many times, and so forth. You simply can’t put it into one category, and feel happy about that.

Classifications touch on an area which is a bit risky; human cognition. It is the ultimate goal that a computer can use human cognition in terms of helping us process our data in best possible ways, because it lowers the cost – be it money, time, resources, power – of the interface between man and machine. Classification is a basic human function we perform all the time in order to gain some knowledge about things around us, but a computer, being a bit thick on the ‘human’ side but truly remarkable on the ‘function’ side, needs to convert data between human thinking and computational power. Remember what was said earlier about finding the best abstraction between computer logic and human concepts.

As such, we delve into more philosophical ponderings in search of good abstractions, and come up with something well known within the library world, called faceted classification, that a ‘thing’ can fit into more than one universal category. Faceted classification usually solves the problem of finding subjects with a great many options like this, be it top-down classification or more complex clustered facets. A common view about information like this is that there is an inner truth, some point you can cling to, and add as many property values as you need to it. So, “Marian Vespers of 1610” – a nice point to cling related information to – can have all of our facets listed above as properties; is-a sacred music, is-in this style, has-a setting, been-recorded by, and so forth.

This is, in simple terms, the world view of the table; something has a row of columns with data about the thing in it. We create programs that converts that table of things and present them as human as possible, and for record-keeping with a finite set of properties, this works like a charm. But what happens when we got infinite sets of properties attached to a thing? We create more tables to hold more versatile properties, and make ‘relational links’ between those tables. Welcome to the very common world of relational databases.

From tables to nodes

In the relational database, that magic “point to cling related information to” is a row in a table. It is one thing, a subject, a point, a node, whatever, and we put properties on it. We can have a table called “works”, and chuck in a lot of rows representing each work Monteverdi has ever written. At row 6, we find “Marian Vespers”, and in our table there are columns like “published”, “composed”, and “librettist”.

y

Oh sure, this is a simple mock-up of a table, but it will demonstrate some very important things;

If something has more than one name, such as a nickname or a name in another language, what do you do? Add more columns and call these “nickname” and “latin_name”? As any sensible E-R designer knows, this is the time to refactor the model. We create a new table, call it “names” where we can put in as many names we like. Then we create a third lookup table to bind the information from “works” to “names”. We could call this one “names_lookup”, and voila! We then create some SQL to represent this, and create user-interface that a) knows about this data model and b) present it in the most likable fashion.

y

Yes, our names can now flow. Umm, apart from such questions as which name to use as a default name, which one to sort on, which one to use if your mothers sister just gave birth to triplets, or what name to use if the Moon is aligned with Venus. It is not safe to say, but by extending the “Name” table with further properties you could solve this problem. Unless there are many types of names, in which case it is easily solvable by creating a new table called “name_types” and another table called “name_types_lookup”, create more brilliant SQL to represent this, and chuck a user interface on top of this that a) knows about this data model and b) present it in the most likable fashion. Yes, we can extend and fiddle with our data model like this until all the little bits of our information is properly dynamic.

So when that model is sufficiently huge, what have you got, apart from a legacy monster of a database that requires a massive team of specialists – in the model designed, in the SQL created, in user interfaces hence devised, in RDBMS for support, and in magic to make it all sound like a good thing to do? You’ve got an ugly, slow hog. And we don’t want hogs; we want smooth and elegant and fast. Oh sure, you can design pretty smooth and elegant data models with plain vanilla RDBMS too, but the more complex the data, the more complex the data model, and the more hog it becomes. Legacy is something we’re all trying to get away from, not embrace.

Some in the RDBMS world refer to maintaining these massive systems as putting lipstick on a hog, and apart from the technical challenges these systems can have (performance, cost, scalability, development) they are fragile because they require so much legacy knowledge about them, which in fact, many of these systems were designed to remove. A catch 22 indeed where we design complex systems to make it easier for the user in finding information, but in doing so we create hogs that are very difficult to maintain and develop.

Topic Maps is hence data model that can be recreated in a lot of existing tools, and there is a very good article by Marc de Graauw that demonstrates how you can use your RDBMS to create a Topic Maps model in it and use it for all the goodness that it gives, similar to what I demonstrate above.

But let’s take a few steps back, and get back to that magic “point to cling related information to.”

Putting the map back in Topic Maps

y

Let’s talk about the “map” in Topic Maps first. Above is a map I did of myself some time ago. There is a magical “point to cling related information to” in the middle named after myself, in which I drew lines between that and other subjects I’m interested in. In the mathematic world this resembles what is know as a Graph, and mathematicians work on these with Graph Theory. I personally think it is just a pretty map of things I’m interested in. My subjects can just as well be cities, and the lines between them roads. It’s just a map.

There is a strong correlation between a map and a category system where entering a category can be seen as zooming in on the map on a subject. There are two very interesting things about a map such as mine above; the subjects scattered about and the actual lines between them. In my map, the boxes represent “subjects” and the lines represent “interest”. In graph theory my subjects are called “nodes” and the lines “edges”. In Topic Maps, these are called “topics” and “associations.” See, there is a red thread running through these maps.

Nodes. They are very small, and they don’t really hold much value in and of themselves, but if we attach other nodes to it through associations / relations / roads they hold valuable information. This fact lies at the core of a Topic Map. There are topics, and the topics have associations. That really is all there is to it. No magic, no mystery, just plain vanilla logic that happens to resemble human nature in its simplistic form.

Now these nodes and associations on their own is all fine and well, but it isn’t until we put them in some order or system that they can be used for something. After all, our much sought-after abstraction that is human and machine at the same time does not come from wild ideas about “nodes” and “edges” or “cities” and “roads”, it comes from writing software on top of hardware to do the job. So, let’s have a look at how we can do the job.

The truth about relational databases

The truth about relational databases is that they really are Topic Maps that are trying to get out. Think about what your RDBMS is trying to do; you have a lot of tables with information bits, and you create relations between them to represent something vital to your business requirements, write SQL to mirror that and try your best at fixing a user interface on top to make it all work. The more relations you’ve got, the more complex your model is going to be. And for what? To create an application that that both a computer and human can handle well.

Where do you stop expanding your model and when? When it gets too complex? Too slow? Too unmaintainable? Too crazy to keep going? Too often you get bogged down in the design of models; what relations are hogs, which ones are necessary, which ones are not? Why not look at it from the other direction; everything is in relation to something. Instead of learning all there is to learn about how to limit a model, why not learn all there is to learn about how to expand your application?

The truth about Topic Maps is that it is a data model that is very successful for a wide variety of applications, and I’m not talking about applications here as in “a program” but as in “an area in which to apply a solution.”

Let’s have a peak at a Topic Maps version of the RDBMS problem of music earlier on.

y

Now, before you panic, lets go through it slowly. There are four topics in this picture. Yes, that’s right; the four circles are topics. They have funny labels which aren’t labels at all, but ID’s. In my passport there is a complicated number which tells the people reading it that the passport of Alexander Johannesen has this number. If it was a different number, it wouldn’t be him. The same with a unique ID number in a database. It is a something that says that this thing is unique. The same for our topics; our ID must be unique so that we can point to this number and say “I’m talking about this one.”

Each topic has one or more names associated with it. It isn’t required to have a name, though, but it can be helpful to the human that was to fiddle with the map. Some of these topics also have more than one name, which is fine, and the names can have a “scope”, meaning a determining value that says something about this name. Our example from “mvw_007” is the name “Vespro della Beata Virgine” for the unconstrained scoped name (meaning default for most things) and the name “Marian Vespers” with the scope “english.” This means that the name “Marian Vespers” have a value (“English”) which says something about it. We can use this for many things, but here it is used for languages of names. If someone viewed this Topic Map and told the Topic Maps application that she would only like to see topics with an English name, this scope on names is what the system use for determining what to display.

Finally, there are some green boxes that seems to hold relational information. Pay close attention to these; they are the backbone of the knowledge in a Topic Map. Note that each line coming or going from an association has a name, and these names are the roles of which the topics play in that association. The topic “mvw_007” obviously play the role of “work” in the association “Composition”. To find the composer, you obviously look to the role of “composer.”

If this still seems totally incomprehensible, this is a good time to panic. Get it over with and out of the system, because these things – albeit an abstract – is still just bits of data. They all – be it topics, associations or roles – represent … something. See, I told you there was some philosophy in there. Let’s go to BASIC.

10 LET $topic_001 = “Monteverdi”

20 LET $topic_002 = “Marian Vespers”

30 LET $association_001 = $topic_001 + ” : ” + $topic_002

Here we have a very – very! – basic Topic Maps representation written in BASIC. It doesn’t do much as such, but just like a variable in BASIC – a name that represent something – there are symbols that resolve to values. A 1-to-1 scenario could be our association at line 30. A number of functions can hence be written to perform various tasks based on these topics and associations. For more complex Topic Maps representation, however, we might turn to other computer languages more suited to the task, especially those that support object-oriented programming. Like PHP.

Class Topic {

Var $name = ” ;

Function Topic ( $name ) { $this.name = $name ; }

}

Class TwoWayAssociation {

$members = Array() ;

Function TwoWayAssociation ( $a1, $a2 ) {

$members[] = $a1 ;

$members[] = $a2 ;

}

}

$composer_001 = new Topic ( “Monteverdi” ) ;

$work_006 = new Topic ( “Marian Vespers” ) ;

$an_association = new TwoWayAssociation ( &$composer_001, &$work_006 ) ;

These are simple examples indeed, to prove a very simple point; Topic Maps are not as complicated nor difficult as many believe. There are objects that points to objects that points to objects. Objects are of certain types, and given a certain number of types behaving in a certain way, we call that a Topic Map. In a Topic Map we have three basic objects called Topics, Occurrences and Associations. Further we extend these with Names, Scopes, Roles and other specialised assetions; there we have an object model. Then we apply a number of rules to that object model (both in the sense of what objects to have what properties and links), and we have a Topic Map. Then a set of further rules to cooperate two or more Topic Maps together, and we have the Topic Maps we know of and practice today.

To lull you back into the safety of RDBMS for a small fraction of a second, think of the topics in our drawing above as rows in a table called “topics”, and the green associations in another table called “associations”. Relax with another table called “topics_associations_lookup”. It’s all data. Just data. But hang on; we don’t want things in a table, we want our nodes of information to be a lot freer than that. We want to be able to attach any association to it without worrying about the flexibility of the “data model.” We want to search our graph of nodes in clever ways, not top-down, because when you look at it, information isn’t top-down; it is associative. The abstraction level of the RDBMS works well for records, but not all that well for information that should – in some mysterious way – resemble human cognition.

So take note; the rules we apply through table definitions and the functional SQL shapes the same thing; data objects and rules together to be called Topic Maps, as long as these definitions and SQL statements comply with the Topic Maps standard. And that’s all there is to it.

Think of the topics and associations as stand-alone objects that have small bits of data attached to them. For those who are in the deep end of object-oriented programming may think “Hey, wait a minute! This is nothing but a tree structure with various properties attached to the nodes.” And you would be absolutely correct in that. That is what it is, with certain names and labels attached in clever ways. No magic. No smoke and mirrors. Just a nice little data model with some rather clever ideas and rules through it. Welcome to Topic Maps.

The end

Well, well, we’re at the end of the essay, and I hope I’ve given you some taste of what Topic Maps are and how they can be applied, and a small grasp of the concepts involved. I realise that I haven’t solved any actual problem thorugh this essay, such as showing you how to actually build a CD collection software with Topic Maps to demonstrate its incredible oomph. Fear not; the tutorial in writing will be attacking this exact problem. Until then, Monteverdi’s “Leatatus Sum” signs off its rolling hills, and lulls me back into Venetian comfort.

Filed Under: Data Modelling, Semantic Web, Topic Maps

21 October, 2000 by Alex

Criticism, MARCXML, the culture of MARC, and the long difficult struggle to stay alive

[note : just realized I didn’t post this one, which was written quite some time ago. A nice filler to, er, fill with.]

Today I want to talk a little bit about criticism in general and within and of the library world, and just make a few points about the culture that permeates their standards and the view on technology in it. There’s a few things that needs to be said, both in the context of pushing the library world forward but also on how to understand me and why I’m doing this. I’m not a librarian in technical terms (I guess I just lost a few readers. Again. But please read on; be strong!), never was, never will be, so why am I even bothering?

Before I get started on that, though, let’s refresh ourselves with my criticism of MARCXML of a few days ago;

[Whoever deals with MARCXML] waste most of their time trying to figure out why the hell someone came up with this evil way of making your brain melt. Well, obviously, if your brain melts, it’s evil, but there is something so anti-XML about the way MARCXML was designed I’m starting to wonder.

Do you know what’s glaringly missing from the above quote?

Oh, crap!

Expletives! Now, I know a lot of people – especially proper folks – will “ignore my message” if I throw expletives into my prose. And certainly, if all I did was swear and use foul language right, left and center I’d ignore me, too. Don’t get me wrong; I hardly ever use bad language, I don’t approve of it under normal circumstances, and I certainly wouldn’t subject myself or others to filthy disposition or discourse — unless, of course, it brought something to the topic at hand, or – more to the point – the discussion itself.

There have been times when soothing and lulling words have been repeated so often they mean nothing anymore, when the truth told in goody words no longer shake people and wake them up from their fluffy dream-world where everything is fluffy, white and wonderful. A well-meaning “shit” can have an awakening effect, and when you mix in my own passion for the subject, it’s hard to filter out what might be considered uncouth. So, I said “shit” on the NGC4LIB mailing-list a couple of times, and perhaps I called a few “bullshits”, said that stuff was “crap”, nothing big, really.

Now tell me, is this really words that would throw off my whole spiel, remove my opinion from the collective communicado flow, and pee in the well of truth? Are people – especially library people – so distanced from reality that when the language of the commons meet them (yes, I’m playing the elitist card here), they put their hands over their ears, and shout “don’t want to hear it!” while shaking their heads from side to side?

The reason I ask is not because I can’t control my filthy mouth. No, I ask because it’s the same kinda thing I wonder the truth does to them, that when I say “you must get out of the MARC conundrum NOW!” it comes across to them as “you gotta get your shit together man! Screw this MARC crap!” — in other words, the truth of what me and lots of others are saying comes across as the equivalent of foul language, and thus ignored? No matter if it is filthy language or not, it comes across that way?

I don’t actually believe this is so, of course. I know there’s lots of librarians who understand most of these issues, but really, they can’t do much about it, and feel like there’s no point in raising their voices and rocking the boat. And so the boat silently sinks.

Criticism

Criticism is mostly about rocking the boat. Sure, there’s positive criticism, like “you’re not ugly, just beautiful-impaired!”, but aren’t we over this silly overly political correctness by now? Criticism is to tell it straight, that what someone else has done is not up to scratch, that surely there must be some improvement that could be done. But the library world don’t work like that. Criticism in the library world uses a different word; approval. I know, I know, sounds ludicrous, and if you’re a librarian yourself you are right to call “bullshit” (although, of course you don’t), but most of the crap that comes out of the library world comes out as a de facto standard because not enough librarians have stood up and called it crap. When no one calls it crap, the proposals become real. Sometimes that’s ok, but other times it’s outright scary. And then there’s times when what comes out is threatening to ruin a whole sector with its poison, all in the name of not standing up, afraid of rocking the boat.

One such poison is of course MARCXML as mentioned earlier, where the very notion of XML in it is pure unrefined anti-XML evil. Another is EAD, an XML standard for Encoded Archival Description (yeah, you try to figure out what it is with a name like that) where the general idea is a good one (digital description of your archives with a focus on non-bibliographic materials), but the actual output is terrible, littered with poor XML use, poor data modeling, and, as usual, seeped in archaic library terms and conditions. How can I dig into the details and criticize it in a positive way, as they all claim you should?

Positive criticism is hailed over anything else. If you don’t have anything good to say, then don’t say it. But after you’ve said the good bits, why are we so shy to deliver the bad bits? Are we then to wrap all bad criticism up in good wrapping so as, um not to hurt people? Is that what it comes down to, sugaring our salt?

Haste

Ok, normally I wouldn’t question these things. I’m not stupid, and understand that positive criticism is perhaps the best way forward, especially if you are ever to have lunch with the people you criticize at some later point. But there’s a catch; it makes progress painfully slow, if progress happens at all.

You see, if I pamper you with good words about something that stinks, and say it smells like flowers and perhaps need a tiny extra fragrance to make it better, you’re not removing that stink; you’re adding to it. There are of course a million ways of doing this, and yes, there are many ways in which you can pad your blows and make it easier to a) deliver the goods, and b) be listened to. I understand all this. But I don’t have the time, and neither do you.

Sorry to say, but we don’t have time to dick about with niceties anymore. I started my MARC cultural experiences over 5 years ago, and nothing good has happened in the field of “saving the library world from MARC” in those years. It was already in peril back then, with a plethora of MARC standards (who would have thought that one standard in reality was more like 20 standards, all almost the same but with local tweaks?) Sure, the odd experiment and the rare project or prototype has popped up from time to time, but in reality, there’s nothing out there that has the potential to pull off a rescue mission of all those inflicted with MARC. You can throw your FRBR and RDA some other direction, because they do not have vendor nor library infrastructure support on a larger scale. Heck, you all talk about it, but no one is actually really doing it. Apart from that odd prototype, of course, shiny and fresh as it sits in a corner waiting to be obsolete.

So let’s talk about timescales, because my statement of being in haste is far truer now than it was 5 years ago. I started gently back then, asking questions, prodding the shortcomings of a format to find out that there’s really nothing wrong with it per se. No, it was the virus named MARC that was causing the sickness I witnessed, a culturally dependent nauseating disease of rules, half-rules, standards, chaos, vendor bingo, conveniences, myth and magic. The format was and is only a carrier of the disease itself.

MARC came about in the 60s and 70s and was great! Truly awsome! It kept going through the 80s when one should have started to look for alternatives, but because it was still a Good Thing (TM), it just kept going and growing. The 90s and the dawn of the Internet hit us. Still going. 95-96 saw a tremendous explotion in Internet activity, and it was around these times that most larger libraries thought more seriously about their online presence.

Here’s the thing; at that time, no one thought that this “new” fandangled network that was going to dominate all future computing, commerce and communication was something to take advantage of in terms of library infrastructure? Or let me put it this way; how many Z39.50 implementations have happened between 1995 and 2008? The answer should be your new battlecry.

Now, let’s quickly check yesterdays status (ie. roughly before 1995). I read with my retrospecticles that when the world (ie. Internet folks of various kinds) wanted bibliographic meta data, or, heck, advice on any kind of meta data, they turned to the one institution that had yonks and buckets of experience in the field; the library world. Dublin Core was one of those fantastic things that came out of that “world leader in meta data” role you had back then, a simple start to meta data description that now has fallen into obscurity and disuse because it was never extended (at least not in the way that most people on the net needed it) and later drowned by librarian lingo and committee orgies.

The present situation is rather different. The world has stopped asking librarians for advice on these things, and I can think of a number of reasons why;

  • The library world didn’t keep up: The technology and the Internet developed far too fast for the librarians to keep pace, and they went from early adopters and sometimes even innovators, to hanging on the the long tail for hard life.
  • The world also knows about meta data: Yes, as crazy as it sounds, not only librarians know about how to deal with meta data, and as IT professionals all over the world understand and can develop for the problem better, there are bound to be knowledge and initiatives that are not based in library traditions.
  • Snobbish library attitude to anything non-academic: This one doesn’t apply to all librarians of course, but there is a “we” and “them” mentality from the ranks of academic librarians. Are you an IT guy who stumbled into their world? Mate, you’re a second-class citizen. Unless you’ve got your library school masters of some kind, you will never pull punches in their world, and there will always be an invisible force-field between librarians and everyone else. And librarians: You can protest all you like, but this is a problem I’ve talked about lots before, this is not just my opinion, and something you really need to take seriously.
  • Library world business models that are uncompatible with the future: Most libraries have one business model, which is collect stuff, make it available for “everyone”, and get a yearly budget from local or national government to do so. There’s a couple of exceptions which I must talk about in the next section about business models.

But before we get to the business juice we need to wrap up why I think we’re in haste, and the keyword(s) here is “internet time.” Library world time is totally and utterly incompatible with internet time. The library world is a safe steady generic march in the direction of the internet long-tail, while internet time wooshes past in a scary pace, slow and fast at the same time as a teenager freaking out in their first driving lesson, without direction, all the while the police sits by wondering if to write out a ticket or just roll around laughing. or both.

The library world is based on those steady calm majesticly paced movements, and while that has worked for hundreds of years, all of a sudden something remarkable happens; information now turns digital. And the world just exploded with possibilities. The library world were in this field actually an early adopter of computers and software but only to the point of applying them to solve a mostly analog problem; paper books. So while the rest of the world went forward, libraries stuck to their books. And these days, who you gonna ask for advice on how to manage meta data for your eBooks?

We’re in haste because the library world is missing out on more and more opportunities of being a real player in the meta data playing field. Now both Amazon and Google have APIs that give you want the library world have refused to give the world for a long time, with the added advantage for Google and Amazon that they evolve, they’re adding more and better data to their APIs, and they play it open.

But maybe I’m wrong; maybe we aren’t in haste because the library world already have lost. There is no more ground to cover, unless you want to bicker about the proper LCSH to use for what book, but last time I check this wasn’t a solved problem within the library world either.

Business models

I’ll mainly focus on OCLC which I’ll quote a bit from the about section;

“Founded in 1967, OCLC Online Computer Library Center is a nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the world’s information and reducing the rate of rise of library costs. More than 69,000 libraries in 112 countries and territories around the world use OCLC services to locate, acquire, catalog, lend and preserve library materials.” (my emphasis)

OCLC is one of the most important players in the library world today. Forget puny Library of Congress, or any of the many national libraries. If OCLC says we’re gonna burn censored books at the stake, most libraries will follow (and for non-librarians, the translation of that statement is roughly that if Republicans say that you should vote for them because Palin is a female, you all go and do so). They’re powerful, and people around the world listen to them very carefully. But there’s a snag.

Libraries around the world pay good money to be members of OCLC. Basically, the business model of the most influencial (and dare I say most powerful) library organisation in the world is one totally disjointed from the normal governmentally funded library model. The implications here are huge; where taxpayers probably would be better off with having all the meta data freed for all to use (that’s what you pay for, right? And that would enable innovation, right?), OCLC must keep them locked down in order to stay alive.

Ok, maybe I’m being overly pessimistic, as indeed OCLC are slowly opening up their meta data repositories, but it’s going soooo sloooow! It was argued before I entered the library world, it was argued during it, and has been argued ever since. All the while we’re waiting, more opportunities are being missed. And I’m not saying this lightly. Why do you think Google made their book API available? Fed up waiting for the library world to do it, that’s why. And when they do it, consider it a lost battle no matter the quality of the meta data; Google has more data and seriously wicked hackers than the whole library world combined could ever dream of, and they play it open, and they play it with as much dedication and passion as librarians themselves, and they will kick your arse at your own game.

What other opportunities are we missing out on while we sit here and talk about the fact that we’ve got problems?

 [note : Funny to see how things went with the OCLC debacle after this post was originally written.]

Filed Under: Uncategorized

  • « Previous Page
  • 1
  • …
  • 11
  • 12
  • 13

Recent Posts

  • Another bob
  • Another tidbits
  • Philosophical and religious matters
  • Do libraries understand the future? Or how to get there?
  • Before I write what I write before the next time I write

Archives

  • April 2010
  • March 2010
  • February 2010
  • December 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • January 2009
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • September 2007
  • August 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • May 2006
  • March 2006
  • October 2000

Categories

  • Alsa
  • Application Design
  • Australia
  • Baroque Music
  • Biology
  • Blogging
  • Book Review
  • Business
  • Canberra
  • Chemistry
  • Coaching
  • Communication
  • Conceptual Models
  • Conference
  • Context
  • Cooperation
  • Cosmology
  • Creationism
  • Crows
  • Cute
  • Data Modelling
  • Data Models
  • Debt
  • Dune
  • Ead
  • Ecology
  • Elegant Code
  • Emnekart
  • Environmentalism
  • Everything
  • Evolution
  • Family
  • Film
  • Food
  • Frameworks
  • Fstl
  • Future
  • General
  • General Life
  • Globalism
  • Grace
  • Happiness
  • Harmonica
  • Holidays
  • Humanity
  • Ia
  • India
  • Indiana Jones
  • Intelligence
  • Java
  • Jobs
  • Juggling
  • Kiama
  • Kids
  • Knowledge Representation
  • Kuala Lumpur
  • Language
  • Laptop
  • Leipzig
  • Library
  • Life
  • Life Lessons
  • Linux
  • Localivore
  • Lucas
  • Marcxml
  • Misc
  • Monteverdi
  • Mood
  • Movies
  • Music
  • Music Production
  • Norway
  • Ontology
  • Ooxml
  • Open Source
  • Oslo
  • Oslo Domkor
  • Oss
  • Philosophy
  • Php
  • Planning
  • Programming
  • Programming Languages
  • Proud
  • Rdbms
  • Real Estate
  • Rental
  • Rest
  • Richard Dawkins
  • Salut Baroque
  • Sam
  • School Closures
  • Semantic Web
  • Semantics
  • Soa
  • Soa Roa Woa Rest Soap Ws Architecture
  • Sound
  • Spielberg
  • Status
  • Stupidity
  • Systems Thinking
  • Talk
  • Technology
  • Terje Kvam
  • Test Driven Development
  • Tidbits
  • Tmra
  • Tmra 2008
  • Topic Maps
  • Ubuntu
  • Ubuntu 9.04
  • Ucd
  • Uncategorized
  • Universe
  • Ux
  • Vista
  • Wollongong
  • Work
  • Working From Home
  • Xml
Copyright © 2021 Shelter.nu