9 April 2010

Do libraries understand the future? Or how to get there?

I can be quite harsh when stating my opinions, and I often feel I get on people's toes a lot through doing so. But it's not my fault; I blame my Norwegian upbringing where we tend to state things as they are, and then we discuss things until the cows come home, but we do it with complete respect of the other side. You might say polite bickering and being tempered yet pigheaded about things is a national favorite past-time, and then some of us went on to the pigheaded Olympics and cleaned the tables and set the whole thing on fire. And then we debated with the smoldering rubble some more.

My past is littered with opinion pieces on all things library, from the culture, its place in society, its technology and direction, and I've written both here on this blog and on various mailing-lists, as well as hold presentation live in various forums and conferences. I'm a prolific library-bitcher, you might say, as most of what I say ain't necessarily wonderful praise and kissing the boots of those who make library decisions.


Today I won't bitch, though. It still may not be safe for work, but I won't say how stupid the library world has been in missing opportunities, how they've misplayed the technology ball, how they've been blind to keeping up societal appearances, how they've lost their political clout, the lack of philosophical standing or progress, how current management streams are drying up the drive and kill the courage they once had, et cetera. No, that's all been said before, and I won't even mention it, not a single word, not me, not today, uh-uh, nosirree.

The reason I won't bitch about these things might be interesting to some; I don't really care anymore. I know that some of you out there who know me think I'm engaging in hyperbolic crud, because normally such a statement would grate against my ideals of love, peace and bibliophilia, but it is true; I don't care what happens to the library anymore, and this is a good thing; a thing not worth breaking is easier to shape and change. My passion for all things library have shifted.

My real passions

I have a bunch of passions, from music and movies, to science and education, from software development and technology, all the way to epistemology and philosophy. And the ideals of my library world certainly bump into or totally encompasses some of my passions, but the library ideal is one that needs to be explained not from a contextual or cultural point (also known as the status quo), but from that of the future and values that crosses over from the past and into it (also known as, err, "the future").

These days the most marked of my passions is the one of science and education. I will extend myself well beyond my borders for people who genuinely wish to know more, that wants to study, to find things out. I've been an inquisitive pain in the arse my whole life, an entrepreneur and inventor, someone who just can't leave something that works alone until I understand how it works (even if that means I'll break the darn thing in the process, a little bit like my library career). Sharing my passion for knowing is what I, as a human being, love the most.

In the past, my passion would have had a good corollary to books, because as we know the past is riddled with books and parchments as the epitome of knowledge and information. If you were to become someone important, you had books. If you were someone important, you've written one or at least starred in one. Libraries formed wherever power dwelled. But then stupid humans and their silly ideals of freedom and unity and all that crazy stuff invented something new; books and information to the people for free. Well, that's at least the ideals going from late 17th century of the western world, slowly creeping into the world and changing it more ways than the history books gives them credit.

That is a piece of history I simply adore and love, feeding my ideals and passion for protecting it, furthering it, pushing that same agenda.

But then we have the now, where most important discourse and information is still somewhat found in books. But only just; a new crazy idea came along not that long ago, the idea of making things digital and hook them together in networks across the globe. The reason we've still got those darn libraries is because the long tail still points into them, into books and journals. But what when they don't anymore?

Library : A definition

Books. And then "modern libraries are increasingly being redefined as places to get unrestricted access to information in many formats and from many sources. They are understood as extending beyond the physical walls of a building, by including material accessible by electronic means, and by providing the assistance of librarians in navigating and analyzing tremendous amounts of knowledge with a variety of digital tools."

Is your library like this? Does it do these things? How does it do it? I want to talk a bit about these things, because, frankly, they are together my list of the most important things in regards to all things library;
  1. The potential of what the library can do and offer
  2. The promises made
  3. The actual offer
  4. Plans for the future
I've written tons and tons on the potential I see in the library before, whined about the promises made and how they stack up (and how in the end made me resign from it all together), and about what drives the machinery and direction. I've written about this so much I've forgotten links and most of what I've said there. But I won't go hunting for it anymore. I mean, what's the point? And I wrote at the beginning that I wouldn't whine about it here, so let's talk about the future instead. The future is now!

This post will be more or less on the future of the library world, or, more prosaically, how the library world enters into it. They've got three choices; enter into it physically by going there, enter into it by planning for it, and enter into it by shaping it.

Boldly going there

This is the easy one, and the default one. You just go there, or, because all of time is encapsulated in the one dimension of non-spatial reality, you just sit there and let the future huddle around you. No, this isn't as zen as it sounds, it's really what most people do, go about their business, where "going about your business" usually means doing things without thinking too much, letting everybody else shape your future.

Now, as a preamble I don't actually have that ability, so it's hard for me to sympathize much with it. I can see how it works, but for me I find it hard to let that become some excuse for not improving and doing better. I'm one of those where even buying lunch is a stream of philosophical implications (how far removed from my ancestors is it ok for me to eat them?), and walk on the beach becomes an internal monologue about how it got there and the eco-system of it (narrated in a faux Sir Attenborough voice), or where my daily work is a constant bombardement of ideas and thoughts of how to do things better, how to improve that which is not quite right (I don't believe in the silly 'if it works, don't fix it' mentality which has spread across this planet like a virus), pondering language and linguistics and neuroscience even when making variable names or looping through a hashmap. Whatever I do at any given time is not a compartment, a category of activities, a single unit of a constrained domain; it's all connected, from the micro to the macro. I'm just an insignificant extension of the cosmos, trying to make things better. I'm evolution.

So, standing still doesn't work for me, but I hear lots of other things on this planet does it, and does it well. But I think that's a bit deceitful; nothing stands still, not even rocks. Nothing stays the same, give it enough time. In a few thousand years, my favorite rock around where I live will become just a grain of sand on the beach. So by this analogy, I don't think even the library world stands still. Heck, we know it doesn't. We all know that little by little, even the most conservative bastion of all things that are meant to stand still in the universe, slowly creeps and crawls their way towards some distant and different place.

So this is all about the scale of time and our place in it. Let's talk about the human scale of time, the time it takes for a generation to react to the former generation. Rocks on this scale are terribly, terribly, utterly mind-bogglingly slow. A rock from generation to generation is so slow they are practically eternal, and indeed, for much of human history this has been the default position. It wasn't until mid 19th century that we figured out that rocks were so slow that we needed to invent a brand new scale to make any sense of it, the geological time, spanning 4 billion years. Compared to a mere generation of humans, that's practically forever.

But the library world on our human generational scale is comparatively damn slow still. The building they built last generation is still there, doing pretty much the same things. The librarians in there are doing practically the same things as the last generation. The only two ways it from time to time have gathered speed is by 1) planning ahead, and 2) having a paradigm thrust upon it from the outside.

Boldly planning for it

The library world have never really had a great need for planning for the future, at least not the more organized types of libraries we've had the last 200 years or so. The world of knowledge and the written word didn't really moved much since movable type, so even if some things changed here and there they had plenty of time to get their heads around it. What is 10-15 years of thinking and tinkering with a problem that has a 100 year span?

Nothing. It's perfectly alright to spend time getting it right when the problem is, on our normal human scale, slow. But what happens when the problem isn't just fast, but changes the human culture in which our scale is rooted?

Enter the digital age. It all started with computers becoming common, not only in places that had the resources to buy expensive and complex computers, but more so when they become cheap enough to go into any home. The age of ZX80/81 / Spectrum 16/48k / Commodore 64 / Amiga / Macintosh II / BBC Micro (all cheaper home computers) changed the world as we know it, probably far more than we give it credit. I was a Spectrum 48K owner myself, living in a country dominated by Commodore 64's. So what does a 10 year old kid who wants to play games on his computer do when there are no games around to buy? He has to make them himself, and sealed his destiny and became a geek, but perhaps a bit more importantly to this conversation, I became a librarian through the process of reading, borrowing, sharing and researching the written materials (remember, no internet in those days) otherwise my programs wouldn't work and my insatiable lust for making the darn thing work would die. Luckily for me, the addiction was like cocaine, but instead of ruining my life I became a computer literate.

When you become computer literate, the world looks very different to you. Problems everywhere become programming tasks, creating a small sliver of interfacing between the digital and the real world in the process. This sliver has since grown rather large, encompassing most of society, not a trivial feat in so few years. Even in the library world it has crept in and helped out. But we need to look at what it helped out with;

Cataloging. Searching said catalog. Bookkeeping. Writing reports. Did I forget anything?

I'm not really trying to be snarky here, and of course I know computers do more than that at the library, however, when we're trying to look at the present of what they actually do, I don't think I'm all that far fetched. There's the odd interesting project, some application running on a (secondary) server somewhere, maybe a new GUI into the catalog, or maybe some exhibition website, or maybe some self-serving library-card database thingy. But seriously, it's not like the computer systems in the average (or hip and cool) library are doing anything amazing (but please point me to them if they exist! Nothing would be cooler!). It's all pretty ... well, average. Standard stuff. Even if you blog and do Wiki's as part of your communication, you're not above average. You are average.

To get out of being average when you need to be great (more on this later) you need to plan to become great. You need to come up with some activities and goals in order to move faster than not moving at all. But how does the library stack up to this?

Of course they are planning. Every day is another plan. But we need to discern between planning for tomorrow and planning for the future. Tomorrow is just around the corner, and that, truly, is just planning to stay up to date, keeping up appearances, to plan for being relevant to what's going on right now. Putting up a Wiki or starting to blog or even putting together a prototype of a search engine that presents records in an FRBR manner, or creating a process and system that streamlines ILL with digital copies and distributes them in a copyright-enabled fashion, or even a backend system that convert MARC records into RDF and spread them across a clustered system of servers in Linked Data fashion complete with cool URI's and ontologies to work with the data, that ain't planning for the future! Maybe you're about to start a project that acts as a portal for information, a collection-point, or a federated search point, or a dynamic system for understanding user requests and dispatch semantic contextual networks to semantic engines that convert them into knowledge nuggets and present them to researchers, you're still not planning for the future. These things are the least you should do, but these things are not the future, it's only today and tomorrow.

Planning now to do something funky within the next year or so is not planning for the future. So then, what is the future? And by "future" I mean to actually have one.

Boldly shaping it

You need to shape it. You need to look into your crystal balls, and determine what the future should hold. No, don't look into the ball to look for what the future holds, that path leads to stupidity, and, well, it doesn't work. No, you must insert yourself into the fabric of modern development far more than you normally have, you must reach out and not only point to the future, but invent it!

I was extremely happy to see Jessamyn join BoingBoing as a guest writer. That's a perfect example what needs to happen, a high-class librarian writing for a high-class blog, about all things weird and wonderful, reaching out with a subtle librarian view of of the world. (My favorite post!) However, even after you've immersed yourself in what's going on in society or even try to shape bits of it by your very existence, there's some bigger issues we haven't even dared go to yet. Well, let's ;

The need for conserving the libraries isn't the need for conserving the houses, or the books, or even the library cultural spot in its society (which are all good reasons, mind you). It's not to keep certain people in jobs, nor is it to keep the services alive. No, it's to preserve the librarian ideal. The librarian profession is not worth keeping if its ideals aren't in tune with reality, and I can point to the thousands of professions through the ages who have died when those parts of society it was attached to, died off. There's a certain notion of librarian philosophy that I'd like to talk about ;

"When the gulf between theory and practice in librarianship is discussed generally two themes emerge, which are that theorizing about librarianship is mostly non-existent and, when such theorizing exists at all, it is largely irrelevant to library practice."

I'm sometimes inclined to say that the reason the library world is in trouble is in the above quoted paragraph. Ok, so it's easy to see the gap between the mostly non-existent library philosophy, but we must remember and let it sink in that philosophy is defined as the action of doing philosophy, not as an archive or to think of it as history, or even use old thinking as if it applies to the now, or, heavens forbid, the future.

Library philosophy needs to happen, at least a hell of a lot more, and it needs to be a bigger part of what libraries do, it should include all librarians, it should be part of the fabric of librarianship. You need to ponder epistemological implications of digital identity, to think through the notion of copyright for the greater good, or your academic standing in an academic world that's moving to semantic networks, the loss of the bibliocentric view and the impact on collection management, or the systemic notion of semantic knowledge networks. Or you need to find out what fragmented semantic contexts offer knowledge management, or how the iPad will influence citing and sharing of notes, how to address those notes themselves as they are often more valuable than the original text (and lots of bloggers know this quite well already). Or, perhaps even more importantly, you need to establish an ethical guidebook to global knowledge management, models for information distribution and wealth, or ontological analysis of human and non-human identity. You need to re-think what those ideals are in order to preserve them. Only then can you shape any future worth having.

All those meetings and planning of cool projects you do? It's all fluff and nonsense unless there's some serious philosophies to back them up, new ideas, visions of what the future might be, and certainly visions that's based on the library ideals worth keeping. In the absence of philosophy there will be the status quo. And that's worse than standing still, even if you look ever so handsome and your moldy paper smells oh so good.

Here's a relevant quote from my distant past ; 'I'm still in love with the library ideals and concepts. I still love books. And maps. And old pictures. And just surfing the catalog. Or snooping in the newspaper reels. Or finding a microfilm, wondering what's on it, what it means, and who did it. Even subject headings and its contextual meaning. I love catalogers. And I love librarians. I just don't love what we're collectively doing with the concept of "library."' And I should add; I don't love the lack of philosophy, or the lack of shaping the future.

Let's think a bit more seriously about this. Let us philosophy!

But the question in todays post was if the library world understand the future? My assertion is, no, they don't. They understand it's coming, they understand it will involve technology, and that books will be less and less important, they understand that they need to have cool projects (and by 'cool' I'm happy to settle for just 'relevant') and to keep that up, and that they need to accommodate the onslaught of the digital impact. They understand all these things, because they are close to them. These things are on their scale, they understand these things because they deal with it every day.

But deep thinking? Good luck.


29 September 2009

Library Pontifications

Once in a while I get some email from people who ask me some questions or ask me to clarify something I've said in some setting. The other day I ranted on the NGC4LIB (Next-generation catalog 4 libraries) mailing-list about, uh, something or other. And I got email, which I answered, but since I got no reply I'm posting it here in a blog-edited form so that it doesn't go to waste ;
I think I am starting to understand your rants against the culture of MARC, and I'd probably feel offended if I knew what all of the above meant.
Hmm. Well, it wasn't meant to offend anyone. I guess if people thought they were hardcore into persistent identity management, then maybe they would feel I've either overlooked their hard work or don't think what they're doing is the right kind, or something.

I usually have two goals with my "rants"; 1. flush out those who already are on the right track, and make them more vocal and visible, and 2. if no one is on the right track, inspire people in the library world to at least have a look at it. I can do this because I have no vested interest in the library world as such; I cannot lose my library job as I'm not working for a library. :)
Naturally, to feel outside of the mainstream creates a crisis of confidence in one's abilities. What does it mean these days to say that one is a cataloger or that one works in tech services, and is it perceived as a joke for those on the outside? Oh yeah...they still produce cards. What do they know about databases?
Librarians are from the outside an incredible gifted bunch of people who knows what they're doing, they have granted powers outside the realm of normal people (including professionals like software developers, believe it or not), and they know stuff we normal folks don't.

However, having been on the inside you get to glimpse the reality of an underfunded, underprioritized sub-culture of society who knows as little about the "real-world" as normal folks know of the library world. There is a great divide between them, and very little has been done to open up. The blame for this I put squarely on the library world (as the real-world is, well, real and out there) who for many years have demanded a library degree even for software development positions, and when we finally get there we are treated as second-class citizens because we don't have that mark of librarianship that comes from library school. It's a bizarre thing, really, and perhaps the most damaging one you've got, this notion of librarians must have a library degree, as if normal people will never understand the beauty of why a 245 c is needed, or the secret of why shelves must be called stacks, and so on.

One thing that has got me very disillusioned about the library way is philosophy. I deliberately sought out the library as a place to work because I have a few passions mixed with my skills which I thought was a good match, and one of the strongest passions were epistemology. One would think that if there was one institutional string of places that could appreciate the finer details of epistemology, it would be the libraries and the people within. That's what they concern themselves with, no?

Err, no. No, they don't. There's the odd person that ponders how a OCLC number can verify some book's identity, but these are very plain boring questions of database management. Then along came FRBR which does not only dip its toes into epistemology, but outright talks about it! The authors of it clearly had knowledge and wisdom about such things. So, one would think there was hope. Like, when it came out in 1993. That's more than 15 years ago. And people still haven't got it. How much time do you reckon it's going to take, and more importantly, how many years until it's way too late?

But no, RDA comes out of the woodwork and proves once and for all that there is no hope of libraries ever taking the issues at a philosophical nor practical level. Let me explain this one, as it sits at the core of much of my "ranting."

FRBR defines work, expression, manifestation, item, and these are semi-philosophical definitions that we're supposed to attach semantics and knowledge to. There's primarily two ways to do that; define entities of knowledge, or create relationships between entities. (Note these two basic ways of doing knowledge management; entities and relationships, as they spring up in all areas of knowledge representation)

Now, can you without looking stuff up tell me the difference between a work and an expression? Or between manifestation and an item? Sure, we can discuss if this or that thing is an item or something else, back and forth, but is that a good foundation upon to lay all future library philosophy? Because that's just what it is; a philosophical model we use to make sense of the real world. FRBR is confusing, even if it is a great leap forward in epistemological thinking, for example when it comes down to identity management (persistent identifiers for one thing can be expressed through a multitude, like a proxy, which FRBR fails at miserably, for example) it is right there in the centre of it, but a lot of it focuses on the wrong part of it, the part that involves human cognition to make decisions about identity.

Anyway, I guess at this point all I'm trying to say is that there are glimpses of what I'm talking about in the library world, and I was attracted to it, I wanted to dedicate parts of my life to fixing a lot what was broken in the real-world. I came to the library because they are the shining beacon of light in our society.

So, what happened?
Which is why I am interested smarting up about some of these things. Where should one go for a decent but not mind-blowing introduction to the types of things you have described lately?
It's hard to say what will blow your mind, and what will not. But since you're a library type person I'm going to go out on a limb here,and assume you're a smart person. :) So, I'm going to assume that http://en.wikipedia.org/wiki/Epistemology won't blow your mind. So let's assume we're using the definition for "subject" as such ;
  • An area of knowledge, a topic, an area of interest or study
In terms of philosophy we usually expand that definition a bit wider (so it will also include most discourse and literature) but I'll try to keep it simple. First, a question?

"What does it mean that something
is something?"

This is the basic question for identity, that something exists and that we can talk and refer to it. Refering to things is a huge portion of what the library does, not only as an archive, but as a living institution where knowledge is harboured. We're talking about subjects put into systems, about being subject-centric in the way we deal with things. Just like our brains do.

Now, for me there's a few things that have happened the last 20-30 years. The world has become more and more knowledge centric (they've gone from "all knowledge are in books" to "knowledge can be found in many places", and the advent of computers and the internet plays no small part in that), while libraries have become more book specific, more focused on the collection part rather than what the collection actually harbours in terms of knowledge (and I suspect this is because there are no traditional tracks within the library world for technology), probably because it's easier and fits better into budget driven government run institutions.

However, this isn't beneficial to the knowledge management part. Libraries are moving steady towards being archives, but the world wants them to become knowledge specialists. Ouch. And so the libraries will be closed down when they
don't deliver knowledge. Archives is what Google does best, and they're not that bad at harbouring basic knowledge. What hope in hell have you got then?

I'm running out of time right now, but feel free to ask any question and point to any of my wrongs, and laugh at it as well; I need the discourse as much as (I hope) you do. Let me just quickly run through that list with comments and pointers ; [
editors note : this is a list of things I felt the library world 'have no clue about' from my mail to the mailing-list]
  • No idea about digital persistent identification.
What happens to identifiers when people stop maintaining them? They lose their semantic and intrinsic value, and become moot. How many libraries maintain their age old software? No, a more human, less technological means of resolving is needed, and when when the world went digital the choice of multiple identities became not only possible but inevitable. Yet, when the library world manages identities as OCLC / LOC record numbers at the item level, things go horribly wrong and you cannot take what you've defined and learned into the philosophical space. Even if the OCLC / LOC numbers are maintained till the end of the world, they do not solve basic epistemological problems.
  • No subject-centricity.
FRBR does actually provide some, but it is not focused on the epistemological problems, only one of identifying the problem of identification without providing a mechanism (real or philosophical) for doing so.
  • No understanding of semantics in data modeling.
The AARC2 / RDA world is, in some definition of the terms, a data model. And between entities in data models there are semantics, meaning the relationships themselves, their names, roles and thought purpose. But you have to understand, as a human, all of AARC2 / RDA to be able to model anything with it; there's no platform on which to stand, there's no atomic parts you can use to build molecules and then cells and then beings. The whole model is, in fact, a hobbled-together set of fields without structure (and no, numbering them is not a structure :), and without structure there's only rules. And rules without structure is only human-enforceable.
  • No clue about ontologies, inferencing, guides by analogy
This is a stab at what the Semantic Web people are doing. They have a long background from AI and knowledge management, and if you guys were at least on par with that group, there could be some better understanding of the issues. The SemWeb crowd understand a lot of first-order logic, inferencing, analogy, case-based reasoning, and so forth, all stuff you need to have computers understand a tad bit better how your data is hobbled together, how they all interact, how entities and relationships (remember those? :) are mapped.

I should of course make a note here that I think that the SemWeb efforts are mostly wrong, and that they could learn an awful lot from librarians in the way to deal with collections and access, but that's a different discourse for some other time. :)
  • no real knowledge about collection management ( ... wait for it ...) with multiple hooks and identities
I was actually hoping people would jump on this one, getting offended that I said they had no real knowledge of collection management (which is their forte, it is what they do!), but I guess either they saw the hook and line of *identities*, and jumped over it. Dang.

It's all about the identity of what you are collecting. Crikey, publishers haven't even got ISBN to work (how many times to I put in one ISBN to get a completely different book ...), and one would think that would provide hints to why this is hard, and perhaps what to do otherwise. Hmm.

-- end of mail except some more personal ramblings not fit for generic consumption --

Labels: ,

25 September 2008

MARCXML : Beast of burden

Lately I've been talking with librarians again. I left their den about 8 months ago and went a bit cool after that, needing some fresh air and to distance myself a bit from everything in that world. But, as I said, I've been lured back again by my own stupid notion to save humanity from itself through the channels the library world offers.

As much as I'm a fanboy of the library world, I'm also quite critical to library world thinking, the collective direction its heading and the way they del with probably their biggest challenge ever; their own survival when the book turns digital.

Today I'll rant a bit about a piece of technology that often is hailed as being the library worlds ticket into the modern techie world, a piece of the future solution, albeit with a few minor worts that could be sorted out. I don't agree; I think MARCXML is the plague, and I'm here to tell you why. First, here's how Library of Congress describes it;
framework for working with MARC data in a XML environment
First of all; framework? Framework suggests something more than a mere format, and yes, there's an XSLT sheet or two there that could convert MARCXML to HTML or somesuch. That's not a framework, that's a format with a few conversion scripts. Framework suggests tools I can use to get some juice, which is nowhere in sight.

Anyway, let's move on to the 8 main design goals or considerations, with my comments;

1. Simple and Flexible MARC XML Schema

The core of the MARC XML framework is a simple XML schema which contains MARC data. This base schema output can be used where full MARC records are needed or act as a "bus" to enable MARC data records to go through further transformations such as toDublin Core and/or processes such as validation. The MARC XML schema will not need to be edited to reflect minor changes to MARC21. The schema retains the semantics of MARC.

All control fields, including the leader are treated as a data string. Fields are treated as elements with the tag as an attribute and indicators treated as attributes. Subfields are treated as subelements with the subfield code as an attribute.
Oh, it's simple alright, in the same sense that a frog that sits in a pot of cold water that's slowly getting hotter to the boiling-point won't hop out to save himself, attributed to very simple neuron- and nerve-control over time (meaning, they're great at short-time tasks, but sucks if the time stretches out a bit). We're talking about mechanisms that are so simple you wonder how they didn't get outed in the evolution of things.

Let's start with "All control fields, including the leader are treated as a data string." Here's a quick example;
<leader>01142cam  2200301 a 4500</leader>
<controlfield tag="001"> 92005291 </controlfield>
<controlfield tag="003">DLC</controlfield>
<controlfield tag="005">19930521155141.9</controlfield>
<controlfield tag="008">920219s1993 caua j 000 0 eng </controlfield>
Not sure you can see it straight away, but they've here got reliance on whitespace being preserved in a format that had as a goal to get rid of reliance on whitespace. How's that for a good start? I'm not sure how many times this has bit me, as pretty much any and all XML tools out there will be whitespace-agnostic by default (meaning, they'll often reduce it). In order to use MARCXML properly you have to change the whitespace options in pretty much all your tools, if they allow you to.

Next up, if you go to lengths to create an XML schema you should already be aware that semantic meta data becomes part of your names and fields (and I'll get back to this point a lot, really). Sure it's a quick and dirty way to get your XML chops started, but is it wise to do this?
<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">Arithmetic /</subfield>
I'll translate what this does for you;
The MARC tag 245 means "title statement", and the code "a" means, uh, title. This perticular madness comes from the culture of MARC itself which I'll rant about some other time (and have in the past), so I'll try to stick to the pure XML part of it ;

What were you thinking? That 245 is easier to remember than "title"? Hardly. Perhaps the international side is more convincing, that 245 is easier to remember for those who wants a title in Norwegian ("tittel")? I seriously can't think of any other format that does it this way, and it doesn't seem to have stopped the success of other formats in the world. No, this particular thing has all to do with the fact that MARCXML isn't as much XML as it is MARC; it's really MARC with a bad hairdo, showing a thinking that as long as we can just claim it has some affiliation with XML then we're hip and cool and we're drinking the new techie XML kool-aid.

And this is the by far biggest problem with MARCXML; it thinks it is XML, but it really isn't, which leads to all sorts of unfortunate situations, like ;
  • Librarians are fooled into thinking their meta data is ready for an increasingly XMLish world
  • Librarians think they can throw XML tools and programmers at it with ease
  • Librarians think they get all the XML goodies and benefits
Let's run through these;

Librarians are fooled into thinking their meta data is ready for an increasingly XMLish world

There's not much these days that hasn't got some anchoring in XML technology. I don't need to go into details to all the XML technology used to even write and publish this little blog post. But when your MARCXML isn't real XML, all the XML technology in the world is rendered useless for you.

Let me try to clarify this as simply as I can, through the use of XPath (an XML query language used pretty much anywhere there is XML technology). Here's what I would write if the XML is real;
And here is what I have to do with MARCXML;
It really isn't optimized for computerized fetching or indexing, and what's more important is this; Notice the tree-structure of the former example, and the lack of obvious structure in the latter. Let's talk about structure, because, frankly, if you aren't then you shouldn't use XML.

We humans have a good sense of structure. Our brains are great at categorization, we do it all the time, break things into category prototypes and derivatives to gather some kind of meaning. A tree-structure is the closest and easiest structure that binds humans and computers together, in the sense that trees are easy for a computer to work with, and easy for a human to understand. (We humans have a natural knack for prototypes and graphs [not the presentation slide kind] that I've talked about earlier, which we shouldn't misinterpret here)

With these faux but useful tree-structures comes mediation between man and computer, a way to advance us further. Take note, because this is an understated and overlooked benefit of XML over any binary (or XML wannabe) format out there. And none of these benefits can you find in MARCXML because there's only two levels involved; field and sub-field. it's, in fact, rather flat and with non-semantic names. Can you get any further from the reasons XML was created?

Librarians think they can throw XML tools and programmers at it with ease

No you can't. Your XML is bad, and XML tools and programmers are going to struggle with your XML. They'll waste most of their time trying to figure out why the hell someone came up with this evil way of making your brain melt. Well, obviously, if your brain melts, it's evil, but there is something so anti-XML about the way MARCXML was designed I'm starting to wonder.

There's probably a ton of tools out there that deals great with XML, but not a single tool (at least in the mainstream) that has ever heard of MARCXML, and even when you throw the MARCXML Schema at them it does them little to no good. You still need domain experts to do anything with it, you still need special knowledge to move around it, and you get absolutely nothing for free in the lack of typed data and semantically rich markup.

Librarians think they get all the XML goodies and benefits

XML comes with a host of good stuff, like xml:id and xml:idrefs attributes that lots of tools understand (including XSLT), in-build language support, extensibility through namespaces, mixed content models, character encoding rules and guarantees, Unicode (for the most part), and when you think of all the XML technologies out there who already adhere and use these benefits to create a complete development universe, who's missing out on all of this?

2. Lossless Conversion of MARC to XML

3. Roundtripability from XML back to MARC

Both of these are the same; we're not using any of the goodness of XML, we're pretty much MARC in a small XML wrapper, so we can easily convert back and forth from MARC and MARCXML. But conversions between XML schemas isn't in scope, so as long as you're working in your own little non-shared universe you're good to go, but life sucks if you dare step out of it.

4. Data Presentation

Once MARC data has been converted to XML, data presentation is possible by writing a XML stylesheet to select the MARC elements to be displayed and to apply the appropriate markup.

This must be part of that "framework" they're talking about but, um, you can present MARC elements and records with or without XML, and converting it into something else in the first place denotes that you can do "stuff" with it. This point is mere fluff.

5. MARC Editing

Some single or batch updates such as adding, updating, or deleting a field to a MARC record can be accomplished with simple XML transformations
Ugh, more fluff. This is basically saying "you can do stuff with it. Do it yourself."

6. Data Conversion

Most data conversions can be written as XML transformations. For more complex transformations of the data, software tools which read MARC XML can be written.
And yet more fluff, saying the same "you can do stuff with it. Do it yourself."

7. Validation of MARC data

Validation with this schema is accomplished via a software tool. This software, external to the schema, will provide three possible levels of validation:
* Basic XML validation according to the MARC XML Schema
* Validation of MARC21 tagging (field and subfield)
* Validation of MARC record content, e.g., coded values, dates, and times.
Now it's getting crazy. First, "basic validation according to MARC XML Schema" means you can make sure that the XML document hasn't got more than 5 elements, the right set of very few attributes, and that's it. Basically, the advantage you get here is to make sure that the crappy structure of MARCXML is preserved and valid. Goody.

Secondly, validation of tagging doesn't exists! What they really mean is that the formatting in the tagging attributes are according to certain character-based rules, that the type (which is extremely loose) is correct. Tagging, you may ask. No, not tagging (which would be useful), but the MARC tags which comes in the absolute number of 999 and are, of course, all numbers. And the validation doesn't even adhere to the type-based system the tags themselves denote. Incredible, ain't it?

Third, the bragging of "Validation of MARC record content" is pure nonsense and doesn't exists unless, you guessed it, made it yourself or found someone else's code. Good luck with all that.

8. Extensiblity

By using XML as the structure for MARC records, users of the MARC in the XML framework can more easily write their own tools to consume, manipulate, and convert MARC data.
Finally, the biggest bullshit statement of all, the one that basically says "now it's in XML; everything will be easy from here on in."

This last section gets its own headline;

What really happens

Seriously, have the people involved in MARCXML any expertise in XML? I know this is a bold and somewhat insulting statement. I can understand why MARCXML became what it is, because it's the first and simplest step one can take in getting anything into XML. The claims made about it, though, does not hold up to scrutiny, and in fact is outright bullshitting you into thinking MARCXML should even be considered to be a part of your development tool-chest. It should not.

The whole idea of XML is to have your meta data be the markup, and the data be, uh, data. When we have complex titles, here's what it should look like;
<title>Arithmetic <responsibility>Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand.</responsibility></title>

But even this isn't good enough; we need typed data values, so that we can verify that what we put in can be used for something we know about, and this is glaringly absent from MARCXML. They probably thought that the problem was too hard, we'll deal with it later, but we are much later now, and nothing has changed. It's luring poor innocent librarians into thinking they're XML savvy, having catalogers think it solves some kind of meta data exchange problem with non-librarians, and making library techies embarrassed to ask XML questions in the fora of the world.

Take a look at this insane example they provide on their website. If you're a MARC junkie you might make something out of it, but if you are anyone else you'll balk at the complexities thrown at you. And the really bad part is that this stuff ain't complex, it only looks that way through crap XML. Here, being in XML is working against you. So, don't show this to your parents.

Finally, forget that MARCXML ever came to be, and look to MADS and MODS instead. Anything but MARCXML. I beg you.

Labels: , , ,

6 December 2007

So long, and thanks for all the smelly fish

It's time to reveal what's going on in my life on a big scale, hinted to in my last dozen or so posts.

Yes, I'm leaving the library world. Yes, I'm leaving Canberra and the National Library of Australia. In fact, I'm leaving Australia altogether. I'm leaving process-oriented committee-driven work (surrounded by the hum of millions upon millions of flies, one of the seriously exciting features about Canberra). I'm leaving good friends and colleges which I'll miss, and a cute house that's been a safe home and haven for the last four years. I'm leaving an extended family who - despite their better knowledge - have accepted me as one of them, with friendships that will bring us back to Australia in a few years, I'm sure. I'm even leaving behind our wonderful dog Oscar (in good hands, I might add ; he's going back to his original owners) which the whole family will miss dearly.

I'm going back to Norway, to work for Bekk Consulting again, and I start at the beginning of February 2008. I'll write more about my role there later, but needless to say these guys know what they're doing, they do it fast and really well, no mucking about. They do not meddle in the semantics of FRBR for 15 years before taking baby-steps to prototype it. Either it's the right thing to do, or it is not. And if it's not, these guys don't do it. And I can't wait to get back into the habit of not doing things we shouldn't do.

We've got a house lined up (in Oslo) centrally and near my beloved woods, and a car. We're scared and excited at the same time, and hope that our Australian friends will forgive us and cheer us on in our adventure, and our Norwegian friends welcome us home and invite us for dinner, if not only for a period.

There's many things I want to talk about, from the state of the library world, to the evils of recruiting companies, to Australian business ethics, the value of friendship, and how to plan your future (which is a short piece about how you can try but will fail), but all in due time.

Right now it's time to pack, reflect, and wrap up my Australian adventure in the most positive way I can. Watch this roof, and wish us good luck in our latest big adventure.

Labels: , , ,

28 March 2007

Quiet, oh so quiet ...

For those who wonder why I've been so quiet of late should go read this Lorcan Dempsey bit. Now you also know why I might be a bit quiet for a little while as well, as there's big deadlines and interesting stuff coming up. Watch this space.

Labels: , , , ,