8 February 2010

Richard Dawkins "Greatest Show on Earth"

If I wasn't an indoctrinated corporate drone I would be a scientist, and indeed, back when I was a wee boy I dreamed of becoming a geologist. Boy, did I know my gray rocks from the slightly lighter gray rocks and so on. I took great delight in walks in nature finding moraines and tills and other long-gone remnants of geological implication (glaciers, mostly), and I could tell rombeporfyr from feltspat and point out the probable processes involved in creating the shapes and colors. It was a glorious time, and I've still got it I think (and I've passed it on to my kids who always make me carry tons of rocks back home ... there's poetic justice if I ever heard it), but nowadays mostly through the local geography (which is interesting in its own mind as the Kiama area are remnants of several epochs of volcanic activity on top of sandstone, with a strong iron presence. I'll probable make a post about all this in the future sometime).

Knowing something about geology makes you somewhat aware of what's known as geological time, a time frame that spans billions of years. And, as some might suspect, trying to get a grip on what 'billions of years' for a mere human is is a daunting and often failed task. But a rudimentary understanding of geological time and processes also rendered me immune to a lot of otherwise human misunderstanding and nonsense that our cultures have built up over time to explain all that which we didn't understand. So if you understand unstable (ie. radioactive) isotopes in rocks and their half-life, how they break down (as a figure of speech) from an unstable to a stable form, you have no problem understanding other processes that also runs across billions of years, and indeed, runs parallel to geological time and processes. And to someone who not only knows a few things about rocks but also those things which you find inside rocks, evolution is not hard to grasp, at least not the tenant that it is right there, in front of you, staring back at you after you chipped that piece of rock off from the rock wall. For me, it was the most natural thing, and indeed sparked my deep interest in all things biological as well.

So for me to read Dawkins book "Greatest show on earth" was more like a dumbed-down defense of something that I thought no one was stupid enough to refute. But, there it was, in the first chapter, a fleshing out that there were indeed idiots out there who just could grasp the most basic notions and evidence, people who actually thought everything we see now has been unchanging for all the time the earth and the universe have existed; about 10.000 years. Huh? *blink* Maybe the sub-title should have tipped me off; "The evidence for evolution", as if we needed more evidence than what was taught in school.

Then I realized that not all the kids I went to school with paid too much attention when such big issues came up. They probably passed the tests and all, but I did not see them engage with (or annoy with too many questions) the teacher the way I think I did, they didn't go out into the woods to climb rocks and find fossils themselves, they didn't deduce the layers of a side of a deep canyon with a river at the bottom who was responsible for the canyon, who dug it, how the shape came to be. I guess they ended up not knowing as much, at least not on these subjects.

And that was the greatest shock for me; the world really needs to be convinced that evolution is real!?

It was like someone punched me in the gut; here I was thinking our human species were going places, and then I found out that the truth somehow is in question. Looking at their argument against is nothing short of a laughing matter, all attributed to the fact that their faith is in disagreement with the science. Ouch. So who do we think is right? The people of faith and no facts, or thousands of scientists working together for hundreds of years on the greatest Utopian adventure humankind has ever ventured on? Oh, the irony.

For me, in short, is that the book is great; it's well-written, perhaps two notches too intelligent in places (c'mon, references to poetry? Who reads poetry anymore? And it uses a lot of big words), but a tad bit too apologetic as there is nothing excusable about being ignorant by choice (although I understand that this angle is mostly for the US market) and, I feel, just way too soft on the "opposition." These people are clearly not just history deniers, they are outright dishonest about their thirst for truth and knowledge, probably wouldn't know epistemology if it hit them over the head, cannot fathom that human traits and physiology only makes sense in evolutionary terms (have you checked your vestigial parts lately?), and since the discovery of genetics the huge amount of science that only works if evolution is true over geological time. I agree that thinking evolution is not true is crazy on a scale of, err, biblical proportions, and as much as I this book wasn't for me, I guess there is a strong need for it if there truly are this many nut cases out there who will deny anything if it doesn't sync with their faith or holy book. Weird.

4 February 2010

Topic Maps, 10 years down the line

I'm told, by way of my own imagination based on loose rumors put out by flying pink fairies, that Topic Maps is a waning technology, poorly supported by the IT industry at large, hard to wrap your head around, and generally icky to deal with.

All of this is, unfortunately, true.

But, as in all stories told by only one side, there is an other side just waiting to come out into the light, just one day, real soon now. This day may never come, but here is my own little attempt to shed some light on a few of the issues with the Topic Maps world. It was about 10 years ago I first got a whiff of Topic Maps, so my first post in 2010 seems fitting to take some Topic Maps rumors, loose observations and vague statements, and make some comments along the way. Here we go ;

1. Topic Maps are hard

Why, yes, to a commoner or some person with a somewhat traditional approach to computing, Topic Maps can indeed seem like an alien concept at first. The first time I started reading up on it I was mesmerized and frightened at the same time, wondering where the magic would bring me and just how painful it would be for me when reality would kick in (and me) ; there were new notions and concept, new words, new paradigms everywhere! Reification, role types, associations, occurrences, occurrence type, typified information, subjects and topics, ontologies (upper, lower, specialized ones) the list goes on. It is terrifying indeed, and for many, many people they are so terrifying that SQL and C# and .Net and C and PHP seems like a comforting auntie lulling you back into things we know and know well, no hard thinking required (just lots of hair to pull out).

Until you realize a few things, that is. For example, the vocabulary is anchored in information science, and with a bit of research or learning it shouldn't take that long to get familiar with it. Even the complex issues of reification and ontologies after some time will be as normal and self-explainable as second-cousins and language. (And yes, there is a correlation between the examples given! See if you can find them!) And perhaps more importantly, the problems you can solve with Topic Maps can completely and utterly eradicate the major problems those traditional methods give us, one of the biggest bug-bears that I'd ever had! (Anyone wish to offer me a book deal on how to solve most of the main IT development problems in seriously interesting ways? :)

Can I just mention that having an small epiphany about Topic Maps have the effect of you never returning to the real world and look at it the same way, ever again? I have never met a person who got Topic Maps return to the old ways, at least not without making huge compromises. Getting it will change you in good ways, and is most definitely worth the effort despite the pain.

Tips to newbies: It's not really hard, even if it seems hard. But it requires you to change your mind on some key issues.

2. Topic Maps are poorly supported in the real-world

Oh yes, indeed. If you talk to anyone, any company in your immediate serenity (yes, a tautologically pun) and ask them about their use of Topic Maps, you'd most likely get a blank stare back and a careful "What would we need maps for?"

There's the odd technical-inclined person who might now a toddle about what these fabled Topic Maps are all about, but very, very few people understand what they are, and even less have implemented them into something useful. (The exception to this is, oddly enough, the country of Norway, and some scantily-clad areas of southern Germany) No mainstream software package comes with the stuff wrapped in, no word-processor touts its amazingness, no operating system comes with support for it, and no popular software of any kind use it.

But then, there's the odd system that use it. You'll find it also in the odd Norwegian government portal, which is bizarre in its own right, and perhaps deep down in some academic underfunded project or perhaps some commercial project where parts of the data-model masquerades as it. My old website use it. I have a framework or two. There's the odd other open-source project, a few API's, and a host of other well-meaning but obscure projects that perhaps has got it, albeit well hidden and kept away from children.

For a technology that stands out as something that can fix it all, I find it bizarre that it is found so seldom, but then bizarre is not the same as surprised. And when you look at the "competition", the well-funded, well-marketed, well-established world of the Semantic Web, championed by none other than the W3C and Tim Berners-Lee, well you have to concede that it shouldn't be much of a surprise at all, really. Topic Maps is a tiny group of enthusiasts (a few hundred, being liberal with statistics) who'll saw off their right leg if it meant we could get the specs done in time, while the Semantic World is littered with academia, organisations and companies (we're talking thousands upon thousands of people actively working on it), so no, you should not be surprised.

Tips to newbies: As the saying go, if a million flies eat it ... surely, it has some nutritional value or greater worth over, say, that green grass the cows are dumping it on?

3. Topic Maps is dying and obsolete; use RDF instead

There was a period about 10 years ago which I regard as the Topic Maps time of bloom ; the trees had beautiful flowers on, the pink and purple petals falling over the world of IT like a slow-motion rainfall of beauty. Everywhere you turned there was people talking about it and potential projects popping all the time.

But times went by. Topic Maps was too hard for most (see point 1 and 2), and not just the technical implications themselves and the language and terms used, but also the philosophy of it, the very idea of why we should be using it over, say, any relational database or traditional software stack. I mean, what's the point, really?

The point is easy to miss, admittedly. A technology that can be used for everything is hard to pin down and said to be good for something. And we have focused just too damn much on knowledge management systems, and not only that, but used our own special language in the process which often is quite remote from knowledge management speech in the enterprise arena (but you find it rife in academia). When the world looks to Topic Maps, all they see is a difficult way to do knowledge management. Ugh.

Myself, I'm using Topic Maps in highly non-traditional ways. I use maps for my application (definitions, actions and functionality), for functional topology (generic functionality in hyper-systems based on typification), for business logic (rules, conditions, interactions) and, perhaps just as important, for the actual development itself (modules and plugins, deployment, versioning, services) which makes for a highly (and this "highly" is quite higher than any normally used "highly") customizable and flexible framework for making great semantic applications. But more on the details at some later stage.

Tips for newbies: No, it's not dead nor dying, just not as popular as stuff that's easier or more accessible

4. Topic Maps is nothing new

Well, given its roughly 20 year history (and I'm counting from early days of HyTyme), in Internet years it's an old, old dog, so by that alone we can't say there's anything new, but most people would mean "new" here to mean something like "we've been doing X for years, so why do we need this?", where X usually points to some bit of the Topic Maps paradigm that indeed has been done before. Of course it has. There is nothing new in Topic Maps except, of course, putting it all together and standardize one cohesive and complete way of doing pretty damn most of what you would need for your complex data-model, identity management, semantic or otherwise relational, interoperable information and / or structural need, chucking in knowledge management, too, for good measure.

There are of course nothing new with Topic Maps, except that all that old stuff is bundled into a new thing, if you allow a 20 year old standard to be called "new." But then again, "the standard" is really a family of standards, all evolving and changing with the times. There's always a sub-standard (no pun intended ... well, not a lot of pun intended) in the woodworks, always some half-baked document to explain something or other, always something that is so damn specific and concise that the overall grooviness and funky bits are pushed to the side-lines.

Topic Maps is new and old at the same time, but it really is groovy and funky once you overcome the technical jargon and the concise nature of the standards.

Tips to newbies: The king is dead. Long live the king!

5. The Topic Maps community is, um, a bit tricky

Oh, yes indeed. And this one is the hardest to write about as I'm part of this community and know pretty much everyone, some more than others.

So let's say it this way; I'm a difficult person in certain ways, for example I talk a lot, I overflow with ideas rather than code, I don't care too much about political correctness, and I speak my mind and use language that could alienate people with too strong attachments to their ties or their social buckets.

And the core of the Topic Maps community is loaded with weirdos like me; highly opinionated, rough ideas, hard on woo, and soft on business. But the problem isn't the weirdos, but the low number of them. Any successful community with such a wide-ranging and all-encompassing area of what Topic Maps is all about (which is, uh, almost anything) going from epistemology to identity management to ontology work, well, you need a lot of personalities to match them all to make it seem like a lively place. We, on the other hand, have a handful of people, and the contrast between us all is sometimes just too great. And, I've noticed, we're not very good with newbies, either, so even if we answer their questions, quite often our answers are just too far out there for normal people to comprehend (and I've got a ton of circumstantial and anecdotal evidence to back it up).

I'm part of many different communities on the web, but there is only one champion of how fast an online discussion goes private (and it's not of the good kind; it's the kind where we need to express our frustrations in private [because, ultimately, we're nice people who don't want to offend anyone even when they deserve it, those bastards], lest we blow up and our eyes will bleed!), and that's the community which is located on a private server where you must write to the list owner in an email to be added. *sigh*

I tried my "question of the week" thing on the mailing-list for a while, and some of those went well, but too many of those question quickly descended into nothing or private arenas. So, I'm officially giving up on it for now. Maybe I'll come back stronger once my spine grows back, who knows?

Tips for newbies: Be strong, keep at it, ask for clarification! We don't know just how alien we are. And please join in as we need more weirdos.

6. What, exactly, is Topic Maps, anyways? I don't get it!

Yes, indeed, what exactly is this darn Topic Maps thing? The funny thing is that there is no correct answer to that question. First of all, it's a family of standards that we collectively call "Topic Maps", but it could also mean either the TMDM (Topic Maps Data Model) standard or the XTM (Topic Maps XML exchange format) XML standard, depending on your non-sexual preferences. Some might even go out on a limb (obviously not the limb cut off in point no. 2) and claim that it means the TMRM (Topic Maps Reference Model) which is a more abstract framework, or possibly even just the philosophical direction - or, dare I say it, zeitgeist? - of the thing, like a blueprint for how to build a key-value recursive property framework with identity- and knowledge management system. Your mileage may vary.

But then we have a problem as it is not a technology nor a format. It is more akin to a language, a model or a direction of sorts. No, not a language like SQL (even though the TMQL (Topic Maps Query Language) could be said to hold that place) that is to be parsed by a computer, nor a language like Norwegian or English. No, we're talking about a language that sits right in the middle between the computer and the human, a kind of mediator or translator, a model in which both machine and human can do things that each part understands equally well, a model which is defined through information science, math and human language.

So what is it? It's a language that both computers and humans can use without pulling too much in either direction, a language in the middle that, if spoken by many parties (computers and humans both), they can all join hands and sing beautiful knowledge management songs together, share and propagate with ease. But of course, Topic Maps isn't limited to just knowledge management, oh no. You can solve unsurmountable things with it as you can make it represent whatever you want it to, and I really, truly mean anything. If you want a topic to represent your thing, off you go. It's that flexible.

It can work as the basis for pretty much any system that has structures in it of any kind or shape, and that, by and large, is pretty much any system ever built. So it's actually quite hard to explain just what you can use it for, even though traditionally it's content management, portals and knowledge management.

Tips to newbies: It's only a model ...

So there you go, a quick summary of bits and bobs about Topic Maps. In my next installment, I'll summarize my naval fluff collection, next the train-table changes of Minnamurra station of the last 10 years, and finally I thought I'd summarize all the redundant technology that's gathering dust in my garage. Stay tuned for exciting times ahead!

Labels:

8 December 2009

I ain't dead!

Right, so I'm still here, in the rubble of my mind, trying to work something out. I haven't blogged in the last few weeks, because, again I'm lost to the infinite machine of just too bloody much to blog about. Some of these things are somewhat secret stuff, but a lot of it I should yell out for all to see, and I'm sure with a bit of patience and Macedonian Oil I just might.

But not now. "Now" is just a futuristic recap of things I've blogged about in the near future, using cheesy book titles ;

  • The end is far, far away : Studies in Cosmology, thoughts on the flat universe model and evolutionary natural selection, and how timing is everything
  • Ontology schmology bolony : Everything I know about ontologies, linked data and inference, and just what a bloody mess it all is (and the possible ways through it, as far as I can see)
  • Library end-times : This is what they were, this is what they are, this is what they'll become
  • Evolution as a driver for moral philosophy : Philosophical greats had good questions that now makes for redundant answers
  • Have you heard this?! : Science as a language of beauty, art and transcendence
  • Functionally complete impotence : Programming languages that mean well, but are ugly, smells bad and won't make you light up after having sex with it
  • Atheism and agnosticism : A transsexual ploy to power (or, a Tale of two Ditties)
  • Software : Sitting comfortably? How sitting in front of a computer makes you a terrible programmer
  • Books I've read : The good stuff (and where to go next)
  • Books I've read : Time I've wasted (and the reasons for it)
  • Lingua Panga! : How language poisons everything (the big problems of humanity blamed on humans talking too much)

Right, so that should leave some clues as to where I am and where my mind is. My bedsite table is brimming over with books and notes, and I've got a few half-written articles (novels, is more like it) sitting around waiting for me to retire so I can bloody well finish them.

I also have some real articles in the oven, about service-oriented architecture (SOA) perils and solutions in a time of cloud hysteria, parallel processing mania and key-value minimalist thinking as a way to leverage, er, something or other. I'm sure it'll be great once I figure out what I'm writing.

Oh, and it's hot here now. We've been to the beach rather often, but being a dad of three crazy kids I don't get to go in the water much, but I enjoy helping Sam dismantle the beach with a shovel and bucket. It's also end-of-year stuff with school, Lilje playing in some musical number or two, and generally for Grace and Lilje to say goodbye to Minnamurra Primary as we're moving them over to Shellharbour Anglican Collage next year following Julie's new job there. We're also moving houses in about 10 days (closer to the beach, so it must be good, although I'll miss the close proximity to all my coffee-shops), so that's going to be crazy time.

Ok, time to go and treat kids for head-lice which has rampaged through their school of late. And then, dinner. Wish me luck.

27 October 2009

On identity

What are you talking about?

   We're always talking about something, but have you wondered why we humans are so good at it? It's not because we're smart, that our brain has got some amazing capacity for language, or even that we've evolved a great sense of logic and inference so we can break sentences up into compartments, parse it and make some sense of it. No, it's because we've got a tremendous imagination!

   And it seems that our frontal lobe is to blame; it is linked to a number of cognitively important things, like dreaming (preparing the brain for situations and trauma; did you know that no matter the trauma you will be over it [as in, able to move on] within 7 months?), Déjà vu (the frontal lobe is always a few milliseconds ahead of you), intuition (simulating possibilities, feeding you with probables), and in this context, filling in the gaps as best it can.

   And boy is it good at it. Remember that meme that was floating around some time ago, about how researchers have found that if you removed some of the letters from words in a text, the brain is still able to fill in the gaps so that you can make sense of it? The brain will fill in whatever gap there is, and this is also being heavily linked to religion and why people believe in rather bizarre things, from ghosts to conspiracies to "alternative medicine" ("You know what they call alternative medicine which is proven to work? Medicine.'" -- Tim Minchin). But I'm not going to get into what they believe here, only how they believe in the same bizarre things as their peers.

   But first some background. My recent adventures in library-land is trying to get some traction on identity management, which I have tried to explain there for the last two or three years with little to no success. I'm not even sure why the library world - full of people who should know a thing or two about epistemology - don't seem to grasp the basics of epistemology. (Maybe it's another one of those gaps the brain fills in with rubbish?) How do we know that we're talking about the same thing?

   If I have a book A in my collection and Bob has a book B in his collection, how can we determine if these two books share some common properties or, if we're really lucky, is written by the same author, has the same title, and is the same edition, published by the same publisher? We're trying to establish some form of identity. Now, we humans are good at this stuff because we're all fuzzy and got this brain which fills in the gaps for us, but when we make systems of it we need other ways to denote identity.

   The library world has a setup which is based around the title and the author, so for example we get "Dune" by Frank Herbert (1920-1986), or if we are to cite it, something like this (from NLA's catalog) ;

  • APA Citation:  Herbert, Frank,  1972  Dune  Chilton Book Co., Philadelphia :
  • MLA Citation:   Herbert, Frank,  Dune  Chilton Book Co., Philadelphia :  1972
  • Australian Citation:  Herbert, Frank,  1972,  Dune  Chilton Book Co., Philadelphia :
   Never mind that when you look at the record itself it lists Herbert as "Herbert, Frank, 1920-" confusing a lot of automata by not knowing he died over 20 years ago. So we've got several ways of citing the book, several ways of denoting the author ... what to do?


   The library world is doing a lot of match and merge (on human prose, no less!), where since you know that a lot of authors have died since their records were last updated, you can parse the author field and try to match "sub-fields" within it to match on that. However, this quickly becomes problematic ;

  • Herbert, Frank (1920-)
  • Herbert, Frank (1921-1986)
  • Herbert, Francis (-1986)
  • Herbert, Franklin (1920-)
  • Herbert, Franklin Patrick Jr (1919-)
  • Herbert, Francis (1030-)
  • Herbert, Frank Morris (1920-)

   Which of these is the real Frank Herbert who wrote the book "Dune"? Four of them, actually. Now, if you're a human you can do some searching and probably find out which ones they are, but if you're a computer you have buckleys trying to figure these things out, no matter how well you parse and analyse the authors individual "sub-fields". People make mistakes and enter imprecise or outright wrong information into the meta data (for a variety of reasons), so we need some other method that's a bit better than this. However, do note that this is the way it's currently being done. Add internationalization to the mix, and you'll have loads of fun trying to make sense of your authority records, as they are called.

   Now, my book A just happened to be "Dune" by Frank Herbert, so I sent a mail to Bob with the following link and asked if that happened to be the same book ;
http://en.wikipedia.org/wiki/Dune_(novel)
   Did you notice what just happened? I used used an URI as an identifier for a subject. If you popped that URI into your browser, it will take you to WikiPedia's article on the book and provide a lot of info there in human prose about this book, and this would make it rather easy for Bob to say that, yes indeed, that's the same book I've got. So now we've got me and Bob agreeing that we have the same book.

   How can our computer systems do the same? They cannot read English, certainly not to any capacity to reason or infer the identity of the subject noted on that WikiPedia page. But here's the thing; that URI is two things ;

  1. A HTTP URI which a browser can resolve, will get a web page back for, and which it displays to a human to read.
  2. A series of characters and letters in a string.

   It's the second point which is interesting for us when computers need to find identity. It is a string that represents something. It isn't the web page itself, just an identifier for that page, just a representation of a particular subject. This brings us back to epistemology, and more specifically representialism; we've created a symbol, a string of letters, that doesn't need to be read or understood when the strings are put together, but simply a pattern, a shape, a symbol, an icon, a token, whatever. It's not an URI anymore, but simply a token. And because it's a string of characters, it's easy to compare one token against the other. "http://bingo.com" and "http://bingo.com" have the same equivalence as "abc" and "abc", that is, they are the same. Those symbols, those tokens, are equal.

   So now we can say that the URI http://en.wikipedia.org/wiki/Dune_(novel) is simply a token and a URI at the same time. This is deliberate, and bloody brilliant at the same time; it means that we can compare a host of them for equality as well as being resolvable in case we want to have a look at what they are. This becomes a mechanism for both human understanding of what's on the other end of the URI, and for doing computational comparisons.

   So are we to use an URI for each of the variations of Frank Herberts name? No, that would bring us back to square one. No, the idea is for sharing these URIs (but more on URIs for multiple names in a minute) in a reasonable fashion, but this is where it gets slightly complex because when you talk to Semantic Web people it's all about established ontologies and shared data. When you talk to people, it's all about resolvable URIs. But there's a bit that's missing ;
I love http://en.wikipedia.org/wiki/Semantic_Web
   That's a classic statement, but what am I saying? Do I love the Semantic Web (the subject), or do I love that web page article at WikiPedia explaining the Semantic Web (a resource)?

   Incidentally, my classic statement is known as a value statement in the RDF world, and as a triplet (because it's got three parts, the three words / notions). Whenever we're working with RDF, we're working with URIs. Every single entity is translated into its URI form like such ;
I [http://shelter.nu/me.html]
love [http://en.wikipedia.org/wiki/Love#Interpersonal_love]
Semantic Web [http://en.wikipedia.org/wiki/Semantic_Web]
   I need to talk a bit about namespaces at this point. If you're not familiar with them, they're basically a shorthand for mostly the first part of an URI, like a representation that can be reused, and then glued together by the means of the magical colon : character, so for example I have many things to say about me and my universe, which each will get translated into a URI ;
me [http://shelter.nu/me.html]
topic maps [http://shelter.nu/tm.html]
fields of interest [http://shelter.nu/foi.html]
blog [http://shelter.nu/blog/]
Writing out the URI for each thing is tedious, and also is prone to errors, so what we do is to create a namespace as such ;
alex = http://shelter.nu/
Now we can use that namespace with a colon to write all those URIs in a faster, less error-prone way ;
me [alex:me.html] 
topic maps [alex:tm.html]

fields of interest [alex:foi.html]
blog [alex:blog]
   Namespaces is also a good way to modularize and extend easier existing stuff, and helps us organize and care for our various bits and bobs. Well, so the theory goes. But when you muck around with lots of data from many places, it quickly becomes a situation that I call name-despaced, where there's just too many namespaces around. When it gets complex like that with hundreds of namespaces around, we're pretty much back to having non-semantic markup again and no one really wants that. This all is of course the result (but not end result) of the organic way information and people organize stuff. Some namespaces will die, while others will be popular and live on, and we're still in early days.

   Anyway, back to solving our identity management problems. The issue here is that just sharing the data doesn't give us semantics (meaning), nor does sharing our ontologies. We need both human comprehension and computational logic in order to pull it all off, and the reason we care about this these days is that the amount of data is growing beyond our wildest imaginations and will continue to grow. The computational part is reading in ontologies and sort data thereafter. The human part is creating the ontologies.

   So what are these ontologies? Well, they're just models, really, an abstract representation of something in reality, so when FRBR spends its time in prose and blogs and articles and debate, it's really trying to make us all agree on a specific way of modeling said domain. When we formalize this effort, mostly into XML schemas or RDF / OWL statements, we are creating an ontology. It's like a meta language in which we can describe our models further. This is usually modularized from the most abstract into the most concrete way of thinking, so from what's known as an upper ontology (pie-in-the-sky) through various layers (all called many different things, of course, like middle, reason, core, manifest, etc.)


   Karen Coyle (a voice of reason on the future of the library world)  recently "debated" with me on these things, and I pointed her to "Curing the web's identity crisis", an article by Steve Pepper (fellow Topic Mapper like me) which more people really should read and make an effort at understanding. Now I think there's some confusion as to what is being explained (well, I never got a reply, so I don't know, to be honest. It's probably me. :), and also to why we (us terrible representialists) keep bringing this up, but I'm kinda back to where I started in this blog post of trying to argue the case for creating identity of things through more layers than currently is being used.

   We (both RDF and Topic Maps) use URIs as tokens for identity. But in the RDF world there is no distinction between subject identity and resource identity, and I suspect this is where Karen's confusion kicks in. In the Topic Maps world we make this distinction quite clear, in addition to the resource-specific identities as well (so URIs for internal Topic Map identity, external subject identity, and external resource identity), and this is vitally important to understand!

Let me examplify with how I would like to see future library cataloging being done ;

I have a resource of sorts at hand, it could be a book or a link or a CD or something. Doesn't matter, but for the example it's written by Frank Herbert, apparently, and is called "Dune Genesis." It's an eBook. I pop "Frank Herbert" into a textbox of sorts, the system automatically does some searching, and finds 5 URIs that match that name. One of those URIs are WikiPedia and another is The Library of Congress. That means LoC has verified that whatever explain the subject of "Frank Herbert" is at the URI at WikiPedia, and that there is a reasonable equality between the two; one WikiPedia page, one authority record at LoC. The other URIs more or less confirm it (and this speaks to trust and government) I choose to accept the LoC URI as a author subject URI. Nothing more needs to be entered, no dates, no names, no nothing. Just one URI.

   Now I pop the name "Dune Genesis" into by tool, and it does its magic, but it return only a WikiPedia URI, and because it's tradition not to "trust" WikiPedia it means I have a "new" record I need to catalog. However, the WikiPedia URI contains RDFa, so my tool asks if I want to try and auto-populate meta data, and I choose yes. Fields gets populated, and I go over them, controlling that they are good, add some, edit some, delete some, and hit save.

   Two things now happen; the system automatically create an URI for me, a subject identity URI that if resolve will point to a page somewhere on our webserver with our meta data. That URI is fed back into whatever loop that tool uses for federated URIs, it could be library custom-made (see EATS below, or look to the brilliant www.subj3ct.com website for federated identity management) or something as simple as Google (for example, I use Ontopedia a lot, so if I do do "Alexander Johannesen Ontopedia", I will get as a first result a page representing an URI I can use for talking about me). This creates a dual system of identity, one for the subject, one for the meta data about the book, both using the same URI.

   Do you dig it? Can you see it? Can you see the library world slowly using such a simple mechanism for totally ruling the meta data and identity management boulevard, or what? I pointed to Conal Tuohy's EATS system. Make him give it to you, collaborate to make this just work, open-source and make make it a tool for librarians to automatically create, use, harvest and share identities and resources using the same URIs, and you've got what you need.

   This is complex stuff, and I think I need a drink now. A nice hot tea will do, and I'll try to clarify more in the coming days. Until then, ponder "what the heck you are talking about."

21 October 2009

Old post, as good as new

I just realized that I wrote this ages ago but never posted it. It has a few gems in it ;
Criticism is mostly about rocking the boat. Sure, there's positive criticism, like "you're not ugly, just beautiful-impaired!", but aren't we over this silly overly political correctness by now? Criticism is to tell it straight, that what someone else has done is not up to scratch, that surely there must be some improvement that could be done. But the library world don't work like that. Criticism in the library world uses a different word; approval.

15 October 2009

Ontological Ponderings

The last few months have been interesting for me in a philosophical sense. My job is on an architectural level in using ontologies in software development, both in the process (development, deployment, documentation), the infra-structure (SOA, servers, clusters) and the end result of it (business applications). So needless to say, I've been going a bit epistemental, so I promised myself yesterday to jot down my thoughts and worries, if for no other reason than for future reference.

One big thing that seems to go through my ponderings like a theme, is the linguistic flow of the definition language itself, in how the mode of definition changes the relative inference of the results of using that ontology over static data (not to mention how it gets even trickier with dynamic data). We usually say that the two main ontological expressions (is_a, has_a) of most triplets (I use the example of triplets / RDF as they are the most common ones, although I use Topic Maps association statements myself) defines a flat world from which we further classify the round world. But how do we do this? We make up statements like this ;

Alex is_a Person
Alex has_a Son

Anyone who works in this field understand what's going on, and that things like "Alex" and "Person" and "Son" are entities, and defined with URIs, so actually they become ;

http://shelter.nu/me.html is_a http://psi.ontopedia.net/Person
http://shelter.nu/me.html has_a http://en.wikipedia.org/wiki/Son

Well, in RDF they do. In Topic Maps we have these as subject identifiers, but pretty much the same deal (except some subtleties I won't go into here). But our work is not done. Even those ontological expressions have their URIs as well, giving us ;

http://shelter.nu/me.html http://shelter.nu/psi/is_a http://psi.ontopedia.net/Person
http://shelter.nu/me.html http://shelter.nu/psi/has_a http://en.wikipedia.org/wiki/Son

Right, so now we got triplets of URIs we can do inferencing over. But there's a few snags. Firstly, a tuple like this is nothing but a set of properties for a non-virtual property and does not function like a proxy (like for instance the Topic Maps Reference Model do), and in transforming between these two forms gives us a lot of ambiguity that quickly becomes a bit of a problem if you're not careful (it can completely render inferencing useless, which is kinda sucky). Now given that most ontological expressions are defined by people, things can get hairy even quicker. People are funny that way.

So I've been thinking about the implications of more ambiguous statement definitions, so instead of saying is_a, what about was_a, will_be_a, can_be_a, is_a_kindof_a? What are the ontological implications of playing around with the language itself like this? It's just another property, and as such will create a different inferred result, but that's the easy answer. The hard answer lies between a formal definition language and the language in which I'm writing this blog post.

We tend to define that "this is_a that", this being the focal point from which our definition flows. So, instead of listing all Persons of the world, we list this one thing who is a Person, and moves on to the next. And for practical reasons, that's the way it must be, especially considering the scope of the Semantic Web itself. But what if this creates bias we do not want?

Alex is_a Person, for sure, but at some point I shall die, and then I change from is_a to a was_a. What implications will this, if any, have on things? Should is_a and was_a be synonyms, antonyms, allegoric of, or projection through? Do we need special ontologies that deal with discrepancies over time, a clean-up mechanism that alters data and sub-sequentially changes queries and results? Because it's one thing to define and use data as is, another completely to deal with an ever changing world, and I see most - if not all - ontology work break when faced with a changing world.

I think I've decided to go with a kind_of ontology (and ontology where there is no defined truth, only an inferred kind-system), for no other reason that it makes cognitive sense to me and hopefully to other people who will be using the ontologies. This resonates with me especially these days as I'm sick on the distinction people make between language and society, that the two are different. They are not. Our languages are just like music; with the ebb and flow, drama and silence that makes words mean different things. By adding the ambiguity of "kind of" instead of truth statements I'm hoping to add a bit of semiotics to the mix.

But I know it won't fix any real problems, because the problem is that we are human, and as humans we're very good at reading between the lines, at being vague, clever with words, and don't need our information to be true in order to live with it. Computers suck at all these things.

This is where I'm having a semi-crisis of belief, where I'm not sure that epistemological thinking will ever get past the stage of basic tinkering with identity in which we create a false world of digital identities to make up for any real identity of things. I'm not sure how we can properly create proxies of identity in a meaningful way, nor in a practical way. If you're with me so far, the problem is that we need to give special attention to every context, something machines simply aren't capable of doing. Even the most kick-ass inferencing machines breaks down under epistemological pressure, and it's starting to bug me. Well, bug me in a philosophical kind of way. (As for mere software development and such, we can get away with a lot of murder)

I'm currently looking into how we can replicate the warm, fuzzy impreciseness of human thinking through cumulative histograms over ontological expressions. I'm hoping that there is a way to create small blobs of "thinking" programs (small software programs or, probably more correctly, script languages) that can work over ontological expressions without the use of formal logic at all (first-order logic, go to hell!) that can be shared, that can learn what data can and can't be trusted to have some truthiness. Here's to hoping.

The next issue is directional linguistics, in how the vectors of knowledge is defined. There's things of importance to what order you gain your knowledge, just like there's great importance in how you sort it. This is mostly ignored, and the data is treated as it's found and entered. I'm not happy with that state of things at all, and I know that if I was taught about axioms before I got sick of math, my understanding of axiomatic value systems would be quite different. Not because I can't sit down now and figure it out, but because I've built a foundation which is hard to re-learn when wrong, hard to break free from. Any foundation sucks in that way, even our brains work this way, making it very hard to un-learn and re-train your brain. Ontological systems are no different; they build up a belief-system which may prove to be wrong further down the line, and I doubt these systems know how to deal with that, nor do the people who use such systems. I'm not happy.

Change is the key to all this, and I don't see many systems designed to cope with change. Well, small changes, for sure, but big, walloping changes? Changes in the fundamentals? Nope, not so much.

We humans can actually deal with humongous change pretty well, even though it may be a painful process to go through. Death, devastation, sickness and other large changes we adapt to. There's the saying, "when you've lost everything, there's nothing more to lose and everything to gain", and it holds remarkably true for the human adventure on this planet (look it up; the Earth is not really all that glad to have us around). But our computer systems can't deal with a CRC failure, little less a hard-drive crash just before tax-time.

There's something about the foundations of our computer systems that are terribly rigid. Now, of course, them being based on bits and bytes and hard-core logic, there's not too much you can do about the underlying stuff (apart from creating quantum machines; they're pretty awesome, and can alter the way we compute far more than the mere efficeny claims tell us) to make it more human. But we can put human genius on top of it. Heck, the ontological paradigm is one such important step in the right direction, but as long as the ontologies are defined in first-order logic and truth-statements, it is not going to work. It's going to break. It's going to suck.

Ok, enough for now. I'm heading for Canberra over the weekend, so see you on the other side, for my next ponder.

Labels: , ,

7 October 2009

Stupidity of systems and debt collection

Today's tale is an example of stupidity put into system. Or, a system that has accumulated enough stupidity to grow sentience, and has become a cancer onto society.

A preamble; in my distant, distant past (over 20 years ago now), I accumulated a bit of debt due to unfortunate circumstances, not too big for the world to get scared, but not small enough not to cause trouble. I lost a house over it, basically stemming from taxes on income the government of the country I was living in at the time thought I should pay when I, in fact, didn't have an income at the time (in their wisdom they demanded I had to prove that I didn't have an income, a bit like proving that something doesn't exists which is, in fact, impossible. And when you're arguing with a system, you're not going to be heard). It's a long story, one I'd rather try to forget, but suffice to say I have some experience of debt, debt collection and the various instances and how they work.

Since my distant past I try to help people make sense of these systems, mostly for minor things (like when you forget to pay a bill twice ... you'd be surprised how easy it is :), but sometimes also for larger debts that take time, patience and good negotiating skills to overcome. But I've done it again and again.

So, the other day we got a message on our answering machine from some person who's got the worlds fastest talking voice, saying something like 'Hi, Ribbedy Rabbedy from Bing and Bong here (honestly, it sounded just like that!), calling on an urgent matter, call us back on !*$*!!!*$$%%!*$ (I had to go through the message over 10 times to get these numbers right) with reference number %*@%*@%*$$* (another 10 times to get this number), bye!'

I called back straight away, because we have a pretty good system in our house for bills coming in and getting dealt with and knew of nothing outstanding, where everything gets put into the 'in' folder and dealt with at least three times a week, and if dealt with, moved from one side of the desks folder drawer to the other, big cross across the bill, and typed 'paid' in large numbers, before filed safely. But when I called the number, I was greeted by a receptionist who didn't know who'd called me, couldn't find anything with my reference number, couldn't tell me quite what it was they do ('business services' yeah, that explains it) and in the end we gave up. I thought, if it is that important, they'll get back to me.

Didn't hear another thing for two weeks. Maybe they made a mistake, and were after someone else.

Then last night we get a call from someone with a thick Indian accent, probably some poor outsourced guy in Bangalore just trying to fill his quota, trying to explain to first my 9 year old daughter, then to my wife, and finally to me, about something or other. We just couldn't work it out, except big words such as "serious matter" and "debt", and this all smack down in the middle of dinner-time. What they hell? It sounded more and more like a scam, as he was being very secretive, refusing to tell me anything of value, so I tried to just get out of him what company he was calling from, which was something like B'n'B, D'n'D, E'n'E, or any other combo of letters that go with ee-enn-ee ("what do you do? We do business services" Aaaargh!). My daughter confused and my wife worked up, I ended the conversation with saying that if there is a serious matter and you can't communicate properly, send us a friggin' letter, in a stern but polite manner.

Today came a letter. Well, a bill actually, accompanied with threats of "garnishee your wages, tax refund, bank account or *** or take you to court" with "urgency" and "serious" plastered all over it.

I paid the bill after going through our paperwork and not finding a 'paid' version of it, ticking it up as 'human failure to pop an old bill where it belongs for filing' (so, most likely my fault), and then the phone rings. Yup, another representative for this company bugging us. Having just paid the bill, I asked why he's calling, but because these guys (and no Indian accent this time, albeit there was a foreign element to it, since I'm a foreigner myself I detect these things) can only read from scripts he insisted to talk to my wife. I said, no, you just called me on my phone, I'm her husband, is there anything we need to know that the letter / bill doesn't address. "If I could only talk to your wife, I could answer that question."

This is where it gets complicated, and I must induce the powers of logic, inference and bloody common sense. The next 3 minutes went on with me stating "you called me, you tell me, my wife doesn't want to speak to you because you're rude, incosiderate and mysterious about matters which could be cleared up in no time and you insist on being stupidly pigheaded because 'for legal reasons' that you can't explain further you can't explain it to anyone but her *if* there is or isn't anything of importance you need to tell her that the letter didn't."

"For legal reasons" is more often than not business speak for "we don't want to get into legal trouble ourselves", and is something I've been thinking a bit about lately. I've had phone calls from various companies we have services from, Telstra being one of them, who do courtesy calls to you to make sure everything is fine, or nag about some service they're pushing, or other somesuch, and they all start with asking me about info to confirm that I am who I am. "For legal reason."

So I am to tell a stranger, who is calling me on my own bloody phone, that claims to be from Telstra or otherwise to give out personal info for verification of who I am? What is my option for verifying that they are who they claim to be? At current, there is none; this is a one-way street, because I am me, lucky to their client, and they are whoever hell they want to be. This whole identity conundrum has been bugging me more and more of late, and culuminated today with this idiot (who in his defence was reading from a script) failing miserably to understand that in any conversation there are two parts; you and who you are addressing at the time. It may not be who you want to be talking to, but that doesn't alter the reality of it.

I ended the conversation by saying 'I'm going to say no' to his insistant nagging to talk to my wife. The letter and all this insane phone terror comes from Dun & Bradstreet (signed 'sincerely' Corey Smith, National Collections Manager, who I suspect has his name and scanned signature in many D&B templates), one of the bigger players in the debt collecting and reporting business (who I've had slightly better dealings with their Norweigan branch in the past, but only marginally).

What's all this hubbub about, you may ask? 63$. Yup, that's right, 63 Australian shiny little dollars, and not only that, but CentreLink - an arm of the Australian government for family benefits, like child support, pensions and the like - had overpaid us the 63$, and now apparently wants it back the hard way, at any cost (and you can just imagine the cost of all this rubbish!). Instead of, you know, just deduct it from our next payment.

63 friggin' dollars. They should feel so ashamed of themselves. This is what you get when stupid systems grows sentinence instead of a brain.

Labels: , ,

29 September 2009

Library Pontifications

Once in a while I get some email from people who ask me some questions or ask me to clarify something I've said in some setting. The other day I ranted on the NGC4LIB (Next-generation catalog 4 libraries) mailing-list about, uh, something or other. And I got email, which I answered, but since I got no reply I'm posting it here in a blog-edited form so that it doesn't go to waste ;
I think I am starting to understand your rants against the culture of MARC, and I'd probably feel offended if I knew what all of the above meant.
Hmm. Well, it wasn't meant to offend anyone. I guess if people thought they were hardcore into persistent identity management, then maybe they would feel I've either overlooked their hard work or don't think what they're doing is the right kind, or something.

I usually have two goals with my "rants"; 1. flush out those who already are on the right track, and make them more vocal and visible, and 2. if no one is on the right track, inspire people in the library world to at least have a look at it. I can do this because I have no vested interest in the library world as such; I cannot lose my library job as I'm not working for a library. :)
Naturally, to feel outside of the mainstream creates a crisis of confidence in one's abilities. What does it mean these days to say that one is a cataloger or that one works in tech services, and is it perceived as a joke for those on the outside? Oh yeah...they still produce cards. What do they know about databases?
Librarians are from the outside an incredible gifted bunch of people who knows what they're doing, they have granted powers outside the realm of normal people (including professionals like software developers, believe it or not), and they know stuff we normal folks don't.

However, having been on the inside you get to glimpse the reality of an underfunded, underprioritized sub-culture of society who knows as little about the "real-world" as normal folks know of the library world. There is a great divide between them, and very little has been done to open up. The blame for this I put squarely on the library world (as the real-world is, well, real and out there) who for many years have demanded a library degree even for software development positions, and when we finally get there we are treated as second-class citizens because we don't have that mark of librarianship that comes from library school. It's a bizarre thing, really, and perhaps the most damaging one you've got, this notion of librarians must have a library degree, as if normal people will never understand the beauty of why a 245 c is needed, or the secret of why shelves must be called stacks, and so on.

One thing that has got me very disillusioned about the library way is philosophy. I deliberately sought out the library as a place to work because I have a few passions mixed with my skills which I thought was a good match, and one of the strongest passions were epistemology. One would think that if there was one institutional string of places that could appreciate the finer details of epistemology, it would be the libraries and the people within. That's what they concern themselves with, no?

Err, no. No, they don't. There's the odd person that ponders how a OCLC number can verify some book's identity, but these are very plain boring questions of database management. Then along came FRBR which does not only dip its toes into epistemology, but outright talks about it! The authors of it clearly had knowledge and wisdom about such things. So, one would think there was hope. Like, when it came out in 1993. That's more than 15 years ago. And people still haven't got it. How much time do you reckon it's going to take, and more importantly, how many years until it's way too late?

But no, RDA comes out of the woodwork and proves once and for all that there is no hope of libraries ever taking the issues at a philosophical nor practical level. Let me explain this one, as it sits at the core of much of my "ranting."

FRBR defines work, expression, manifestation, item, and these are semi-philosophical definitions that we're supposed to attach semantics and knowledge to. There's primarily two ways to do that; define entities of knowledge, or create relationships between entities. (Note these two basic ways of doing knowledge management; entities and relationships, as they spring up in all areas of knowledge representation)

Now, can you without looking stuff up tell me the difference between a work and an expression? Or between manifestation and an item? Sure, we can discuss if this or that thing is an item or something else, back and forth, but is that a good foundation upon to lay all future library philosophy? Because that's just what it is; a philosophical model we use to make sense of the real world. FRBR is confusing, even if it is a great leap forward in epistemological thinking, for example when it comes down to identity management (persistent identifiers for one thing can be expressed through a multitude, like a proxy, which FRBR fails at miserably, for example) it is right there in the centre of it, but a lot of it focuses on the wrong part of it, the part that involves human cognition to make decisions about identity.

Anyway, I guess at this point all I'm trying to say is that there are glimpses of what I'm talking about in the library world, and I was attracted to it, I wanted to dedicate parts of my life to fixing a lot what was broken in the real-world. I came to the library because they are the shining beacon of light in our society.

So, what happened?
Which is why I am interested smarting up about some of these things. Where should one go for a decent but not mind-blowing introduction to the types of things you have described lately?
It's hard to say what will blow your mind, and what will not. But since you're a library type person I'm going to go out on a limb here,and assume you're a smart person. :) So, I'm going to assume that http://en.wikipedia.org/wiki/Epistemology won't blow your mind. So let's assume we're using the definition for "subject" as such ;
  • An area of knowledge, a topic, an area of interest or study
In terms of philosophy we usually expand that definition a bit wider (so it will also include most discourse and literature) but I'll try to keep it simple. First, a question?

"What does it mean that something
is something?"

This is the basic question for identity, that something exists and that we can talk and refer to it. Refering to things is a huge portion of what the library does, not only as an archive, but as a living institution where knowledge is harboured. We're talking about subjects put into systems, about being subject-centric in the way we deal with things. Just like our brains do.

Now, for me there's a few things that have happened the last 20-30 years. The world has become more and more knowledge centric (they've gone from "all knowledge are in books" to "knowledge can be found in many places", and the advent of computers and the internet plays no small part in that), while libraries have become more book specific, more focused on the collection part rather than what the collection actually harbours in terms of knowledge (and I suspect this is because there are no traditional tracks within the library world for technology), probably because it's easier and fits better into budget driven government run institutions.

However, this isn't beneficial to the knowledge management part. Libraries are moving steady towards being archives, but the world wants them to become knowledge specialists. Ouch. And so the libraries will be closed down when they
don't deliver knowledge. Archives is what Google does best, and they're not that bad at harbouring basic knowledge. What hope in hell have you got then?

I'm running out of time right now, but feel free to ask any question and point to any of my wrongs, and laugh at it as well; I need the discourse as much as (I hope) you do. Let me just quickly run through that list with comments and pointers ; [
editors note : this is a list of things I felt the library world 'have no clue about' from my mail to the mailing-list]
  • No idea about digital persistent identification.
What happens to identifiers when people stop maintaining them? They lose their semantic and intrinsic value, and become moot. How many libraries maintain their age old software? No, a more human, less technological means of resolving is needed, and when when the world went digital the choice of multiple identities became not only possible but inevitable. Yet, when the library world manages identities as OCLC / LOC record numbers at the item level, things go horribly wrong and you cannot take what you've defined and learned into the philosophical space. Even if the OCLC / LOC numbers are maintained till the end of the world, they do not solve basic epistemological problems.
  • No subject-centricity.
FRBR does actually provide some, but it is not focused on the epistemological problems, only one of identifying the problem of identification without providing a mechanism (real or philosophical) for doing so.
  • No understanding of semantics in data modeling.
The AARC2 / RDA world is, in some definition of the terms, a data model. And between entities in data models there are semantics, meaning the relationships themselves, their names, roles and thought purpose. But you have to understand, as a human, all of AARC2 / RDA to be able to model anything with it; there's no platform on which to stand, there's no atomic parts you can use to build molecules and then cells and then beings. The whole model is, in fact, a hobbled-together set of fields without structure (and no, numbering them is not a structure :), and without structure there's only rules. And rules without structure is only human-enforceable.
  • No clue about ontologies, inferencing, guides by analogy
This is a stab at what the Semantic Web people are doing. They have a long background from AI and knowledge management, and if you guys were at least on par with that group, there could be some better understanding of the issues. The SemWeb crowd understand a lot of first-order logic, inferencing, analogy, case-based reasoning, and so forth, all stuff you need to have computers understand a tad bit better how your data is hobbled together, how they all interact, how entities and relationships (remember those? :) are mapped.

I should of course make a note here that I think that the SemWeb efforts are mostly wrong, and that they could learn an awful lot from librarians in the way to deal with collections and access, but that's a different discourse for some other time. :)
  • no real knowledge about collection management ( ... wait for it ...) with multiple hooks and identities
I was actually hoping people would jump on this one, getting offended that I said they had no real knowledge of collection management (which is their forte, it is what they do!), but I guess either they saw the hook and line of *identities*, and jumped over it. Dang.

It's all about the identity of what you are collecting. Crikey, publishers haven't even got ISBN to work (how many times to I put in one ISBN to get a completely different book ...), and one would think that would provide hints to why this is hard, and perhaps what to do otherwise. Hmm.

-- end of mail except some more personal ramblings not fit for generic consumption --

Labels: ,