6 April 2010

Before I write what I write before the next time I write

It seems my last poll revealed that there are still people in the library world who hasn't rejected me, or, perhaps a stronger theory, likes to watch road accidents. So my next piece is being written about why the library world fails so badly at technology and seeing the future (or even their own relevance to it), but I'm somewhat busy these days with real work, so a few more days, ok?

However, all is not lost. I've got a few things to say about, well, the stuff I work with, that bucket I stick my head in every day to see if the crap I put in it yesterday has turned to gold yet. No luck so far.

There's a peculiar discussion going on in the Semantic Web mailing-list at the W3C, of which Bernard Vatant will fill you in. It's funny to watch; where are the success stories, where is the commercial viability, does it even work in academia, has it got traction, what do we do now? Why aren't more people doing it? Why haven't the world adopted this specific and undoubtedly brilliant world-view yet? Are we all mad!?

I'm sure you can fill in with our own Topic Maps echo here, but the more you dig, the more you discover that most of the sillies put up as a reason or a scapegoat for the lack of world dominance are things that, frankly, the Topic Maps community have figured out long ago, and some of those missing features in their world is a dominant feature in ours. And we haven't taken over the world, either. Bummer.

It's frustrating, I know, but what can we do? There's no amount of technology suave that can beat any status quo that feeds upon itself. No new ideas can beat old ones that seem to work, because, well, the definition of "works" is so multi-faceted and complex and, eh, making lots of money for lots of people. Semantic Web and Topic Maps doesn't make lots of many. Heck, they don't make money, period. They're convenient little technologies that will stay small and insignificant.

I have a plan, though, and it will piss off some of the Topic Maps purist (or, let's face it, even pragmatists) and hopefully some Semantic Web people as well. First, I'll rename it something cool - maybe something like NoSQL or something - and then rename the integral concepts, strip away the jargon, and make it web-friendly by injecting it straight into HTML5 based technology, and relate all queries through SQL. Mwuahaha, I might even throw some REST API's in there, just to stir it up some more. And I shall call it ; the web.

Man, I hate these technical wars over standards and ways of doing things. The thing I love about Topic Maps isn't the standard or the specs. No, it's the thinking I'm forced to do in rejecting some parts, while loving others. It's what I take from it. It's the epiphanies it yields.

NoSQL? Semantic Web? Topic Maps? SQL? They're all just abstract interfaces into a set of memory positions shaped by various registers, stacks and pops. Standardizing our ways is just a step on the ladder of the future, not a platform upon which we have to stand firm.

Anyway, the whole NoSQL thing is something I'll have to write about more later. Right now dinner and kids and cleaning the house beckons.

Labels: , , ,

19 March 2010

Newbie tips for Topic Maps bliss

Well, hi there, pilgrim! So, you've noticed this fandangled thing called "Topic Maps", and you heard it was an interesting, smart or new way of solving hard problems of some sort? Well, you've come to the right place. Let me, as an elder of the movement, give you a few hints and tips of how to go about this complex notion ;

1. Don't do it

Yeah, I won't lie to you; unless you know more than a smidgen about information science and / or knowledge management, and especially unless you know quite a bit about data models and how to interact with them in complex computer systems, I'd urge you to stay right clear of it. Topic Maps is full of complex models, silly jargon, weird people and technical API's. Nothing in the normal world is easy, and the Topic Maps world only makes it harder. Topic Maps won't solve your problem, unless you already know how to solve it, at which point find some other technology that people actually know, ok? What's the point of building the best system out there with amazing technology that no one knows how to use, extend, appreciate or even keep a straight face while talking about it.

Also, at what technical level do you think you need it? If the answer is, not very technical at all, why care about the underlying technology? An Excel spreadsheet might fix your problem much better. Use that. Unless you know that multi-dimensional graph-based technology will save you, look another way. If you don't know this stuff it will lure you in with magical wistful promises of a better tomorrow that will never see the time of day.

2. Make someone else do it if you must

Ok, so this one isn't all that different from the first tip, but since Topic Maps indeed can solve hard problems in brilliant ways, there are people out there who could use it and help you solve them. The truth is that people who are steeped in this stuff, who knows it inside out, indeed can solve pretty much any complex issue you might have with it, and even help you become smarter in doing it, even teach you how it all works. And there's other benefits to letting them do it for you; you don't have to become one of them in the process.

The truth is that Topic Maps really is cool and brilliant and all that, but it is ridiculously hard to grasp and even harder to master, and it will change you into a weirdo in the process. Have you really got the time,  resources and personality traits it takes to get into this stuff? Really?

3. Topic Maps people are few and rare, and, uh, strange

I kid you not; these guys are not your average cup of tea, so tread gently, and expect to be surprised in some way or another. Expect them to say things that makes no sense whatsoever, they have their own language littered with technical jargon even technologists wouldn't understand. I don't think they get out much, at least not outside their own field, so you need to replace your own terminology for theirs if they are to made sense of, as they themselves rarely compromise and adapt to how the rest of the world see things.

Also, expect some slight geeky behavior like mistaking things like pages, tags, websites, business objectives, servers, networks, computers and pasta for topics, topics, topics, topics, topics, topics and topics, in that order. It's quite similar to that of Asberger's syndrome, but if you know how, you can use it to your advantage, but caution and patience must be urged, and just like with the real thing unfortunately there is no cure, only workarounds.

4. Patience is not a virtue, but stubbornness might be

Look, the mystical world of Topic Maps is full of concepts you never dreamed existed, things that makes you ask fundamental questions about identity philosophy, what is knowledge, and paradigms of models and technological culture. Patience is not enough to grasp this stuff, you need sheer stubbornness and bloodymindedness to get anywhere, and - dare I say it? - perhaps some weird personally trait. Maybe a limp or a monocle. Not only do you have to understand the technicality of the stuff, but also the weirdness of the culture itself and - perhaps even more important - the personal implications this knowledge might have upon your own thought processes. You may end up getting a cape.

Once you tread down the path of Topic Maps and actually get anywhere (and that in itself is a hallmark of your stubbornness), your brain will change, you will see things differently. I'm not going to say that that is a good thing, but it can be, especially if you like uprooting your preconceived notions and planting new ones. The world is built on foundations which are far removed from the Topic Maps world, but once you grasp this other world it is hard not to see your old world in a new light, and this can be challenging. You might even start to sound like one of these blubbering idiots yourself, saying topics, topics, topics, topics, topics when you used to speak a coherent language people around you actually understood. There's great danger in getting an epiphany or two.

5. If you like your job, stay clear

Hmm, I see a pattern in my tips, but the thing is that once you have converted your old job might look boring and infantile by comparison. You might get (morally repulsive) urges to work on Topic Maps, but your organisation probably won't understand what the hell you're on about (remember, the whole painful stubborn process you went through has to happen to each and every person in your whole organisation!), and you might be looking around for another job where the Topic Maps goodness is practiced. Don't be fooled!

These jobs don't really exist. No one thinks Topic Maps on your CV is a good thing, because they, too, haven't done that stupid painful stubborn path to enlightenment either, and you'll come across as a bit of a show-off with nothing to show for it. (The exception to this tip is if you live in Norway. If you want to know why, the answer is that, again, the Topic Maps culture is repellingly weird) No one who does real business gives a rats ass about Topic Maps, and no one who does real business in the future will either. Even people who does weird but similar things who also have modest success (like people doing Semantic Web / RDF work) shun Topic Mappers. For your own job security, stay clear.

6. However ...

However, if you are weird, not scared by overly complex or outlandish technologies, if you think that strange new cultures only makes you stronger (and you've got a strong immune system to boot), if you think job security is only the stuff of boring people, and, indeed, if you have a monocle, cape and a glass eye, perhaps this might be the place for you after all.

And if so, contact me; I get off on this stuff. Otherwise, you have been warned.

Labels:

4 February 2010

Topic Maps, 10 years down the line

I'm told, by way of my own imagination based on loose rumors put out by flying pink fairies, that Topic Maps is a waning technology, poorly supported by the IT industry at large, hard to wrap your head around, and generally icky to deal with.

All of this is, unfortunately, true.

But, as in all stories told by only one side, there is an other side just waiting to come out into the light, just one day, real soon now. This day may never come, but here is my own little attempt to shed some light on a few of the issues with the Topic Maps world. It was about 10 years ago I first got a whiff of Topic Maps, so my first post in 2010 seems fitting to take some Topic Maps rumors, loose observations and vague statements, and make some comments along the way. Here we go ;

1. Topic Maps are hard

Why, yes, to a commoner or some person with a somewhat traditional approach to computing, Topic Maps can indeed seem like an alien concept at first. The first time I started reading up on it I was mesmerized and frightened at the same time, wondering where the magic would bring me and just how painful it would be for me when reality would kick in (and me) ; there were new notions and concept, new words, new paradigms everywhere! Reification, role types, associations, occurrences, occurrence type, typified information, subjects and topics, ontologies (upper, lower, specialized ones) the list goes on. It is terrifying indeed, and for many, many people they are so terrifying that SQL and C# and .Net and C and PHP seems like a comforting auntie lulling you back into things we know and know well, no hard thinking required (just lots of hair to pull out).

Until you realize a few things, that is. For example, the vocabulary is anchored in information science, and with a bit of research or learning it shouldn't take that long to get familiar with it. Even the complex issues of reification and ontologies after some time will be as normal and self-explainable as second-cousins and language. (And yes, there is a correlation between the examples given! See if you can find them!) And perhaps more importantly, the problems you can solve with Topic Maps can completely and utterly eradicate the major problems those traditional methods give us, one of the biggest bug-bears that I'd ever had! (Anyone wish to offer me a book deal on how to solve most of the main IT development problems in seriously interesting ways? :)

Can I just mention that having an small epiphany about Topic Maps have the effect of you never returning to the real world and look at it the same way, ever again? I have never met a person who got Topic Maps return to the old ways, at least not without making huge compromises. Getting it will change you in good ways, and is most definitely worth the effort despite the pain.

Tips to newbies: It's not really hard, even if it seems hard. But it requires you to change your mind on some key issues.

2. Topic Maps are poorly supported in the real-world

Oh yes, indeed. If you talk to anyone, any company in your immediate serenity (yes, a tautologically pun) and ask them about their use of Topic Maps, you'd most likely get a blank stare back and a careful "What would we need maps for?"

There's the odd technical-inclined person who might now a toddle about what these fabled Topic Maps are all about, but very, very few people understand what they are, and even less have implemented them into something useful. (The exception to this is, oddly enough, the country of Norway, and some scantily-clad areas of southern Germany) No mainstream software package comes with the stuff wrapped in, no word-processor touts its amazingness, no operating system comes with support for it, and no popular software of any kind use it.

But then, there's the odd system that use it. You'll find it also in the odd Norwegian government portal, which is bizarre in its own right, and perhaps deep down in some academic underfunded project or perhaps some commercial project where parts of the data-model masquerades as it. My old website use it. I have a framework or two. There's the odd other open-source project, a few API's, and a host of other well-meaning but obscure projects that perhaps has got it, albeit well hidden and kept away from children.

For a technology that stands out as something that can fix it all, I find it bizarre that it is found so seldom, but then bizarre is not the same as surprised. And when you look at the "competition", the well-funded, well-marketed, well-established world of the Semantic Web, championed by none other than the W3C and Tim Berners-Lee, well you have to concede that it shouldn't be much of a surprise at all, really. Topic Maps is a tiny group of enthusiasts (a few hundred, being liberal with statistics) who'll saw off their right leg if it meant we could get the specs done in time, while the Semantic World is littered with academia, organisations and companies (we're talking thousands upon thousands of people actively working on it), so no, you should not be surprised.

Tips to newbies: As the saying go, if a million flies eat it ... surely, it has some nutritional value or greater worth over, say, that green grass the cows are dumping it on?

3. Topic Maps is dying and obsolete; use RDF instead

There was a period about 10 years ago which I regard as the Topic Maps time of bloom ; the trees had beautiful flowers on, the pink and purple petals falling over the world of IT like a slow-motion rainfall of beauty. Everywhere you turned there was people talking about it and potential projects popping all the time.

But times went by. Topic Maps was too hard for most (see point 1 and 2), and not just the technical implications themselves and the language and terms used, but also the philosophy of it, the very idea of why we should be using it over, say, any relational database or traditional software stack. I mean, what's the point, really?

The point is easy to miss, admittedly. A technology that can be used for everything is hard to pin down and said to be good for something. And we have focused just too damn much on knowledge management systems, and not only that, but used our own special language in the process which often is quite remote from knowledge management speech in the enterprise arena (but you find it rife in academia). When the world looks to Topic Maps, all they see is a difficult way to do knowledge management. Ugh.

Myself, I'm using Topic Maps in highly non-traditional ways. I use maps for my application (definitions, actions and functionality), for functional topology (generic functionality in hyper-systems based on typification), for business logic (rules, conditions, interactions) and, perhaps just as important, for the actual development itself (modules and plugins, deployment, versioning, services) which makes for a highly (and this "highly" is quite higher than any normally used "highly") customizable and flexible framework for making great semantic applications. But more on the details at some later stage.

Tips for newbies: No, it's not dead nor dying, just not as popular as stuff that's easier or more accessible

4. Topic Maps is nothing new

Well, given its roughly 20 year history (and I'm counting from early days of HyTyme), in Internet years it's an old, old dog, so by that alone we can't say there's anything new, but most people would mean "new" here to mean something like "we've been doing X for years, so why do we need this?", where X usually points to some bit of the Topic Maps paradigm that indeed has been done before. Of course it has. There is nothing new in Topic Maps except, of course, putting it all together and standardize one cohesive and complete way of doing pretty damn most of what you would need for your complex data-model, identity management, semantic or otherwise relational, interoperable information and / or structural need, chucking in knowledge management, too, for good measure.

There are of course nothing new with Topic Maps, except that all that old stuff is bundled into a new thing, if you allow a 20 year old standard to be called "new." But then again, "the standard" is really a family of standards, all evolving and changing with the times. There's always a sub-standard (no pun intended ... well, not a lot of pun intended) in the woodworks, always some half-baked document to explain something or other, always something that is so damn specific and concise that the overall grooviness and funky bits are pushed to the side-lines.

Topic Maps is new and old at the same time, but it really is groovy and funky once you overcome the technical jargon and the concise nature of the standards.

Tips to newbies: The king is dead. Long live the king!

5. The Topic Maps community is, um, a bit tricky

Oh, yes indeed. And this one is the hardest to write about as I'm part of this community and know pretty much everyone, some more than others.

So let's say it this way; I'm a difficult person in certain ways, for example I talk a lot, I overflow with ideas rather than code, I don't care too much about political correctness, and I speak my mind and use language that could alienate people with too strong attachments to their ties or their social buckets.

And the core of the Topic Maps community is loaded with weirdos like me; highly opinionated, rough ideas, hard on woo, and soft on business. But the problem isn't the weirdos, but the low number of them. Any successful community with such a wide-ranging and all-encompassing area of what Topic Maps is all about (which is, uh, almost anything) going from epistemology to identity management to ontology work, well, you need a lot of personalities to match them all to make it seem like a lively place. We, on the other hand, have a handful of people, and the contrast between us all is sometimes just too great. And, I've noticed, we're not very good with newbies, either, so even if we answer their questions, quite often our answers are just too far out there for normal people to comprehend (and I've got a ton of circumstantial and anecdotal evidence to back it up).

I'm part of many different communities on the web, but there is only one champion of how fast an online discussion goes private (and it's not of the good kind; it's the kind where we need to express our frustrations in private [because, ultimately, we're nice people who don't want to offend anyone even when they deserve it, those bastards], lest we blow up and our eyes will bleed!), and that's the community which is located on a private server where you must write to the list owner in an email to be added. *sigh*

I tried my "question of the week" thing on the mailing-list for a while, and some of those went well, but too many of those question quickly descended into nothing or private arenas. So, I'm officially giving up on it for now. Maybe I'll come back stronger once my spine grows back, who knows?

Tips for newbies: Be strong, keep at it, ask for clarification! We don't know just how alien we are. And please join in as we need more weirdos.

6. What, exactly, is Topic Maps, anyways? I don't get it!

Yes, indeed, what exactly is this darn Topic Maps thing? The funny thing is that there is no correct answer to that question. First of all, it's a family of standards that we collectively call "Topic Maps", but it could also mean either the TMDM (Topic Maps Data Model) standard or the XTM (Topic Maps XML exchange format) XML standard, depending on your non-sexual preferences. Some might even go out on a limb (obviously not the limb cut off in point no. 2) and claim that it means the TMRM (Topic Maps Reference Model) which is a more abstract framework, or possibly even just the philosophical direction - or, dare I say it, zeitgeist? - of the thing, like a blueprint for how to build a key-value recursive property framework with identity- and knowledge management system. Your mileage may vary.

But then we have a problem as it is not a technology nor a format. It is more akin to a language, a model or a direction of sorts. No, not a language like SQL (even though the TMQL (Topic Maps Query Language) could be said to hold that place) that is to be parsed by a computer, nor a language like Norwegian or English. No, we're talking about a language that sits right in the middle between the computer and the human, a kind of mediator or translator, a model in which both machine and human can do things that each part understands equally well, a model which is defined through information science, math and human language.

So what is it? It's a language that both computers and humans can use without pulling too much in either direction, a language in the middle that, if spoken by many parties (computers and humans both), they can all join hands and sing beautiful knowledge management songs together, share and propagate with ease. But of course, Topic Maps isn't limited to just knowledge management, oh no. You can solve unsurmountable things with it as you can make it represent whatever you want it to, and I really, truly mean anything. If you want a topic to represent your thing, off you go. It's that flexible.

It can work as the basis for pretty much any system that has structures in it of any kind or shape, and that, by and large, is pretty much any system ever built. So it's actually quite hard to explain just what you can use it for, even though traditionally it's content management, portals and knowledge management.

Tips to newbies: It's only a model ...

So there you go, a quick summary of bits and bobs about Topic Maps. In my next installment, I'll summarize my naval fluff collection, next the train-table changes of Minnamurra station of the last 10 years, and finally I thought I'd summarize all the redundant technology that's gathering dust in my garage. Stay tuned for exciting times ahead!

Labels:

15 October 2009

Ontological Ponderings

The last few months have been interesting for me in a philosophical sense. My job is on an architectural level in using ontologies in software development, both in the process (development, deployment, documentation), the infra-structure (SOA, servers, clusters) and the end result of it (business applications). So needless to say, I've been going a bit epistemental, so I promised myself yesterday to jot down my thoughts and worries, if for no other reason than for future reference.

One big thing that seems to go through my ponderings like a theme, is the linguistic flow of the definition language itself, in how the mode of definition changes the relative inference of the results of using that ontology over static data (not to mention how it gets even trickier with dynamic data). We usually say that the two main ontological expressions (is_a, has_a) of most triplets (I use the example of triplets / RDF as they are the most common ones, although I use Topic Maps association statements myself) defines a flat world from which we further classify the round world. But how do we do this? We make up statements like this ;

Alex is_a Person
Alex has_a Son

Anyone who works in this field understand what's going on, and that things like "Alex" and "Person" and "Son" are entities, and defined with URIs, so actually they become ;

https://shelter.nu/me.html is_a http://psi.ontopedia.net/Person
https://shelter.nu/me.html has_a http://en.wikipedia.org/wiki/Son

Well, in RDF they do. In Topic Maps we have these as subject identifiers, but pretty much the same deal (except some subtleties I won't go into here). But our work is not done. Even those ontological expressions have their URIs as well, giving us ;

https://shelter.nu/me.html https://shelter.nu/psi/is_a http://psi.ontopedia.net/Person
https://shelter.nu/me.html https://shelter.nu/psi/has_a http://en.wikipedia.org/wiki/Son

Right, so now we got triplets of URIs we can do inferencing over. But there's a few snags. Firstly, a tuple like this is nothing but a set of properties for a non-virtual property and does not function like a proxy (like for instance the Topic Maps Reference Model do), and in transforming between these two forms gives us a lot of ambiguity that quickly becomes a bit of a problem if you're not careful (it can completely render inferencing useless, which is kinda sucky). Now given that most ontological expressions are defined by people, things can get hairy even quicker. People are funny that way.

So I've been thinking about the implications of more ambiguous statement definitions, so instead of saying is_a, what about was_a, will_be_a, can_be_a, is_a_kindof_a? What are the ontological implications of playing around with the language itself like this? It's just another property, and as such will create a different inferred result, but that's the easy answer. The hard answer lies between a formal definition language and the language in which I'm writing this blog post.

We tend to define that "this is_a that", this being the focal point from which our definition flows. So, instead of listing all Persons of the world, we list this one thing who is a Person, and moves on to the next. And for practical reasons, that's the way it must be, especially considering the scope of the Semantic Web itself. But what if this creates bias we do not want?

Alex is_a Person, for sure, but at some point I shall die, and then I change from is_a to a was_a. What implications will this, if any, have on things? Should is_a and was_a be synonyms, antonyms, allegoric of, or projection through? Do we need special ontologies that deal with discrepancies over time, a clean-up mechanism that alters data and sub-sequentially changes queries and results? Because it's one thing to define and use data as is, another completely to deal with an ever changing world, and I see most - if not all - ontology work break when faced with a changing world.

I think I've decided to go with a kind_of ontology (and ontology where there is no defined truth, only an inferred kind-system), for no other reason that it makes cognitive sense to me and hopefully to other people who will be using the ontologies. This resonates with me especially these days as I'm sick on the distinction people make between language and society, that the two are different. They are not. Our languages are just like music; with the ebb and flow, drama and silence that makes words mean different things. By adding the ambiguity of "kind of" instead of truth statements I'm hoping to add a bit of semiotics to the mix.

But I know it won't fix any real problems, because the problem is that we are human, and as humans we're very good at reading between the lines, at being vague, clever with words, and don't need our information to be true in order to live with it. Computers suck at all these things.

This is where I'm having a semi-crisis of belief, where I'm not sure that epistemological thinking will ever get past the stage of basic tinkering with identity in which we create a false world of digital identities to make up for any real identity of things. I'm not sure how we can properly create proxies of identity in a meaningful way, nor in a practical way. If you're with me so far, the problem is that we need to give special attention to every context, something machines simply aren't capable of doing. Even the most kick-ass inferencing machines breaks down under epistemological pressure, and it's starting to bug me. Well, bug me in a philosophical kind of way. (As for mere software development and such, we can get away with a lot of murder)

I'm currently looking into how we can replicate the warm, fuzzy impreciseness of human thinking through cumulative histograms over ontological expressions. I'm hoping that there is a way to create small blobs of "thinking" programs (small software programs or, probably more correctly, script languages) that can work over ontological expressions without the use of formal logic at all (first-order logic, go to hell!) that can be shared, that can learn what data can and can't be trusted to have some truthiness. Here's to hoping.

The next issue is directional linguistics, in how the vectors of knowledge is defined. There's things of importance to what order you gain your knowledge, just like there's great importance in how you sort it. This is mostly ignored, and the data is treated as it's found and entered. I'm not happy with that state of things at all, and I know that if I was taught about axioms before I got sick of math, my understanding of axiomatic value systems would be quite different. Not because I can't sit down now and figure it out, but because I've built a foundation which is hard to re-learn when wrong, hard to break free from. Any foundation sucks in that way, even our brains work this way, making it very hard to un-learn and re-train your brain. Ontological systems are no different; they build up a belief-system which may prove to be wrong further down the line, and I doubt these systems know how to deal with that, nor do the people who use such systems. I'm not happy.

Change is the key to all this, and I don't see many systems designed to cope with change. Well, small changes, for sure, but big, walloping changes? Changes in the fundamentals? Nope, not so much.

We humans can actually deal with humongous change pretty well, even though it may be a painful process to go through. Death, devastation, sickness and other large changes we adapt to. There's the saying, "when you've lost everything, there's nothing more to lose and everything to gain", and it holds remarkably true for the human adventure on this planet (look it up; the Earth is not really all that glad to have us around). But our computer systems can't deal with a CRC failure, little less a hard-drive crash just before tax-time.

There's something about the foundations of our computer systems that are terribly rigid. Now, of course, them being based on bits and bytes and hard-core logic, there's not too much you can do about the underlying stuff (apart from creating quantum machines; they're pretty awesome, and can alter the way we compute far more than the mere efficeny claims tell us) to make it more human. But we can put human genius on top of it. Heck, the ontological paradigm is one such important step in the right direction, but as long as the ontologies are defined in first-order logic and truth-statements, it is not going to work. It's going to break. It's going to suck.

Ok, enough for now. I'm heading for Canberra over the weekend, so see you on the other side, for my next ponder.

Labels: , ,

29 September 2009

Library Pontifications

Once in a while I get some email from people who ask me some questions or ask me to clarify something I've said in some setting. The other day I ranted on the NGC4LIB (Next-generation catalog 4 libraries) mailing-list about, uh, something or other. And I got email, which I answered, but since I got no reply I'm posting it here in a blog-edited form so that it doesn't go to waste ;
I think I am starting to understand your rants against the culture of MARC, and I'd probably feel offended if I knew what all of the above meant.
Hmm. Well, it wasn't meant to offend anyone. I guess if people thought they were hardcore into persistent identity management, then maybe they would feel I've either overlooked their hard work or don't think what they're doing is the right kind, or something.

I usually have two goals with my "rants"; 1. flush out those who already are on the right track, and make them more vocal and visible, and 2. if no one is on the right track, inspire people in the library world to at least have a look at it. I can do this because I have no vested interest in the library world as such; I cannot lose my library job as I'm not working for a library. :)
Naturally, to feel outside of the mainstream creates a crisis of confidence in one's abilities. What does it mean these days to say that one is a cataloger or that one works in tech services, and is it perceived as a joke for those on the outside? Oh yeah...they still produce cards. What do they know about databases?
Librarians are from the outside an incredible gifted bunch of people who knows what they're doing, they have granted powers outside the realm of normal people (including professionals like software developers, believe it or not), and they know stuff we normal folks don't.

However, having been on the inside you get to glimpse the reality of an underfunded, underprioritized sub-culture of society who knows as little about the "real-world" as normal folks know of the library world. There is a great divide between them, and very little has been done to open up. The blame for this I put squarely on the library world (as the real-world is, well, real and out there) who for many years have demanded a library degree even for software development positions, and when we finally get there we are treated as second-class citizens because we don't have that mark of librarianship that comes from library school. It's a bizarre thing, really, and perhaps the most damaging one you've got, this notion of librarians must have a library degree, as if normal people will never understand the beauty of why a 245 c is needed, or the secret of why shelves must be called stacks, and so on.

One thing that has got me very disillusioned about the library way is philosophy. I deliberately sought out the library as a place to work because I have a few passions mixed with my skills which I thought was a good match, and one of the strongest passions were epistemology. One would think that if there was one institutional string of places that could appreciate the finer details of epistemology, it would be the libraries and the people within. That's what they concern themselves with, no?

Err, no. No, they don't. There's the odd person that ponders how a OCLC number can verify some book's identity, but these are very plain boring questions of database management. Then along came FRBR which does not only dip its toes into epistemology, but outright talks about it! The authors of it clearly had knowledge and wisdom about such things. So, one would think there was hope. Like, when it came out in 1993. That's more than 15 years ago. And people still haven't got it. How much time do you reckon it's going to take, and more importantly, how many years until it's way too late?

But no, RDA comes out of the woodwork and proves once and for all that there is no hope of libraries ever taking the issues at a philosophical nor practical level. Let me explain this one, as it sits at the core of much of my "ranting."

FRBR defines work, expression, manifestation, item, and these are semi-philosophical definitions that we're supposed to attach semantics and knowledge to. There's primarily two ways to do that; define entities of knowledge, or create relationships between entities. (Note these two basic ways of doing knowledge management; entities and relationships, as they spring up in all areas of knowledge representation)

Now, can you without looking stuff up tell me the difference between a work and an expression? Or between manifestation and an item? Sure, we can discuss if this or that thing is an item or something else, back and forth, but is that a good foundation upon to lay all future library philosophy? Because that's just what it is; a philosophical model we use to make sense of the real world. FRBR is confusing, even if it is a great leap forward in epistemological thinking, for example when it comes down to identity management (persistent identifiers for one thing can be expressed through a multitude, like a proxy, which FRBR fails at miserably, for example) it is right there in the centre of it, but a lot of it focuses on the wrong part of it, the part that involves human cognition to make decisions about identity.

Anyway, I guess at this point all I'm trying to say is that there are glimpses of what I'm talking about in the library world, and I was attracted to it, I wanted to dedicate parts of my life to fixing a lot what was broken in the real-world. I came to the library because they are the shining beacon of light in our society.

So, what happened?
Which is why I am interested smarting up about some of these things. Where should one go for a decent but not mind-blowing introduction to the types of things you have described lately?
It's hard to say what will blow your mind, and what will not. But since you're a library type person I'm going to go out on a limb here,and assume you're a smart person. :) So, I'm going to assume that http://en.wikipedia.org/wiki/Epistemology won't blow your mind. So let's assume we're using the definition for "subject" as such ;
  • An area of knowledge, a topic, an area of interest or study
In terms of philosophy we usually expand that definition a bit wider (so it will also include most discourse and literature) but I'll try to keep it simple. First, a question?

"What does it mean that something
is something?"

This is the basic question for identity, that something exists and that we can talk and refer to it. Refering to things is a huge portion of what the library does, not only as an archive, but as a living institution where knowledge is harboured. We're talking about subjects put into systems, about being subject-centric in the way we deal with things. Just like our brains do.

Now, for me there's a few things that have happened the last 20-30 years. The world has become more and more knowledge centric (they've gone from "all knowledge are in books" to "knowledge can be found in many places", and the advent of computers and the internet plays no small part in that), while libraries have become more book specific, more focused on the collection part rather than what the collection actually harbours in terms of knowledge (and I suspect this is because there are no traditional tracks within the library world for technology), probably because it's easier and fits better into budget driven government run institutions.

However, this isn't beneficial to the knowledge management part. Libraries are moving steady towards being archives, but the world wants them to become knowledge specialists. Ouch. And so the libraries will be closed down when they
don't deliver knowledge. Archives is what Google does best, and they're not that bad at harbouring basic knowledge. What hope in hell have you got then?

I'm running out of time right now, but feel free to ask any question and point to any of my wrongs, and laugh at it as well; I need the discourse as much as (I hope) you do. Let me just quickly run through that list with comments and pointers ; [
editors note : this is a list of things I felt the library world 'have no clue about' from my mail to the mailing-list]
  • No idea about digital persistent identification.
What happens to identifiers when people stop maintaining them? They lose their semantic and intrinsic value, and become moot. How many libraries maintain their age old software? No, a more human, less technological means of resolving is needed, and when when the world went digital the choice of multiple identities became not only possible but inevitable. Yet, when the library world manages identities as OCLC / LOC record numbers at the item level, things go horribly wrong and you cannot take what you've defined and learned into the philosophical space. Even if the OCLC / LOC numbers are maintained till the end of the world, they do not solve basic epistemological problems.
  • No subject-centricity.
FRBR does actually provide some, but it is not focused on the epistemological problems, only one of identifying the problem of identification without providing a mechanism (real or philosophical) for doing so.
  • No understanding of semantics in data modeling.
The AARC2 / RDA world is, in some definition of the terms, a data model. And between entities in data models there are semantics, meaning the relationships themselves, their names, roles and thought purpose. But you have to understand, as a human, all of AARC2 / RDA to be able to model anything with it; there's no platform on which to stand, there's no atomic parts you can use to build molecules and then cells and then beings. The whole model is, in fact, a hobbled-together set of fields without structure (and no, numbering them is not a structure :), and without structure there's only rules. And rules without structure is only human-enforceable.
  • No clue about ontologies, inferencing, guides by analogy
This is a stab at what the Semantic Web people are doing. They have a long background from AI and knowledge management, and if you guys were at least on par with that group, there could be some better understanding of the issues. The SemWeb crowd understand a lot of first-order logic, inferencing, analogy, case-based reasoning, and so forth, all stuff you need to have computers understand a tad bit better how your data is hobbled together, how they all interact, how entities and relationships (remember those? :) are mapped.

I should of course make a note here that I think that the SemWeb efforts are mostly wrong, and that they could learn an awful lot from librarians in the way to deal with collections and access, but that's a different discourse for some other time. :)
  • no real knowledge about collection management ( ... wait for it ...) with multiple hooks and identities
I was actually hoping people would jump on this one, getting offended that I said they had no real knowledge of collection management (which is their forte, it is what they do!), but I guess either they saw the hook and line of *identities*, and jumped over it. Dang.

It's all about the identity of what you are collecting. Crikey, publishers haven't even got ISBN to work (how many times to I put in one ISBN to get a completely different book ...), and one would think that would provide hints to why this is hard, and perhaps what to do otherwise. Hmm.

-- end of mail except some more personal ramblings not fit for generic consumption --

Labels: ,

24 August 2009

What event model ontology?

Hmm, it seems that no one has blogged, tweeted or mentioned my blog post in my last plea, which I'm quite disappointed with. However, I'll chalk this one down to the complexity of what I'm trying to accomplish, and my failed attempt at explaining what it is.

In the mean time I've been working at it, converging various models from all sorts of weird places (anything from WebServices and SOAP stacks, to operating systems like Linux, to event models in Java and .Net, to more conceptual stuff in the Semantic Web world), but boy, you can tell that we live in a world shaped by iterative imperative paradigms of approaching the software world.

One thing I learned quite early was declarative and functional programming, introduced to me, of all places, with using XSLT many years ago. It may not be the most obvious place to find it, and this is one of those hidden gems of the language which still doesn't enjoy too much of a following. And no wonder; people come into it from the imperative stuff that dominates the world, polluting us all with filthy thoughts of changing variables (at least in Scala you can choose between var and val), functions that aren't truly functional, and the classical idea in object-oriented programming of a taxonomical structure that doesn't hold up to scrutiny.

Let me clarify that last point. Wht are we doing this stuff? Why are we creating computer programs?

To solve problems. And who are we solving problems for? For humans. It's the classical example (albeit extrapolated) of garbage in, garbage out. I've talked about this in the past a lot, about the constant translation that happens between huna and machine, and how we are creating translation models in both worlds in order to "move forward" and solve problems better. But this excercise becomes increasingly harder as our legacy grows, so trying to teach functional programming to people who don't understand certain basic principles of Lambda Calculus is going to be hard, just like it's hard to teach Topic Maps to people who live in a SQL world. Or like it's hard to teach auto-generating user-interfaces to a user-interface developer.

These are usually called paradigm shifts, where some important part of your existing world is totally changed as you learn some other even more important knowledge. You must shift your thinking from one way to a rather different other. And this is hard. Patterns of knowledge in your brain is maintained by traversing certain paths often, and as such strengthening that path (following the pattern that an often travelled path must be the right path). But if the path is wrong, there's some pretty strong paths you need to unlearn. Damn, that is hard! Which is why I urge you to try it out.

I'm currently using Topic Maps, human behaviour driven ontologies for auto-generating applications and user-interfaces over functional complete models of both virtual and concrete human domains, all with temporality and continous change as the central paradigms. Yeah, pretty hefty stuff, and I've spent years trying to unlearn stuff I learnt in the years before that. And those years were unlearning some other stuff before that. My whole life has been one huge unlearning experience, and I don't think any other way conceptually grasps the beauty of life better; nature and life both are in perpetual change. Needless to say, I'm enjoying every single crazy second of it!

But back to my event model ontology. I've learned one important thing in all this; Sowa has suggested a shift from logical inference to analogy, and this coupled with the OODA loop can create an intriguing platform for knowledge management and eco-system forsoftware applications. I'll let you know more as things progress from here. I'm excited!

And as always, I'd love to hear your comments on all of this. I beg you. Again. :)

Labels: , , , ,

5 August 2009

Can I ask you a favour? (Does social media actually work?)

Hi everybody. Could I ask you a favour? I'm not getting much response to my quest for a unified software architecture ontology, so could I humbly ask you to blog, tag, link or otherwise gossip about my previous post on the matter? I would really appreciate it, and I promise I'll share my findings with you all.

(My subtitle "Does social media actually work?" is a blatant attempt to get circulation going by mocking the whole debacle which I try to, ahem, you know, promote. Thanks.)

Labels: , ,

15 May 2009

Spilling a few beans

I think enough time has passed, don't you? I've been hinting to what I'm up to these days, but I've been rather careful about spilling the beans, I guess because, well, it's a brand new adventure and every storyteller should get their story together well before writing it down. I'm keen to talk about this stuff, though, because it is wickedly cool and I'm keen to not only do it, but to talk about it and involve more people in it as well.

As you probably saw from my last post I'm currently in India, and yes, my new employer is an Indian company, but I work from home (in gorgeous Kiama, Australia, 1.5 hours south of Sydney) and travel to India every so often (4-5 times a year as a rough guide). We work over the Internet, including video conferencing and remote controlling and the like, and as such is a new interesting challenge for me to be somewhat isolated from the smiles and sideways nods and the tacit knowledge floating down the hallways of our headquarters in Mumbai. I've got plenty of ideas of how to deal with that, so we'll see how it goes.

My company is Free Systems Technology Labs, a nifty medium-sized IT development company with main offices in Mumbai (from where I'm writing this) and most R&D and development in Bangalore (where I've been the last week), which is a daughter-company of another company mostly known for more hardware orientated stuff, like computer building, server hosting and various gadgets, but they only have a number of software outlets as well. I'll be working with anything from planning to execution, and mostly in the domain of Topic Maps. Yes, the very thing I've been talking about for the last 9 years is now going to be my main concern as opposed to secondary or third (or some periods not at all) at the whims of other jobs, and I can't even begin to tell you how excited the prospect of that is to me; I believe in the ideals and practice of Topic Maps so strongly, and it's going to be good for my soul to pour it into something as cool as what we're going to do. (More on that later) The guys here also happen to share many of my own ideals (open-source, development methods, goals, community and societal building, and so much more), and they've been spoiling me. I'll miss the tea, that's for sure.

I became part of this through a weird mix of happenstance, but mostly because the people involved here have been, put simply, a fantastic bunch, in terms of technical brilliance, sincerity and honesty, and in convincing me that I should join (they obviously think I'm good for something :). I've been with the company now almost three months where the first two months are more like a warm-up, but it's been a very good ride so far.

But I need to talk about something that's been on my mind ever since they got in touch with me last year, and that's prejudice. The world is full of it, and I entered this adventure with a slight degree of scepticism. No, not the bad kind, but a certain carefulness, because, you know, they're Indians, and Indians got their mouth full of rice, and you're not getting any! (A joke I got from an Indian friend, so that makes it alright, yeah? :) Not only did they have to convince me, but also my wife. "Honey, how about I drop my great-paying safe cushy job in one of the richest countries in the world, and rather work for strangers from a strange land full of poverty and strong smells and interesting hairdoos, and do it over internet?" Yeah, she was keen, as you can imagine.

You can't work for Indians! They are supposed to work for us!

Sure. But they kept talking with me, flew me to Belgium (they own half of a company there) and were not only completely honest with me but simply blew me away with their knowledge, seriousness, and most importantly their friendliness and openness. Me and the wife thought long and hard about it (probably longer and harder than my company wanted me to :), and here we are.

Everything I knew about India was either heavily adjusted, or simply wrong, but I've seriously enjoyed being corrected. I've embraced everything that's been thrown at me, including very hot food, weird drinks, amazingly crazy traffic and the sweltering heat, the chaos, the smells, the meetings and the way they interact, the attitudes and the values. I think the tagline "Incredible India" is truer than they think.

Ok, that's enough for a first intro, now I have to get to bed. I'm flying home tomorrow and I'm looking forward to seeing the wife and kids again (Lilje just won an award for her art at school, so I'm mighty proud as well), and we'll be spending the weekend together, and on sunday celebrate Norways national day in Sydney.

And then, a little bit later, I'll tell you about the wickedly cool stuff we're going to do with Topic Maps.

Labels: , ,

9 May 2009

Where in the world is Alexander?

Short answer; Bangalore, India.

Longer answer; my new employer which I started with a couple of months back is an Indian company with strong ties to back-end systems and support, hardware manufacture and design, and software services. I'll tell more as things progress, and I'll probably talk a lot more about how they plan to use Topic Maps to solve some really crazy and hard problems. But before I do those kind of detailed stuff, I wanted to just quickly show you this picture which pretty much summarises my first impression of this crazy, lively, contrasting, weird, interesting place, and if you can't read the sign, it says "Follow traffic rules." I realise that in India, if you ask kindly, they just might do what you ask, but riding as a passenger in a car through this traffic was, err, an experience I won't forget anytime soon. However, it's interesting that in a language such as my own (English, or Norwegian, or Swedish, or Danish) we base our expression mostly on words alone, while in India the reason traffic works is that they've got such a strong foothold in semiotics that makes it work. A honk here, two honks there as we pass a car, a blink of our beam lights racing past a "moto" (small scooter that's kinda rebuilt as a tiny car) ... I still have much to learn about this language. The cool thing is that it's global; even I can do it. Except I would never drive here. Never. Ever.

Anyway, I'm in India to train staff and meet and plan with them in all things black magic and drink their excellent Indian tea and eat their amazing food, and generally get a feel for the country, the culture, and most importantly, the people I'm working with, which so far has turned out to be a fantastic bunch. I'm here for another week or so, and I'll suss out the details and let you know all about it in due time. Until then, there's a chapati drenched in yummy chutney with my name on it. India is, truly, an amazing place.

P.S. Hey Barta, where can I get my sweaty hands on your TM as a filesystem? Would love a play with it right about now. Oh, and that near NLP query stuff you mentioned that one time in the back-alley while drinking gin and discussing the meaning of wife. Or life. Or whatever.

Labels: ,

20 March 2009

Ressurection : xSiteable Framework

I've just started in my new job (yes, more on that later, I know, I know) and was flipping through a lot of my old projects over the years, big and small, and I was looking for some old Information Architecture / prototyping tool / website generator application I made with some help from IA superstar Donna Spencer (nee Maurer) back when I lived in Canberra, Australia.

I found three generations of the xSiteable project. Generation 1 is the one a lot of people have found online and used, the XSLT framework for generating Topic Maps based websites. I meant to continue with generation 2, the xSiteable Publishing Framework (which runs the Topic Maps-based National Treasures website for the National Library of Australia) but never got around to polishing it enough for publication, and before I came to my senses I was way into developing generation 3, which I now call the xSiteable Framework (which sports a full REST stack, Topic Maps. And yes, I'm still too lazy to polish it enough for publication (which includes writing tons of documentation), at least as of now, but I showed this latest incarnation to a friend lately, and he said I had to write something about it. Well, specifically how my object model is set up, because it's quite different from the normal way to deal with OO paradigms.

First of all, PHP is dynamic, and has some cool "magic" functions in the OO model which one can use for funky stuff. Instead of just extending the normal stuff with some extras I've gone and embraced it completely, and changed my programming paradigms and conventions at the same time. Let's just jump in with some example code;
// Check (and fetch) all users with a given email
$usercheck = $this->database->users->_find ( 'email', 'lucky@bingo.com' ) ;
Tables are are contextually defined in databases, so $this->database->users points directly to the 'users' table in the database. (Well, they're not really table names, but for this example it works that way) The framework checks all levels of granularity, and will always return FALSE or the part-object of which you want, so for example ;
// Get the domain of a users email address
$domain = $this->database->users->ajohanne->email->__after ( '@' ) ;
Again, it's like a tree-structure of data, a stream of granularity to get in and out of the data. This does require you to know the schema (and change the code if you change the schema), but apart from that, in a stable environment, this really is helpfull (it's also cached, so it's really fast, too).

You might also have noticed ... users->ajohanne->email .... Where did that "ajohanne" bit come from? Well, as things are designed, again the framework will try to find stuff that isn't already found, so "ajohanne" it will automatically look up in designated fields. All objects that extend the framework have two very important fields, one being the integer primary identifier, the second one the qualified unique name (so not a normal name as such, but a most often a computer generated one that isn't normally a number. Often systems will use things like a username, say, as a qualified name, and hence "ajohanne" was my username in one such system). Why do this?

Well, PHP is dynamic, so in my static example above, explicitly using 'ajohanne' as part of the query, isn't the best way to go in more flexible systems, but just pop your found user in dynamically instead;
$domain = $this->database->users->$username->email->__after ( '@' ) ;
Easy. And this applies to all parts of the tree, so this works as well ;
$domain = $this->database->$some_table->$some_id->$some_field->__after ( '@' ) ;
No, from the two examples above we might see a different pattern, too. All data parts has unrestrained names, all query operations use an underscore, and all string operations uses two underlines. (__after is a shortcut for substr ($str, strpos ( $str, $pattern ) ), and I've got a heap of little helpers like this built in ) Through this I always know what the type of the object interface is, and with PHP magic functions these types are easy to pull down and react to. As some of my objects are extendable, I need to pass _* and __* functionality up and down the object tree.

Traditionally, we use getters and setters ;
$u = $obj->getUsername() ;
$obj->setUsername ( $u ) ;
I turn them all into properties, so ;
$u = $obj->username ;
$obj->username = $u ;
But they are still full internal functions to the object, and this is another magic function in PHP ;
class obj extends xs_SimpleObject {
function getUsername () {
...
}
function setUsername ( $value ) {
...
}
}
The framework isn't just about object persistence. In fact, it is not about that. I hate ORMs in the sense that they still drag your OO applications back into the relational database stoneage with some sugar on top. In fact what I've done is to implement a TMRM model in a relational database layer, so it's a generic meta model (Topic Maps) driving that backend and not tables, table names, lookup tables, and all that mess. In fact, crazy as it sounds, there's only four tables in the whole darn thing. I'm relying on backend RDBM systems to be good at what they should be good at; clever indeces, and easier joins in a recursive environment (which, when all data is in the one table, it indeed is recursive), where systems use filters to derive joins instead of doing complex cross-operations (which takes lots of time and resources to pull off, and is the main bottleneck in pretty much any application ever created which has a database backend.

A long time ago I thought that the link between persistent URI's for identity management in Topic Maps and the URI (and using links as application state) in REST were made for eachother, and I wanted to try it out. In fact, that fact alone was the very inspiration for me to do the 3rd generation of xSiteable, hacking out code that basically has one URI for every part of the Topic Map, for every part of the TM API, and for other parts of your application. Here's some sample URIs ;
http://mysite.com/prospect/12
http://mysite.com/api/tm/topics/of_type:booking
http://mysite.com/admin/db/prospects
At each of these there are GET, PUT, POST and DELETE options, so when I create a new prospect, it's a POST to http://mysite.com/prospect or a direct PUT to http://mysite.com/prospect/[new_id], for example.

All in all, this means I have many ways into the system and its data, none of them more correct than the other as they all resolve to topics in the topic map. This lowers the need for error checking greatly, and the application is more like a huge proxy for a Topic Map with a REST interface. It's a cute and very effective way of doing it. I'm trying various scaling tests, and with the latest Topic Maps distribution protocols that I can use for distributing the map across any cluster, it's looking really sexy (although I still have some work to do in this area, but the basics rock!).

Anyway, there's a quick intro. I guess I should follow this up with some more coded details of examples. Yeah, maybe next week, as I need to get some other stuff done now, but I like the object model I've got in place, and it's so easy to work with without losing the need for complex stuff. Take care.

Labels: , , , ,

20 October 2008

I went to TMRA 2008, and all I got was the best days of my life ...

Update: I've added an embedded version of the slides at the bottom of the post; my cool animations and lots of fonts are wrong, but hey, you can read it at least. :)

Not to put too much sugar in your otherwise fine brew of tea, but being at TMRA 2008 this year was one of the most fantastic experiences I've had so far. Not only did I catch up with some old friends, I met some new ones I know I'll stay in touch with. So much smart and easy-going folks gathered in one place ... I'm surprised it didn't disintegrate in a puff of logic as that there really must be some cosmic law against it. Although, I see the TED conferences still churning out good stuff, so it must be allowed. And yes, I do equate TMRA with TED; it was that great.

This year I was invited to hold the opening keynote speach, which I called "You're all crazy - subjectivelly speaking", a romp on the Topic Maps community, a plea to remember epistemology in all things data modeling, and the message that being "subject-centric" is not a technical feat; it's about social processes and agreement (or, at least, rough understanding of eachother).

I used a few cheap interactive ploys to hold the audiences attention, with making them audibly disagree or agree with certain assertions I made up on the screen. It was very effectice as raising the collective awareness to the issues I was trying to point out, and especially helpful when I needed to point out that there are some things we all disagree with. And not only that, but things we should disagree with.I think people in general thought it was a good speach, and the feedback was great, so thanks to all for that.

I'd like to thank Lars Marius Garshol and Lutz Maicher for inviting and encouraging me, Patrick Durusau, Jack Park (you need a website or blog, mate!) and Robert Barta for just being who you are, and every one else for making me once again believe so strongly that the Topic Maps community is the best thing since recursive properties and frames theory!

I'm sure I'll write more on what went down at TMRA 2008, but right now I need to make porridge for my kids. Later.

Labels: , ,

10 October 2008

Keynote speaking at TMRA 2008

Oops, I totally forgot to mention to the world that I'm the intro keynote speaker at the TMRA 2008 conference (one of two yearly Topic Maps conferences each year) in Leipzig next week (15-17 October). My talk is titled "We're all crazy - subjectively speaking" and will contain at least one bad joke, two pretty good ones, some philosophical ranting and hopefully lots of community building. I really, really hope to see you there; find me, say hello, let's have tea and discuss whether my two jokes really were good or not.

The big question is, how did I forget to tell you about this? I'll let you know that in a few days time or so.

Labels: , , , , ,

3 July 2008

Round and round it goes

This morning was a good one. I got on the bus, armed with breakfast banana in hand, and right there in front of me sat fellow Topic Mapper Stian Danenbarger (from Bouvet), who happened to be living just literally down the road from me. I've been living at Korsvoll (in Oslo) for 6 months now without bumping into him, how odd is that?

Anyways, the last few days I've written about Language and Semantics and about context for understanding communication (all with strong relations to programming languages), and needless to say this became the topic (heh) of discussion on the bus this morning as well.

In this post I'll try to summarize the discussion so far, implement the discussion I had on the bus this morning, coupled with a discussion I've had with Reginald Braithwaite on his blog, from "My mixed feelings about Ruby". Let's start with Reginald and move backwards.

Background
Matz has said that Ruby is an attempt to solve the problem of making programmers happy. So maybe we aren’t happy with some of the accidental complexity. But can we be happy overall? Can we find a way to program in harmony with Ruby rather than trying to Greenspun it into Lisp?
I think that the goal of making programmers happy is a good one, although I suspect there's more than one way to please a programmer. One way is perhaps rooted in the syntax of the language at hand. Then there's the semantics of your language keywords. Another is to have good APIs to work with. Another is how meta the language is (i.e. how much freedom the programmer has in changing the semantics of the language, where Lisp is very meta while Java is not at all), and yet another is the community around it. Or the type and amount of documentation. Or its run-time environment. Or how the code is run (interpreted? compiled? half-compiled to bytecodes?).

Can we find ways in programming that would make all programmers happy? I need to now point back to my first post about Language and Semantics and simply reiterate that there's a tremendous lack of focus on why we program in most modern programming languages. Their idea is to shift bits around, and seldom to satisfy some overal more abstract problem. So for me it becomes more important to convey semantics (i.e. meaning) through my programming more than just having the ability to do so. Most languages will solve any problem you have, so what does the different languages offer us? In fact, how different are they most of the time?
At this moment in time I have extremely mixed feelings about Ruby. I sorely miss the elegance and purity of languages like Scheme and Smalltalk. But at the same time, I am trying to keep my mind open to some of the ways in which Ruby is a great programming language.
I think we really agree here. My own experiences with over 8 years of professional XSLT development (yes, look it up :) has taught me some valuable lessons about how elegant functional programming can be, just like Lisp and the mix-a-lot SmallTalk (which I like less of the two). But then I like certain ways that Ruby does things too, with a better syntax for one. I like to bicker about syntax. Yeah, I'm one of those. And I think I bicker about syntax for very good reasons, too;

Context

In "just enough to make some sense" I talk about context; how many hints do we need to provide in order to communicate well? Make no mistake; when we program, we are doing more than solving the shifting of bits and bytes back and forth. We are giving hints to 1) a computer to run the code, and 2) the programmer (either the original developer, or someone else looking at her code). Most arguments about syntax seems to stem from 1) in which 2) becomes a personal opinion of individuals rather than a communal excericse. In other words, syntax seems to come from some human designer trying to express hints best to the computer in order to shift bits about, instead of focusing entirly on their programming brothers and sisters.

In the first quote about Ruby being designed in order to please the programmer, that would imply that 2) was in focus, but the focus of that quoted statemement is all wrong; it pleases some programmers, but certainly not all, otherwise why are we even talking about this stuff?

Ok, we're ready to move on to the crux of the matter, I think.
I am arguing that while it is easy to agree that languages ought to facilitate writing readable programs, it is not easy to derive any tangible heuristics for language design from this extremely motherhood and apple pie sentiment.
Readability is an important and strong word. And it is very important, indeed. We need everything to be readable, from syntax to APIs to environments and onwards. I think we all want this pipe-dream, but we all see different ways of accomplishing it. Some say it's impossible, others say it's easy, while people like Reginald I think is right there in the middle, the ultimate pragmatic stance. And if I had never done Topic Maps I would be right there with him. Like Stian Danenberger said this morning, there's more to readability than just reading the code well.

Topic Maps

Yeah, it's time talk about what happens when you drink the kool-aid and you accept the paradigm shift that comes with it. There's mainly x things I've learned through Topic Maps;
  • Everything is a model, from the business ideals and processes, to design and definition, our programming languages, our databases, the interaction against our systems, and the human aspect of business and customers. Models, models, everywhere ...
  • All we want to do is to work with models, and be able to change those models at will
  • All programming is to satisfy recreating those models
Have you ever looked at model-driven architecture or domain-driven design? These are somewhat abstract principles to creating complex systems. Now, I'm not going to delve into the pros and cons of these approaches, but merely point out that they were "invented" out from a need that programming languages didn't solve, namely the focus on models.

Think about it; in every aspect of our programming life, all we do is trying to capture models which somehow mimics the real-life problem-space. The shifting of bits wouldn't be necessary if there wasn't a model were working towards. We create abstract models of programming that we use in order to translate between us humans and those pesky computers who's not smart enough to understand "buy cheap, sell expensive" as a command. This is the main purpose of our jobs - to make models that translate human problems into computer-speak - and then we choose our programming language to do this in. In other words, the direction is not language first then the problem, but the other way around. In my first post in this series I talked about tools, and about choosing the "right tool for the job." This is a good moment to lament some of what I see are the real problems of modern programming languages.

What objects?

Object-oriented programming. Now, don't get me wrong, I think OOP is a huge improvement over the process-oriented imperative ways of the olden ways. But as I said in my last post, it looks so much like the truth, we mistakenly treat it as truth. The truth is there's something fundamentally wrong with what we know as object-oriented programming.

First of all, it's not labeled right. Stian Danenbarger mention that someone (can't remember the name; Morten someone?) said it should be called "Class-based programming", or - if you know the Linnean world - taxonomical programming. If you know about RDF and the Semantic Web, it too is based loosely on recursive key/value pairs, creating those tree-structures as the operative model. This is dangerously deceitful, as I've written about in my two previous posts. The world is not a tree-structure, but a mix of trees, graphs and vectors, with some semi-ordered chaos thrown in.

Every single programming approach, be it a language or a paradigm like OOP or functional, comes with its own meta model of how to translate between computers and the humans that use them. Every single approach is an attempt to recreate those models, to make it efficient and user-friendly to use and reuse those models, and make it easy to change the models, remove the models, make new ones, add others, mix them, and so on. My last post goes into much detail about what those meta models are, and those meta models define the communication from human to computer to human to computer to human, and on and on and on.

It's a bit of a puzzle, then, why our programming languages focus less on the models and more on shifting those bits around. When shifting bits are the modus operandi and we leave the models in the hands of programmers who normally don't think too much about those models (and, perhaps by inference, programmers who don't think about those models goes on to design programming languages in which they want to shift bits around ...), you end up with some odd models, which at most times are incompatible with each other. This is how all models are shifted to the API level.

Everyone who has ever designed an API knows how hard it can be. Most of the time you start in one corner of your API thinking it's going smooth until you meet with the other end, and you hack and polish your API as best you can, and release version 1.0. If anyone but you use that API, how long until requests for change, bugs, "wouldn't it make more sense to ...", "What do you mean by 'construct objects' here?", and on and on and on. Creating APIs is a test of all the skills you've got. And all of the same can be said about creating a programming language.

Could the problem simply be that we're using a taxonomic programming language paradigm in which we try to create a graph structured application? I like to think so. Why isn't there native support in languages for typed objects, the most basic building block of categorisation and graphing?

$mice = all objects of type 'mouse' ;

Or cleanups?

set free $mice of type 'lab' ;

Or relationships (with implicit cardinality)?

with $mice of type ('woodland')
add relationship 'is food' to objects of type 'owl' ;

Or prowling?

with $mice that has relationship to objects of type ('owl')
add type ('owl food') ;

Or workflow models?

in $workflow at option ('is milk fresh?') add possible response ('maybe')
with task ('smell it') and path back to parent ;

[disclaimer : these are all tounge-in-cheek examples]

I know you can extend some languages to do the basic bidding here, for example in JavaScript I can change the prototype for basic objects and types, but it's an extension each programmer must make and the syntax is bound to the limits of the meta model of the language, amking most such extensions look kludgy and inelegant. And unless they know all the problems that I think we've been talking about here, they really won't do this. This sort of discussion certainly does not appear where people learn programming skills.

No, most programming languages follow the tree-structure quite faithfully, or more precise the taxomatic model (which is mostly trees but with the odd jump (relationship) sideways in order to deal with the kludges that didn't fit the tree). Our programs are exactly that; data and code, and the programming languages define not only the syntax for how to deal with the data and code, but the very way we think about dealing with blobs of data and code.

They define the readability of our programs. So, Reginald closes;
Again we come down to this: readability is a property of programs, and the influence of a language on the readability of the programs is indirect. That does not mean the language doesn't matter, but it does make me suspicious of the argument that we can look at one language and say it produces readable programs and look at another language and say it does not.
Agreed, except I think most of the languages we do discuss are all forged over the same OOP and functional anvil, in the same "shifting the bits and byes back and forth" kind of thinking. I think we need to think in terms of the reason we program; those pesky models. Therein lies the key to readability, when the code resembles the models we are trying to recreate.

Syntax for shifting bits around

Yes, syntax is perhaps more important than we like to admit. Syntax defines the nitty-gritty way we shift those bits around in order to accomplish those modeling ideals. It's all in the eyes of the beholder, of course, just like every programming language meta model have their own answer. What is the general consensus on good syntax that convey the right amount of semantics in order for us all to agree to its meaning?

There's certain things which seems to be agreed on. Using angle brackets and the equal sign for comparators of basic types, for example, or using colon and equal to assign values (although there's a 50/50 on that one), using curly brackets to denote blocks (but not closures), using square brackets for arrays or lists (but not in functional languages), using parenthesis for functional lists, certain keywords such as const for constants, var for variables (mostly loosly typed languages, for some reason) or int or Int for integers (basic types or basic type classes), and so on. But does any of this really matter?

As shifting bytes around, I'd say they don't matter. What matters is why they're shifting the bytes around. And most languages don't care about that. And so I don't care about the syntax or the language quirks of inner closures when inner closures are a symptom of us using the wrong tools for the modeling job at hand. We're bickering about how to best do it wrong instead of focusing on doing it right. Um, IMHO, of course, but that's just the Topic Maps drugs talking.

Just like Robert Barta (who I'd wish would come to dinner more often), I too dream of a Topic Maps (or graph based) programming language. Maybe it's time to dream one up. :)

Labels: , , , ,

2 July 2008

Just enough to make some sense

I've realized that my previous post on language and semantics could possibly be a bit hard to understand without having the proper context wrapped around it, so today I'll continue my journey of explaining life, universe and everything. Today I want to talk about "just enough complexity for understanding, but not more."

Mouses

Let's talk about mouse. Or a mouse. Mice. Let's talk about this ;

One can argue whether this is really enough context for us to talk about this thing. What does "mouse" mean here? The Disney mouse? A computer mouse? The mouse shadow in the second moon? In order for me to communicate clearly with my fellow human beings I need to provide just enough information so that we can figure this out, so I say "mouse, you know the furry, multivorus, small critter that ..." ;


This is too much information, at least for most cases. I'm not trying to give you all the information I know about mice, but just enough for me to say "I saw a mouse yesterday in the pantry." Talking about context is incredibly hard, because, frankly, what does context mean? And how much background information do I need to provide to you in order for you to understand what I'm talking about?

In terms of language "context" means verbal context as words and expressions that surrounds a word, and social context as the connection between the words and those who hear or read them based on the human constraints (age, gender, knowledge, etc.) There's also some controversy about this, and we often also imply certain mental models (social context of understanding).

In general, though, we talk about context as "that stuff that surrounds the issue", from solid objects, ideas, my mental state, what I see, what I know, what my audience see and knows, hears, smells, cultural and political history, musical tastes, and on and on and on. Everything in the moment and everything in the past in order to understand the current communication that takes us to the future.

Yup, it's pretty big and heady stuff, and it's a darn interesting question; how much context do you need in order to communicate well? My previous post was indeed about how much context we need to put into our language and definition in order to communicate well.

A bit of background

Back in 1956 a paper by the cognitive psychologist George A. Miller changed a lot of how we think about our own capacity for juggling stuff in our heads. It's a most famous paper, where further research since has added to and confirmed the basic premise that there's only so much we're able to remember at the same time. And the figure that came up was 7, plus / minus 2.

Of course that number is specific to that research, and may mean very little in the scheme of more specific settings. It's a general rule, though, that hints to the limits we have in cognition, in the way we observe and respond to communication. And it certainly helps us understand the way we deal with context. Context can be overly complex, or overly simple. Maybe the right amount of context is 7, plus / minus 2?

Just right



I'm not going to speculate much in what it means that "between 5 and 9 equally-weighted error-less choices" defines arbitrary constraints on our mental storage capacity (short-term especially), but I'll for sure speculate that it guides the way we can understand context, and perhaps especially where it's loosely defined.

We humans have a tendency to think that those things that looks like the truth must be the truth. We do this perhaps especially in the way we deal with computer systems, because, frankly, it's easy to define structures and limitations there. It's what we do.

An example of this is how we observe anything as containers that may contain things, that in themselves might be containers which might be things or more containers, and so on. Our world is filld with this notion, from taxonomies, to object-oriented programming, to XML, to how we talk bout structures and things, to how science was defined, and on and on and on. Tree-structures, basically.

But as anyone with a decent taxonomic background knows, taxonomies don't always work as a strict tree-structure. Neither does anyone who's meddled in OO for too long. Or fiddled with XML until the angle-brackets break. These things looks so much like the truth that we pursue them as truth.

things are more chaotic than we like. They're more, in fact, like graph structures, where relationships between things go back and forth, up and down, over and under already established relationships. It can be quite tricky, because the simple "this container contains these containers" mentality is gone, and a more complex model appears;


This is the world of the Semantic Web and Topic Maps, of course, and many of the reasons why these emerging technologies are, er, emerging is of course because all containers aren't containers at all, and that the semantics of "this things belongs to that thing" isn't precise enough when we want to communicate well. Explaining the world in terms of tree-structures puts too many constraints on us, so many that we spend most our time trying to fit our communication into it rather than simply defining them.

We could go back to frames theory as well, with recursive key/value properties that you find naturally in b-trees, where values are either a literal, or another property. RDF is based on this model, for example, where the recursiveness is used for creating graph structures. (Which is one reason I hate RDF, using anonymous nodes for literals)

Programming languages and meta models

Programming languages don't extend the basic pre-defined model of the language much. Some languages allow some degree of flexibility (such as Ruby, Lisp and Python), some offer tweaking (such as PHP. Lua and Perl), while others offer macroing and overloading of syntax (mostly C family), and yet more are just stuck in their modeling ways (Java). [note: don't take these notions too strictly; there's a host of features to these languages that mix and match various terms, both within and outside of the OO paradigm]

What they all have in common is that the defined meta model is linked to shifting bits and bytes around a computer program, and that all human communication and / or understanding is left in the hands of programmers. Let's talk about meta models.

Most programming languages have a set of keywords and syntax that make up a model of programming. this is the meta model; it's a foundation of a language, a set of things in which you build your programs on. All programming languages have more or less of them, and the more they have, the stricter they usually are as well. Some are object oriented languages, other functional, some imperative, and yet other mixes things up. If I write ;

Int i = new Int ( 34) ;

in Java, there's only so many ways to interpret that. It's basically an instance of the Integer class, that holds the integer number of 34. But what about

$i = new Int ( 34 ) ;

in PHP? There is no built-in class called Int in PHP, so this code either fails or produce an instance of some class called Int, but we do not know what that means, at least not at this point. And this is what the meta model defines; built-in types, classes, APIs and the overall framework, how things are glued together.

As such, Java and .Net has huge meta models defined, so huge that you can spend your whole career in just one part of it. PHP has a medium meta model, Perl even smaller, all the way down to assembler with a rather puny meta model. Syntax and keywords is not just how we program, but they define the constraints of our language. There's things that's easy and hard in every language, and there is no one answer to what the best programming language is. They all do things differently.

The object-oriented ways of Java differ to the ones of Ruby which differs to the ways of C++ which differs to the ways of PHP. The functional ways of Erlang differs to XSLT which differs to Lisp.

The right answer?

There is no right answer. One can always argue about the little differences between all thse meta models, and we do, all the time. We bicker about operator overloading, about whether mutliple inheritance is better than single inheritance, one the real difference between interfaces and abstract classes, about getter and setter methods (or lack thereof), about types should be first class objects or not, about what closures are, wheter to use curly-brackets or define programming structure through whitespace, and on and on and on.

My previous post was another way of saying that we perhaps should argue less about the meta model of our language, and worry more about the reason the computer was created more than how a certain problem was solved? We don't have the mental capacity to juggle too much stuff around in our brains, and if the meta model is huge, our ability to focus on perhaps the important bits become less.

There are so many levels of communication in our development stack. Maybe we should introduce a more semantically sane model into it to move a few steps closer to the real problem, the communication between man and machine? I'm not convinced that OO nor functional programming solves the human communication problem. let's speculate and draw sketches on napkins.

Labels: , , , , , , ,