Technologies used
topic map logo
xSiteable logo

Note : This document was written for an internal workshop at Bekk Consulting. Copyright to them and me. Don't use it without permission, except the tricks and tips within.

XSLT - all-singing? All-dancing?

a simple and opiniated introduction to XSLT in real life

by Alexander Johannesen, Bekk Consulting AS

1. Intro

The focus of this document is to try to balance the weight between system programmers and architects on the one side, and the interface implementers on the other side. XSL is most often used when transforming XML into something more visible like HTML, and hence a lot of its more practical and good uses tends to get lost and never used.

The focus is also to offer some form of strategy and hints from people who have used XSL in development and production environments, trying to point to pitfalls and excels.

1.1 Hyperspace

The Internet is often more about hype than anything, and the area of XML and its various cousin technologies are no exception. Ever since it was introduced back in 1996 as a subset of SGML it has been a growing and evolving set of standards, expressions and uses, surrounded by myth and legend, the latter probably through strong evangelism but also maybe in the hope to hype it up to become more than it really is to make it stay for longer until most quirks are getting sorted out. Lots of theories, folks.

But no matter the reasons for its somewhat shrouded circumstances; is it worth the hype? Yes. Maybe. And no. Let's explore these options a little.

1.2 Yes.

Yes, XML and its cousins are pretty cool, making life easier for a lot of computers and people out there who deals with objects, data and structures. XML gives sometimes pretty obvious, simple and elegant solutions to somewhat complex, ugly and difficult problems.

1.3 Maybe.

Maybe, if we looked at everything from a distance for a moment, we'll see that our structures are wrong, that something basic and fundamentally needs changing. That is not always so easy to change, and is one of the bigger pitfalls of relying totally on structures that later on is found to be not-so-great structures at all.

Note also cases where you use XML data unsorted, unformatted and untested, where, instead of rather complex XSLT you might reconsider pre-processing your data before storing them in XML format.

1.4 And no.

No, XML can't do it all, and some things don't really belong in it at all, especially things that is not text-based. Not to say you can't do it with XML, but rather that you shouldn't; organised and structured data is only useful if the structure means something when you read it back, but sometimes you have data that can't be structured in a hierarchical way like real-time acquisitions or binary or semi-binary data, and maybe an alternative approach is much better for your solution.

2. What you should know

First, you should know about SGML, which is the mother of both XML and HTML; started upon back in 1986, it is a complex monster of a DTD based markup language that more or less can do anything, and then some. Both XML and HTML are simplified sub-sets of SGML.

You of course need to know what XML is. If you don't know, stop reading this document right now, as it wouldn't make much sense to you. Do a search online for XML basics or a FAQ to get you going. Don't like structured data? Go somewhere else, as this document is not for you.

You should know what XSL really is, that it is a group term just like XML that contains funny things like XSL, XML Protocol, XML Schema, XML Query, Xlink, Xpointer, XML Base, DOM, RDF, XHTML, MathML, SVG, and more. In fact, consider XSL to be a node on the XML tree, with several sub nodes in it; XSLT, XSL TO, Xpath, Qname, Xquery, and more being added as we speak. Further, XSLT is the function in which XSL transforms XML documents through.

XSLT is itself an XML document, meaning it must be well-formed and standards compliant. If you know how namespaces in XML works, XSLT is a namespace that XSLT parsers pick up and perform. This means the following strict precedence; the input to an XSLT must be XML conformant. That sets certain limits to characters that can be used, the structure of the data, and to how you can manipulate it.

2.1 Who wants to use XSLT?

Someone smart. Yes, really, anybody with a minimum of sense and intelligence should at least look at it, as XML is the de-facto exchange format for textual data over the internet today, it makes sense to use an XML specialist transformer. Anywhere where structured textual data needs to be processed in any way, XSL is an ideal way of doing just that.

As XSL is an XML-transforming specialist, you may want to use it instead of massive programming in your favourite language to do selections, sorting, transforming and otherwise changing your XML data into something else. All XSL parsers are optimised for these tasks, and it is very likely that they will outperform your own program in such tasks. And it may save you a lot of coding, too.

2.2 Who shouldn't use XSL?

A word of warning; don't use it if you don't mean it. It's more about system architecture than being yet another cool tool to solve a certain problem, and if you haven't got XML in both ends of your system, then maybe XSL may not solve your problems as intended. Let me repeat those important words; there must be XML. Even if HTML is also a sub-set of SGML, it in no ways implies that using HTML with XSL will be easy. In fact, the XHTML standard was created for this very reason. If you need to reuse a lot of HTML, make sure it is transformed to XHTML before attempting to use it with XSL. There are tools that can help you with this.

So basically, if you've got a lot of loose HTML (note; not XHTML) that you want to reuse, XSL will not be your friend. If you want to use HTML snippets (partial code, unstructured elements), XSL will not like you. If you have a lot of weird characters or binary data in there, XSL will hate you. You can't just send anything at an XSL parser and think it works right away; it must pass the test of XML.

So, think XML. And lets get productive.

2.3 Basic flow

As I've already mentioned a few times is that the input is XML and must conform to that standard, and I think this cannot be repeated to seldom as many sticky situations can bite you from behind if you overlook this. The output, however, can be more or less anything textually based, but there are two default outputs that it is being used the mostly for; XML and HTML.

The XSL parser reads the input document, and it looks for anything in the namespace of the XML-nodes for any nodes matching that of the XSL standard. An example:

example of XML tag

In the example above, the XML tag will either pass through as is, or, if there is an XSL instruction to react on it, be processed thus. If you don't know anything about namespaces, it is simply a mechanism in the XML standard for mixing data from various sources. Let's look a little closer at namespaces.

2.4 Name that tune

A quick example shows what namespaces are:

example of XML tag

example of XML tag from Australia

example of XML tag from Norway

All these three tags have a namespace attached; the first one uses the default namespace (simply the root of your input XML dataset), one for Australia, and one for Norway. In a lot of cases you want to use data nodes from several sources (like various documents or databases), but sometimes the data have the same tag-names, and hence you need a way to differentiate between them. Namespaces gives us this mechanism.

Note that the Australia-tag doesn't have an exact matching end-tag. As you can't mix namespaces, they must be hierarchically inherently compliant, and hence , as it really means and relates to a different namespace, is invalid.

Advice : Namespaces can cause blisters and frustration. If you don't have to use them, try to avoid them as they make life so much more complex.

Why this emphasise on namespaces? Simply because XSL is a group of instruction to a parser with defined behaviour through the XSL specifications that is being marked up through the XSL namespace. Phew! One long sentence, but the whole definition of XSL in one sentence. Not bad.

3. What can it do?

XSL can do a lot of practical transformations and manipulation of structured textual data. It can move things around, sort it, change it, and output it to some format you may think is the right format for the job.

It can be used for single-pass or multiple-pass transformation for a variety of results, from converting one XML into another for further processing, to converting it to HTML for displaying.

3.1 ...and what it shouldn't do

Anything dynamic. No, really, it isn't suitable for anything dynamic. Or anything too complex. Or flexible. If you need parameterised functions, dynamic includes, variable searching or regular expressions, look elsewhere. There have been put quite a number of deliberate limitations to XSL to avoid complexity and bloat of the language. For tasks like these I'd suggest you'd rather look to your favourite programming language.

Also, don't use XSL in the hope that the XML structure will make your life easier. If your input is bad XML, the XSL will be a nightmare to control and maintain. A classic pitfall is when you have XML that looks like this:

45653

Ocean

95865

Grass

This creates hopelessly complex Xpath expressions. Another typical example of how your input XML can make life difficult is the flat model very typical of translating CSV (Comma Separated Values) to XML :

1001

Oscar

available

1002

Pedro

vacation

1002

Christie

unavailable

This looks maybe like sane code at first, but XSL is not an ordered language; it is an event based language, meaning you have no knowledge in which order elements will be triggered. Yeah, they look nice and orderly, but in XSL those lines separating the elements so visually nicely is whitespace, and in XML as XSL as HTML and SGML, whitespace is ignored, and can't normally be used as a delimiter unless you're one of those with too much time and sanity to waste.

In short; make sure your XML is nice and neat, contextual and structural, and I'm tempted to say, mostly self-explanatory. Even programmers are by most still regarded as thinking and relational human beings.

3.2 ...and what it really cannot do!

There are a number of things it can't do as of this writing, which is based on the XSLT 1.0 specifications;

Truly variables: variables in XSL are constants, with block-scope only. If defined in one template, it stays there.

Variables in XPath expressions in run-time.

DOCTYPE manipulation: nope, any DOCTYPE definition is hidden to XSL, as it is a strict XML element.

Loose structure (like HTML): the input must be well formed XML.

Conditional includes.

File and directory control.

Sorting dynamically.

3.3 ...and, of course, how to still do it

As in any technology that is distinguishable from magic, there are workarounds to most problems, with their additional bloat and strings attached. As a general rule, if you're looking for too many workarounds, you're working in a language that may not be suited for your needs.

Most good parsers around have extensions to the XSL specification to do Frequently Asked for Features, like regular expressions on strings or getting the current documents URL, and most often they are defined in a separate namespace, like the common library from the rogue EXSLT specifications (see www.exslt.org for more);

The parser picks up the common namespace, and passes the nodes through the extension-functions in the parser. Do note however, that most parses, except those that comply to extension libraries such as the EXSLT, have their own names for these functions. There might be some research that needs to be done before solving your problem, and the final code is in no ways portable between parsers.

4. Clever boy

Even if you have extensions to XSL, these are made mostly for special cases and when wanting to use XSL for other things than what is was designed for at the time. XSL is quite a clever boy even without all the extensions, as XSL gives you several technologies and protocols that are involved in an XSL transformation, ranging from searching and selection through XPath expressions to sorting and manipulation of a tree of nodes or single node.

The basic loop of a XSL parser is to select something, do something with that selection, and select again until the end is reached. There are mainly one way of selecting data; through the "select=" parameter attached to a variety of XSL tags, like , , , and so on.

4.1 XPath

The most powerful way of selecting anything is through XPath. Do note that you can loop through a known structure of a XML node tree without the use of Xpath, and it is recommended for performance to do so, but when your tree is bigger than a few lines, Xpath is your friend. Whenever you need to select dynamically a part of your XML document for processing, you'll probably use XPath to select them. XPath is an expression that is specially constructed to work miracles on XML nodes, and knowing XPath well is to master XSL well, both in performance and abilities.

Explaining the wonders of XPath is beyond the scope of this document, but a quick introduction is necessary;

Just like any DOM API out there (like SAX) XPath provides the XSL programmer with the ability to break down and select the node-tree (here, the XML input document) you need. A quick example given the following XML;

Ocean

Mountain

Grass

Sky

we can select all tags as follows;

/various-tags/tag

An XPath expression is somewhat similar to a path of most common file systems, with a few exceptions. Given the same XML, we can select all tag nodes with "blue" as an attribute as follows;

/various-tags/tag[@color='blue']

There are many criterias you can attach to an XPath expression, ranging from positions of nodes, their relationships, their attributes, their content, and even do simple mathematical evaluations and linear lookup in external XML documents. It can be a powerful tool indeed.

Do be aware of some pitfalls, mainly in two areas:

Performance : XPath can be quite consuming of your computers resources as it can easily be the loop in which the XSL parser spends most of its time and processing. Copying nodes and simple string manipulation is nothing compared to recursive searches with multiple options on complex node trees.

Complexity : XPath expression, unless you split your expressions up or find other workarounds, can be unnecessary long and hard to work out what really does. Sometimes they can even do more than what bargained for, and can often become a major source of bugs that are hard to track down.

5. Simple pleasures

Simple pleasures are always the best, and with XSL this is not only no exception, but a must for keeping your code maintainable. XSL is not a pretty language, and given enough complexity the code will not only be arcane in visibility but also poor in implementation. XSL was not meant for a lot of things, and even though you can probably do it, it doesn't mean that you should. If the code is too bloated and ugly, maybe you're doing it wrong or using the wrong method. Just because you can do it with XSL doesn't mean you have to, and most often you really shouldn't.

XSL documents have a tendency to get very large and difficult to grasp, and hence it is recommended that if your XSL document exceeds 4 to 6 pages you outsource it to a separate document and include it (through xsl:import or xsl:include), thus creating perhaps reusable libraries of XSL code. Remember to group related templates together.

One of the main reasons for the complexity of XSL is due to bad naming of templates, variables and parameters. Keep them as sane as possible, and use long descriptive names if you can, as this has no effect on performance.

6. Hairy palms and turning blind

There are many pitfalls when doing XSL, and even though some of these lie in the language itself, there are bigger and far more severe pitfalls in the general architecture of your system that might prove fatal. If you haven't thought through your structured data you might find yourself in a blind alley where XSL can't even be used, or the extensions you're so fond of using suddenly is no longer available.

A short list of useful hints in planning and implementing your use of XSL:

Don't use XSL for bloody everything

XSL isn't the saviour for all, and it surely doesn't live up to the hype of modern day XML evangelists that tells you about all the whistles and bells that XSL has got in order to impress you.

You can't change the node tree

If you need to change a tree structure, you can only create a new one with XSL. It is a compiler, not a program.

Don't think in one generation

XSL performs very well when given several parses of the same data. Too many try to do everything with their XML in one setting, forgetting that the job becomes simpler, quicker and more robust through two or more passes.

Don't overuse dynamic techniques

Trust me that most data don't need to be parsed every time someone wants to see a page. Most XSL results can be cached, unless the data is returned from a database and is unique to every user. Look into what data needs dynamic transformation, and what data you can statically build and bind long before the data reaches the user. This enhances performance, and can make it easier to separate content and design.

String manipulation

Don't do it. XSL truly suck when strings and numbers needs to be changed, fixed, converted, moved, added, sorted or otherwise anything apart from checking that the data is really there. Do it elsewhere. It makes your life simpler. Make sure the XML input is as ready as possible for output, already checked for case, already stripped of leading, attending, and annoying characters, already converted to a nice date format, and already removed if node was left blank.

General conditionals

They suck in XSL. Keep them simple, and be prepared to write lots of code twice if not more due to the block scope rules of XSL.

Keep these in mind, and prosperity and a happier life will be yours. Don't overlook any of them, and don't think that someone down the line will have to deal with it. Quality of data is always an important success-factor, and the earlier in the chain it is introduced, the higher the smell of success. Don't think that you'll "chuck it into XML, and let the other guy handle the rest" as this will most certainly make the other guy ask you why you did this stupid thing. Most questions I see about XSL is usually about workarounds due to strange or difficult input, of course apart from those questions that are linked to the semantics of the language itself and the various decisions made during its creation.

7. Working example

A working example for those who wants to have some actual code to look at:

The XML

Hvordan lage HTML fra XML gjennom XSL?

Hva du trenger

Du trenger en XML fil, en XSL fil, og en XSL parser.

Dessuten trenger du grunnleggende kunnskap om XML og strukturert data, samt å vite hvordan man åpner, redigerer og lukker filer, og kjører enkle programmer.

Hvordan lage XML

Hvordan strukturere dine data

Det er mange måter å strukturere dine data på, men husk hele tiden på at dine data bør representere både hvordan de logisk henger sammen og hvordan man best finner frem i dem.

Hvordan lage XSL

Hva er forskjellen på XML og XSL?

XML er en datastruktur, og en XSL er en XML-fil med tilleggsinformasjon for hvordan man forvandler en XML til noe annet, slikt som HTML eller tekstfiler.

XSL er XML?

Ja, men der hvor XML holder data i en struktur, holder XSL spesifikk informasjon om hvordan dataen i XML'en skal behandles i tillegg til annen informasjon som også skal være del av resultatet.

The XSL

Manual


Chapter .





.