“Raw Data Now!”

January 28, 2010

Meaning‘Data’ is not synonymous with ‘meaning’. Although in all the recent fuss about Sir Tim Berners-Lee’s attempt to overturn the UK Civil Service’s ingrained culture of secrecy, this might easily be overlooked.

The announcement of data.gov.uk is to be welcomed, but it is only the first step on a long and complex road. The fears expressed by the data custodians, that data might be interpreted differently from the way intended, just shows how much we are still governed by vested interests who act ‘in our own good’. Sorry, give us the data, and let us make our own interpretations, good or bad.

So, data.gov.uk is a good thing. But it could turn into the veritable Pandora’s Box without some kind of agreed framework within which data are interpreted and evaluated. I am indebted to the KIDMM community for flagging-up the fact that a European focus group has been working on this very problem for some time.

The all-Europe Comité Européen de Normalisation (CEN), is a rather shadowy organisation which seems to work on standards issues in the background, and then suddenly spring into the limelight with a proposal for a new ISO standard. One of their workshops – Discovery of and Access to eGovernment Resources (CEN/ISSS WS/eGov-Share) -  appears to have done precisely this with (I assume) a proposal to the SC34 working group (ISO/IEC JTC1/SC34/WG3). This working group is concerned with producing standard architectures for information management and interchange based on SGML, and their current focus is the Topic Maps standard Topic Maps (ISO/IEC 13250).

Well, you know me. Any mention of Topic Maps and I’m anybody’s. So when I hear of an initiative which has developed a proposal which specifies a protocol for the exchange of information about semantic descriptions which conforms to the Atom Syndication Format and the Topic Maps Data Model, and moreover, which works with semantic descriptions represented in XTM 1.0, XTM 2.0 and RDF/XML, then, well, Nirvana!

Thanks to KIDMM, if you’re interested (and you should be!), then this is where you can find the full specification of the protocol SDShare: Protocol for the Syndication of Semantic Descriptions.

Let us know what you think of it, and of its potential in making sense of the vast amounts of data due to be released on the Web.


Open Source Software: A Serious Option At Last?

January 24, 2010

Open SourceI am increasingly impressed by what open source (OS) software communities are offering. Not just in terms of the sheer range of applications, but by their quality too. That’s an observation vindicated by the recent award of DoD 5015.02 Records Management Certification  to Alfresco, according it the kudos of being the first open source product to demonstrate compliance with the strict DoD 5015.02 STD specification for records management. That’s a significant achievement even Microsoft can’t match.

If you visit the Mecca of OS developers, Sourceforge, you’ll find hundreds, if not thousands, of little niche applications of the sort often found on computing magazine cover-CDs, which will be of great use to some, but of no interest to most. But bear with them. Like any jumble sale or bric-a-brac market, you have to plough through the dross to find the jewels. One particular jewel I am playing with at the moment is VirtueMart.

VirtueMart is an OS online e-commerce application, allowing anyone to set up an online sales presence at an incredible level of detail and functionality. It runs under the OS Joomla! CMS, which in itself is a jewel. Although one has to give an equal plug to Mambo, the original OS CMS project from which Joomla! forked some five years ago. Both VirtueMart and Mambo utilise the LAMP development and deployment environment – Linux, Apache, MySQL, PHP – although I’m using the Windows variant WAMP.

Why is this relevant to those interested in KO? Well, because I can’t think of any more complex real-world application requiring solid KO expertise than an e-commerce site. VirtueMart has to support and integrate:

  • vendor indentity and brand
  • product classes, categories, instances and descriptions
  • manufacturer information
  • site visitors
  • existing customers
  • product reviews by customers
  • multiple payment methods
  • discount & coupon schemes
  • ordering & order status reporting
  • multiple tax regimes
  • shipping methods & rates

All of these entities (if that’s the right term) have numerous attributes which need to be configurable, depending on what you’re selling. The VirtueMart developers, all of whom have given of their time and expertise freely, have done a really impressive job. Might they have done even better, I wonder, if KO professionals had been prepared to donate their expertise?

——————————-

In the course of sifting through Sourceforge, I discovered a number of applications relevant to KO. I shall be featuring these over the next few weeks in our KOOLTools section, as and when I have the time to test them. Bookmark it.


Self-Signifying Data

January 13, 2010

** Event rescheduled to March 24, 2010 **

The digitally-supported brain

The Digitally-supported Brain

Those ISKO UK members who attended Dave Snowden’s memorable seminar in April 2009 – Human-machine symbiosis for data interpretation – may remember what I remember as a collective gestalt. ‘Self-signifying data’ was the phrase, I believe, and it proved more memorable for many even than Dave’s ‘Blanket Octopodes’ (see Dave’s slide set for an explanation.

Why raise this again now? Well, because ISKO UK member Jan Wyllie and business partner Simon Eaton, have for some time been developing a web site/application based on those very principles of self-signifying data described by Dave Snowden. Their Open Intelligence initiative draws on years of experience in Content Analysis, brought up-to-date through Web 2.0 technology. And it’s highly relevant to KO professionals, because KO technologies – categorisation and taxonomies – are at the heart of the Open Intelligence approach.

If making systematic inferences from communications flows using faceted taxonomies, using content analysis techniques to turn the tables on the knowledge glut, or increasing the value and productivity of work groups because they will be working with a much higher level of common knowledge rings a bell for you, then consider attending Jan Wyllie’s one-day Ark Group Masterclass Content analysis: Using taxonomies to improve collaboration. It’s to be held on 10 February 2010, in London, and will feature “the first public showing of the all new Open Intelligence software dedicated to making the social networking experience of creating collaborative intelligence, an engaging, as well as a valuable and productive use of a community’s knowledge working time.”

Further details of this important and pioneering effort are available on the Open Intelligence web site, where you can also book your place.


Trying to please everyone

September 18, 2009

One of the enduring attractions of our profession (that’s information management, knowledge management, records management, information science, knowledge organization – whatever you want to call it) for me, is that it impacts upon everything. Yes, literally, everything. When we build a taxonomy, relate descriptors in a thesaurus or assign keywords, we are mediators among a multiplicity of points-of-view, creeds and catechisms. But while that heterogeneity, that multicultural dimension, is often the root of our sense of fulfilment, contention can lie just below the surface.

To focus on one problem in particular, how can we know whether a taxonomy we build is ‘true’ – or perhaps ‘authoritative’? Is there such a thing as ‘universal truth’? Do we all see things the same way? Or, to put it another way, how do we distinguish between – and accommodate – the subjective and the objective?

For instance, when we build a taxonomy, or a navigation scheme for a web site, how can we capture the viewpoint of the majority, whilst also allowing for the individual – even idiosyncratic – point-of-view? Thus do philosophy and politics enter an otherwise cosy world.

It’s a problem addressed recently by Fran Alexander of the Department of Information Studies, University College London, who mounted a highly stimulating poster at ISKO UK’s conference on 22-23 June 2009. The poster provides an interesting first-sight of the complex nexus among business sector objectives, attendant socio-economic-environmental constraints, and the influence exerted by the relative subjectivity/objectivity of the domain.

The degree to which a conceptual framework is held in common, the coherence of interpretation of that framework among its stakeholders, and the terminological system designed to represent it, all depend upon a process of intersubjective creation of shared meaning within a defined socio-cultural context. In other words, politics. Taxonomy is therefore partly political, partly individual and partly pragmatic.

Melville Dewey deserves his place in the history of KO for his balanced accommodation of all three dimensions at the time he devised the DDC. But we’re over 130 years further on now, and the mix of political, personal and practical elements required to reflect current understanding of the world (or organization) has changed immensely. Dewey’s innocent assumptions drawn from the Weltanschauung of his time, appear at least inappropriate, sometimes biased and often incorrect in a 21st century context.

In a rather adept (and certainly persuasive) essay in the latest issue of Knowledge Organization*, Richard Davies asks ‘Should Philosophy Books Be Treated As Fiction?’. He makes the point that, in the terms used here, the intersubjective creation of meaning in the domain of philosophy has barely occurred; rather the opposite in fact, each philosopher seeming bent upon distinguishing his/her approach from predecessors. This occurs, although to a lesser degree, in most other domains as well, amongst them the 15 or so covered by Fran Alexander’s research.

Fran’s conclusion is that “The mediation of subjectivity/objectivity is becoming increasingly relevant in a ‘user-centric’ age.”. So, an awareness of the degree of ‘objectivity’ of a taxonomy project is becoming vital to its functional effectiveness, and this is inevitably governed to some extent by political considerations and the degree to which the role of the taxonomist is perceived to have a political dimension by those who provide the support for such projects.

This is an interesting piece of research and I urge you to take closer look at Fran’s poster, and to allow it to stimulate your own thoughts on the issues involved.

* Davies, Richard. Should Philosophy Books Be Treated As Fiction? Knowledge Organization, 36(2/3), 121-129.



Death of the document?

June 29, 2009

isko_loves_waveWith not even a soupçon of the quagmire I was entering, I recently looked up the definition of ‘document’. In case you didn’t know, the glib dictionary definitions hide a debate that has, well, not exactly raged, but rather limped on for nearly twenty years now. I don’t know, but I guess that it was the arrival of the digital ‘document’ with the first word processors in the early 1980s which sparked it in the first place.

It turns out that there’s no one definition of ‘document’ that everyone’s happy with. We can all agree what a cup is, or a bus, but not, it seems, a ‘document’. And to cap it all, a recent paper in the Journal of Documentation (Frohmann, Berndt. Revisiting “what is a document?”, JDoc 65(2), 2009) tells us that we shouldn’t bother anyway. Shame, I’d been planning to investigate where the ‘document’ stands in the light of Web 2.0, much as Steve Bailey and James Lappin are doing for records. And then what happens? Google announces the death of the document.

How so? Well, instinctively, we humans don’t welcome change. We are ruled by nostalgia – or rather, inertia. Come any new technology, we always try to replicate the old model within it, failing to see that it offers scope for completely new ways of doing things. Web 2.0 is just the catch-all term for a number of such new ways – new models of communication and interaction – Blogs, Wikis, Facebook, Twitter, LinkedIn and now, Google Wave. All of them are document-agnostic.

Pedigree

The team that developed Google Maps moved on to look at the various ways in which ICT supports the ways we communicate and share information. They range from the historic, fixed snapshot (documents, including email and blogs) through the quasi-dynamic SMS and IM to real-time telephony. In all of them, the concept of the link begins to eclipse the concept of the discrete document.

Google Wave integrates the best features of email and IM to move a significant step forward toward the ideals of the Semantic Web. The plus is that discrete, siloed documents are no longer the focus of communication. Rather, documents become just one element in a conversation. And a conversation, one might note, in which any kind of editor function has been eliminated. It remains to be seen how that disintermediation helps or hinders effective information sharing.

Features

Wave offers four main innovative features which take it way beyond conventional email. The first tackles the problem of ‘threading’. A Wave starts with a message, just as in normal email, discussion lists, forums and blogs. However, Wave allows participants’ comments or replies to be embedded in-line in the original message adjacent to the text to which they refer. The logic of the would-be conversation is no longer fragmented across multiple, separate messages, linked only by a tenuous ‘thread’ which is easily broken. The advantages of this consolidation apply to attachments too, which are a pain to find again in anything but the shortest thread. A Wave therefore becomes a multi-participant conversation, complete with associated resources, attached or linked.

Wave’s second key feature builds upon the quasi real-time echoing of participant keyboard input familiar from IM applications. Google’s step forward in this case is to echo updates to all participant screens in as near real-time as current technology allows. No longer do you have to watch that scribbling pencil for seconds that feel like minutes; characters appear virtually as the writer types. This live, as-you-type updating works well with simultaneous multiple editing too.

Thirdly, Wave authors are allowed to specify the scope of participation, from public, to group, to private, and whether each member has read only, authoring or editing rights. The group and private categories can be expanded or contracted at any time.

Lastly, and perhaps the most significant feature of all, participants who join the conversation late don’t lose out. When they join a conversation in progress, they can simply click a button to see each and every change made to the original message up to that point, in a kind of slow-motion automated playback of a wiki page history. The Wave Playback facility could prove to be the silver bullet that records managers have been looking for to bring email under control and to tame the anarchic tendencies of Web 2.0. But it could equally be used also as a point-by-point versioning system where that’s useful.

Google have made the most of the opportunities provided by current technology by including further features, such as context-aware corrections as-you-type (‘Spelly’), detection and insertion of links as you type (‘Linky’), and ‘Polly’, a gadget for conducting surveys and polls. Particularly impressive is ‘Rosy’, a robot drawing on Google Translate which can translate in real-time, as you type, from any of 40 languages. There’s easy linking to Google Maps too, as you might expect, and yet more.

The original Wave video (1h20) can be found on YouTube, while Smarterware have chopped it up into eight 30-60 second chunks for those who can’t afford 80 mins. online. Alternatively, there’s an excellent summary of Wave on Mashable.

But by now you’re asking, ‘OK, nice, but so what?’

Changing how we work

Wave combines previously separate communication applications into an integrated communication space far better resembling what third generation knowledge management sophists revere – the conversation. It enables a whole new level of real-time disintermediated collaborative communication where the document is just one part of a greater whole – the conversation. What’s more, another of Wave’s robots – ‘Bloggy’ – allows Wave content to be published to blogs, or via the Wave API (Application Programming Interface), for whole Waves to be embedded in a blog, or in any Web page come to that.

As if that weren’t enough, Google are making the Wave source code, its XML-based communications protocol and its External API open source. That opens the floodgates for developers around the globe to create extensions and gadgets of any kind imaginable. There is already a Twitter extension –‘Twave’ – which integrates Twitter feeds within a Wave, incoming or outgoing. Although Google obviously hope that most Wave developments will be hosted by them, they are acknowledging the corporate perspective by allowing anyone to run their own Wave server. How that fits with their advertising-based business model remains to be seen.

Implications for KO

Possibly the single most significant thing about Wave is that Google are recognising the potentially unlimited development resources available through the open source community. And that’s where KO might just find a new lease of life. We’re all familiar with the ongoing debate, a little less polarized now than it was four years ago, on formal taxonomies versus folksonomic tagging à la del.icio.us or Technorati. Wave, it seems, has adopted a flat tagging approach similar to Twitter hashtags. However, there’s lots of room between the two for rapprochement, as evidenced by the emergence of RDF-style machine tags (triple tags) on Flickr a while back, or by Wikipedia’s extensive category tree. Open Intelligence, a knowledge-sharing site set up by ISKO UK member Jan Wyllie, is pioneering a faceted tag system which may just provide some clues to where KO might be going in the Web 2.0 world.

It would seem not unreasonable therefore to pose the question whether someone (ISKO UK?) might sponsor some research into how established KO techniques may be applied to findability in Google Wave? It could make for a challenging doctoral dissertation. Then, someone with the necessary technical savvy just might develop a Wave extension allowing tags to be selected from a thesaurus. An attractive prospect, methinks.

Let’s not play catch-up yet again. Let’s get involved!



David and Goliath 2.0?

May 1, 2009

There is superficial search and there is deep search. While Google is great at the first, it’s not so good at the second. There are some enterprise search applications which can claim the centre-ground between the deep and the superficial, but most of the runners in that particular race fall somewhere along the way and barely even glimpse the finishing line. Not that it matters any more, apparently, because if search analyst Stephen Arnold is right, search is dead.

Stephen Wolfram

Stephen Wolfram

Arnold is right that the domain of knowledge discovery is ripe for an orthogonal change – a disruptive intervention as complexity theorists would call it. Enter US-based British mathematician Stephen Wolfram. Wolfram is no stranger to orthogonal change, having published in 2002, a monster of a book entitled A New Kind of Science (NKS).

NKS essentially proposed that accepted scientific method be augmented by an inverted approach, whereby hypothesis is not solely tested by experimentation, but where experimentation may also generate hypothesis. At 1280 pages, it took me months to read, despite its author writing very lucidly about complex mathematical concepts (maths was never my strong point).

In NKS, Wolfram presents (in narrative and over 1000 illustrations) the results of years of computational experimentation with ‘simple programs’. Simple programs are typified by cellular automata – grids of cells, each of which can exist in some defined ‘state’ with finite values (+ or -, on or off, 1-2-3-4-5 etc.) in any number of dimensions, accompanied by certain rules regarding how adjacent cells may interact in time. Wolfram devised hundreds of such cellular automata and associated interaction rules, then explored, through his Mathematica computation engine, how each of them developed – or not – over time.

Wolframs depiction of his Rule 150

Wolfram's depiction of his Rule 150

Result of running Rule 150 over many iterations

Result of running Rule 150 over many iterations

He discovered that a significant proportion of them can produce surprisingly complex and sustainable patterns of results (illustrated in the book, as right), some resembling patterns discovered decades earlier by complexity pioneers such as Lorenz and Mandelbrot.

Wolfram was much criticized at the time NKS was published for not employing ‘proper’ scientific method in his research. That’s a bit like criticizing Einstein for straying outside the boundaries of Newtonian physics, it seems to me. He was also criticized for not having any immediate applications for his discoveries.

Well, seven years on, Wolfram appears to be striking back at his critics with the imminent launch of Wolfram Alpha, a ‘computational knowledge engine’ combining Mathematica with principles he first described in NKS.

What’s a ‘computational knowledge engine’? Well, PCMag (29 April, 2009) in the US reported:

“Wolfram Alpha has trillions of pieces of curated data,” Wolfram said. “We’re getting data from both free data and licensed data – some of it is very static. A lot is data from feeds that come into our system, and we’re running through this partially automated, partially human process, correlating data and verifying data. It’s set up so it’s organized and clean and computable.

Wolfram says that there are four main components to Wolfram Alpha (WA): data curation, internal algorithm and computation, linguistic understanding, and automated presentation. The first two components sound a bit like what Google does, and some commentators have gone as far as claiming that WA might even out-Google Google. However, WA appears to be a different kind of application altogether – a knowledge aggregator and synthesizer with real-time presentational graphics. The Washington Post (April 24, 2009) said:

When it was first unveiled in March, Wolfram Alpha, a new type of search engine created by computer scientist Stephen Wolfram, got a lot of buzz. Naturally, some people threw out the “Google killer” title; but it seems to be a different beast, as it’s all about knowledge search. That is to say, you ask a question, and you get an answer; with Google, you ask a question and you get a link to a bunch of documents. That may sound a bit bland, and simplistic, but the select few who have seen it, seem to think it works really well and could be a game changer.

There is considerable cynicism surrounding the WA announcement, and perhaps Google deserves to enjoy a brief whiff of schadenfreud before WA launches publicly in May. We’ve also yet to hear what the Semantic Web community thinks about WA and how it relates (if at all) to what they are trying to achieve. Until we know more about how Wolfram Alpha works and what kind of results it can produce over what domains of discourse, it’s difficult to form an opinion. You can find out whether all the fuss is warranted by keeping an eye on the Wolfram Alpha Blog and monitoring the responses in the specialist media.


Topic Maps Go Open Source

April 27, 2009

XTM Topic Maps (ISO 13250) is a Semantic Web-related technology using XML to describe knowledge structures. A number of start-up companies in Europe and the US in the early 2000s initiated programmes to develop applications supporting the creation and navigation of Topic Maps. Of them, only Ontopia in Norway seems to have survived in any commercial sense, with its Ontopia Knowledge Suite (OKS) incorporating the Omnigator Topic Map navigator and Ontopoly Topic Map editor. Despite a committed cadre of enthusiasts across the globe (including myself), Topic Maps as a knowledge organization technology proved difficult to promote outside of Norway. As a result, Ontopia was acquired by Norwegian IT consultancy Bouvet ASA in March 2007.

Bouvet themselves have now acknowledged that Topic Maps does not appear to be a technology with any conventional commercial potential. They have therefore announced that the Ontopia suite of Topic Maps applications is to be made open-source. In my view, this is the best decision they could have made. Topics Maps is an XML mark-up standard with more readily understandable semantics and far greater flexibility for describing the widest variety of knowledge structures than is RDF, as adopted by the Semantic Web developers.

Visit the Ontopia site for further information.