KOnnect

“Raw Data Now!”

January 28, 2010

Meaning ‘Data’ is not synonymous with ‘meaning’. Although in all the recent fuss about Sir Tim Berners-Lee’s attempt to overturn the UK Civil Service’s ingrained culture of secrecy, this might easily be overlooked.

The announcement of data.gov.uk is to be welcomed, but it is only the first step on a long and complex road. The fears expressed by the data custodians, that data might be interpreted differently from the way intended, just shows how much we are still governed by vested interests who act ‘in our own good’. Sorry, give us the data, and let us make our own interpretations, good or bad.

So, data.gov.uk is a good thing. But it could turn into the veritable Pandora’s Box without some kind of agreed framework within which data are interpreted and evaluated. I am indebted to the KIDMM community for flagging-up the fact that a European focus group has been working on this very problem for some time.

The all-Europe Comité Européen de Normalisation (CEN), is a rather shadowy organisation which seems to work on standards issues in the background, and then suddenly spring into the limelight with a proposal for a new ISO standard. One of their workshops – Discovery of and Access to eGovernment Resources (CEN/ISSS WS/eGov-Share) - appears to have done precisely this with (I assume) a proposal to the SC34 working group (ISO/IEC JTC1/SC34/WG3). This working group is concerned with producing standard architectures for information management and interchange based on SGML, and their current focus is the Topic Maps standard Topic Maps (ISO/IEC 13250).

Well, you know me. Any mention of Topic Maps and I’m anybody’s. So when I hear of an initiative which has developed a proposal which specifies a protocol for the exchange of information about semantic descriptions which conforms to the Atom Syndication Format and the Topic Maps Data Model, and moreover, which works with semantic descriptions represented in XTM 1.0, XTM 2.0 and RDF/XML, then, well, Nirvana!

Thanks to KIDMM, if you’re interested (and you should be!), then this is where you can find the full specification of the protocol SDShare: Protocol for the Syndication of Semantic Descriptions.

Let us know what you think of it, and of its potential in making sense of the vast amounts of data due to be released on the Web.

1 Comment | findability, knowledge management, knowledge organization, resource description, resource description-markup-RDF, sense-making, Uncategorized | Tagged: CEN, data, information, meaning, semantic web, semantics, standards | Permalink
Posted by bbater

Open Source Software: A Serious Option At Last?

January 24, 2010

I am increasingly impressed by what open source (OS) software communities are offering. Not just in terms of the sheer range of applications, but by their quality too. That’s an observation vindicated by the recent award of DoD 5015.02 Records Management Certification to Alfresco, according it the kudos of being the first open source product to demonstrate compliance with the strict DoD 5015.02 STD specification for records management. That’s a significant achievement even Microsoft can’t match.

If you visit the Mecca of OS developers, Sourceforge, you’ll find hundreds, if not thousands, of little niche applications of the sort often found on computing magazine cover-CDs, which will be of great use to some, but of no interest to most. But bear with them. Like any jumble sale or bric-a-brac market, you have to plough through the dross to find the jewels. One particular jewel I am playing with at the moment is VirtueMart.

VirtueMart is an OS online e-commerce application, allowing anyone to set up an online sales presence at an incredible level of detail and functionality. It runs under the OS Joomla! CMS, which in itself is a jewel. Although one has to give an equal plug to Mambo, the original OS CMS project from which Joomla! forked some five years ago. Both VirtueMart and Mambo utilise the LAMP development and deployment environment – Linux, Apache, MySQL, PHP – although I’m using the Windows variant WAMP.

Why is this relevant to those interested in KO? Well, because I can’t think of any more complex real-world application requiring solid KO expertise than an e-commerce site. VirtueMart has to support and integrate:

vendor indentity and brand
product classes, categories, instances and descriptions
manufacturer information
site visitors
existing customers
product reviews by customers
multiple payment methods
discount & coupon schemes
ordering & order status reporting
multiple tax regimes
shipping methods & rates

All of these entities (if that’s the right term) have numerous attributes which need to be configurable, depending on what you’re selling. The VirtueMart developers, all of whom have given of their time and expertise freely, have done a really impressive job. Might they have done even better, I wonder, if KO professionals had been prepared to donate their expertise?

——————————-

In the course of sifting through Sourceforge, I discovered a number of applications relevant to KO. I shall be featuring these over the next few weeks in our KOOLTools section, as and when I have the time to test them. Bookmark it.

1 Comment | classification, Classification-design-opportunity, Classification-implementation-example, Uncategorized | Tagged: e-commerce | Permalink
Posted by bbater

Self-Signifying Data

January 13, 2010

Event rescheduled to March 24, 2010

The Digitally-supported Brain

Those ISKO UK members who attended Dave Snowden’s memorable seminar in April 2009 – Human-machine symbiosis for data interpretation – may remember what I remember as a collective gestalt. ‘Self-signifying data’ was the phrase, I believe, and it proved more memorable for many even than Dave’s ‘Blanket Octopodes’ (see Dave’s slide set for an explanation.

Why raise this again now? Well, because ISKO UK member Jan Wyllie and business partner Simon Eaton, have for some time been developing a web site/application based on those very principles of self-signifying data described by Dave Snowden. Their Open Intelligence initiative draws on years of experience in Content Analysis, brought up-to-date through Web 2.0 technology. And it’s highly relevant to KO professionals, because KO technologies – categorisation and taxonomies – are at the heart of the Open Intelligence approach.

If making systematic inferences from communications flows using faceted taxonomies, using content analysis techniques to turn the tables on the knowledge glut, or increasing the value and productivity of work groups because they will be working with a much higher level of common knowledge rings a bell for you, then consider attending Jan Wyllie’s one-day Ark Group Masterclass Content analysis: Using taxonomies to improve collaboration. It’s to be held on 10 February 2010, in London, and will feature “the first public showing of the all new Open Intelligence software dedicated to making the social networking experience of creating collaborative intelligence, an engaging, as well as a valuable and productive use of a community’s knowledge working time.”

Further details of this important and pioneering effort are available on the Open Intelligence web site, where you can also book your place.

Leave a Comment » | classification, Classification-implementation-example, knowledge management, knowledge organization, sense-making | Tagged: collective intelligence, content analysis, open intelligence, self-signifying data, sense-making | Permalink
Posted by iskouk

Trying to please everyone

September 18, 2009

One of the enduring attractions of our profession (that’s information management, knowledge management, records management, information science, knowledge organization – whatever you want to call it) for me, is that it impacts upon everything. Yes, literally, everything. When we build a taxonomy, relate descriptors in a thesaurus or assign keywords, we are mediators among a multiplicity of points-of-view, creeds and catechisms. But while that heterogeneity, that multicultural dimension, is often the root of our sense of fulfilment, contention can lie just below the surface.

To focus on one problem in particular, how can we know whether a taxonomy we build is ‘true’ – or perhaps ‘authoritative’? Is there such a thing as ‘universal truth’? Do we all see things the same way? Or, to put it another way, how do we distinguish between – and accommodate – the subjective and the objective?

For instance, when we build a taxonomy, or a navigation scheme for a web site, how can we capture the viewpoint of the majority, whilst also allowing for the individual – even idiosyncratic – point-of-view? Thus do philosophy and politics enter an otherwise cosy world.

It’s a problem addressed recently by Fran Alexander of the Department of Information Studies, University College London, who mounted a highly stimulating poster at ISKO UK’s conference on 22-23 June 2009. The poster provides an interesting first-sight of the complex nexus among business sector objectives, attendant socio-economic-environmental constraints, and the influence exerted by the relative subjectivity/objectivity of the domain.

The degree to which a conceptual framework is held in common, the coherence of interpretation of that framework among its stakeholders, and the terminological system designed to represent it, all depend upon a process of intersubjective creation of shared meaning within a defined socio-cultural context. In other words, politics. Taxonomy is therefore partly political, partly individual and partly pragmatic.

Melville Dewey deserves his place in the history of KO for his balanced accommodation of all three dimensions at the time he devised the DDC. But we’re over 130 years further on now, and the mix of political, personal and practical elements required to reflect current understanding of the world (or organization) has changed immensely. Dewey’s innocent assumptions drawn from the Weltanschauung of his time, appear at least inappropriate, sometimes biased and often incorrect in a 21st century context.

In a rather adept (and certainly persuasive) essay in the latest issue of Knowledge Organization*, Richard Davies asks ‘Should Philosophy Books Be Treated As Fiction?’. He makes the point that, in the terms used here, the intersubjective creation of meaning in the domain of philosophy has barely occurred; rather the opposite in fact, each philosopher seeming bent upon distinguishing his/her approach from predecessors. This occurs, although to a lesser degree, in most other domains as well, amongst them the 15 or so covered by Fran Alexander’s research.

Fran’s conclusion is that “The mediation of subjectivity/objectivity is becoming increasingly relevant in a ‘user-centric’ age.”. So, an awareness of the degree of ‘objectivity’ of a taxonomy project is becoming vital to its functional effectiveness, and this is inevitably governed to some extent by political considerations and the degree to which the role of the taxonomist is perceived to have a political dimension by those who provide the support for such projects.

This is an interesting piece of research and I urge you to take closer look at Fran’s poster, and to allow it to stimulate your own thoughts on the issues involved.

* Davies, Richard. Should Philosophy Books Be Treated As Fiction? Knowledge Organization, 36(2/3), 121-129.

1 Comment | Classification-conceptualisation-critique, knowledge management, knowledge organization, sense-making, Taxonomy-purpose-critique, Uncategorized | Tagged: Add new tag, intersubjective meaning, subjectivity, taxonomy | Permalink
Posted by bbater

Linked Data: A Crystallizing Vision

August 4, 2009

Semantic. {Credit: Lifeboat Foundation}

It seems like only yesterday that cyberspace was a-buzz with excited predictions of the coming Nirvana of the Internet – the Semantic Web (SemWeb). But commentators were split into two distinct camps. The enthusiasts talked of nothing less than a revolution bringing greatly enriched search and navigation experiences, while the snarks muttered ‘jam tomorrow’ and slunk away to their slithy dens. Well, as might have been expected, neither was quite right. The SemWeb will come, but it will creep up upon us incrementally, often unnoticeably. And guess what? It’s happening already.

Taming the Mashup

I’m not going to be trapped into appearing to describe a precise chronology, but the first evidence I came across that the SemWeb was putting down real roots, was Google Squared. I posted briefly in May 2009 about this and also Google’s Smarter Search, suggesting that these enhancements to the regular Google experience might prove attractive to the users of ‘popular search’, but reserving judgement on their usefulness for enterprise users.

I now think that the ability of Google Squared to pull-in a variety of ‘related’ information is useful, but that ‘useful’ here falls far short of ‘meaningful’. That’s because Google is fixated upon ad hoc keyword frequency correlations, with little or no consideration of semantics. Semantics are key to useful discovery, and the success of search enhancements like Google Smarter Search and Google Squared, will depend upon how much they are willing to trust the SemWeb as it develops, and consequently to what degree they are willing to shift further towards a semantics-based model, as they have done to some degree with these products.

Maintaining GoodRelations with your SearchMonkey

Hot on the heels of Google Squared, comes Yahoo! SearchMonkey. Much like Google’s SemWeb-aware offerings, Yahoo! SearchMonkey provides a means to enhance search results with relevant links, images and structured data derived not only from microformats and RDF embedded in the page, but also from various remote web service APIs. However, SearchMonkey appears to differ from the Google product in one vital respect.

While I can find no reference to Google Squared offering any kind of developer interface, Yahoo! SearchMonkey leaves no doubt that it is designed to allow any host website, or third-party developers, to develop and configure a custom SearchMonkey application tailored to specific requirements. The application can then be distributed virally via site badges and buttons allowing others to add it to their search profiles.

For those interested, you can try-out the Yahoo! SearchMonkey developer interface via a quickstart tutorial for developers, which should answer any nagging questions you may have. Suffice it to say here, that the application allows the developer to specify a URL pattern which will trigger the fetching and display of the enhanced information.

You then have a choice of presenting this information either as an ‘Enhanced Result’ (first figure) that reconfigures the search result itself, or as an ‘Infobar’ pane (second figure) displayed below the result that contains expandable lists of additional information. The example here shows three infobars.

Yahoo! have even gone so far as to explain the psychology of the expected user interaction with each of these devices, and their comments are worth repeating here:

Because Infobars and Enhanced Results behave so differently, users deal with them in fundamentally different ways.

When a user views an Enhanced Result, there is a critical fraction of a second where the user is subconsciously trying to determine whether the result has any relevance. Hence, Enhanced Results use a standard template designed to make it clear that the Enhanced Result is in fact a search result, not an advertisement.

By contrast, when a user views an Infobar, this is always a conscious act. The user has already decided that the result might be relevant, and they are looking for more information. For this reason, Infobars lift some of the restrictions described for Enhanced Result applications.

It seems to me that Yahoo! SearchMonkey is streets ahead of the comparable Google offering. While Google Squared is essentially self-serving (enhanced search results = more users = greater advertising exposure), Yahoo! SearchMonkey opens its technology to the developer community with no strings attached – yet. Of course, it’s entirely possible that once SearchMonkey has caught-on, a revenue model will emerge and become a condition of use. We’ll have to wait and see.

That issue aside, we now come to the really interesting bit (for me, anyway). Which is that Yahoo! SearchMonkey seems to have gone one step further than Google Squared in another sense too. The Google product, to its credit, acknowledges the usefulness of the RDF triple as a source of (inferred) related information. But because its semantic reasoning is still governed largely by ad hoc associations between keywords, it has not yet taken that next step of embracing the SemWeb concept of a defining ontology.

In contrast, Yahoo! SearchMonkey has taken that bold step. In particular, it has adopted an ontology dubbed GoodRelations, which describes the often subtle relationships which can occur between the web resources online vendors maintain, their product or service domain, the precise products or services they are offering online, and the associated prices and terms & conditions.

GoodRelations, Monkey or no Monkey

GoodRelations is a lightweight ontology designed to be used for enriching the information associated with goods and services on the Web. The GoodRelations ontology complements products and services ontologies like eClassOWL, by providing a vocabulary for expressing things such as:

Web site X is offering cellphones of a certain make and model at a certain price
Company Y offers maintenance for pianos that weigh less than 150 kg
Company Z, a car rental company, leases out cars of a certain make and model from a particular set of branches across (this or that) country

GoodRelations has been under development since 2003, by a team led by Prof. Dr. Martin Hepp at the Bundeswehr University München, Germany, supported by various other institutions such as the Austrian BMVIT/FFG, a Young Researcher’s Grant (Nachwuchsförderung 2005-2006) from the Leopold-Franzens-Universität Innsbruck, and by the European Commission under the project SUPER (FP6-026850). It offers comprehensive support for every aspect of e-commerce, details of which are available on the GoodRelations web site.

Officially released on July 28, 2008, the GoodRelations ontology is available under the Creative Commons Attribution 3.0 licence. Under this licence, you are free to copy, distribute and transmit the work; to remix/adapt the work (e.g. to import the ontology and create specializations of its elements), as long as you attribute the work, e.g. by stating “This work is based on the GoodRelations ontology, developed by Martin Hepp” with a link back to http://purl.org/goodrelations/. The GoodRelations ontology has a full range of features, including support for all ISO 4217 currencies, international standards such as ISO 3166, ISO 4217, UN/CEFACT, eCl@ss, and UNSPSC, and for even the oddest product bundles. For instance, it can easily handle an offer of 2 Kg butter + 2 cellphones for €99.

Yahoo!’s adoption of the GoodRelations ontology indicates to me a somewhat ‘purer’ commitment to SemWeb standards than the apparently revenue-besotted Google developments, and also rather validates what Herr Hepp and his team have taken pains to develop. And the release by GoodRelations in April 2009 of the GoodRelations Annotator is the icing on the cake. This is an on-line service where anyone can create a machine-readable description of their business and their range of products using the GoodRelations vocabulary for e-commerce. When the SemWeb finally goes global, such metadata will be worth its weight – or maybe bit-count – in gold.

Juice up Your Web Site

So, that’s the USA and Germany spoken for. What about the UK? Well, as it happens, I can report that UK-based company Talis which specializes in extending the range of the public library OPAC – Online Public Access Catalogue – has unleashed a technology evangelist into the wild, and he’s come up with something rather neat.

Juice in action

Richard Wallis thinks that Internet-savvy library users increasingly want enriched results from the OPAC – links to Amazon, to Google Books, WorldCat, Open Library, LibraryThing, whatever. Consequently, he has developed a couple of JavaScript libraries which can easily be configured to fetch and display related information from selected sources to enhance any search. It goes by the name of Juice – Javascript User Interface Componentised Extension framework. What’s more, Juice is not confined to OPACs; it can be embedded in any web page with just a few lines of code.

Wallis’ innovation doesn’t pretend to conform to SemWeb standards, but instead utilizes common web technologies (Javascript, Ajax) to aggregate data from a variety of sources into a ‘hole in the page’ which you make for that purpose. With just a few tweaks to the standard, downloadable code (available as standard extensions), you can include links to information from such diverse sources as those library-oriented sites mentioned above, and also Copac, Waterstones, del.cio.us, Google Maps and Twitter. Anyone with the necessary skills can develop further extensions, as they wish.

Juice is available under the GNU General Public License v2 and full details and a download of Juice version 0.5 (146KB) are available at the Juice project site on Google Code. There is a useful review of Juice by David Tebbut in Information World Review, which is where I first heard about it. If you want to know even more, then also catch the very entertaining Juice introductory video, which captures a talk given by Richard Wallis at the Code{4}Lib conference in February 2009.

ISKO UK: Linked Data: A Crystallizing Vision

ISKO UK are proposing to run an all-day event on Linked Data in November 2009. We hope to have speakers who will tell us more about Linked Data as defined by Tim Berners Lee and the SemWeb community, some who will describe Linked Data initiatives currently under way (such as at the BBC), and also some who will describe similar and related developments which don’t necessarily fit within the SemWeb definition – like Juice and GoodRelations. If you’d support us by attending such an event, then drop a comment on this post saying ‘Yes Please’ or something equally encouraging.

3 Comments | knowledge organization, metadata, Uncategorized | Tagged: Semantic Technologies | Permalink
Posted by bbater

Death of the document?

June 29, 2009

isko_loves_wave With not even a soupçon of the quagmire I was entering, I recently looked up the definition of ‘document’. In case you didn’t know, the glib dictionary definitions hide a debate that has, well, not exactly raged, but rather limped on for nearly twenty years now. I don’t know, but I guess that it was the arrival of the digital ‘document’ with the first word processors in the early 1980s which sparked it in the first place.

It turns out that there’s no one definition of ‘document’ that everyone’s happy with. We can all agree what a cup is, or a bus, but not, it seems, a ‘document’. And to cap it all, a recent paper in the Journal of Documentation (Frohmann, Berndt. Revisiting “what is a document?”, JDoc 65(2), 2009) tells us that we shouldn’t bother anyway. Shame, I’d been planning to investigate where the ‘document’ stands in the light of Web 2.0, much as Steve Bailey and James Lappin are doing for records. And then what happens? Google announces the death of the document.

How so? Well, instinctively, we humans don’t welcome change. We are ruled by nostalgia – or rather, inertia. Come any new technology, we always try to replicate the old model within it, failing to see that it offers scope for completely new ways of doing things. Web 2.0 is just the catch-all term for a number of such new ways – new models of communication and interaction – Blogs, Wikis, Facebook, Twitter, LinkedIn and now, Google Wave. All of them are document-agnostic.

Pedigree

The team that developed Google Maps moved on to look at the various ways in which ICT supports the ways we communicate and share information. They range from the historic, fixed snapshot (documents, including email and blogs) through the quasi-dynamic SMS and IM to real-time telephony. In all of them, the concept of the link begins to eclipse the concept of the discrete document.

Google Wave integrates the best features of email and IM to move a significant step forward toward the ideals of the Semantic Web. The plus is that discrete, siloed documents are no longer the focus of communication. Rather, documents become just one element in a conversation. And a conversation, one might note, in which any kind of editor function has been eliminated. It remains to be seen how that disintermediation helps or hinders effective information sharing.

Features

Wave offers four main innovative features which take it way beyond conventional email. The first tackles the problem of ‘threading’. A Wave starts with a message, just as in normal email, discussion lists, forums and blogs. However, Wave allows participants’ comments or replies to be embedded in-line in the original message adjacent to the text to which they refer. The logic of the would-be conversation is no longer fragmented across multiple, separate messages, linked only by a tenuous ‘thread’ which is easily broken. The advantages of this consolidation apply to attachments too, which are a pain to find again in anything but the shortest thread. A Wave therefore becomes a multi-participant conversation, complete with associated resources, attached or linked.

Wave’s second key feature builds upon the quasi real-time echoing of participant keyboard input familiar from IM applications. Google’s step forward in this case is to echo updates to all participant screens in as near real-time as current technology allows. No longer do you have to watch that scribbling pencil for seconds that feel like minutes; characters appear virtually as the writer types. This live, as-you-type updating works well with simultaneous multiple editing too.

Thirdly, Wave authors are allowed to specify the scope of participation, from public, to group, to private, and whether each member has read only, authoring or editing rights. The group and private categories can be expanded or contracted at any time.

Lastly, and perhaps the most significant feature of all, participants who join the conversation late don’t lose out. When they join a conversation in progress, they can simply click a button to see each and every change made to the original message up to that point, in a kind of slow-motion automated playback of a wiki page history. The Wave Playback facility could prove to be the silver bullet that records managers have been looking for to bring email under control and to tame the anarchic tendencies of Web 2.0. But it could equally be used also as a point-by-point versioning system where that’s useful.

Google have made the most of the opportunities provided by current technology by including further features, such as context-aware corrections as-you-type (‘Spelly’), detection and insertion of links as you type (‘Linky’), and ‘Polly’, a gadget for conducting surveys and polls. Particularly impressive is ‘Rosy’, a robot drawing on Google Translate which can translate in real-time, as you type, from any of 40 languages. There’s easy linking to Google Maps too, as you might expect, and yet more.

The original Wave video (1h20) can be found on YouTube, while Smarterware have chopped it up into eight 30-60 second chunks for those who can’t afford 80 mins. online. Alternatively, there’s an excellent summary of Wave on Mashable.

But by now you’re asking, ‘OK, nice, but so what?’

Changing how we work

Wave combines previously separate communication applications into an integrated communication space far better resembling what third generation knowledge management sophists revere – the conversation. It enables a whole new level of real-time disintermediated collaborative communication where the document is just one part of a greater whole – the conversation. What’s more, another of Wave’s robots – ‘Bloggy’ – allows Wave content to be published to blogs, or via the Wave API (Application Programming Interface), for whole Waves to be embedded in a blog, or in any Web page come to that.

As if that weren’t enough, Google are making the Wave source code, its XML-based communications protocol and its External API open source. That opens the floodgates for developers around the globe to create extensions and gadgets of any kind imaginable. There is already a Twitter extension –‘Twave’ – which integrates Twitter feeds within a Wave, incoming or outgoing. Although Google obviously hope that most Wave developments will be hosted by them, they are acknowledging the corporate perspective by allowing anyone to run their own Wave server. How that fits with their advertising-based business model remains to be seen.

Implications for KO

Possibly the single most significant thing about Wave is that Google are recognising the potentially unlimited development resources available through the open source community. And that’s where KO might just find a new lease of life. We’re all familiar with the ongoing debate, a little less polarized now than it was four years ago, on formal taxonomies versus folksonomic tagging à la del.icio.us or Technorati. Wave, it seems, has adopted a flat tagging approach similar to Twitter hashtags. However, there’s lots of room between the two for rapprochement, as evidenced by the emergence of RDF-style machine tags (triple tags) on Flickr a while back, or by Wikipedia’s extensive category tree. Open Intelligence, a knowledge-sharing site set up by ISKO UK member Jan Wyllie, is pioneering a faceted tag system which may just provide some clues to where KO might be going in the Web 2.0 world.

It would seem not unreasonable therefore to pose the question whether someone (ISKO UK?) might sponsor some research into how established KO techniques may be applied to findability in Google Wave? It could make for a challenging doctoral dissertation. Then, someone with the necessary technical savvy just might develop a Wave extension allowing tags to be selected from a thesaurus. An attractive prospect, methinks.

Let’s not play catch-up yet again. Let’s get involved!

3 Comments | Uncategorized | Permalink
Posted by bbater

Google Ups its Stakes in the Search 2.0 Race

May 13, 2009

A fortnight ago, I commented that ‘Google deserves to enjoy a brief whiff of schadenfreud’ before Stephen Wolfram launches his computational knowledge engine in May. Well, Google appear to have pipped him to the post in the first round of Search 2.0, although the actual finishing line in the web search race is still nowhere in sight. Of course, it might not exist at all.

Google have unveiled this week, a smarter search which, according to the BBC News item ‘uses semantic web technology’. Smarter search uses any embedded metadata in a web page – metadata in RDF mark-up as well as conventional META tags – to seek and gather information related to the search query, and to display it with each hit in what they call a ‘rich snippet’. Not a Wolfram-blaster on its own. But there’s more.

Google also unveiled Google Squared, which collates information – text-based, numerical, graphical – and displays it in summary form, e.g. a table. Showing a command of smoke-and-mirrors communication rivalling that of politicians, Google spokesperson Marissa Mayer explained:

“What they are basically doing is looking for structures on the web that seem to imply facts. Like something ‘is’ something.”

“Different tables, different structures, and then corroborating the evidence around whether or not something is a fact by looking at whether that fact occurs across pages.”

That’s clear then.

Before you think the balance of schadenfreud might just have tipped back in favour of Stephen Wolfram, Google also announced Google Search Options, a feature which allows users to manipulate search results to refine them, filter them and view them in different ways until they make sense (presumably).

I’m sure these Google enhancements will prove hugely popular, and I sincerely hope Google will continue with its highly innovative approach to squeezing ever more value out of web search. But when the phrase ‘lipstick on a pig’ keeps flashing up in my mind’s eye, I need to remind myself that my benchmark is enterprise search – or deep search – a different animal altogether.

But let’s look on the bright side. At least Google is at last acknowledging the value of metadata.

5 Comments | findability, metadata, resource description, resource description-markup-RDF | Tagged: metadata, Semantic Technologies, web search | Permalink
Posted by bbater

David and Goliath 2.0?

May 1, 2009

There is superficial search and there is deep search. While Google is great at the first, it’s not so good at the second. There are some enterprise search applications which can claim the centre-ground between the deep and the superficial, but most of the runners in that particular race fall somewhere along the way and barely even glimpse the finishing line. Not that it matters any more, apparently, because if search analyst Stephen Arnold is right, search is dead.

Stephen Wolfram

Arnold is right that the domain of knowledge discovery is ripe for an orthogonal change – a disruptive intervention as complexity theorists would call it. Enter US-based British mathematician Stephen Wolfram. Wolfram is no stranger to orthogonal change, having published in 2002, a monster of a book entitled A New Kind of Science (NKS).

NKS essentially proposed that accepted scientific method be augmented by an inverted approach, whereby hypothesis is not solely tested by experimentation, but where experimentation may also generate hypothesis. At 1280 pages, it took me months to read, despite its author writing very lucidly about complex mathematical concepts (maths was never my strong point).

In NKS, Wolfram presents (in narrative and over 1000 illustrations) the results of years of computational experimentation with ‘simple programs’. Simple programs are typified by cellular automata – grids of cells, each of which can exist in some defined ‘state’ with finite values (+ or -, on or off, 1-2-3-4-5 etc.) in any number of dimensions, accompanied by certain rules regarding how adjacent cells may interact in time. Wolfram devised hundreds of such cellular automata and associated interaction rules, then explored, through his Mathematica computation engine, how each of them developed – or not – over time.

Wolfram's depiction of his Rule 150

Result of running Rule 150 over many iterations

He discovered that a significant proportion of them can produce surprisingly complex and sustainable patterns of results (illustrated in the book, as right), some resembling patterns discovered decades earlier by complexity pioneers such as Lorenz and Mandelbrot.

Wolfram was much criticized at the time NKS was published for not employing ‘proper’ scientific method in his research. That’s a bit like criticizing Einstein for straying outside the boundaries of Newtonian physics, it seems to me. He was also criticized for not having any immediate applications for his discoveries.

Well, seven years on, Wolfram appears to be striking back at his critics with the imminent launch of Wolfram Alpha, a ‘computational knowledge engine’ combining Mathematica with principles he first described in NKS.

What’s a ‘computational knowledge engine’? Well, PCMag (29 April, 2009) in the US reported:

“Wolfram Alpha has trillions of pieces of curated data,” Wolfram said. “We’re getting data from both free data and licensed data – some of it is very static. A lot is data from feeds that come into our system, and we’re running through this partially automated, partially human process, correlating data and verifying data. It’s set up so it’s organized and clean and computable.

Wolfram says that there are four main components to Wolfram Alpha (WA): data curation, internal algorithm and computation, linguistic understanding, and automated presentation. The first two components sound a bit like what Google does, and some commentators have gone as far as claiming that WA might even out-Google Google. However, WA appears to be a different kind of application altogether – a knowledge aggregator and synthesizer with real-time presentational graphics. The Washington Post (April 24, 2009) said:

When it was first unveiled in March, Wolfram Alpha, a new type of search engine created by computer scientist Stephen Wolfram, got a lot of buzz. Naturally, some people threw out the “Google killer” title; but it seems to be a different beast, as it’s all about knowledge search. That is to say, you ask a question, and you get an answer; with Google, you ask a question and you get a link to a bunch of documents. That may sound a bit bland, and simplistic, but the select few who have seen it, seem to think it works really well and could be a game changer.

There is considerable cynicism surrounding the WA announcement, and perhaps Google deserves to enjoy a brief whiff of schadenfreud before WA launches publicly in May. We’ve also yet to hear what the Semantic Web community thinks about WA and how it relates (if at all) to what they are trying to achieve. Until we know more about how Wolfram Alpha works and what kind of results it can produce over what domains of discourse, it’s difficult to form an opinion. You can find out whether all the fuss is warranted by keeping an eye on the Wolfram Alpha Blog and monitoring the responses in the specialist media.

1 Comment | findability, knowledge synthesis | Tagged: research report, retrieval | Permalink
Posted by bbater

Topic Maps Go Open Source

April 27, 2009

XTM Topic Maps (ISO 13250) is a Semantic Web-related technology using XML to describe knowledge structures. A number of start-up companies in Europe and the US in the early 2000s initiated programmes to develop applications supporting the creation and navigation of Topic Maps. Of them, only Ontopia in Norway seems to have survived in any commercial sense, with its Ontopia Knowledge Suite (OKS) incorporating the Omnigator Topic Map navigator and Ontopoly Topic Map editor. Despite a committed cadre of enthusiasts across the globe (including myself), Topic Maps as a knowledge organization technology proved difficult to promote outside of Norway. As a result, Ontopia was acquired by Norwegian IT consultancy Bouvet ASA in March 2007.

Bouvet themselves have now acknowledged that Topic Maps does not appear to be a technology with any conventional commercial potential. They have therefore announced that the Ontopia suite of Topic Maps applications is to be made open-source. In my view, this is the best decision they could have made. Topics Maps is an XML mark-up standard with more readily understandable semantics and far greater flexibility for describing the widest variety of knowledge structures than is RDF, as adopted by the Semantic Web developers.

Visit the Ontopia site for further information.