Via momolondon list: opencellid.org data dumps

The readme.txt file describes the tabular data structure (split into a cells, and a measures file).

I think the cells data is the one most folk will be interested in re-using. Table headings are:

# id,lat,lon,mcc,mnc,lac,cellid,range,nbSamples,created_at,updated_at
For example:
7,44.8802,-0.526878,208,10,18122,32951790,0,2,2008-03-31 15:22:22,2008-04-07 08:57:33

This could be RDFized using something similar to the (802.11-centric) Wireless Ontology. Perhaps even using lqraps

 My main OpenID provider is currently LiveJournal, delegated from my own danbri.org domain. I suspect it’s much more likely that danbri.org would go offline or be hacked again (sorry DreamHost) than LJ; but either could happen!

In such circumstances, what should a ‘relying party’ (aka consumer) site do? Apparently myopenid has been down today; these are not theoretical scenarios. And my danbri.org site was hacked last year, due to a DreamHost vulnerability. The bad guys merely added viagra adverts; they could easily have messed with my OpenID delegation URL instead.

I don’t know the OpenID 2.0 spec inside-out (to put it mildly!) but one model that strikes me as plausible: the relying party should hang onto FOAF and XFN ‘rel=me’ data that you’ve somehow confirmed (eg. those from http://danbri.org/foaf.rdf or my LJ FOAF) and simply offer to let you log in with another OpenID known to be associated with you. You might not even know in advance that these other accounts of yours offer OpenID; after all there are new services being rolled out on a regular basis. For a confirmed list of ‘my’ URLs, you can poke around to see which are OpenIDs.

danbri$ curl -s http://danbri.livejournal.com/ | grep openid
<link rel=”openid.server” href=”http://www.livejournal.com/openid/server.bml” />

danbri$ curl -s http://flickr.com/photos/danbri/ | grep openid
<link rel=”openid2.provider” href=”https://open.login.yahooapis.com/openid/op/auth” />

Sites do go down. It would be good to have a slicker user experience when this happens. Given that we have formats - FOAF and XFN at least - that allow a user to be associated with multiple (possibly OpenID-capable) URLs, what would it take to have OpenID login make use of this?

More great music-related stuff from Yves Raimond. He’s just announced (on the Music ontology list) a D2RQ mapping of the MusicBrainz SQL into RDF and SPARQL. There’s a running instance of it on his site. The N3 mapping files are on the  motools sourceforge site.

Yves writes…

Added to the things that are available within the Zitgist mapping:

  •  SPARQL end point
  •  Support for tags
  • Supports a couple of advanced relationships (still working my way  through it, though)
  • Instrument taxonomy directly generated from the db, and related to performance events
  • Support for orchestras

This is pretty cool, since the original MusicBrainz RDF is rather dated (if it’s even still available). The new representations are much richer and probably also easier to maintain.

Nearby in the Web: discussion of RDF views into MySpace; and the RDB2RDF Incubator Group at W3C discussions are getting started (this group is looking at technology such as D2RQ which map non-RDF databases into our strange parallel world…)

For many years, the 24×7 IRC chatrooms #swig and #foaf (and previously #rdfig) have been logged to HTML and RDF by Dave Beckett’s IRC logging code. The RDF idiom used here dates from 2001 or so, and looks like this:

 <foaf:ChatChannel rdf:about=”irc://irc.freenode.net/swig”>
<foaf:chatEventList>
<rdf:Seq>
<rdf:li>
<foaf:chatEvent rdf:ID=”T00-05-11″>
<dc:date>2008-04-03T00:05:11Z</dc:date>

<dc:description>
danbri: do you know of good scutter data for playing with codepiction? would be fun to get back into that (esp. parallelization).
</dc:description>
<dc:creator>
<wn:Person foaf:nick=”kasei”/>
</dc:creator>
</foaf:chatEvent>

Dave has offered to make a one-time search-and-replace fix to the old logs, if we want to agree a new idiom. The main driver for this is that the old logs have a class for ‘chat event’  but use an initial lowercase later for the term, ie. ‘chatEvent’ instead of ‘ChatEvent’. None of these properties are yet documented in the FOAF schema, and since the data for this term is highly concentrated, and its maintainer has offered to change it, I suggest we document these FOAF terms as used, with the fix of having a class foaf:ChatEvent instead of foaf:chatEvent.

Almost all  RDF vocabularies stick to the rule of using initial lowercase letters for properties, and initial capitals for classes. This makes RDF easy to read for those who know this trick; consequently a lowercase class name can be very confusing, for experts and beginners alike. I’d therefore rather not introduce one into FOAF if it can be avoided. But I would like to document the IRC logging data format, and continue to use FOAF for it.

The markup also uses the wordnet:Person class from my old Wordnet 1.6 namespace (currently offline, but will be repaired eventually, albeit with a later Wordnet installation). This follows early FOAF practice, where we minimised terms and used Wordnet a lot more. I suggest Dave updates this to use foaf:Person instead. The dc:creator property used here might also use the new DC Terms notion of ‘creator’, since that flavour of ‘creator’ has a range of DC Terms “Agent” that is a more modern and FOAF-compatible idiom. This btw is a candidate for using instead of foaf:maker, which I introduced with some regret only because the old dc:creator property had such weak usage rules. But then if we change the DC namespace used for ‘creator’ here, should we change the other ones too? Hmm hmm hmm etc.

The main known consumer of this IRC log data is XSLT created and maintained by the ubiquitous Dave. If you know of other downstream consumer apps please let us know by commenting in the blog or by email.

While there are other ways in which such chatlogs might be represented in RDF (eg. SIOC, or using something linked to XMPP), let’s keep the scope of this change quite small. The goal is to make a minimal fix to the current archived instance data  and associated tools, so that the FOAF vocabulary it uses can be documented.

Comments welcomed…

I just signed up to give a talk at the Microformats vEvent in London, May 27th; thanks to the organizers (Frances Berriman and Drew McLellen of microformats.org) for inviting me :)

I’ve called it “One Big Happy Family: Practical Collaboration on Meaningful Markup” and my goal really is to help make it easier for enthusiasts for both RDF and Microformats to say ‘we‘ rather than ‘they‘ a bit more often when discussing complementary efforts from this community. As I said on the foaf-dev list yesterday, “anything good for Microformats is good for FOAF”; vice-versa too, I hope. There’s only one Web and we’re all doing our bit, with the tools and techniques we know best.

Here’s the abstract:

This talk explores some ways in which the Microformat and RDF approaches can complement each other, and some ways in which we can share data, tools and experiences between these two technologies. It will outline the often-unarticulated history of the RDF design, the techniques used for parsing and querying RDF data, and the things made easy and hard through this approach. RDF techniques can be contrasted with the different choices made for Microformats. However these differences obscure an underlying similarity that comes from shared ‘Webby’ values.

Edit: it seems I’m incapable of spelling “compl[ie]mentary”. Freudian slip? :)

BTW the London Web Week site has just gone live; check it out…

The Internet is beginning a fundamental transition into the broadband, commercial information superhighway of the future. Today, the Internet offers immediate opportunities for commercial applications by connecting millions of PC, Macintosh and workstation users with businesses and organizations around the world. Tomorrow, as network capabilities and performance increase, this global link will deliver interactive services, information and entertainment into consumers’ homes. Mosaic Communications Corporation intends to support companies and consumers throughout this transition, and to accelerate the coming of this new era with tools that ease and advance online communications.

Mosaic Communications Corporation: Who We Are: Our Story: The Future of Interactive Media

jwz and friends have restored mcom.com to it’s former 1994-era glory, reminding us that the future’s always up for grabs.

I was really very impressed by Obama’s speech this week. And somewhat suprised to hear a major US politician speak on complex, subtle issues in thoughtful, nuanced terms. For non USAmericans, I think it’s often hard to empathise with US-style patriotism; in particular, the seeming impossibility of seeing the US as anything other than a shining beacon of goodness, as the world’s policeman. Watching the warmongering, flag-waving television news in the US in 2001/2 terrified me, and left me feeling like an alien in a strange land. But this speech did give me a vivid sense of an America that one could admire, even aspire to live in, and one that was a lot more honest with us all about its failings and difficulties, as well as justifiably proud of its many strengths. I really think this was a landmark speech, and one that shows the US at its best.

I was reading around the various responses, and so here’s a quick link-dump. Daily show; The Onion; Andrew Sullivan; Juan Cole; Fox News.

And also Daily Kos quoting of all people Mick Huckabee:

And one other thing I think we’ve gotta remember. As easy as it is for those of us who are white, to look back and say “That’s a terrible statement!”…I grew up in a very segregated south. And I think that you have to cut some slack — and I’m gonna be probably the only Conservative in America who’s gonna say something like this, but I’m just tellin’ you — we’ve gotta cut some slack to people who grew up being called names, being told “you have to sit in the balcony when you go to the movie. You have to go to the back door to go into the restaurant. And you can’t sit out there with everyone else. There’s a separate waiting room in the doctor’s office. Here’s where you sit on the bus…” And you know what? Sometimes people do have a chip on their shoulder and resentment. And you have to just say, I probably would too. I probably would too. In fact, I may have had more of a chip on my shoulder had it been me.

Quite so. And Huckabee deserves praise for acknowledging this. A similar perspective would bring some rationality to foreign policy discussions too. Understandably, most of the commentary we’ve seen on this speech have been on US domestic politics. But one point that seems to have gone underemphasised in the commentary I’ve read (even beyond The Onion!) is that for all the folk inside the US sympathising with Wright’s “God Damn America” outburst, there are hundreds or thousands or more out here in the rest of the world who are frustrated, angry and outraged by the actions of successive US governments. Giving the world a US president who seems capable of acknowledging this and beginning to address it would be a breath of fresh air. Elect him already! :) (and not that guy who jokes about killing my friends, please…).

Speaks, reads, writes
Stephanie Booth asks:

 I vaguely remember somebody telling me about some emerging “standard” (too big a word) for encoding language skills. Or was it a dream?

That would’ve been me, showing markup from the FOAFX beta from Paola Di Maio and friends, which explores the extension of FOAF with expertise information. This is part of the ExpertFinder discussions alongside the FOAF project (see also wiki, mailing list). FOAFX and the ExpertFinder community are looking at ways of extending FOAF to better describe people’s expertise; both self-described and externally accredited. This is at once a fascinating, important and terrifyingly hard to scope problem area. It touches on longstanding “Web of trust” themes, on educational metadata standards, and on the various ways of characterising topics or domains of expertise. In other words, in any such problem space, there will always be multiple ways of “doing it”. For example, here is how the Advogato community site characterises my expertise regarding opensource software: foaf.rdf (I’m in the Journeyer group, apparently; some weighted average of people’s judgements about me).

One thing FOAFX attempts is to describe language skills. For this, they extend the idiom proposed by Inkel some years ago in his “Speaks, Reads, Writesschema. In the original (which is Spanish, but see also English version), the classification was effectively binary: one could either speak, read, or write a language; or one couldn’t. You could also say you ‘mastered’ it, meaning that you could speak, read and write it. In FOAFX, this is handled differently: we get a 1-5 score. I like this direction, as it allows me to express that I have some basic capability in Spanish, without appearing to boast that I’m anything like “fluent”. But … am I a “1″ or a “2″? Should I poll my long-suffering Spanish-speaking friends? Take an online quiz? Introducing numbers gives the impression of mathematical precision, but in skill characterisation this is notoriously hard (and not without controversy).

My take here is that there’s no right thing to do. So progress and experimentation are to be celebrated, even if the solution isn’t perfect. On language skills, I’d love some way also to allow people to say “I’m learning language X”, or “I’m happy to help you practice your English/Spanish/Japanese/etc.”. Who knows, with more such information available, online Social Network sites could even prove useful…

Here btw is the current RDF markup generated by FOAFX:

<foaf:Person rdf:ID="me">
<foaf:mbox_sha1>6e80d02de4cb3376605a34976e31188bb16180d0</foaf:mbox_sha1>
<foaf:givenname>Dan</foaf:givenname>
<foaf:family_name>Brickley</foaf:family_name>
<foaf:homepage rdf:resource="http://danbri.org/" />
<foaf:weblog rdf:resource="http://danbri.org/words/" />
<foaf:depiction rdf:resource="http://danbri.org/images/me.jpg" />
<foaf:jabberID>danbrickley@gmail.com</foaf:jabberID>
<foafx:language>
<foafx:Language>
<foafx:name>English</foafx:name>
<foafx:speaking>5</foafx:speaking>
<foafx:reading>5</foafx:reading>
<foafx:writing>5</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:language>
<foafx:Language>
<foafx:name>Spanish</foafx:name>
<foafx:speaking>1</foafx:speaking>
<foafx:reading>1</foafx:reading>
<foafx:writing>1</foafx:writing>
</foafx:Language>
</foafx:language>
<foafx:expertise>
<foafx:Expertise>
<foafx:field>::</foafx:field>
<foafx:fluency>
<foafx:Language>
<foafx:name>English</foafx:name>
</foafx:Language>
</foafx:fluency>
</foafx:Expertise>
</foafx:expertise>
</foaf:Person>

The apparent redundancy in the markup (expertise, Expertise) is due to RDF’s so-called “striped” syntax. I have an old introduction to this idea; in short, RDF lets you define properties of things, and categories of thing. The FOAFX design effectively says, “there is a property of a person called “expertise” which relates that person to another thing, an “Expertise”, which itself has properties like “fluency”.

The FOAFX design tries to navigate between generic and specific, by including language-oriented markup as well as more generic skill descriptions. I think this is probably the right way to go. There are many things that we can say about human languages that don’t apply to other areas of expertise (eg. opensource software development). And there many things we can say about expertise in general (like expressions of willingness to learn, to teach, … indications of formal qualification) which are cross domain. Similarly, there are many things we might say in markup about opensource projects (picking up on my Advogato mention earlier) which have nothing to do with human languages. Yet both human language expertise and opensource skills are things we might want to express via FOAF extensions. For example, the DOAP project already lets us describe opensource projects and our roles in them.

The Semantic Web design challenge here is to provide a melting pot for all these different kinds of data, one that allows each specific problem to be solved adequately in a reasonable time-frame, without precluding the possibility for richer integration at a later date. I have a hunch that the Advogato design, which expresses skills in terms of group membership, could be a way to go here.

This is related to the idea of expressing group-membership criteria through writing SPARQL queries. For example, we can talk about the Group of people who work for W3C. Or we can talk about the Group of people who work for W3C as listed authoritatively on the W3C site. Both rules are expressible as queries; the latter a query that says things about the source of claims, as well as about what those claims assert. This notion of a group defined by a query allows for both flavours; the definition could include criteria relating to the provenance (ie. source) of the claims, but it needn’t. So we could express the idea of people who speak Spanish, or the idea of people who speak french according to having passed some particular test, or being certified by some agency. In either case, the unifying notion is “person X is in group Y”, where Y is a group identified by some URL. What I like about this model, is it allows for a very loose division of labour: skill-related markup is necessarily going to be widely varied. Yet the idea that such scattered evidence boils down to people falling into definable groups, gives some overall cohesion to this diversity. I could for example run a query asking for people with (foafx idiom) “Spanish skills of 2 or more”. I could add a constraint that the person be at least a “Journeyer” regarding their opensource skills, according to Advogato, or perhaps mix in data expressed in DOAP terms regarding their roles in opensource project work. These skills effectively define groups (loosly, sets) of people, and skill search can be pictured in venn diagram terms. Of course all this depends on getting enough data out there for any such queries to be worthwhile. Maybe a Facebook app that re-published data outside of Hotel Facebook would be a way of bootstrapping things here?

I’m in Cork, mainly for the excellent Social Network Portability event on Sunday, but am also staying through Blogtalk’08 which has been great. I’ve uploaded my slides from my talk (slideshare in Flash, included inline here, or a pdf). I have some rough speaking notes too,  maybe I’ll get those online. I have no idea how they relate to whatever actually came out of my mouth during the talk :) Apologies to those without PDF or Flash. I haven’t tried Keynote’s HTML output yet.

Basically much of what I was getting at in the talk, and my thoughts are only just congealing on this … is that the idea of a ‘claim’ is a useful bridge between Semantic Web and Social Networking concerns. Also that it helps us understand how technologies fit together. FOAF defines a dictionary of terms for making claims, as does xfn, hCard. RDF/XML, Microformats, RDFa, GRDDL define textual notations for publishing documents that encode claims, and SPARQL gives us a way of asking questions about the claims made in different documents.

Next Page »