February 20, 2007

The Strange Content Label Incubator Group

Today the W3C Content Label Incubator Group published its final report. I must say I'm mystified of this group. I was when I first learnt about them and I still am.

See, here's what they want to do:

In essence what's required is a way of making any number of assertions about a resource or group of resources. In order to be trustworthy, the label containing those assertions should be testable in some way through automated means.

So now you might guess that they are part of the Semantic Web community - but you would be wrong. It actually seems they are actively trying to avoid the SW label or using RDF.

Lets have a look at an example of what they say about their relation to RDF:

It is anticipated that the primary encoding will be in RDF but that alternatives will be considered: for example, extensions for RSS and ATOM to allow a default cLabel to be declared at the channel/feed level with overriding cLabels at item/entry level.

I think what they are trying to say is that they want to mostly encode their model as RDF/XML. Which immediately begs the question: are the data models compatible, the same even? Well, mostly ... more about that in a sec. And what they then should by saying in the second part of the example is that they still need ways to embed RDF in ATOM and RSS (other than 1.0, obviously).

So, what is their problem with RDF? It's the groups. RDF makes statements about resources but they want to make statements about groups of resources. Now you may point out that indeed OWL at least allows to make statements about classes and that I can describe classes in pretty sophisticated ways ... and I'm not sure whether they have thought about that, but in any case: the groups of resources they are envisioning aren't easily or naturally captured in OWL.

  • As a matter of policy, all content created after 1 January 2005 meets WAI AA standard.
  • Content created after 1 January 2006 meets the Mobile Web Initiative's mobileOK standard.
  • There is no sex or violence in any content but resources whose URLs contain the word "-pg" may portray bare breasts, bare buttocks, alcohol or gambling.
  • The content is organized in such a way that the genre of a resource (pop, film, fashion etc.) can be inferred from its host, such as http://fashion.example.com
  • All material is copyright Exemplary Multimedia Company
  • Some metadata is unique to a given resource, such as title and author. This can be accessed using a URI associated with the resource. This might be a URL, an internal ID number or the resource's ISAN number.

But then - you will have a hard time defining any formalism (other than a full fledged imperative programming language) that can. And you'll still need some metadata attached to each element - how else will I know when it was created? And in any case why? Why do I need such complex groups? And I think it is there where their argument fully collapses, here's what they say:

Rather than spend considerable time and effort to create a complete set of metadata for each resource, the Exemplary Multimedia Company wishes to group resources together for descriptive purposes.

= Because the content provider can't be bothered to apply these rules himself (which he could easily do by a script - where he would have a full fledged imperative language at his disposal). I think that's a pretty weak excuse to throw away a standard.

At some other location in the document they give a different reason for their dislike of RDF - that it does not allow defaults.  That's just false, OWL does not, but the logic programming approaches under discussion for the rule stack do ... RDF is agnostic to these things. In any case that's not the point. I'd guess the creators of filter software that a part of this community have created their filter files with these kind of groups - and they want to keep it this way. And maybe, for this kind of applications, its even making sense - as a way to preserve bandwidth .. but if this is the real reason, then it should be argued this way (although I'd still say its wrong)

There are more problems with their current document, but this post is already ridiculously long. Just one more example: they specify trust as a core problem in the mission statement and then barely touch it (that reminds me of a different community that does the same ... ahh, that'll be the Semantic Web community)

Labels:

0 Comments:

Post a Comment

<< Home