October 2, 2008

Tackling the Curse Of Prepayment - Collaborative Knowledge Formalization Beyond Lightweight

We finally came round to write up our ideas on how to overcome the motivation and incentive problems for collaborative heavyweight knowledge formalization:

This paper argues for collaborative incremental augmentation of text retrieval as an approach that can be used to immediately show the benefits of relatively heavyweight knowledge formalization in the context of Web 2.0 style collaborative knowledge formalization. Such an approach helps to overcome the "Curse of Prepayment"; i.e. the hitherto necessary very large initial investment in formalization tasks before any benefit of Semantic Web technologies is visible. Some initial ideas about the architecture of such a system are presented and it is placed within the overall emerging trend of "people powered search".

You can read the entire paper here. I will present it at the INSEMTIVE workshop at this year's ISWC; if you're in Karlsruhe, it would be great to see you there!

Labels: , ,

March 2, 2008

Collaborative Knowledge Formalization Beyond Lightweight - Tackling the Curse of Prepayment; Part II

This is the second in a series of three posts - you may wish to start with the first.

'Knowledge' Does Dot Equal 'Knowledge'

When the collaborative knowledge formalization community talks about 'knowledge' they mean something quite different from what most of the Uppercase Semantic Web community or knowledge based systems community think. The collaborative knowledge formalization community thinks of taxonomies, thesauri, skos or of structured data; the other communities are thinking of Logic Programs, Description Logics, OWL or First Order Logic. Current collaborative knowledge formalization approaches just don't support the formalisms that are commonly associated with knowledge formalization.
Now you might argue that this must be this way - that highly formal representations are just not well suited to be edited in the web2.0 style collaboration that is the topic of the collaborative knowledge formalization community. Indeed this may be the case, but its surely worth trying. There is no definite argument proving that highly formal representations cannot be edited in this way and I believe that trying to bring knowledge formalization with more powerful and more complex formalisms to the crowd will at the very least bring advances in robust reasoning and usable knowledge formalization interfaces.

The Challenges Of Using More Heavyweight Formalisms

There are, however, many challenges entailed in moving to more heavyweight formalisms. Challenges such as:

  • Usability / Debuggability: Formalisms such as OWL or First Order Logic are harder to understand, in particular errors are much harder to find.
  • Robustness: A single faulty statement added to a knowledge base with a million of axioms may break everything. Unless this problem is tackled, open collaborative knowledge formalization is impossible.
  • Performance and the  Language Expressivity / Performance tradeoff: Current reasoners for representation languages such as OWL or FOL could not dream of supporting a continuously updated knowledge base of even a fraction of the size of Wikipedia; hence something would have to give: there would have to be restrictions on language expressivity, reasoning algorithms that do not achieve soundness and/or completeness, or languages that are not purely declarative would have to be used.
  • Mixed Formality: the kind of collaborative knowledge formalization approaches discussed here rely on incremental and partial formalization- hence the data store is never fully formalized, contains data at different levels of formality. Current reasoning approaches are not well suited to tackle this.

The Curse of Prepayment - Again

All of the problems in the previous section are real and important - but there is one that trumps them all - the question of what is the immediate benefit of formalizing even small parts of a data store? What do I get from spending time and/or money on bringing a part of my data store to a more formal level? Having answered this question then allows me to decide the tradeoffs needed to address the challenges described in the previous section.

Here the collaboration knowledge formalization community has the same problem as the wider Semantic Web community: "what exactly do I get in extra benefit from using OWL? And is this worth the effort?". I believe there is an answer to that questions - but I'll describe it in the next installment of this series*.

* The first ever cliffhanger on this blog ;)

Labels: ,

February 20, 2008

The CKC Challenge: Exploring Tools for Collaborative Knowledge Construction

The 'challenge' in which some tools for the collaborative creation of structured knowledge were compared and in which SOBOLEO participated has now been described in a IEEE Intelligent systems publication (that is freely available here, you can also find a longer technical report version here).

The publication is light on real conclusions, but is a decent overview of tools for the collaborative creation of structured knowledge. In the conclusions you can also read that we are working on integrating SOBOLEO and BibSonomy - true, but somewhat surprising for me to see it announced publicly by people other than the BibSonomy guys or us.

Labels:

Collaborative Knowledge Formalization Beyond Lightweight - Tackling the Curse of Prepayment; Part I

The Curse Of Prepayment
The Curse of Prepayment is also often referred to as the Chicken-Egg problem of Semantic Technologies: Semantic Technologies promise great functionality once a great amount of knowledge is formalized. And because knowledge formalization is difficult, often not well supported and cumbersome you need to make a great up-front investment before you see any functionality. Now this insight is not new at all, there are already numerous approaches that try to address it; of particular interest here are approaches that try to harness web2.0 ideas for this task. These web2.0 approaches to knowledge formalization can be roughly separated into two groups

  1. The first group is based on the observation that lots of people are successfully creating structured data with tagging applications. These approaches then try to extend these systems with a bit more structure, a bit more formality. Our own soboleo system, GroupMe, Int.ere.st, Bibsonomy and gnizr are examples for these kinds of systems.
  2. The second group of systems start from the observation that people are spending large amounts of time creating semi-structured data in wikis. These system then try to give people the tools and the support such that they can create data with more structure, more formality. The Semantic Media Wiki, Freebase, IkeWiki and MyOntology are example for these kinds of systems.

Making Every Penny Count, Immediately
What makes these systems interesting, what gives them a chance to tackle the Curse of Prepayment are five closely related properties:

  • Simple: Formalization is simple, can be done with little training, little effort and not only by logic experts.
  • Collaborative: Formalization can be done jointly in a group - in this way the cost is spread over multiple persons; the prepayment needed from every person is reduced. 
  • Incremental: Not everything needs to be formalized at once, formalization can be done incrementally.
  • Partial: The tools can work with data stores that are only partly formalized, that contain data at different levels of formality.
  • Immediate: Formalized data can be used immediately, immediately brings some benefit to the user.

Together these five properties can be summed up as: "Making Every Penny Count, Immediately". There is an immediate benefit for formalizing even small parts; and because these systems are simple and collaborative, formalizing these small parts is relatively cheap.

The exact nature of this 'immediate benefit' differs between the systems mentioned above, for example it is:

  • Tables and less redundant data: The unique selling point of the Semantic Media Wiki: as soon as just a few attribute values have been specified, these can be used to create tables and overview pages that before had to be maintained manually.
  • Hierarchical Organization: In systems like Soboleo or Bibsonomy tags can be organized hierarchically, this allows for more effective maintenance of the tag repository as well as for more effective navigation and retrieval. This works after having just one such relation.
  • Advanced Search: For example in the SOBOLEO system adding just one synonym for a tag/concept will already improve the search experience, searching for this synonym will then also consider the documents annotated with the topic.

This post is the first in a series of three posts, the next will focus on the challenges for collaborative knowledge formalization we encounter when moving beyond the very lightweight formalisms currently employed in the tools mentioned above. 

Labels: ,

October 3, 2007

The Ontology Maturing Approach to Collaborative and Work-Integrated Ontology Development: Evaluation Results and Future Directions

A paper by the usual suspects (Andreas Walter, Simone Braun, Andreas Schmidt and me) about some evaluation results and our plans for future work surrounding the ontology maturing ideas.  It will be presented at the Workshop on Emergent Semantics and Ontology Evolution at the ISWC 2007.

Ontology maturing as a conceptual process model is based on the assumption that ontology engineering is a continuous collaborative and informal learning process and always embedded in tasks that make use of the ontology to be developed. For supporting ontology maturing, we need lightweight and easy-to-use tools integrating usage and construction processes of ontologies. Within two applications – Imagination for semantic annotation of images and SOBOLEO for semantically enriched social bookmarking – we have shown that such ontology maturing support is feasible with the help of Web 2.0 technologies. In this paper, we want to present the conclusions from two evaluation sessions with end users and summarize requirements for further development.

The entire paper can be downloaded here.

Labels: ,

July 17, 2007

SOBOLEO: Vom kollaborativen Tagging zur leichtgewichtigen Ontologie

(Please excuse the German)

Bisher gibt es kein integriertes Werkzeug, welches sowohl die kollaborative Erstellung eines Indexes relevanter Internetressourcen („Social Bookmarking“) als auch einer gemeinsamen Ontologie, welche zur Organisation des Indexes genutzt wird, integriert unterstützt. Die derzeitigen Werkzeuge gestatten entweder die Erstellung einer Ontologie oder die Strukturierung von Ressourcen entsprechend einer vorgegebenen, unveränderlichen Ontologie bzw. ganz ohne jegliche Struktur. In dieser Arbeit zeigen wir, wie sich kollaboratives Tagging und kollaborative Ontologieentwicklung vereinen lassen, so dass jeweilige Schwächen vermieden werden und die Stärken einander ergänzen. Wir präsentieren SOBOLEO, ein System, das kollaborativ und web-basiert die Erstellung, Erweiterung und Pflege von Taxonomien und gemeinsamer Lesezeichensammlung ermöglicht und gleichzeitig die Annotierung von Internetressourcen mit Konzepten aus der erstellten Taxonomie unterstützt.

For once a German publication (this may be my first) about the relations between tagging and semantic annotations (and Soboleo). Its written by the usual people (Simone Braun, Andreas Schmidt and me) and will presented at the German Mensch und Computer conference by Simone.

It focusses on the question why you could want to augment tags with more structure. You can access the entire paper here.

Labels: ,

June 28, 2007

Defining Folksonomy

Recently I skimmed over the (interesting) proceedings of the "Bridging the Gap between Semantic Web and Web 2.0" workshop. What surprised me was, that for all the papers talking about Folksonomies, there was no convincing definition of Folksonomy. So I thought I share one with you:

A Folksonomy is the computer stored record of the use of labels by many people.

This might be a bit surprising at first, but I think it'll become clearer when I discuss two often used candidate definitions:

First, wikipedia: "A Folksonomy is a user generated taxonomy used to categorize and retrieve web content such as Web pages, photographs and Web links, using open-ended labels called tags". The problem with this definition (and all that argue a similarity to taxonomies) is that the most salient feature of a Taxonomy is the explicit representation of a hierarchical structure - and that's something a Folksonomy lacks. So in this sense a Folksonomy is more like a controlled vocabulary - except that it isn't controlled .... so that leads nowhere.

Second, based on Peter Mikas groundbreaking "Ontologies are us" paper some people say that a Folksonomy is a tripartite graph of persons, concepts and documents. There's nothing wrong with that, but for me that's still an incomplete definition because it does not try to capture what is represented by this graph; it only talks about the very basic structure.  

In the end a Folksonomy really only is a computer accessible sample of the use of language to name things. But then - naming things is at the core of language, conceptualizations and ontologies, and having a simple way to observe it (as imperfect as it may be) is no small thing!

Labels: ,

June 8, 2007

The Perils of Tagging

If you try to find pictures of the European Semantic Web Conference on Flickr using the tag eswc2007 all you currently find are hundreds of pictures from the Electronic Sports World Cup 2007 - an event sharing the same acronym and hence the same tag. Oh the irony..

Labels: , ,

April 25, 2007

Try Out Tools For The Collaborative Construction Of Structured Knowledge

One of the main activities in CKC 2007 is an open challenge to evaluate a number of tools for collaborative constructions of knowledge. The aim of the challenge is to test existing approaches to figure out collectively what we expect of such tools, which features are useful and which are not, and which direction the field should follow. 

Please go to:
http://km.aifb.uni-karlsruhe.de/ws/ckc2007/challenge.html

For example you can try out collaborative Protégé or our SOBOLEO tool I described earlier.

Labels:

March 30, 2007

SOBOLEO

We present SOBOLEO, a system for the webbased collaborative engineering of SKOS ontologies and annotation of web resources. SOBOLEO enables the simple creation, extension and maintenance of taxonomies. At the same time, it supports the annotation of web resources with concepts from this taxonomy.

And another publication. I talked about SOBOLEO before, but now it got its own demo paper. SOBOLEO is a rather nice tool that brings together Semantic Search, a lightweight annotation tool (AJAX bookmarklet) and a collaborative real time taxonomy editor. In the coming days SOBOLEO (and all other demos) will also be available for the workshop participants to try out - I'm curious how that'll work out; how well it will be able to holds its own against the likes of collaborative Protégé and Bibsonomy (I always like it when the other approaches that you dismiss in the "related work" section are actually present at the same workshop :)

It's also published at the Workshop on Social and Collaborative Construction of Structured Knowledge @ WWW. Authors are the developers of SOBOLEO - Valentin Zacharias and Simone Braun. You can read the entire (demo) paper here.

Labels: ,

Ontology Maturing: a Collaborative Web 2.0 Approach to Ontology Engineering

Most of the current methodologies for building ontologies rely on specialized knowledge engineers. This is in contrast to real-world settings, where the need for maintenance of domain specific ontologies emerges in the daily work of users. But in order to allow for participatory ontology engineering, we need to have a more realistic conceptual model of how ontologies develop in the real world. We introduce the ontology maturing processes which is based on the insight that ontology engineering is a collaborative informal learning process and for which we analyze characteristic evolution steps and triggers that have users engage in ontology engineering within their everyday work processes. This model integrates tagging and folksonomies with formal ontologies and shows maturing pathways between them. As implementations of this model, we present two case studies and the corresponding tools.

This paper was accepted to the workshop on Social and Collaborative Construction of Structured Knowledge @ WWW2007 (authors are Simone Braun, Andreas Schmidt, Andreas Walter, Gabor Nagypal and me). You can read the complete text here.

It got pretty good reviews, so you might actually want to read it :)

Labels: ,

December 8, 2006

Ontology Maturing with Lightweight Collaborative Ontology Editing Tools

Another publication. Will be presented at the Workshop on Productive Knowledge Work : Management and Technological Challenges (ProKW), 4th Conference on Professional Knowledge Management - Experiences and Visions (WM 2007).

Authors are Simone Braun, Andreas Schmidt and me.

Ontology building is an important prerequisite for state-of-the-art semantic technologies for knowledge worker support. But ontology engineering methods have so far neglected the early phase of ontology building where a conceptualization only exists rather informally and underlies continuous evolution through collaboration and interaction within the community. We have to view ontology building as a maturing process that requires collaborative editing support and the integration into the daily work processes of knowledge workers. In spirit of current Web 2.0 applications, we present an AJAX-based lightweight ontology editor as a first approach to this problem.

I won't be at the conference, but the other two will be. My role in writing the paper was rather small anyway. I did, however, do most of the work in defining and implementing one "lightweight collaborative ontology editing tool" presented in the paper. A rather nice AJAX application. A collaborative editor for a subset of SKOS. The cool thing about the editor is that it really support truly collaborative work - Google Spreadsheet style;  i.e. two people can really change the same concept at exactly the same time and nothing will break. Users see almost realtime* updates of the changes other people do to the same taxonomy.

The paper is not yet online, but someday you'll find it here.

* depending on configuration and connection - but maybe a third of a second.

Labels: ,