Valentin's Blog: February 2007

QEDWiki

Very very cool, hadn't heard of it before: a mashup creating wiki system for the enterprise:

QEDWiki is a lightweight mash-up maker written in PHP 5 and hosted on a LAMP, WAMP, or MAMP stack. A mash-up assembler will use QEDWiki to create a personalized, ad hoc Web application or mash-up by assembling a collection of widgets on a page, wiring them together to define the behavior of the mash-up application, and then possibly sharing the mash-up with others. Mash-up enablers provide QEDWiki with a collection of widgets that provide application domain- or information-specific functionality. These widgets are represented within QEDWiki as PHP scripts.

They also have a good demo video (although the speaker is annoying - somebody should tie his hands next time).

I hadn't had time to test it - but the demo looks amazing.

Labels: Mashup

Synonyms (and homonyms) are really the boring basis for semantic search and I'd probably be one of the first to say that someone building semantic search shouldn't spend to much time on that because it's just not exiting enough ... but, if there where a search engine that handled this well, it would have just saved me half a day. It could have told me that what the knowledge engineering community calls knowledge formulation by "Domain Experts" and "Subject Matter Experts" is called End User Programming in the Software Engineering community. And similarly that provenance (or traceability) is Lineage in the DB community. And don't get me started about Algorithmic Debugging aka Declarative Debugging, Declarative Diagnosis, Guided Debugging, Rational Debugging aka Deductive debugging.

But then - at this level these labels are often not synonyms but similar concepts - and then it might be interesting again. The query "Similarity based semantic search" still returns nothing ;-)

Labels: SemanticWeb

The Strange Content Label Incubator Group

Today the W3C Content Label Incubator Group published its final report. I must say I'm mystified of this group. I was when I first learnt about them and I still am.

See, here's what they want to do:

In essence what's required is a way of making any number of assertions about a resource or group of resources. In order to be trustworthy, the label containing those assertions should be testable in some way through automated means.

So now you might guess that they are part of the Semantic Web community - but you would be wrong. It actually seems they are actively trying to avoid the SW label or using RDF.

Lets have a look at an example of what they say about their relation to RDF:

It is anticipated that the primary encoding will be in RDF but that alternatives will be considered: for example, extensions for RSS and ATOM to allow a default cLabel to be declared at the channel/feed level with overriding cLabels at item/entry level.

I think what they are trying to say is that they want to mostly encode their model as RDF/XML. Which immediately begs the question: are the data models compatible, the same even? Well, mostly ... more about that in a sec. And what they then should by saying in the second part of the example is that they still need ways to embed RDF in ATOM and RSS (other than 1.0, obviously).

So, what is their problem with RDF? It's the groups. RDF makes statements about resources but they want to make statements about groups of resources. Now you may point out that indeed OWL at least allows to make statements about classes and that I can describe classes in pretty sophisticated ways ... and I'm not sure whether they have thought about that, but in any case: the groups of resources they are envisioning aren't easily or naturally captured in OWL.

As a matter of policy, all content created after 1 January 2005 meets WAI AA standard.
Content created after 1 January 2006 meets the Mobile Web Initiative's mobileOK standard.
There is no sex or violence in any content but resources whose URLs contain the word "-pg" may portray bare breasts, bare buttocks, alcohol or gambling.
The content is organized in such a way that the genre of a resource (pop, film, fashion etc.) can be inferred from its host, such as http://fashion.example.com
All material is copyright Exemplary Multimedia Company
Some metadata is unique to a given resource, such as title and author. This can be accessed using a URI associated with the resource. This might be a URL, an internal ID number or the resource's ISAN number.

But then - you will have a hard time defining any formalism (other than a full fledged imperative programming language) that can. And you'll still need some metadata attached to each element - how else will I know when it was created? And in any case why? Why do I need such complex groups? And I think it is there where their argument fully collapses, here's what they say:

Rather than spend considerable time and effort to create a complete set of metadata for each resource, the Exemplary Multimedia Company wishes to group resources together for descriptive purposes.

= Because the content provider can't be bothered to apply these rules himself (which he could easily do by a script - where he would have a full fledged imperative language at his disposal). I think that's a pretty weak excuse to throw away a standard.

At some other location in the document they give a different reason for their dislike of RDF - that it does not allow defaults. That's just false, OWL does not, but the logic programming approaches under discussion for the rule stack do ... RDF is agnostic to these things. In any case that's not the point. I'd guess the creators of filter software that a part of this community have created their filter files with these kind of groups - and they want to keep it this way. And maybe, for this kind of applications, its even making sense - as a way to preserve bandwidth .. but if this is the real reason, then it should be argued this way (although I'd still say its wrong)

There are more problems with their current document, but this post is already ridiculously long. Just one more example: they specify trust as a core problem in the mission statement and then barely touch it (that reminds me of a different community that does the same ... ahh, that'll be the Semantic Web community)

Labels: SemanticWeb

Mandatory Email Signature

--
FZI Forschungszentrum Informatik an der Universität Karlsruhe Haid-und-Neu-Str. 10-14
D-76131 Karlsruhe
Tel.: +49-721-9654-0
Fax: +49-721-9654-959

Stiftung des bürgerlichen Rechts
Stiftung Az: 14-0500.1 Regierungspräsidium Karlsruhe

Vorstand:
Prof. Dr.-Ing. Rüdiger Dillmann
Dipl. Wi.-Ing. Michael Flor
Prof. Dr. Dr.-Ing. Jivka Ovtcharova
Prof. Dr. rer. nat. Rudi Studer
Vorsitzender des Kuratoriums:
Ministerialdirigent Günther Leßnerkraus

Name of the company, address, form of organization, organization registration location and number, names of the directors and the head of the supervisory board. This, in short, is what I and every employee in Germany has to append to every (business) email send - mandated by law. Not kidding here - in a striking show of just how little they know about electronic communication the German government made a law equating business emails with snail-mail (where this kind of information used to be included in small print in the footer). The definition of business mail is so broad that it has to be understood as every email send by an employee of a company to someone on the outside. Not including this kind of information results in a fine for the company. It is still unclear whether this laws applies to SMS (like status SMS send by mobile phone providers) ... And no, it is not allowed to append a vcard (not every application can read them) or include a link to a website with this information.

I'm just happy that the responsible persons apparently did not yet hear of thing called "Instant Messaging" - it would just not be the same if every message had a 10 line signature ...

Labels: Web

Apple Knowledge Navigator

Apple's 1990ish vision of a computer interface of the future - not that different from descriptions of how people should interact with Semantic Web agents (Video below or watch it at Google Video).

Labels: AI, SemanticWeb

Not Really Sure If I Want To ...

But anyway - as you probably guessed from the picture above I'm now the trillionth person to stumble around in Second Life. My avatar is named Valentin Biedermann - Biedermann is a German word for a respectable, no fun person. I intend to life up to this label by permanently carrying a sign: "Ban Genitals* in Second Life" - as soon as I figure out how to create such a sign.

* This is probably the pretence Google was waiting for to ban this blog (again). Ah well, I myself was surprised to see that there are people finding my old blog with search strings containing "gay sex blog".

Labels: SecondLife

Sporadic Link Post

Some links from the past weeks:

Fiji and Vienna - the Windows versions after Vista?

Mozilla does Microformats: Firefox 3 as information broker

Amazon's Webservices in 5 minutes

A periodic table of visualization methods

Wikipedia search engine WikiSeek launches

Flickr launches machine tags

An inside look at cybercrime

Mobile phone firms seek their own search engine

Neither Google nor Internet able to scale for Web TV?!

And as usual: you can find all links at del.icio.us, the newest 15 are also always shown in the sidebar of this blog.

Labels: links

Yahoo's Pipes

From O'Reilly Radar:

Yahoo! Pipes was released today with the goal of allowing people to easily mix, match, filter, sort and merge data sources into RSS feeds. These resulting RSS feeds are called Pipes and they allow you to do things like find all of the parks in your city or convert the news to Flickr photos. The product allows you to browse pipes, search for pipes, share pipes, or clone somebody else's pipe.

More:

Yahoo!'s new Pipes service is a milestone in the history of the Internet. It's a service that generalizes the idea of the mashup, providing a drag and drop editor that allows you to connect Internet data sources, process them, and redirect the output. Yahoo! describes it as "an interactive feed aggregator and manipulator" that allows you to "create feeds that are more powerful, useful and relevant." While it's still a bit rough around the edges, it has enormous promise in turning the web into a programmable environment for everyone.

Very cool stuff, but "generalizing the idea of the mashup" - wasn't this the job of the Semantic Web? (yea, a pipe like service on RDF would be much, much, cooler).

Even more here and here.

Labels: Mashup, SemanticWeb

Crowdsourcing Surveillance

This morning I spend some time trying to find Jim Gray. Jim Gray is renowned computer scientist that went missing after a solo trip with his sail boat. A group of people got together to organize a satellite run over the area where he is suspected, they separated the picture into many tiles and through Amazons Mechanical Turk site you can volunteer to analyze this data; try to find foreign objects on the satellite pictures. You can read more about it here.

Sadly I didn't find anything that looked really promising - I hope someone else has more luck.

However, I think that this kind of "crowdsourcing of surveillance" has a big future - the next time a large number of surveillance tapes need to be looked at fast after a terrorist attack - why not let volunteers do it? Have Israeli citizens see real time surveillance footage to look out for rocket launchers in Lebanon? Have relatives of soldiers back home monitor the perimeter of bases while the soldiers rest? US citizens can already help monitor the southern border from an Internet computer.

Labels: Web

Public Health And Cybercrime

From a BBC article summarizing the talk of Vint Cerf in Davos

Mr Cerf, who is one of the co-developers of the TCP/IP standard that underlies all Internet traffic and now works for Google, likened the spread of botnets to a "pandemic".
Of the 600 million computers currently on the Internet, between 100 and 150 million were already part of these botnets, Mr Cerf said.
Botnets are made up of large numbers of computers that malicious hackers have brought under their control after infecting them with so-called Trojan virus programs.
While most owners are oblivious to the infection, the networks of tens of thousands of computers are used to launch spam e-mail campaigns, denial-of-service attacks or online fraud schemes.

Technology writer John Markoff said: "It's as bad as you can imagine, it puts the whole Internet at risk."

I think likening the spread of botnets to a pandemic points in the right direction - some of the tradeoffs are similar to those faced in public health. For example the cost of protection (license for the virus scanner and the resulting lower performance of the computer) are born by the user of the computer, but we all benefit from less vulnerable computers, i.e. smaller botnets. Its very similar to vaccination - getting vaccinated caries some risk and the best case for each person is that she is the only one without a vaccination: No risk of getting the disease and it was not even necessary to take the small risk of vaccination.

It is for this reason that government involvement could make us all better off. The government could either reward the positive side effects of protecting a computer by giving away free licenses for virus scanners (the same kind of reasoning that lets governments often pay for vaccinations). Or it could regulate - similar to mandatory vaccinations. It could demand that PCs be sold only with virus scanners valid for at least five years, that each email provider must scan incoming emails and that ISP's must protect their costumers with firewalls... yes, it could make PC's, email accounts and Internet connection more expensive - but the cost for dealing with SPAM, DDOS and other kinds of cybercrime would decrease.

Labels: Web

Transclusion: Fixing Electronic Literature

Enjoyable Google tech talk of Ted Nelson (famous for example for coining the terms hypertext and hypermedia) about how he thinks the web should be structured (his vision includes typed links ;-). I don't think the presented vision is realistic for a large part of the web - but the talk is worth the 40 minutes (plus 15 minutes discussion). My favorite quote (taken a bit out of context):

You can read my apologies in [...] for any part I had in creating html links [...] I think its one of the worst thinks that has happened to the human race [...]

Or watch it at Google Video.

Labels: Web

Valentin's Blog

February 21, 2007

QEDWiki

Semantic Search and Synonyms

February 20, 2007

The Strange Content Label Incubator Group

February 16, 2007

Mandatory Email Signature

February 11, 2007

Apple Knowledge Navigator

Not Really Sure If I Want To ...

February 8, 2007

Sporadic Link Post

Yahoo's Pipes

February 4, 2007

Crowdsourcing Surveillance

February 3, 2007

Public Health And Cybercrime

Transclusion: Fixing Electronic Literature

About

Newest Links

Publications

Archive