WWW2007 Workshop on Tagging and Metadata for Social Information Organization (12 February 2007 / 8 May 2007)

The WWW2007 Workshop on Tagging and Metadata for Social Information Organization will be held in Banff (Canada).

We hope to bring together researchers as well as practitioners to explore social, design and computational aspects of tagging and social information organization. The topic space is wide, but the following are some of the areas of special interest:

  • Semantics. Ontology and hierarchy creation; Semantics issues in cross-system tagging; Standardization efforts.
  • Cognition. The cognitive aspects of categorization; Organizing and retrieving tagged objects.
  • Social networks. Social sharing; Relationship building; Ratings systems and collaborative filtering; Identity management self-presentation; Tagging in blogs, wikis, etc.
  • Usability and Interfaces. Search navigation, browsing and filtering; Novice users and tags; Evaluating existing tagging environments and user behavior; Motivations for tagging.
  • Multimedia. Tagging multiple kinds of media; tagging across media types; Tagging at varying scales; Non-text-based annotations and tags.

Innovative Collaborative Filtering Venture going down: Findory

Findory was one of the most innovative collaborative filtering application ever: it provided a recommender system to help you find interesting news. Greg is letting Findory go:

Findory appears to have sufficient resources to run on autopilot through most of 2007. Findory will eventually fade away, but I believe it has touched immortality through the impact it had.

Apparently, Greg wants “to spend more time on health and with family.” Also, it is no secret that Greg was unhappy about the popularity of the site and Findory was facing the “me too!” routine with all the big players copying his ideas or simply creating similar tools.

I think that in the coming decade, dropping projects and reducing your work load to spend more time on your family will become a cool thing to do.

Projects and companies have to come and go. We should not be slaves to our projects.

Color terminal under Mac OS X

One thing that annoys me since I started using Mac OS X is that there is no color in the terminal. So I added the following lines to my .bashrc file:


export TERM="xterm-color"
alias ls="ls -G"
PS1="\[\033[01;32m\]\u@\h\[\033[01;34m\] \w \$\[\033[00m\] "

For some reason, I also had to add the following line at the end of the global bashrc file (/etc/bashrc) so that my user bashrc file is read:


. ~/.bashrc

See also my post I have had it with Firefox under MacOS.

Subscribe to this blog
in a reader
or by Email.

No Great Researcher is Special

Peter Turney tells us that there is almost no instance of a great discovery or invention that was not discovered independently and simultaneously. I think it is very important to keep this in mind, given our culture of overinflated egos.

Peter then asks a very important question. If his theory is true, and he has strong backing, then why is this fact getting so little recognition?

Turing award recipient tears apart artificial intelligence

Nice article by Peter Naur, Turing award recipient, in the last Communications of the ACM (January 2007). He takes on the hypothesis that the human mind is nothing by a Turing machine, and tears it apart:

(…) human thinking basically is a matter of the plasticity of the elements of the nervous system, while computers—Turing machines—have no plastic elements.

Naturally, this is not the first time someone objects to this hypothesis which motivates the field of Artificial Intelligence (most famously Roger Penrose also objected with a completely different line of argument).

I’m not sure I understand why we can’t model the neural system in a Turing machine. My concern would be more to see whether we can do so in in almost O(n) time or where n is the number of neurons or another complexity metric. It would suffice that machine intelligence requires prohibitive (not necessarily NP-hard) computations for Naur’s point to hold. Even if, on paper, you could simulate a brain in O(n) time, you still have to demonstrate you have numerical stability.

He takes a shot at Natural Language Processing:

Talking of verbal `word senses’ given by `sets of linguistic contexts’ is an impossible way of describing human linguistic activity. Choosing between alternative senses of a polysemous word does not arise when people speak. (…)Typically the meaning of a word is ephemeral, entirely a matter of the particular conversation taking place.

The guy is interesting and has clearly a sizeable ego:

I have tried to have these articles published in journals, so far without any success. The present presentation, when published in the Communications of the ACM, will in fact be the first presentation of the Synapse-State Theory of mental life to appear in a journal. So I am clearly at the beginning of that twenty year period that it usually takes to have a scientific breakthrough accepted.

He is clearly a trouble-maker and I’m sure that Communications of the ACM debated whether to publish his article. But it is hard to deny publication right to a Turing Award recipient. Maybe some will regret he was awarded the prize in the first place.

I am now expecting a debate to ignite. Naur has made it clear that he expects to be around another 20 years to defend his point. This should prove interesting. Much is at stake.

Too Much Semantics is Harmful in Information Technology

It has become evident that, in the realm of Web Services, the REST paradigm is taking over while the Service-oriented Architecture Protocol (SOAP) is progressively being forgotten except in some academic circles and by some companies interested in selling tools to alleviate the pain1.

Here is what Clay Shirky was saying in 2001:

This attempt to define the problem at successively higher layers is doomed to fail because it’s turtles all the way up: there will always be another layer above whatever can be described, a layer which contains the ambiguity of two-party communication that can never be entirely defined away.

No matter how carefully a language is described, the range of askable questions and offerable answers make it impossible to create an ontology that’s at once rich enough to express even a large subset of possible interests while also being restricted enough to ensure interoperability between any two arbitrary parties.

The sad fact is that communicating anything more complicated than inches-to-millimeters, in a data space less fixed than stock quotes, will require AI of the sort that’s been 10 years away for the past 50 years.

The main reason being put forward is that SOAP is simply too complex. But does complexity means here? The Web is something incredibly complex if you consider how many parts it has, yet, we consider it to be simple.

How to recognize a simple technology? The first criteria any engineer would use is the number of points of failures. SOAP architectures can break in many more ways than REST architectures, and so they are more complex. Meanwhile, theoretical computer science teaches us that something is more complex if it requires more CPU cycles to run. Well, SOAP architectures are also more complex in this light as well, as there is simply a lot more XML going around and the requests are far more verbose.

I’d like to propose that there is another criteria for complexity. And that’s semantics. One should always aim for the simplest possible solution… and providing lots of semantics is not a simple feat. SOAP architectures necessarily include semantics to define the meaning of terms used in the description and interfaces of the service. This is totally absent from REST architectures. It is not so much that there is no semantics in the REST paradigm, but it is kept extremely simple: you only need to know about the semantics of the main HTTP operation (POST, GET, PUT and DELETE). In fact, the wikipedia REST entry includes the following citation attributed to Roy Fielding:

REST’s client-server separation of concerns simplifies component implementation, reduces the complexity of connector semantics (…)

I think this is fundamental. What makes REST simple is that it reduces the amount of semantics the software has to worry about.

Why would semantics be a bad idea? Well, simply because semantics implies coupling, and too much coupling makes a system too complex. Without any coupling, we cannot do anything, but when we throw too much, we harm the system. What type of coupling are we talking about? Well, if I pass the variable x to the function f, there is relatively little coupling. All I do is that I establish a relationship between the function f and the variable x. But what if x is mean to be the cost of a product? Then x must be tied explicitly to the product ID, to some price identifier, and so on. This makes the system harder to maintain, harder to debug, and more failure-prone.

Fundamentally, software design is about communication. But not communication between machines… rather communication between developers. And communications between distributed folks works much better when the message they need to send to each other is kept very simple. That is why the SOAP philosophy is fundamentally flawed.

So, when you design software, you should include as little semantics as possible as this will make your system simpler, and thus, easier to manage.

This is, of course, contrary to what AI enthusiasts do.

1. See recent posts by Larry O’Brien and Nick Gall.

Priority R-Tree

I have been doing some reading on Priority R-Trees (PR-Trees). R-trees are the equivalent of B-trees for spatial data. Apparently, PR-Tree perform must like R-trees on average, but with a much better worst case analysis.

Reference

Lars Arge, Mark de Berg, Herman Haverkort, and Ke Yi, The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree, In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD ’04), Paris, France, June 2004, 347-358.

Human Brain Evolution Slows To A Crawl

According to Chung-I Wu from the University of Chicago the human brain hardly evolves anymore.

The researchers say the slowdown may be due to the increasingly complexity of interactions within the brain. “We know that proteins with more interacting partners evolve more slowly,” Wu said. “Mutations that disrupt existing interactions aren’t tolerated. On the basis of individual neurons of the brain, humans may indeed have a far more active, or even more complex, transcription profile than chimpanzee. We suggest that such abundant and complex transcription may increase gene-gene interactions and constrain coding-sequence evolution.

Any mathematician worth its salt will think about an iterative algorithm reaching a local maximum and no longer able to improve the solution by much. Are human beings a local maximum of evolution?

What is interesting also is that chimps are now evolving faster than we do. This feels like a David Brin novel!

(Source: Peter Turney.)

Peter Turney launches his blog

Peter Turney now has a blog. Who is Peter? One of the most interesting researcher I ever met. It is not so much that the research he does is different from any other research, but it is the way he does it.

His first post is rather deep, but I was able to follow most of it. He makes us a promise:

I suspect that it may be possible to reduce attributional similarity to relational similarity, and I have started sketching an algorithm for performing this reduction.

Maybe we will learn more about it later?

Outsourcing Email: Universities Switching to Google Apps for Education

My friend Owen sent me this article about Lakehead University switching more than 38,000 e-mail accounts to Google Mail in three days. Here’s the core of the article:

Because it is getting the whole suite for no charge and it is entirely hosted by Google rather than on university hardware, the university expects to save $2 million to $3 million a year on maintenance and about $6 million annually on infrastructure.

And, Jafri said, students, staff and faculty now get 2GB each of storage space, versus 60MB with the old system. In addition, he expects Google to deliver 99 per cent availability. “It’s very hard for us to get to that level of availability.”

There is no doubt in my mind that there is a progressive but, eventually, extensive outsourcing of email functions for all small to medium organizations. What is interesting is that public institutions are starting to do it and it gets in the press.

(Disclaimer: I have switched to Gmail a long time ago and never looked back.)

« Previous PageNext Page »

19 queries. 0.411 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.