Hierarchy of Collaborative Filtering Distribution

I think that, increasingly, both creators and clients want to regain control. The beauty of it is that I think that businesses can be built on putting customers back in charge. To a large extend, I keep prefering Amazon to my local bookstore in part because, I have more control when using Amazon.

Increasingly, we are seeing that the creator want to stay in control. Publishers increasingly struggle to stay in charge, but they fight a losing war. The next logical step is that the clients will want more “control” as well.

This issue lead me to designing this “Hierarchy of Collaborative Filtering Distribution”1.

Definition: In a collaborative filtering recommender system, we have two type of human agents: the creators who want to sell their content, and the clients who are willing to share some of their preferences. By this definition, Google is typically not a collaborative filtering recommender system.

Level 1. The data (and goods) are centralized. The creators relinquish total control. The clients need to trust one entity with its preferences. The business value is in controlling the channel, the data and providing good tools. (Think: Amazon) (Think: Standard distribution channels)

Level 2. Only the meta-data is centralized. The creators keep the control, but the clients need to trust one entity with their preferences. Some of the business value lies in the client’s metadata. (Think: inDiscover.)

Level 3. Both the data and the metadata is distributed and only the aggregation needs to happen at one point of contact. The clients and the creators use interoperable tools and data format and keep tight control of their data. The business value is in the tools and services themselses, not in the data. (Think: Semantic Webish applications.)

Regarding Music, going to the level 3 is not hard. Sites like inDiscover and webjay already make playlists available in XML. This is where the work of people like Lucas Gonze on XML formats for MP3 playlists can become interesting.

Imagine a world were artists post on various web sites, not only their MP3s, but also, some standard XML file allowing aggregation. Imagine also that users posts their playlists (indiscover and webjay users do this already). We then have the possibility for a level 3 distributed recommender system “à la Semantic Web”.

This can then be very interesting research-wise and business-wise.

Update: Rod Savoie points me to DLORN (Distributed Learning Object Repository Network) as a related tool.

1- I checked and this concept appears to be new. If you ever use it, you have to cite this blog entry! There is related work however, such as Tomas Olsson, Bootstrapping and Decentralizing Recommender Systems, 2003 and Resource Profiles by Stephen Downes (also in 2003).

Open University announces £5.6m project to make learning material free on the internet

The Open University (OU) announced a £5.6m project to make learning material free on the internet. The OU is the largest university in the UK by the number of students.

The Open University today announced a GBP £5.65 million (US $9.9 million) project to make a selection of its learning materials available free of charge to educators and learners around the world.

Supported by a grant of US $4.45 million from The William and Flora Hewlett Foundation the University will launch the website in October 2006.

The provision on the internet of ‘Open Educational Resources’, free at point of use and available to everyone, reflects The Open University’s mission of promoting fair access for all. During the initial phase of this initiative, the University will select and make available educational resources from all study levels from access to postgraduate and from a full range of subject themes: arts and history, business and management, health and lifestyle, languages, science and nature, society and technology. Learners will also be able to benefit from a range of study skills development material.

In this wikipedia and OpenCourseWare (OCW) era, free, systematic and organized educational online content is not, in itself, a very impressive proposition.

What puzzles me is how you can spend so much money just to “select and make available some educational resources”. Some web developers are really overpaid these days! (yes, that’s a joke)

Kidding aside, with this kind of budget, I’m really looking forward to see what they’ll put together. It better be more impressive than a few pages with PDF documents on them. And it better be more than “here’s the first lesson in our calculus course, enter your VISA card number here if you want to take up the rest of the course”.

With this kind of money, they should be able to innovate somewhat. Let’s see what they come up with.

Palo – Open-Source MOLAP for Excel

I guess I knew about Palo but I haven’t blogged it yet. Anyhow. Palo is a free open source MOLAP backend for Excel with .Net and Java API. They now have a Linux version of their server. While I’m not an Excel user myself, not even a Windows user, Palo has the potential to turn Excel into a very potent Business Intelligence tool, for free. It won’t be quite as scalable, with regard to the number of dimensions, as commercial tools, but it is probably faster than most alternatives.

The company behind it, Jedox, looks interesting. But I would be a bit scared to have a company rely on a product (Excel) that I don’t control. Microsoft can, at any time, change deeply the way Excel works through a major update. Whatever technology they have, could become obselete overnight if Microsoft so wishes. Sounds scary.

(Disclaimer: I have never used Palo nor any Jedox product or service.)

FIFO Data Structure in Python

This post is obselete, see this more recent discussion.

For some odd reason, Python doesn’t come with a good FIFO data structure (as of 2.4). Here’s one.


class Fifo:
def __init__(self):
self.data = [[], []]
def append(self, value):
self.data[1].append(value)
def pop(self):
if not self.data[0]:
self.data.reverse()
self.data[0].reverse()
return self.data[0].pop()
def __len__(self):
return len(self.data[0])+len(self.data[1])
def tolist(self):
temp= self.data[0][:]
temp.reverse()
return temp+self.data[1]

China and India as academic powerhouses?

Ian cites the Guardian:

China and India – already braced to become two of the world’s greatest economic powers – are now expected to become two of its most important academic powerhouses.

What I am waiting for is for the business world to shift to Asia. Right now, blue collar jobs, tech jobs, and now even academic jobs, are moving to Asia, but it seems business people get to keep their jobs, for now. This drives the students to get business degrees. The typical argument to explain is that the markets are in the Western world. For how long?

On the other hand, how can we explain that with the hard time Asia is giving us, the unemployment rate remains very low in Canada?

The answer is that we are driven by the service industry. But can we sustain a service industry if the rest of the industrial backbone is in Asia? For how long?

UCR Time Series Classification/Clustering page

Eamonn Keogh just posted this announcement:

We are please to announce a new resource for researchers working on time series classification and clustering.

The UCR Time Series Classification/Clustering page has the largest collection of test datasets (with objective class labels) in the world.

Furthermore, in order encourage research that is both reproducible and comparable to other works, we have created training/test data splits, and listed results for some standard benchmarking algorithms such as Euclidean Distance and Dynamic Time Warping.

We have also made Matlab code available that allows all results to be exactly reproduced.

The page is located here: http://www.cs.ucr.edu/~eamonn/time_series_data/

We encourage donations of datasets/results and suggestions/bug fixes.

The Cost of Graduation

From Ian I got to this Observer article. The story is that university education is becoming less attractive as the costs increase and the salaries for graduates don’t correspondingly increase. The net result is that universities might be in an increasingly competitive game as western universities progressively lose their edge in the world wide market and as students look for more cost efficient alternative.

(…) more and more A-level students ask about alternatives to university, said the author of the research, Peter Brown, director of Gabbitas Educational Consultants. (…) we are seeing Chinese universities [are also] more financially attractive.

Asian universities stand to win big. In the western world, the first university to offer high quality, but significantly cheaper university education, by essentially cutting down on the fat and keeping what really matters, is going to win big time.

We are at a pivotal point where it might be good timing for a radical rethinking of university education.

Did I mention that you can listen to Stanford lectures on your ipod? See http://itunes.stanford.edu/. Chances are good that Stanford will be among the winners.

On moving a sofa around a corner

I just thought it was a cool title for a paper (if anyone has read it, let me know if it is any good):


On moving a sofa around a corner

in Geometriae Dedicata, Volume 42, Number 3, June 1992

Joseph L. Gerver

A necessary condition is given for a region of the plane to have the greatest possible area of any region able to move around a right-angled corner in a hallway of unit width. A region is constructed, with area 2.2195… and bounded by 18 analytic pieces, which satisfies this condition. It is conjectured that this is the unique region of maximum area.

Problem Solving Heuristics

Ian recalls some of the basic problem solving heuristics:

  • If you are having difficulty understanding a problem, try drawing a picture.
  • If you can’t find a solution, try assuming that you have a solution and seeing what you can derive from that (“working backward”).
  • If the problem is abstract, try examining a concrete example.
  • Try solving a more general problem first. This is the “inventor’s paradox”: a more ambitious plan may actually have more chances of success.

While I never studied these heuristics, I think I use them all. I probably learned them by trial and error. Maybe we ought to teach those.

I would add a few which I feel are very potent:

  • Try to sketch a solution hastily, then try to find faults in your solution.
  • If you can’t solve a problem, try to solve a related, but simpler problem.
  • If you can’t solve a problem, try dividing into smaller problems (divide-and-conquer).

The Combinatorial Object Server

It looks like it is quite old, but I found the Combinatorial Object Server for the first time this week and I thought I’d share it with my readers. I was looking for irreducible polynomials with binary coefficients (don’t ask why) and I found that this server can generate them on the fly for you! A beautiful application of the web.

Here are some things it can do:

  • Permutations and their restrictions
  • Subsets or Combinations
  • Permutations or Combinations of a Multiset
  • Set Partitions
  • Numerical Partitions and relatives
  • Binary, rooted, free and other trees
  • Necklaces, Lyndon words, DeBruijn Sequences
  • Irreducible and Primitive Polynomials over GF(2) to GF(5)

This reminds me a bit of the famous Plouffe’s inverter which, given a floating point number, will give you a matching mathematical constant.

Next Page »

19 queries. 0.399 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.