Tag-Cloud Drawing: Algorithms for Cloud Visualization

You might have noticed tag clouds if you are into Web 2.0. I think they are an interesting widget on their own. With Owen Kaser, I wrote a paper on tag-cloud optimization. I expect this will be a popular topic. Our paper has a definitively hard component, with lots of (non-human) experimental evidence:

Tag clouds provide an aggregate of tag-usage statistics. They are typically sent as in-line HTML to browsers. However, display mechanisms suited for ordinary text are not ideal for tags, because font sizes may vary widely on a line. As well, the typical layout does not account for relationships that may be known between tags. This paper presents models and algorithms to improve the display of tag clouds that con- sist of in-line HTML, as well as algorithms that use nested tables to achieve a more general 2-dimensional layout in which tag relationships are considered. The first algorithms leverage prior work in typesetting and rectangle packing, whereas the second group of algorithms leverage prior work in Electronic Design Automation. Experiments show our algorithms can be efficiently implemented and perform well.

To appear in the proceedings of the Tagging and Metadata for Social Information Organization (WWW 2007) workshop.

Download it on arxiv.

Making Firefox Prettier under Linux

I think Firefox is a capable browser. It is also multiplatform. Unfortunately, I feel that the Linux version is sometimes far uglier than the Windows version. This has to do, partly, with the fact that Firefox uses GTK. This Gentoo HOWTO on how to Integrate Firefox with KDE helped me quite a bit making Firefox look nicer.

Yes. Looks do matter.

Google Summer of Code and Collaborative Filtering

Andre reports that the Taste Collaborative Filtering library was selected by the Google Summer of Code program. You can check out the various summer project ideas. Among those, you have multidimensional rating algorithms: for a reference, see my 2003 paper with Harold Boley and others.

Seems like Google likes collaborative filtering. Last year, the collaborative filtering algorithm Slope One was integrated in the Drupal project.

The big scientific questions in computing have been answered

The Economist tell us about the downfall of Vannevar Bush’s research model: ideas are conceived in universities and then passed on to industry for commercialization. I honestly do not know if this model was ever true. I do not think that academic research has ever been about producing ideas that industry can commercialize.

Many people are quite happy to consider that academic research is meant to address longer term issues whereas industry must tackle immediate needs, but as the article points out, there are major flaws in this theory. Most industry researchers are part of solid teams where people all have (relatively) long lasting jobs. Meanwhile, university researchers are often surrounded by students who come and go. In universities, the researchers often must keep seeking funding, whereas, in industry laboratories, funding is the manager’s problem. Longer term thinking? Let us look at most university researchers publication lists: often we see lots of short papers, lots of short time goals. Maybe these all fit together into a long term plan, maybe they don’t. Meanwhile, increasingly, industry researchers must go from the fundamental ideas all the way to the product, a really long cycle in comparison.

It would be more honest to say that industry researchers have to try to deliver commercial products whereas academic researchers do not, rather than to assume that one is either complimentary to the other. I do not think that industry researchers spend much time reading academic papers. They are too busy learning about how their ideas can be taken to market. Even if you have the best laboratory in the world, and you have Claude Shannon, and Hamming and so on, working for you, in 2007… it might not help you that much. The Nortel of the world have their software written in India and their hardware components designed in China.

What about Computer Science? The article is quite pessimistic about academic research:

In Bush’s time the science that went into computing was itself closer to basic research. By contrast, many of the big scientific questions in computing have been answered—at least well enough for companies to find that innovation emerges from new ways of arranging today’s technologies rather than inventing new ones.

Fine. But we have to be careful about what this means. It only means that given our industrial needs and abilities, building a new generation of tools is not worth it. We prefer to build iPods out of existing parts. It is simply too expensive at this point, given what the potential gains might be, to build drastically new concepts about computing. But it could all change suddenly.

Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element

My paper Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element will appear in the Nordic Journal of Computing. Despite the scary title, anyone with an elementary Mathematics background can read and appreciate the paper. There are mathematical proofs, but they are simple. The algorithm itself is very simple (that is the whole point of my paper). I am pretty proud of this small paper because I think I was able to find one of the few remaining basic algorithmic problems that did not have a satisfactory solution: people have worked on this problem for the last 20 years and it seems to me that they found increasingly complicated solutions, whereas what I propose goes back to the roots (it looks a lot like the early solutions that were proposed).

This is subjective, of course, since people patented previous solutions to this problem and some would object that they already have better solutions (in some technical sense). Some may also object that it is not a basic problem, or that there are many more basic problems out there to be found. Well, read my paper and see for yourself.

Here is the abstract:

The running maximum-minimum (max-min) filter computes the maxima and minima over running windows of size w. This filter has numerous applications in signal processing and time series analysis. We present an easy-to-implement online algorithm requiring no more than 3 comparisons per element, in the worst case. Comparatively, no algorithm is known to compute the running maximum (or minimum) filter in 1.5 comparisons per element, in the worst case. Our algorithm has reduced latency and memory usage.

There is one very interesting open problem arising from this paper. Can you find an optimal algorithm (and prove it is optimal) of the running max-min problem or at least, a tight lower bound on the number of comparisons per element required? I conjecture that my solution, being so elegant, is probably optimal is some sense, but I could not prove it. Further investigations of this problem are probably not difficult, but they may require more complicated Mathematics.

One More Step Toward Infinite Storage

Acccording to a new IDC study reported in Wired, the world had 185 exabytes of storage available last year and will have 601 exabytes in 2010. Meanwhile, the amount of “digital information” generated will grow from 161 exabytes last year to 988 exabytes in 2010.

Their point is that we lack the storage capacity to store everything. This seems to go against the theory that we nearly have infinite storage. But I do not think so. How can they tell how much storage is available? Do they include the NSA? Do they include Echelon? Do they include all of the secret agencies in the world storing massive quantities of data in general?

What might be a more interesting observation is that few people store everything as of now. And I do not expect that people will start storing everything soon. Storage costs must still come down a bit, and software must adapt. But in a few short years, everyone will store copies of everything. And managing all this data, whatever managing means, will become a big deal. And it will not be a nice database problem either because this data will not follow nice database schemas.

Oracle buys Hyperion

This might be a very big deal: Oracle just bought Hyperion for US$3.3 billion. The most obvious effect is that the Business Intelligence market has now fewer players then ever and Oracle is now a very serious player indeed.

I wonder what this will mean for the defunct JOLAP? Could Oracle bring it back? It would be really nice to have one standard API for OLAP, even if it is limited to the Java programming language.

(Source: Owen)

18 queries. 0.389 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.