Geometric Wavelets

Ah! Just learned about Geometric Wavelets. Most recent work on wavelets as been very uninspiring to me, but this is pretty good. Seems it has been around for a number of years (most recent paper I can see is Le Pennec and Mallat in 2000).

The idea is simple, but I only saw one talk about it, so don’t take my word for it, read the papers.

Take a function f(x) over any (convex) region S. The function is approximated by some polynomial (of fixed degree), call it p(x). Split the region exactly in two by a straight line, so that the two remaining regions S1 and S2 are still convex. Over each subregions, by regression, solve for the polynomials (of fixed degree) p1(x) and p2(x) approximating f(x). Optimize the splitting so that the approximation error made by p1 and p2 is minimal. Then your two new wavelets are the polynomials p(x)-p1(x) and p(x)-p2(x) limited to S1 and S2 respectively. These wavelets are not continuous, but if f(x) is itself a polynomial (up to a fixed degree), then the wavelets vanish. Repeat this splitting for as long as you need to.

The gist of the result is that you can do state-of-the-art image compression. This is surprising because the wavelets are rather complex and difficult to encode especially if you consider that encoding their support region is no joke. The key point is that you can define the region over which each wavelet is defined simply by encoding the various splitting lines, and that’s an easier problem than encoding polygonal regions from scratch.

I strongly suspect that these algorithms are not practical though. They use up too many CPU cycles to save up on bandwidth and storage space in a world where storage is infinite but CPU cycles are comparatively more expensive. But the idea is neat!

Text Mining ICML 2006 Tutorial Slides

This is too good not to pass on. Ronen Feldman posted slides from his ICML 2006 Tutorial on Text Mining.

In this tutorial we will present the general theory of Information Extraction and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We will present a general architecture for information extraction and will outline the algorithms and data structures behind the systems. The Tutorial will cover the state of the art in this rapidly growing area of research. Several real world applications of information extraction will be presented in the areas of business intelligence, competitive intelligence, bio information, and military intelligence.

(Source: Yuhong (by email).)

Leaving for Curves and Surfaces (Avignon)

In a few hours, I’m “planing” (opposite of “deplaning”) for Avignon, France where I’ll attend Curves and Surfaces 2006. The web site is currently down.

I will be talking about monotone curves.

For applications such as pattern recognition, curve reconstruction and so on, it is important to be able to study the properties of curves and chains (a chain is just a discrete, usually finite, curve). For discrete functions, while we can’t talk about smoothness, we can talk about monotonicity, for example. A perfectly monotone (or piecewise monotone) discrete function (or signal) is unlikely to be noisy.

What is the equivalent of monotonicity for curves?

The typical definition of a monotone curve is a so-called v-monotone curve: under a change of basis where v is aligned with the x-axis, then the curve’s x-component is always increasing. For most settings, this is a very strong requirement.

We decided to look at an alternative definition of what it could mean for a curve (or a chain) to be monotone. For functions, we know that f is monotone if the inverse image of balls are connected. So, we decided that an arc-length parametrized curve s would be R-monotone if the inverse images of balls are connected. We go on to show it is a sensible definition. The definition also applies to chains. We can then filter noisy chains to increase their degree of monotonicity (R).

In the coming months/weeks, I’ll post the preprint. It is also available to those who ask by email.

Update: I posted my slides on the web. Comments are invited even if you don’t attend Curves and Surfaces.

Kamel Aouiche in Montreal!

I met Kamel in Montreal today. I invited him over for a post-doc and he arrived 2 months ahead of time (talk about eagerness!) from Lyon (Eric laboratory). Kamel is a very promising, young (well, younger than me) researcher in OLAP, Data Warehousing and Data Mining. Among other things, Kamel will be working of view size estimation in data warehouses.

Oh! How did Kamel learned about me and my work? Nah! Not a conference or a talk… nothing like that, he just got to me through my blog! So this blog has a purpose after all!

Slashdot | String Theory a Disaster for Physics?

According to Slashdot, Mathematician Peter Woit of Columbia University calls String Theory ‘a disaster for physics.’ Reading the article, you learn that “Virtually every young mathematically inclined particle theorist must sign on to the string agenda to get an academic job.” I think this is probably more common than we care to admit. Academia tends to reinforce its own bias.

  • Big shot professor X comes up with an idea.
  • He gets 20 Ph.D. students to work on it. A few good results come up, but mostly, the idea remains a promise.
  • Other soon follow the new trend, or at least, they send some of their Ph.D. students to the front, just in case.
  • There is little evidence that the idea is worthwhile, but this is quickly dismissed… “we just need more research”…
  • As the trend grows, researchers who have invested a lot in the idea become numerous, and they start infiltrating hiring committees, program committees, funding agencies, and so on. Their career now depends on the fact that the trend continues. After all, trying something new is much harder than just continuing whatever we are doing!
  • The idea finally bears some fruits. There is some very limited industry adoption. It is hyped by academia the world over.
  • Nothing further comes up, people start recycling old ideas. Researchers refuse to see the lack of progress for what it is.
  • Big shot professor Y comes up with a related idea which builds on professor X’s original idea, but in a novel way”.
  • The cycle continues…

There is a simple solution to this problem: encourage independent thinking. Next time you review a paper or a funding application which attempts something fresh, don’t dismiss it because “the paper ignores 20 years of research.” We need to ignore fashions more, and encourage novel ideas. Don’t reward people because they follow the trends. We are supposed to give people tenure so that they can take risks, say things others don’t want to hear. Let’s build on that. Next time you are on a hiring committee and someone wants to hire a string theorist, or the equivalent in your field, be watchful.

Second order probability

Luciano Floridi poses the problem of second-order probabilities:

(…) we actually lack a theory of second order probability, let alone a philosophical justification for it.

By second order, he means “probabilities about probabilisties”.

I beg to differ with Luciano. What about this reference?

Gaifman, H. 1986. A theory of higher order probabilities. In Proceedings of the 1986 Conference on theoretical Aspects of Reasoning About Knowledge

International Conference on Weblogs and Social Media (December 8, 2006 / March 26-28, 2007)

The first International Conference on Weblogs and Social Media call for papers is out. It will be held in Boulder, Colorado. The subjects of interest are quite wide ranging:

  1. AI methods for ethnographic analysis through social media.
  2. Blogosphere vs. mediasphere; measuring the influence of blogs on the media.
  3. Centrality/influence of bloggers/blogs; ranking/relevance of blogs; web pages ranking based on blogs.
  4. Crawling/spidering and indexing.
  5. Human Computer Interaction; social media tools; navigation.
  6. Multimedia; audio/visual processing; aggregating information from different modalities.
  7. Semantic analysis; cross-system and cross-media name tracking; named relations and fact extraction; discourse analysis; summarization.
  8. Semantic Web; unstructured knowledge management.
  9. Sentiment analysis; polarity/opinion identification and extraction.
  10. Social Network Analysis; communities identification; expertise discovery; collaborative filtering.
  11. Text categorization; gender/age identification; spam filtering.
  12. Time Series Forecasting; measuring predictability of phenomena based on social media.
  13. Trend identification/tracking.
  14. Visualization, aggregation and filtering.

Google funding the implementation of Slope One in Drupal

I learned on Ted Serbinski that Scott Reynolds, who implements the Slope One Collaborative Filtering algorithm in Drupal, is being paid by Google. It is one of 14 projects that Google decided to fund. Way to go Google!

My little family

My family (June 20th 2006)

Click on the picture to see it all. Don’t ask me what Louka ate, I have no idea. You can see Lohan is quite aware of the camera. My wife is enjoying a good book.

Update: According to my wife, Louka was eating sand. Gross!

Does theory helps in algorithmic design?

Olivier Bousquet questions Statistical Learning Theory.

So, I think that the theoretical analysis is mainly a way to popularize an algorithm and to raise its visibility. The effect is then that more people try it out, and streamline it. So in the end, the algorithm may be adopted, but a theoretical analysis rarely justifies the algorithm and never provides guarantees. Hence theory is a way to attract attention to an algorithm. It should also be a way to get new insights for the development of new algorithms, but this happens much less frequently than is claimed!

I’m currently working on a paper where I spent a lot of time fiddling and proving some bounds, as Olivier points out we often do. Of course, not being a pure theory guy, I also run extensive experiments. I cannot help but notice a strong disconnect between the theory and the practice, not only in what I do, but in the papers I cite.

The counter-argument might be: but theory helps design better algorithms. Like Olivier, I don’t buy it.

Ah! I still do theory of course, but let’s not fool ourselves in thinking we are smarter than nature.

Next Page »

18 queries. 0.409 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.