Hidden Gems : seeking diamonds in your data (November 7th 2007)

On November 7th 2007, Hazel Webb will give a lunch talk at 100 Sherbrooke West room SU-2720 (LORIT) from 12:30 to 13:30 on seeking diamonds in your data.

The talk will be webcasted at http://mediasrv.lorit.ca/presentation (Windows Media Player and Internet Explorer under Windows required). During the presentation, emails can be sent to lorit@licef.teluq.uqam.ca (MSN messenger also supported).

Everyone is invited to attend remotely or in person!

Hidden Gems : seeking diamonds in your data

Data warehouses, repositories for ever-increasing quantities of data, have become commonplace in most medium to large-sized enterprises. Finding efficient methods to explore the data for suspected and/or previously unknown information continues to be an active area of research.

We introduce the diamond cube operator which, we believe, will aid decision support and analysis. It seeks a single large dense area of data by pruning lesser attributes, i.e. they are related to few others. An algorithm has been designed, implemented and tested on a real-world, terrorism data set. Our preliminary results confirm that this operation can be used to find a dense subset within a data cube.

In this talk an overview of On-line Analytical Processing (OLAP) will be presented, together with the diamond cube operator, some of its mathematical properties and anticipated benefits to the commercial and scientific communities.

Your platform is your software

Seb Paquet sent me a link to Delusions of Facebook – Should you be a Facebook Startup? I am immediately reminded of McLuhan.

The medium is the message.

When Microsoft Windows came along, many people only noticed the obvious. Windows made it easy to build a good-looking application. Microsoft offered a standard API which was usable. Microsoft Windows was used by millions of people. What a great platform for a business! However, a lot of good companies (such as Netscape, Word Perfect, Novell, Stacker, Lotus) were crushed by Microsoft because they tried hard to complement what Microsoft was offering. These companies were often not bought, just made irrelevant. Microsoft destroyed the software industry over time and few people beyond Microsoft benefited from it.

One tragic mistake some people have made is to assume you can simply move to a new platform when the time comes. The mere fact that you use Microsoft Windows this year does not preclude you to support MacOS next year, does it? But platform-dependence builds up while you are not watching and is a lot more insidious than one might think. I am not even talking about the fact that your users are on one specific platform: these are business concerns. You grow a lot of technological dependencies toward a platform without ever realizing it. Anyone who has switched from Windows to Linux, or from Linux to MacOS or from Windows to MacOS has some idea of these hard-to-describe dependencies. And over time, once you have adopted a new platform, you just work differently.

There was a large number of companies in the Unix/DOS era who just assume they could port their stuff to Windows. Most failed. Meanwhile, Oracle made platform-independence a requirement for their software from the get-go. They are still around and going strong. (Ironically, they try hard to lock their own customers into their own software.)

Then the web arrived. Again, several companies from the desktop era try to adapt themselves to the web… most failed. Even Microsoft has a very hard time adapting. And the giant of today (Yahoo!, Google…) did not even exist in the desktop-era.

So, if all you build is a facebook application, it is what you are. Even if, in theory, you can recast your technology as something else, using a new platform, it does not make it practical. If facebook makes your application irrelevant, or breaks it in some way, your company may very well die on the spot.

And, you know what, this is a great thing sometimes. As a researcher with an interest in applications, I know I will be busy for years to come.

Cool new native HTML widgets

I name this the coolest hack of the week. Glen decided to apply the tag-cloud idea to other HTML widgets. See this example and recall that I am only using straight HTML/CSS (no ECMAScript, no flash):

Update: according to Benoît Boudeau, it does not work under IE 6. According to Kamel Aouiche, it does not work under IE 7. It does not work under Safari.

Why is this important? Because you can convey extra information along with a selection widget. You could ask people to select which type of product they want while making more popular product appear in a larger font. In other words, you can hide a recommender system in the selection process.

We just need a good name for these new widgets. Cloudgets? Taglets?

Publish or perish? Let them perish!

There has been much ink spilled on the evils of public or perish, that is, the way professors and would-be professors are mostly gauged by what they wrote and especially, how much they wrote.

Most recently, David Lorge Parnas, one of the most prolific authors in Computer Science (I found 242 papers in his name) has published an article on this topic: stop the numbers game (Communications of the ACM, Volume 50, Number 11 (2007), Pages 19-21). It starts like this:

As a senior researcher, I am saddened to see funding agencies, department heads, deans, and promotion committees encouraging younger researchers to do shallow research.

Here are the evils of the numbers game according to Parnas:

  • It encourages superficial research.
  • It encourages overly large groups.
  • It encourages repetition.
  • It encourages small, insignificant studies.
  • It rewards publication of half-baked ideas.

He concludes as follows:

Sadly, the present evaluation system is self-perpetuating. Those who are highly rated by the system are frequently asked to rate each other and others; they are unlikely to want to change a system that gave them their status.

What Parnas says is not new. See my posts Productivity measures are counterproductive?, Are we destroying research by evaluating it? and On the upcoming collapse of peer review.

What Parnas misses is that the publication process itself is changing. While Parnas is a prolific author, he does not publish in open archives and he does not appear to have a blog. He has predicted the failure of Wikipedia as an encyclopedia in 2005 because it lacks a classical peer review process. I think he is dead wrong: our current classical peer review process is not the only one that can work, and it is not the optimal system.

But whether you agree with him or not on the evils of the counting game, I do not think you can easily reject this last recommendation:

When serving on recruiting, promotion, or grant-award committees, read the candidate’s papers and evaluate the contents carefully. Insist that others do the same.

I actually just reviewed a couple of grant applications and in both instances, I drilled down to the papers the researcher wrote and reviewed them. Sometimes you have pleasant surprises (results are as strong or stronger as the researcher claimed), but you also get bad surprises (an article touted as the cornerstone of one’s research is a thin 1-pager).

Recruiting is a bit of a tougher issue. If the candidate has single-author papers, then you can probably do a decent job if you know the field, but most candidates will only have multiple-author papers. Reading the papers may not be a good predictor for the candidate’s ability.

(Thanks to Sébastien Paquet for a thoughtful discussion.)

Play the strongest checkers program in the world

I just attended a talk by Jonathan Schaeffer the guy behind Chinook, the best checkers computer player in the world. You can play a watered down version of Chinook on the Web.

The way they built this, is to enumerate all games with 10 pieces or less. Hence, if there are fewer than 10 pieces, Chinook knows whether he can win, get a draw, or lose since it has explored all possibilities. What they have recently shown is that, in any game, they can get to a 10 pieces configuration where Chinook cannot lose. In effect, Chinook is mathematically unbeatable.

I am not an expert in this area, far from it, but it seems to me that one could design an even better computer program. There may be instances where Chinook gets a draw, whereas a win might have been possible. However, using this brute force approach, it seems like it will be several decades before we can improve substantially over Chinook. Again because it relies on brute force, it may not be possible to generalize this approach to more complex games (like Chess).

Jonathan is a great speaker and, no doubt, a great researcher. As a hacker, I appreciate Chinook: they had to use extremely clever designs. However, we have a Wizard-of-Oz effect here: once you see how they have beaten the game, you realize it depends fundamentally on extreme brute force. They used 50 powerful computers running for years to solve the problem. Their files are so large, and copied so often, that they have to consider file copying to be a lossly operation!

Chinook did not learn rules, it enumerated all possibilities! Many of my AI friends would rather see intelligence as the result of inference. Somehow, we learn rules that are applied in a logically fashion by our brain. Chinook is also not an instance of Machine Learning as it is commonly done: there is very little statistics involved, just brute force. Finally, Chinook did not try to think like a human being, whatever this could mean.

Famous tech drop-outs

I decided it to update my list of famous tech drop-outs.

Mark Zuckerberg – Facebook
Bill Gates – Microsoft
Steve Jobs – Apple
Michael Dell – Dell
Larry Ellison – Oracle
Mike Lazaridis – Blackberry
Shawn Fanning – Napster
Blake Ross – Firefox

I am sure you can help me complete my list. (Note: I think a better list would be “people without a college degree who made it big in tech.”)

See also my preceding article Where would we be without school drop outs?

Update: I do not mean to imply that there is a causal relation between dropping out of college and making it big.

Optical disks, soon to be obsolete?

I will make a prediction. Optical disks, such as DVDs, HD-DVDs, Blu-Rays, and so on, will not matter in five years. And no, tape will not replace them.

I see only one viable storage technology in five years: fast volatile memory hooked up to a super-fast network.

Why will it happen? One word: youtube. Unavoidably, video-on-the-Web will get better resolution, you will have longer streams, and it will only get more popular.

How are we going to make this happen? Fiber in every house?

Meanwhile, Sony and Microsoft are fighting over which format will dominate the market (HD-DVD or Blu-Ray). I say neither.

Early impressions on Facebook


(source)

Facebook has been the hot networking site for quite some time now. Founded in 2004 by a teenager, this same teenager, Mark Zuckerberg, is now 23, has no degree, and is about 2300 times richer than I will ever be. (No, I am not bitter.)

Some colleagues asked me to join facebook today. My friends from MyDYO Inc. are there too. So I joined. Here are my impressions.

  • The ads feel out-of-place. As a disclaimer, my blog is not any prettier, but I do not have millions and millions to spend on graphical design.
  • It is far more popular than I expected. It seems that about 50% of everyone I know is on facebook. Including many people who do not have a web presence.
  • Oddly, people seem to assume that the data put there is private.
  • It is a walled garden. As far as I can tell, there is no way to share content through URIs without having visitors log into facebook. Not very RESTful. However, the application is very responsive.
  • The search engine appears very limited. Running Google through this data would be much more fun!
  • The first few minutes are fun. Finding out that you are more connected than you thought is always pleasing. However, I cannot see why I would spend much time in a walled garden where most of the content seems to be your list of friends you have no seen in years? There is a reason why I have forgotten all these names… I am busy.
  • There is clearly a viral effect at work, but I do not understand why it would work better than with other networking sites I tried.
  • I quickly browsed the applications. According to Seb, this is where the real value lies. And indeed, I was impressed. Thinking a bit more about it, I think that facebook serves as a form of OpenID: you sign on to facebook once and you can automagically use a large number of applications without having to create several accounts and reenter the same data, again and again. I see no reason why we can’t have an open-world non-proprietary facebook, other than the fact that we have not yet managed to get OpenID off the ground.

See also my post Academic blogging: why still bother?

What happens after a technological singularity?


(source)

A technological singularity is a rapid sequence of technological changes tearing apart our society. For example, imagine we can create smarter-than-human-beings machine. Suppose that, in turn, these machines can create other machines that are even smarter than they are. If the timing is just right, you could get infinite intelligence in a finite time. Of course, a technological singularity does not need to be so drastic. It suffices that we exceed the speed at which most human beings can adapt.

My own definition of a technological singularity is the achievement of such a high level of sophistication, that, as far as human beings are concerned, technological progress becomes irrelevant. It could be that we have such an advanced technology that our brain cannot even comprehend progress. Or maybe, we are all trying to kill each others so that the insanity can stop. Or it could happen on the day spammers find a way to get spam directly in our brains and we are all buying pills to get our penises to be longer.

It is hard to tell if such a singularity is a distinct possibility. For example, it could become increasingly expensive to improve our technological sophistication faster. One limit is the size of our brain. Your brain has a limited memory and processing speed. But we could possibly expand our brain or replace it with better hardware.

(As I said before, I do not care for AI. I do not want my laptop to be talking back to me. But I would not mind replacing my brain with a piece of hardware that gives me a photographic memory and twice the processing speed.)

For fun, assume that a technological singularity does happen. It may not be a catastrophe. For example, maybe we have the technology to keep us all alive nearly forever and in a quasi-paradise. What happens next? Clearly, our technology cannot improve further, and even if does, nobody cares.

My own prediction is that some strong religious figure would emerge and people would become highly spiritual. Science and technology would be frowned upon. It could even be that we would go back to a medieval state.

See also my posts Duck Typing, Artificial Intelligence and Philosophy, The Big Bang is Intelligent, and How artificial intelligences are already at war with us.

Productivity measures are counterproductive?

Michael has a long post on why it seems foolish to measure scientist according to one unidimensional metric (such as the H-index). His argument is mostly that you can game these metrics rather easily if you have a large enough social network. Given how hard people work at gaming the PageRank metric, and the often quoted fact that over 50% of all married people cheat on their spouse, we would be naive to think that researchers do not game the metrics. For that matter, it is known that several journals cheat to increase their impact factor (another unidimensional metric).

The question really is, does it hurt us that people play these games? After all, if we accept that the rule of the game is to get a high h-index, then why should I care how people go about it?

Michael is actually reacting on an article, The Mismeasurement of Science, which identifies several ill-effects of these unidimensional measures, including the facts that:

  • many authors ignore or hide results that do not fit with the story being told in the paper because doing so makes the paper less complicated and thus, more appealing;
  • science is becoming a more ruthlessly self-selecting field where those who are less aggressive and less self-aggrandizing are also less likely to receive recognition.

In turn, I conjecture that we have the following measurable effects:

  • Science is becoming less attractive as a career. If you are going to pursue a high H-index, if this becomes your goal, then how is this more interesting, as a game, than to make a lot of money? Should we be surprised that Science Faculties are bleeding students while Business Schools are turning down students? When accounting becomes sexier than Physics, we have a problem. Women, who are less attracted to career where you compare the size of your appendage, are harder to find than ever in Computer Science. Should we get a clue?
  • Research papers, while becoming easier to read and cite, fail to provide us with enough data to correctly appreciate the results and their applications. In particular, research papers are increasingly dismissed by practitioners who need not only a nice story, but also the full story, including the dirty secrets.

Whatever rules we set, they have consequences. I am particularly worried about the fact that we are making science uninteresting by redefining it from “scientific discovery” to “achieving a high H-index”.

Maybe we have to go back and ask fundamental questions. Why do we do science? What do we really expect from scientists? What should we really reward

See also my posts Are we destroying research by evaluating it?, On the upcoming collapse of peer review, and Assessing a researcher… in 2007.

Next Page »

18 queries. 0.417 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.