Wednesday, October 31st, 2007

Hidden Gems : seeking diamonds in your data (November 7th 2007)

Filed under: Science and Technology — Daniel Lemire @ 11:00

On November 7th 2007, Hazel Webb will give a lunch talk at 100 Sherbrooke West room SU-2720 (LORIT) from 12:30 to 13:30 on seeking diamonds in your data.

The talk will be webcasted at http://mediasrv.lorit.ca/presentation (Windows Media Player and Internet Explorer under Windows required). During the presentation, emails can be sent to lorit@licef.teluq.uqam.ca (MSN messenger also supported).

Everyone is invited to attend remotely or in person!

Hidden Gems : seeking diamonds in your data

Data warehouses, repositories for ever-increasing quantities of data, have become commonplace in most medium to large-sized enterprises. Finding efficient methods to explore the data for suspected and/or previously unknown information continues to be an active area of research.

We introduce the diamond cube operator which, we believe, will aid decision support and analysis. It seeks a single large dense area of data by pruning lesser attributes, i.e. they are related to few others. An algorithm has been designed, implemented and tested on a real-world, terrorism data set. Our preliminary results confirm that this operation can be used to find a dense subset within a data cube.

In this talk an overview of On-line Analytical Processing (OLAP) will be presented, together with the diamond cube operator, some of its mathematical properties and anticipated benefits to the commercial and scientific communities.

Your platform is your software

Filed under: Business / Economics / Politics, Science and Technology — Daniel Lemire @ 9:19

Seb Paquet sent me a link to Delusions of Facebook - Should you be a Facebook Startup? I am immediately reminded of McLuhan.

The medium is the message.

When Microsoft Windows came along, many people only noticed the obvious. Windows made it easy to build a good-looking application. Microsoft offered a standard API which was usable. Microsoft Windows was used by millions of people. What a great platform for a business! However, a lot of good companies (such as Netscape, Word Perfect, Novell, Stacker, Lotus) were crushed by Microsoft because they tried hard to complement what Microsoft was offering. These companies were often not bought, just made irrelevant. Microsoft destroyed the software industry over time and few people beyond Microsoft benefited from it.

One tragic mistake some people have made is to assume you can simply move to a new platform when the time comes. The mere fact that you use Microsoft Windows this year does not preclude you to support MacOS next year, does it? But platform-dependence builds up while you are not watching and is a lot more insidious than one might think. I am not even talking about the fact that your users are on one specific platform: these are business concerns. You grow a lot of technological dependencies toward a platform without ever realizing it. Anyone who has switched from Windows to Linux, or from Linux to MacOS or from Windows to MacOS has some idea of these hard-to-describe dependencies. And over time, once you have adopted a new platform, you just work differently.

There was a large number of companies in the Unix/DOS era who just assume they could port their stuff to Windows. Most failed. Meanwhile, Oracle made platform-independence a requirement for their software from the get-go. They are still around and going strong. (Ironically, they try hard to lock their own customers into their own software.)

Then the web arrived. Again, several companies from the desktop era try to adapt themselves to the web… most failed. Even Microsoft has a very hard time adapting. And the giant of today (Yahoo!, Google…) did not even exist in the desktop-era.

So, if all you build is a facebook application, it is what you are. Even if, in theory, you can recast your technology as something else, using a new platform, it does not make it practical. If facebook makes your application irrelevant, or breaks it in some way, your company may very well die on the spot.

And, you know what, this is a great thing sometimes. As a researcher with an interest in applications, I know I will be busy for years to come.

Monday, October 29th, 2007

Cool new native HTML widgets

Filed under: Science and Technology — Daniel Lemire @ 12:41

I name this the coolest hack of the week. Glen decided to apply the tag-cloud idea to other HTML widgets. See this example and recall that I am only using straight HTML/CSS (no ECMAScript, no flash):

Update: according to Benoît Boudeau, it does not work under IE 6. According to Kamel Aouiche, it does not work under IE 7. It does not work under Safari.

Why is this important? Because you can convey extra information along with a selection widget. You could ask people to select which type of product they want while making more popular product appear in a larger font. In other words, you can hide a recommender system in the selection process.

We just need a good name for these new widgets. Cloudgets? Taglets?

Saturday, October 27th, 2007

Publish or perish? Let them perish!

Filed under: Academia/Research — Daniel Lemire @ 8:23

There has been much ink spilled on the evils of public or perish, that is, the way professors and would-be professors are mostly gauged by what they wrote and especially, how much they wrote.

Most recently, David Lorge Parnas, one of the most prolific authors in Computer Science (I found 242 papers in his name) has published an article on this topic: stop the numbers game (Communications of the ACM, Volume 50, Number 11 (2007), Pages 19-21). It starts like this:

As a senior researcher, I am saddened to see funding agencies, department heads, deans, and promotion committees encouraging younger researchers to do shallow research.

Here are the evils of the numbers game according to Parnas:

  • It encourages superficial research.
  • It encourages overly large groups.
  • It encourages repetition.
  • It encourages small, insignificant studies.
  • It rewards publication of half-baked ideas.

He concludes as follows:

Sadly, the present evaluation system is self-perpetuating. Those who are highly rated by the system are frequently asked to rate each other and others; they are unlikely to want to change a system that gave them their status.

What Parnas says is not new. See my posts Productivity measures are counterproductive?, Are we destroying research by evaluating it? and On the upcoming collapse of peer review.

What Parnas misses is that the publication process itself is changing. While Parnas is a prolific author, he does not publish in open archives and he does not appear to have a blog. He has predicted the failure of Wikipedia as an encyclopedia in 2005 because it lacks a classical peer review process. I think he is dead wrong: our current classical peer review process is not the only one that can work, and it is not the optimal system.

But whether you agree with him or not on the evils of the counting game, I do not think you can easily reject this last recommendation:

When serving on recruiting, promotion, or grant-award committees, read the candidate’s papers and evaluate the contents carefully. Insist that others do the same.

I actually just reviewed a couple of grant applications and in both instances, I drilled down to the papers the researcher wrote and reviewed them. Sometimes you have pleasant surprises (results are as strong or stronger as the researcher claimed), but you also get bad surprises (an article touted as the cornerstone of one’s research is a thin 1-pager).

Recruiting is a bit of a tougher issue. If the candidate has single-author papers, then you can probably do a decent job if you know the field, but most candidates will only have multiple-author papers. Reading the papers may not be a good predictor for the candidate’s ability.

(Thanks to Sébastien Paquet for a thoughtful discussion.)

Wednesday, October 24th, 2007

Play the strongest checkers program in the world

Filed under: Science and Technology — Daniel Lemire @ 17:51

I just attended a talk by Jonathan Schaeffer the guy behind Chinook, the best checkers computer player in the world. You can play a watered down version of Chinook on the Web.

The way they built this, is to enumerate all games with 10 pieces or less. Hence, if there are fewer than 10 pieces, Chinook knows whether he can win, get a draw, or lose since it has explored all possibilities. What they have recently shown is that, in any game, they can get to a 10 pieces configuration where Chinook cannot lose. In effect, Chinook is mathematically unbeatable.

I am not an expert in this area, far from it, but it seems to me that one could design an even better computer program. There may be instances where Chinook gets a draw, whereas a win might have been possible. However, using this brute force approach, it seems like it will be several decades before we can improve substantially over Chinook. Again because it relies on brute force, it may not be possible to generalize this approach to more complex games (like Chess).

Jonathan is a great speaker and, no doubt, a great researcher. As a hacker, I appreciate Chinook: they had to use extremely clever designs. However, we have a Wizard-of-Oz effect here: once you see how they have beaten the game, you realize it depends fundamentally on extreme brute force. They used 50 powerful computers running for years to solve the problem. Their files are so large, and copied so often, that they have to consider file copying to be a lossly operation!

Chinook did not learn rules, it enumerated all possibilities! Many of my AI friends would rather see intelligence as the result of inference. Somehow, we learn rules that are applied in a logically fashion by our brain. Chinook is also not an instance of Machine Learning as it is commonly done: there is very little statistics involved, just brute force. Finally, Chinook did not try to think like a human being, whatever this could mean.

Monday, October 22nd, 2007

Famous tech drop-outs

Filed under: Science and Technology — Daniel Lemire @ 12:27

I decided it to update my list of famous tech drop-outs.

Mark Zuckerberg - Facebook
Bill Gates - Microsoft
Steve Jobs - Apple
Michael Dell - Dell
Larry Ellison - Oracle
Mike Lazaridis - Blackberry
Shawn Fanning - Napster
Blake Ross - Firefox

I am sure you can help me complete my list. (Note: I think a better list would be “people without a college degree who made it big in tech.”)

See also my preceding article Where would we be without school drop outs?

Update: I do not mean to imply that there is a causal relation between dropping out of college and making it big.

Friday, October 19th, 2007

Optical disks, soon to be obsolete?

Filed under: Science and Technology — Daniel Lemire @ 9:24

I will make a prediction. Optical disks, such as DVDs, HD-DVDs, Blu-Rays, and so on, will not matter in five years. And no, tape will not replace them.

I see only one viable storage technology in five years: fast volatile memory hooked up to a super-fast network.

Why will it happen? One word: youtube. Unavoidably, video-on-the-Web will get better resolution, you will have longer streams, and it will only get more popular.

How are we going to make this happen? Fiber in every house?

Meanwhile, Sony and Microsoft are fighting over which format will dominate the market (HD-DVD or Blu-Ray). I say neither.

Next Page »

30 queries. 0.242 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.