Monday, August 29th, 2005

Pentaho - Open Source Business Intelligence

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 7:51

In relation to a previous post of mine about open source Business Intelligence where I wrote “So, maybe someone out there should start a support company for Open Source Business Intelligence?”, Krishnaswamy Ram pointed out to Pentaho which seems to be exactly what I had imagined a smart businessman could do.

The Pentaho BI Project provides enterprise-class reporting, analysis, dashboard, data mining and workflow capabilities that help organizations operate more efficiently and effectively. The software offers flexible deployment options that enable use as embeddable components, customized BI application solutions, and as a complete out-of-the-box, integrated BI platform.

Friday, August 26th, 2005

Journals are already dead! Long live eprint servers!

Filed under: Open Access, Academia/Research — Daniel Lemire @ 23:37

For researchers who actually want to be read, there are several good eprints servers including arxiv.org (which I don’t use, but many physicists seem to like it) and cogprints (great for AI-related stuff). Of course, you can simply post your papers on your web site and let Google find them (my favorite solution).

On this topic, Suresh cites Cosmic Variance:

Most people these days post to the arxiv before they even send their paper to a journal, and some have stopped submitting to journals altogether. (I wish they all would, it would cut down on that annoying refereeing we all have to do.) And nobody actually reads the journals they serve exclusively as ways to verify that your work has passed peer review.

I think we are slowly getting at the point paper-based publications are going to be completly unecessary. Right now, people still ask me for page numbers when I say I published a given paper. I was even asked for photocopies of the journal issue. These people will soon die and we will be finally free to let the trees in the forest.

Should you encourage your M.Sc. students to go for a Ph.D.?

Filed under: Academia/Research — Daniel Lemire @ 12:01

Should you encourage your M.Sc. students to go for a Ph.D.? If you want to get more grant money, publish more papers and be generally viewed as a more “important” researcher, than you should definitively push all your talented M.Sc. students to go for a Ph.D.

Yet, Yuhong does differently:

I never encourage my master students to get Ph.D., though some have the talent. I know that a Ph.D. does not gain a lot more happiness in one’s life. I even find that normal people enjoy better life than researchers. So why impose research to my students?

Myself? I remember the first time a student came in my office to inquire about an academic career. She was a bright first-year student. The type that went to the best high school, got the best grades, had probably been involved in several extracurricular activities, in short, the perfect student. She was the best student in my class. Maybe she is reading this and will recognize herself. She also wanted to have a family. My answer to her? Make a choice: either a family or an academic career. She left my office pretty disappointed. I could never figure out whether she was disappointed at me or at life.

Is it true you can’t be a great scientist and also a family person? Of course not. Some people become astronauts, get a Ph.D., and get a gold medal at the Olympics. Such people exists. However, is it a reasonable plan? For a young lady, I don’t think so. I don’t think you can have 2-3 kids, raise them well, feed them well, spend quality time with them, and at the same time, pursue a solid academic career. There are counterexamples, but…

What we need to do is to:

  • Stop sending more and more people to the Ph.D. track. Make sure those who get on the Ph.D. track have fair expectations; make sure they are not betting their lifes on what this Ph.D. can bring to them.
  • When reviewing a colleague, clearly separate work done with students from work done by the researcher. It is easy: just check the names on the papers.
  • We should value academic simplicity: fewer papers, fewer students, less money, more quality of life, and happier professors.

Further reading: The 2003-2004 Taulbee survey shows that the number of new Ph.D. in Computer Science is sharply on the rise (17% from the year before) whereas the number of undergraduates is about to take a significant drop since the number of new students has significantly gone down (60%).

Wednesday, August 24th, 2005

PODS 2006 (December 1st, 2005 / June 26-28, 2006)

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 21:10

The PODS 2006 call for papers is out. It will be held in Chicago along with SIGMOD.

The PODS symposium series, held in conjunction with the SIGMOD conference series, provides a premier annual forum for the communication of new advances in the theoretical foundation of database systems. For the 25th edition, original research papers providing new insights in the specification, design, or implementation of data management tools are called for.

SIGMOD 2006 (November 17, 2005 / June 27-29, 2006)

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 21:06

The SIGMOD call for papers is out. It will be held in Chicago (cool!).

The annual ACM SIGMOD conference is a leading international forum for database researchers, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. We invite the submission of original research contributions as well as proposals for demonstrations, tutorials, industrial presentations, and panels. We encourage submissions relating to all aspects of data management defined broadly and particularly encourage work that represent deep technical insights or present new abstractions and novel approaches to problems of significance. We especially welcome submissions that help identify and solve data management systems issues by leveraging knowledge of applications and related areas, such as information retrieval and search, operating systems & storage technologies, and web services.

Tuesday, August 23rd, 2005

Now you can prepare your math slides using MathML!!!

Filed under: — Daniel Lemire @ 19:38

You know TeX, you know HTML and you don’t like PDF or PowerPoint slides? Leverage the fact that Firefox supports MathML and your troubles are over!

Following one of my earlier posts, Peter Jipsen was nice enough to email me to let me know that ASCIIMathML officially works with both HTML Slidy and S5. Peter has proof in the form of an online set of slides.

Saturday, August 20th, 2005

Hitflip DVD recommender is using Slope One collaborative filtering algorithm

Filed under: — Daniel Lemire @ 19:09


Jan Miczaika from the Otto Beisheim Graduate School of Management just sent me an email. Their movie (DVD) recommender system hitflip (German site) is using the Slope One collaborative filtering algorithm I presented at SIAM Data Mining 2005. I believe he found useful the technical report I wrote about it (Implementing a Rating-Based Item-to-Item Recommender System in PHP/SQL).

Jan had interesting comments:

  • Instead of working live, you can replace the INSERTs to your DMBS by some INSERT DELAYED and do batch processing. We had thought about this option with inDiscover, but it proved to be unnecessary for us, even using MySQL which has relatively slow INSERTs. Batch processing is an ok alternative when ressources are limited, but, myself, I prefer true online systems.
  • Brand new DVDs that have not been rated a sufficient number of times (say twice) are not recommended and one trick you can use is to recommend new DVDs which are similar to DVDs the user might like. This is a form of cold start problem and Jan’s solution appears pretty generic and sensible.
  • In his experience, it is useful to precompute recommendations for users, only updating them when this particular user enters new data. Of course, in theory, you should invalidate these recommendations continuously as new data (form other users) is entered. But Jan felt it was “close enough” I suspect.
« Previous PageNext Page »

30 queries. 0.323 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.