Sunday, October 16th, 2005

Analyzing Large Collections of Electronic Text Using OLAP

Filed under: Abstracts, Data Warehousing and OLAP — Daniel Lemire @ 10:27

Steven will be presenting our paper Analyzing Large Collections of Electronic Text Using OLAP at APICS 2005. This work is based on an idea by Owen Kaser: what happens if we apply multidimensional databases (OLAP) to literary research?

Data Mining and Information Retrieval techniques are used routinely for literary research or processing text in general, but decision support techniques commonly used in the business world (sometimes called “Business Intelligence”) have not seen much use yet in text processing. The main difference between decision support systems and data mining is the fact that in decision support, the user remains in control, thus simple yet extremely efficient algorithms are favoured over sophisticated, but possibly expensive algorithms. Ideally, all decision support algorithms should be O(1) after accounting for precomputations. With infinite storage almost available now, decision support research is due for a technological and scientific boom.

Computer-assisted reading and analysis of text has various applications in the humanities and social sciences. The increasing size of many electronic text archives has the advantage of a more complete analysis but the disadvantage of taking longer to obtain results. On-Line Analytical Processing is a method used to store and quickly analyze multidimensional data. By storing text analysis information in an OLAP system, a user can obtain solutions to inquiries in a matter of seconds as opposed to minutes, hours, or even days. This analysis is user-driven allowing various users the freedom to pursue their own direction of research.

Wednesday, October 12th, 2005

Logitech USB Desktop Microphone under Linux

Filed under: — Daniel Lemire @ 21:09

I got my new Logitech USB Desktop Microphone working under Linux. Should have been very easy, but I hit a small nail.

Plug the device in and type “lsusb”, you should see:

Bus 001 Device 004: ID 0556:0001 Asahi Kasei Microsystems Co., Ltd AK5370 I/F A/D Converter

Ah! The device is called AK5370.

Do “dmesg”‘ you should see two lines like those:

usb 1-3: new full speed USB device using ohci_hcd and address 4

usbcore: registered new driver snd-usb-audio

If you don’t see the second line, you have a problem. In my case, I didn’t have the usbaudio driver so I only got the first line. I had to go compile usbaudio. To do so, I did “uname -a”, it gave me “Linux romeo 2.6.10-gentoo-r6″. I went under /usr/srclinux-2.6.10-gentoo-r6 and typed

genkernel --no-clean --menuconfig all

Next, after the menu opened up, I went under driver/audio and chose usb audio drivers (and loadable modules). Exiting genkernel launched the compilation of the module and all I had to do was to unplug/replug my microphone. You should check that /dev/dsp1 appears.

All I had to do after this was to launch mhwaveedit and choose “hw:1,0″ as my recording device, so that I would not record out of my sound card, but rather from my microphone. Setting the sampling rate to 44100 Hz seemed to be necessary.

To enable the microphone under KDE, you have to launch kmix and choose the appropriate device, if you don’t see the device, quit kmix (through the file menu) and restart it. This being said, I don’t see why you need the microphone under KDE. However, make sure you turn the gain all the way to the maximum for optimal sound quality.

Voilà! Isn’t Linux friendly?

For recording tips, see this page by Bob Cunningham.

Update: sometime you might have to force the drive to load up doing “modprobe snd-usb-audio”. In theory, modprobe shouldn’t be necessary as devices should be automatically recognized, but it happens to me sometimes that I need to help my kernel a bit. (Bugs?)

Monday, October 10th, 2005

Doing the Martin Shuffle

Filed under: — Daniel Lemire @ 21:52

Through Will’s I got to the Martin Shuffle which is a cool randomized algorithm to quickly find sonds on a MP3 player (without browsing them one by one). They implement a nice Markov Decision Process using my favorite language: Python.

Academic Authorship

Filed under: Academia/Research — Daniel Lemire @ 14:05

I don’t know where this come from, but Yuhong seems upset:

A professorship is not only a position to do research, but also a resource to exploit the other’s work by acquiring the authorship.

She’s right, of course. If you seek fame and fortune through a professorship, you have to become an “academic entrepreneur” where you seek to employ people (read: students) at the lowest possible wage (what she describes as “slave labor”) so that they do the kind of research that can sustain large research grants.

Last year, I chatted with some graduate students and I realized that students actually enjoy working for such professors, and not only for the money, but also because they feel they are getting better training than with a lone crazy professor. Let’s face it: the factory model has something conforting even for the students. Working with a lone crazy professor means you won’t have fixed deadlines nor any fixed research subjects.

I simply think that a professorship is a very open ended career. There are many models, and some of them are hard to compare. I believe this derives from “academic freedom”. In practice, as long as you can find a significant number of peers to vouch for the quality of your work, no matter how you achieve it, then you are ok.

However, there are routes more rewarding or rewarded than others. Some research topics are better funded than others: Ben Laden detectors are better funded than graph theory theorems. And why not? I think it is healthy.

As long as professors not working on Ben Laden detectors, professors getting small grants, and professors having few students, still keep their jobs and don’t get insulted publicly, then we are ok and academic freedom is safe.

Update: I got too many insults by email. Ok, I didn’t mean the government should be funding Ben Laden detectors, only that it is ok for some subjects to be funded better than others.

Firefox 1.0.7: better memory management?

Filed under: — Daniel Lemire @ 13:36

I love Firefox, but one thing that’s causing me grief is its memory leakage. On my gentoo box, Firefox 1.0.6 would quickly eat up to 55% of my available memory. I had to kill Firefox every two days to get my machine working. I’ve upgraded to 1.0.7 yesterday, and it runs smoothly using only 25% of my available memory (according to “top”). Given that my browser is arguably the most important software application running on my machine and given that I’m unlikely at any one time to run two browsers, I don’t care if firefox uses up to 25% of my memory, but please, no more. When I’m not browsing the web, I’ve got to do real work like research and teaching. Interestingly, the release notes don’t mention anything about improved memory usage. Also, my Mandrake box (running 10.1) doesn’t seem to have this problem, nor do my windows boxes, irrespective of the Firefox version number. Anyhow, I’m crossing my fingers, hoping that Firefox will be well behaved this time.

Friday, October 7th, 2005

Google launches an online RSS aggregator

Filed under: — Daniel Lemire @ 20:08

Google did it, finally. They launched a beta of their RSS aggregator. It is still a bit immature, but I’m trying it out. I’m already a bit fan of gmail which has become my sole email client.

Amazon’s Developer Contest

Filed under: — Daniel Lemire @ 8:35

Amazon is launching a web services developer contest:

Build an innovative and entrepreneurial application using Visual Studio 2005 with Amazon Web Services and you can win the grand prize: $5,000!
The first 100 entries will receive an Amazon Web Services t-shirt.

The deadline to submit your software application is December 31, 2005. If you ask me, the prizes are not very high, and the requirement that you’d be using Visual Studio is a bit sad, but a contest is a contest and if I were a student, I’d participate.

« Previous PageNext Page »

37 queries. 1.333 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.