Friday, August 18th, 2006

Embedding fonts for IEEE

Filed under: — Daniel Lemire @ 16:22

IEEE requires that your PDF files embed all fonts. Earlier, I told you how to get pdflatex LaTeX to embed all its fonts, but what if you are including figures that have hanging fonts and you are getting desperate? Luke Fletcher gives us the solution.

  1. convert to ps: pdftops ICRA05.pdf
  2. convert back to pdf using prepress settings: ps2pdf14 -dPDFSETTINGS=/prepress ICRA05.ps
  3. check new ICRA05.pdf for horrendous formatting errors due to double conversion.

(Source: Owen Kaser.)

Subscribe to this blog
in a reader
or by Email.

A Tectonic Shift in Global Higher Education

Filed under: Academia/Research — Daniel Lemire @ 10:34

(…) India, which accounts for a quarter of the developing world’s population and has the third largest higher education system in the world. Today, 23 percent of all higher education enrollments in India are in distance education–specifically in 13 national and state open universities and 106 institutions, mostly public, that teach both on campus and by correspondence. The government’s target is that by 2010, 40 percent of all higher education participation will take place using distance education.

(A Tectonic Shift in Global Higher Education, Sir John Daniel et al., August 2006.)

Thursday, August 17th, 2006

Google launches online, shareable, spreadsheet tool!

Filed under: — Daniel Lemire @ 12:12

Google has done it again! Spreadsheets.google.com offers free (as in “no money”) shareable, online spreadsheets. The UI feels a lot like Excel and you can save and load Excel documents. Unfortunately, it does not appear to support the Open Document Format. Unlike Excel, you can easily share your Google spreadsheet.

spreadsheets.google.com image

Wednesday, August 16th, 2006

Get an RSS feed of your favorite researcher

Filed under: Academia/Research, Open Access — Daniel Lemire @ 20:11

Want to monitor the publications of a researcher? As long as he submits his papers to arXiv.org and/or Cogprints, you can use citebase to get a RSS feed: enter the author’s name, do a search, then click on the RSS link. ArXiv also has RSS feeds if you are only interested in this particular repository.

Source: Peter Turney.

Further reading: See my earlier post on this topic.

Technique without theory or theory from technique? An examination of practical, philosophical, and foundational issues in data mining

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 11:44

Korukonda wrote a paper in the Journal of Human-Centred Systems (AI & Society) putting into question the (philosophical) foundation of Data Mining as a science. He puts it quite bluntly:

the current status of Data Mining as an intellectual discipline is tenuous.

This is of particular interest to me since I consider myself a Data Mining researcher. Unlike Korukonda, however, I consider that Data Mining can be equally user-driven (as in OLAP) or data-driven. I do not think that there is a well established definition of Data Mining, except that we all agree it has to do with analyzing lots of data. The lack of a theoretical foundation for Data Mining is a well known problem which was explicitely identified during IEEE Data Mining 2005 as one of the top 10 challenges facing the community.

Korunda makes some good points. First of all, he reminds us that “Knowledge from Data” can be misleading. He gives the example of “idiot correlation” where the wrong hypothesis is tested. He takes an example from the New York Times where a reporter wrote that they noticed a 4.5% increase in sales at Red Lobster in March despite the war in Irak. Why would this be interesting? It is well known that when working over large data sets, you will invariably find surprising relations between different variables, but these relations are not necessarily meaningful. The fact that they are well supported by the historical data is not sufficient to make them useful. Statistically, because the number of possible relations is exponentially large, some of them are always supported by the historical data. A review of the Data Mining literature would prove him right: many researchers design systems to blindly find relationships. This is one reason why I prefer user-driven Data Mining as in OLAP where meaningless relations are automically discarded by the user.

Nevertheless, he correctly points out that even though the process is flawed, it could still be that it is a useful paradigm. When facing large data sets, you can either give up or try something. Data Mining is the best paradigm we have for these types of problems.

Being a nice guy, Korukonda gives us the way out:

This focus needs to shift to the “why” questions, if DM is to establish itself as a scientific investigative tool or as a long-term solution to business problems. In other words, the outcome of DM should extend beyond discovery of patterns to finding causal explanations for the observed patterns and relationships.

Meanwhile, he cites Taschek who tells us that Data Mining is dead:

One reason data mining as a term failed was because data mining products did not work. Sure, the technology theoretically allowed companies to dig through historical data but it never lived up to its promise. (Taschek, eWeek, 2001)

I think that the Taschek quote must refer to data-driven Data Mining, because user-driven Data Mining is doing quite well with a worldwide market of 6 billions$ in software products alone.

Maybe the real question is whether it is sensible to take the human out of the loop. I always think that as long as strong AI escapes us, we have to keep the human in there because that’s our only chance for intelligence. Yes, paying an analyst to look at your data is expensive, but you can lower the costs by giving him the best tools money can buy.

Monday, August 14th, 2006

The Scare Effect

Filed under: — Daniel Lemire @ 15:06

Stephen cites Gwynne Dyer on these new scary terrorists who use liquid explosives:

Maybe it’s cynical, but there are strong grounds for suspecting that this is all a charade. If they infiltrated these terrorist cells many months ago and now have arrested most of the members, then why would they institute drastic new security measures on flights at this point? And did they really only realize in the last few days that explosives come in liquid form as well?

I’ve known since I was a kid and I was watching western movies that there are super-powerful liquid explosives. Putting those in bottles and getting in a plane is not exactly rocket science.

Either we have been assuming that terrorists (Muslim or otherwise) are stupid people, or else, our governments think we are fools. They had to know, for years and years, that terrorists could use liquid explosives. Aren’t our governments supposed to anticipate threats?

If you infiltrate a terrorist cell and find out they are clever and have good tricks to blow up airplanes, why would you make this public? Surely, you want to tell other governments, but why the big media outburst with all the juicy details on how you can fit enough explosive in a toothpaste tube to blow up a plane? Wouldn’t it be a better strategy to just arrest these guys, go to court and put them in jail? Aren’t you just helping to promote terrorists by turning the public light on young fools? Whether the airplanes have been blown up or not, these guys have won by showing that you can have a huge worldwide impact just by plotting such a scheme. The proponents of the war on terror have also won by heating up the security issue once more.

Clearly, this is all part of a public relation scheme.

I hear some guys in the USA got arrested for having too many cellular phones (and being muslims didn’t help). Yes, you can blow up planes using cellular phones as detonators. Yes, muslims own cellular phones. Where are we getting at?

It looks like we are building a case against Islam. That’s what it looks like. Yes, I want terrorists arrested efficiently. Yes, I want sane security measures taken. But, no, I do not want a war on Islam. Sometimes, there are wars you cannot win.

(Disclaimer: I’m an atheist.)

IRMA 2007 - Data Warehousing and Mining track (October 1, 2006 / May 19-23, 2007)

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 8:29

The IRMA 2007 International Conference has a Data Warehousing and Mining track organized by Andrew Kusiak and Qiang Zhu. The conference will be held in Vancouver.

A key to success for enterprises in today’s competitive markets is their ability to manage the staggering volumes and complexity of data from various sources in an efficient and economic manner. Data warehousing and mining have become prevailing technologies for data analysis, knowledge extraction and decision support in modern enterprises and organizations. The emergence of vast applications and other related software/hardware technologies continues to raise challenges (e.g., demand for real-time, active, mobile, parallel, distributed, secure, and spatio-temporal characteristics) for data warehousing and mining. The objective of this track is to provide a forum for researchers and practitioners to disseminate and exchange ideas on both the technical and managerial issues associated with data warehousing and mining.

Papers on pretty much every related topics are invited.

« Previous PageNext Page »

33 queries. 0.402 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.