Monday, April 17th, 2006

ACM International Collegiate Programming Contest Final Results

Filed under: — Daniel Lemire @ 20:44

The ACM International Collegiate Programming Contest results are out. The top ten schools are given below.

  1. Saratov State University (Russia)
  2. Jagiellonian University - Krakow (Poland)
  3. Altai State Technical University (Russia)
  4. University of Twente (The Netherlands)
  5. Shanghai Jiao Tong University (China)
  6. St. Petersburg State University (Russia)
  7. Warsaw University (Poland)
  8. Massachusetts Institute of Technology (USA)
  9. Moscow State University (Russia)
  10. Ufa State Technical University of Aviation (Russia)

Asian Universities are beginning to dominate

Filed under: Academia/Research, Science and Technology — Daniel Lemire @ 19:54

Asian Universities are beginning to dominate the Top Institutions portion of the Top Scholars and Institutions survey conducted and published by the Journal of Systems and Software each year in October. In the latest survey findings, published in October 2005, among the leading institutions of the world based on counting the number of software engineering research publications emerging from them, three of the top five institutions are Asian. Korea Advanced Institute of Science and Technology is number one, National Chiao Tung University of China is number two, and Seoul National University of Korea is number 4. The non-Asian institutions in the top five are Carnegie Mellon University (including its Software Engineering Institute), at number three; and Fraunhofer Institute for Experimental Software Engineering, at number five. What is striking about this particular fact is that as recently as three years ago, in earlier such survey findings, Asian schools were only marginally represented in the top 10, and the top institutions were clearly North American.

Source: Robert L. Glass, Practical programmer: Is the crouching tiger a threat?, Communications of the ACM, Volume 49, Number 3 (2006), Pages 19-20.

Friday, April 14th, 2006

What are the computer langage people waiting for?

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 11:41

The glorious time when people could design a new insightful computer language is gone.

Or is it? In our Data Warehousing and OLAP classes, we cover MDX and various APIs for OLAP. Arguably, MDX is de facto the standard OLAP language. But as far as languages go, it is just ugly. Microsoft chose to mimick closely SQL and yet, extend it dramatically into a multidimensional setting with a large dose of abstraction. I’ve never designed computer languages, but I’ve used them and just like a painter can recognize a bad brush even if he can’t design a brush, I just don’t like MDX.

But even if I’m wrong, you can’t hope to teach MDX to a busy decision maker even if he has sufficient programming experience:

I believe that OLAP using MDX with Mondrian requires expert language knowledge and it would be very difficult for a user, with only domain knowledge, to be able to issue correct queries. (Hazel Webb)

What is needed is a simpler, easier langage. Something someone who knows about control structures (loops and if clauses) and has a basic understanding of what a data cube is (drilling-down, rolling-up, slicing and so on), can quickly pick up and use, say within a day.

Would make a great Ph.D. thesis.

Monday, April 10th, 2006

Slashdot: Why Is Data Mining Still A Frontier?

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 21:31

Slashdot asks “Why Is Data Mining Still A Frontier?” The article itself is not very exciting, but the comments are great. Here are some I like:

I would suggest that, in practice, the real difficulty is that the problems that need to really be solved for data mining to be as effective as some people seem to wish it was are, when you actually get down to it, issues of pure mathematics. Research in pure mathematics (and pure CS which is awfully similar really) is just hard. Pretending that this is a new and growing field is actually somewhat of a lie.

Available datasets are not themselves in anything like normal relational form, and so have potential internal inconsistencies. And that gets in the way before you even have the chance to try to form intelligent inferences based on relations between data sets, which of course are terribly inconsistent.

The ultimate problem, is that for most datasets, there are an infinite (at least), set of relations that can be induced from the data. This doesn’t even address the issue, that the choice of available data is a human task. However, going back to assuming we have all the data possible, you still need to have a specific performance task in mind.

To sum it up:

  • Data Mining requires hard and fancy Mathematics.
  • Data cleaning and integration is hard.
  • There are infinitely many ways to mine data and it is not obvious a priori what is useful.

I think Data Mining is a beautiful research topic. However, as the comments indicate, it is very hard and it requires a wide ranging expertise.

Kunal Anand: Some XML exam questions

Filed under: — Daniel Lemire @ 10:19

Kunal has almost picked up my challenge on his blog: come up with deep homework questions having to do with XML.

  • Given at least 10 blog/link feeds, determine the top ten outbound URLs?
  • Parse an iTunes library file and capture all the unique artist/albums.
  • Given a user’s XML file from del.icio.us, determine the top 10 intersecting tags.
  • Scrape a dynamic list from a web site (i.e. the Google Zeitgeist) and serialize a well-formed Atom feed.

The last one seems like mostly hard labour probably requiring quite a bit of fiddling.

The other ones are all interesting because they are examples of aggregation and that’s not trivial to do in XSLT/XPath. Naturally, Kunal suggests to solve these problems using a nice script language like Python, but solving them in XSLT is much more fun because it is harder.

Friday, April 7th, 2006

CSWWS 2006: Call for Participation

Filed under: — Daniel Lemire @ 16:48

The 2006 Canadian Semantic Web Working Symposium will be held June 6th at Laval University, Quebec, Canada. We invite you to attend. This event which will be held in conjunction with Canadian AI 2006.

  • On-line registration is available. THERE WILL BE NO ON-SITE REGISTRATION.
  • Industry partner and sponsor: OntoText (ontotext.com)
  • One plenary talk by Professor Michael N. Huhns, Director of the Center of Information Technology at the University of South Carolina,
    USA.
  • Two Tutorials:
    • State of Affairs in Semantic Web Services
      Michael Stollberg
      Leopold-Franzens Universität Innsbruck, Austria
    • MDA Standards for Ontology Development
      Dragan Gasevic, Dragan Djuric, Vladan Devedzic
      Simon Fraser University, Canada
  • Proceedings in the Semantic Web and Beyond series of Springer-Verlag:
    • A Trust Model for Sharing Ratings of Information Providers on the Semantic Web
      Jie Zhang, Robin Cohen
    • Ontoligent Interactive Query Tool
      Christopher Baker, Xioa Su, Greg Butler, Volker Haarslev
    • A Rule-based Approach for Semantic Annotation Evolution in the CoSWEM System
      Phuc-Hiep Luong, Rose Dieng-Kunt
    • Incorporating multiple ontologies into IEEE learning object metadata standard
      Phaedra Mohammed, Permanand Mohan
    • Applying and Inferring Fuzzy Trust in Semantic Web Social Networks
      Mohsen Lesani, Saeed Bagheri
    • Integrating Ontologies by Means of Semantic Partitioning
      Gizem Olgu, Atilla Elçi
    • DatalogDL: Datalog Rules Parameterized by Description Logic
      Jing Mei, Harold Boley, Jie Li, Virendrakumar C. Bhavsar, Zuoquan Lin
    • Completion Rules for Uncertainty Reasoning with the Description Logic ALC
      Volker Haarslev, Hsueh-Ieng Pai, Nematollaah Shiri
    • Fulfilling the Needs of a Metadata Creator and Analyst - An Investigation of RDF Browsing and Visualization Tools
      Shah Khusro and A. Min Tjoa
    • A Semantic Web Mediation Architecture
      Michael Stollberg, Emilia Cimpian, Adrian Mocan, Dieter Fensel
    • Resolution-based Explanations for Reasoning in Description Logic ALC
      Xi Deng, Volker Haarslev, Shiri Nematollaah
    • A Distributed Agent System Upon Semantic Web Technologies to Provide Biological Data
      Farzad Kohantorabi, Gregory Butler, Christopher J.O. Baker
    • Toward the Identification and Elimination of Semantic Conflicts for Integration of Ontologies
      Yevgen Biletskiy, David Hirtle, Olga Vorochek

    A Birds of Feather and Poster sessions are also scheduled with the
    parallel session.

You can purchase the proceedings from Amazon.

Thursday, April 6th, 2006

Bill Gates is a cheap bastard!

Filed under: Business / Economics / Politics — Daniel Lemire @ 15:14

Here’s a picture of Bill Gates in his office.

I’m not impressed.

« Previous PageNext Page »

32 queries. 0.381 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.