Should Academia care for standards?

In the December 2006 edition of IEEE Computer, Simone Santini, from the Universidad Autónoma de Madrid asks “Standards: What Are They Good For?” The gist of his argument is that using concepts like XML in Computer Science is harmful. He argues that there is no such thing as XML technologies since XML, like all standards, is nothing but a rather slimy mix of politics and industrial concerns. He says that no W3C standard is deserving of a Computer Science paper. (Does he know that W3C never issued a standard?)

Why? Mostly because, “having academia operate on industrial principles makes about as much sense as having the industry operate on academic ones.” Interestingly, he argues that if Computer Science had been as focused on standards 40 years ago as it is now, we would still be programming in Fortan and Cobol on OS/360 operating systems.

You see the problem in his argument? First, academia should not operate on industrial principles, but yet, somehow, academic research is meant to lead to industrial progress (such as Java, C++ and Windows XP?).

I like the guy, for sure, just because he dares to publish this in IEEE Computer… an engineering publication. Do you know people more obsessed by standards than engineers? He is not going to win a popularity contest and I am amazed he managed to get this opinion piece to press.

Here are some arguments I would like to submit to him:

  • There is zero evidence that having researchers interested by standards, such as people studying XML, is having a harmful effect on the pace of technology. It may harm some theoretical research work, but I seriously doubt it.
  • Computer Science, as a branch of Mathematics, should not care about XML or the W3C. But academic research does not stop at Computer Science. Information Technology is an increasingly important discipline and it cares very much about Web standards. Moreover, Computer Science is sometimes considered and engineering discipline (think about “software engineering”) and engineers need to care about standards . Finally, Computer Science should be an empirically discipline and not just a branch of Mathematics, and in this respect, it should care about standards and study them from a scientific point of view (maybe to improve them!). For example, XML is a bit more than just an “unranked labeled tree”: it is one of the most interesting phenomenon in Information Technology that I know of. Choosing not to study XML would be like choosing not to study the Web.
  • Not all standards are ugly compromises. It often goes down that way, but some standards are fun and interesting.

So, my answer is clear: yes, academia should care about standards, despite Simone Santini‘s point of view.

(Disclaimer: I do not think that any of my papers have been standards-centered, and most, if not all, are not standards-aware. I do not generally write papers about XML or XQuery or XHTML. My point though is that such work is perfectly legitimate.)

Christmas parties are more fun with kids

[photopress:lohan_avion.JPG,full,pp_image][photopress:louka_police.jpg,full,pp_image]

Yes, there is snow in Montreal

I think this is our first snow man picture!

[photopress:PC2600191.JPG,full,pp_image]

A Better Alternative to Piecewise Linear Time Series Segmentation

I will present my paper A Better Alternative to Piecewise Linear Time Series Segmentation at SIAM Data Mining 2007 in April. The paper is available on arxiv.org (pdf).

Here’s the abstract:

Time series are unstructured data; they are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can determine where the data goes up or down and at what rate. Unfortunately, when the data does not follow a linear model, the computation of the local slope creates overfitting. We argue against declaring as flat, intervals where the slope is not significant. We propose an adaptive time series model where the polynomial degree of each interval vary (constant, linear and so on). Given a number of regressors, the cost of each interval is its polynomial degree: constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so on. Our goal is to minimize the Euclidean (l_2) error for a given model complexity. Experimentally, we investigate the model where intervals can be either constant or linear. Over synthetic random walks, historical stock market prices, and electrocardiograms, the adaptive model provides a more accurate segmentation than the piecewise linear model without increasing the cross-validation error or the running time, while providing a richer vocabulary to applications. Implementation issues, such as numerical stability and real-world performance, are discussed.

Is this web page trying to sell me something?

Mindset is a research program to train software to recognize commercial pages. One application of this tool is that you can try to exclude commercial pages out of the result set.

Of course, if this is as good as spam filtering, people will only be partly happy with the results. And yes, there are many commercial Web sites trying to pass out as non-commercial. Everyone is out to sell something, afterall.

Interesting question: would you ever want to do the reverse, that is, exclude non-commercial content?

(Source: Turney.)

We do not need to teach math and science

Roger Schank, a math wiz, says we do not need to teach math and science.

What (…) makes no sense is the idea that math and science are important subjects. You can live a happy life without ever having taken a physics course or knowing what a logarithm is.

On the other hand, being able to reason on the basis of evidence actually is important. Thinking rationally and logically is important. Knowing how to function in a world that includes new technology and all kinds of health issues is important. Knowing how things work and being able to fix them and perhaps design them is important.

Lets get serious. We don’t need more math and science. We need more people who can think.

Of course, while I agree with his point, he is being a bit too hasty. If you want to run a business, you need to know that if you save 10% and then save 10% again, you do not save 20%. You need to know that if you sell something US$10 and US$1 = CAN$1.5, then you sell it US$15. So, we do need to teach mathematics if only because everyone (at least in Canada) is responsible for filling out tax forms. Yes, you can get your tax forms filled out by an accountant, but, in principle, you are the one responsible for any mistake made.

You do not need to know what a logarithm is? Depends what you do for a living. If you are a programmer and you need to sort a bunch of entries, you need to understand that the first algorithm you will think up (typically Bubble sort) is not going to cut it if you have to sort 10,000,000 entries. If you want to really understand the issue you need the concept of logarithm.

What about trigonometry? Most people working in a factory doing non trivial work need a basic understanding of trigonometry.

I must admit that I have little use for the chemistry I learned, but then, I have little use for the geography either. Who needs to know where is Val d’Or?

Where he is right however is that math and science are not as important as some lobbies make it out to be. I know a lot about how to solve nonlinear differential equations. Way more than I need considering I haven’t seen a nonlinear differential equation in nearly 10 years.

But to be fair, education takes a long time to adjust. My education was modeled after the space race. We were all to be rocket designers or astronauts (or maybe cosmonauts if you were a pessimist). So Physics, differential equations, algebra, and so on, were thought to be central. It turns out that it is a fringe subject. Few people work for the space industry.

In Computer Science, for a long time, we thought that we were limited by our computing resources so designing extra efficients algorithms was very important. As it turns out, Tim Berners-Lee convinced me that we are not building the future out of algorithms.

So, one generation teaches the next what they think is most important. The old generation is almost always wrong. Yet, it almost seems not to matter because apparently, we manage to go forward.

Nothing to worry about here. The Americans will not go bankrupt because people in Bulgaria are twice as good at math. If you learn math, you probably learn to think straight, but there are other ways to learn to think straight.

(Source: Downes.)

My predictions for year 2007

It is this time of the year again. First, before I share my predictions for 2007, let us look at how well I did for my 2006 predictions.

  • Microsoft Flight Simulator X comes close but this prediction didn’t come true because optical disks are not doing so well and the video game market is disappointing: 100 GB of storage on a single optical disk will be common by the end of 2006. Amazing video games using upward of 30 GB will come on the market and impress reviewers. I imagine a flight simulator containing the complete maps of the entire planet including every single house.
  • Mostly true, see for example Google Analytics. Google will still be the most interesting player by the end of 2006. They will leverage their massive storage capacities to do amazing Data Mining and they will know, better than anyone else, what the pulse of the planet is. Google will start analysis social trends and will get into decision support.
  • Not true. Generally speaking, year 2006 will be the year Data Mining becomes mainstream. Data warehousing will increasingly be a big deal for large corporations and we will see shortages in Data Engineering.
  • I say this one came true. Thanks in part to fancy open source content management software, eLearning will grow in most universities. By the end of 2006, we won’t be asking “why eLearning” but “how eLearning”.
  • Clearly, the Netflix contest and the surge of interest for collaborative filtering makes this one true. eCommerce will all be about personalization and Data Mining, and much less about work flow and web site design.

So, that’s 3 true predictions and 2 failed predictions. That’s slightly better than my 50% rate of last year. Let’s be daring for 2007:

  • We will see something like “Google Games”.
  • We will see something like “Google Slides/PowerPoint”. Google will offer a full office suite on the Web and it will be pretty good for 80% of the office tasks.
  • Governments will take tougher measures to stop spam and other illegal online behavior. We will see a lot more cybercops around.
  • Television will become more irrelevant than ever.
  • Apple will continue to grow and gain mindshare.
  • Since all machines will be connected all the time on the Web, OS-agnostic Web-based office software will be a big deal by the end of 2007 and it will start to make a dent in Microsoft’s monopoly to the point where Microsoft will have to acknowledge it and start reacting, in some way. We will come to see this as the end of an era: the operating system and office software will become secondary. The Open Document Format will gain some real mindshare, mostly in Europe.
  • Ontologies, queries by natural language processing, Semantic Web, all these things will fail to make a dent in Google’s monopoly.
  • Blogging will still be popular. Maybe the number of blogs will go down, but the quality of the remaining blogs will be good and the technology will improve. There will be tricks beyond ping/talkback to network the various blogs.
  • Occidental universities will increasingly focus on continuing education. We will see more and more quality offers to complete one’s education with a master degree or certificate taken online. While it has been a secondary, and not so interesting, cash cow so far, it will become a vital issue in many universities as the number of foreign students starts to diminish.
  • Video blogging will be common: I’ll be subscribed to at least two video blogs.
  • Videoconferencing will be mainstream. My wife, my colleagues will be using it regularly. We will finally have “phones with pictures” though we will be using our computers to get the desired effect.
  • Within academia, posting talks on the web using digital video will become common.
  • The WS-* SOA stack will still go nowhere.
  • For less than 4000$, I will be able to buy a PC or the equivalent, with 10 TB of storage.
  • Carrying a laptop will be out. People will carry tiny computers, as cell phones are, but laptops are too large to be convenient. With most of our data and applications on the Web, we will stop breaking our backs. Hotels will start offering nice computers you use to do real work.

Roleplaying is Indicative of a Delusional Mind?

According to Wired, the Isreali army does not like roleplaying:

Israeli officials view a fondness for Dungeons and Dragons (D&D) as being indicative of a delusional mind, RPGers are out of touch with reality. “The game indicates a weak personality,” one security official said. “One of the tests we do, either by asking soldiers directly or through information provided us, is to ask whether they take part in the game,” he added. “If a soldier answers in the affirmative, he is sent to a professional for an evaluation, usually a psychologist.”

Ok. Where to start?

Back when I was in High School, we did play D&D, hidden in a corner, under the stairs. Yes, I admit it, I was a nerd. Oh! Didn’t you know that I’m a nerd? Though I prefer “hacker” these days.

Back then, there were serious concerns that playing D&D was like being in a dangerous cult. I was attending a very religious institution whose moto is now akin to learning to serve (s’instruire pour mieux servir), I kid you not! All the adults around me were horrified to hear about what we were doing. Things took a turn for the worse when we grew an interest for Lovecraft. As far as I know, none of us turned into a dangerous killer. One of is is a senior software engineer, another one is a robotics engineer, and the last one… I lost track. My point being that there is zero evidence that we turned into evil people, though we all gravitated toward nerdy jobs.

Now, I learn that in Israel, I would have been sent to a psychologist?

Ok. Let us face the truth: we all play role games. That is what life is. You can turn yourself into a soldier. Yes, you might kill people or get killed, but deep down, this is roleplaying. When American soldiers go to Irak to fight for democracy, is this reality? What is real?

Roleplaying has nothing to do with being delusional. These people know that they are not facing a freaking dragon in the physical world. Nevertheless, this dragon is real in that they interact with it, learn from it, and eventually can even kill it. This dragon might change their life (for example, they may get defeated and sink into depression… who knows?).

While we are at it. I hate the way we use the word virtual. Second Life is not a virtual world. The Web is not virtual. Email relationships are not virtual. They are not. They are real relationships, real worlds. A world where people dress up, meet, have fun, learn, get to know each other, build things… that is a real world.

This is not a simple matter of semantics. Saying that things are delusional or virtual amounts to dismissing it as having lesser importance.

Which one is more important? My so-called virtual identity on the web (represented best by this blog) or my so-called real identity (my physical body)? To many of the readers of this blog, by virtual identity is far more important since they never get to meet me. So, this Web identity is not virtual at all! Not for them! But my physical body might as well be virtual, for them.

We know of Jesus (sorry, I’m not religious at all, but this is a good example) through books. How is it different from knowing someone through email? Is Jesus real? Many people think so. Christians and muslims do, at least. Is he virtual? Any more so than me?

What is virtual then? It is is a representation of what is. The software model of a store is virtually the store: it represents the store, but it is not a store. Be careful though. Amazon.com is an actual store, not a virtual store! But my blog is a virtual notebook. It is not a notebook! But it can represent a notebook. If I use an icon to represent my identity, it is a virtual identity. But my identity on the Web is not virtual. In object-oriented programming, the objects and classes represent virtually the component of your system: the software class “Student” is a virtual representation of a student.

Please send this post to the Israeli army. Anyone?

Collaborative Filtering Made Easy

Bryan O’Sullivan wrote an excellent Collaborative Filtering tutorial. He gives out the first Slope One Collaborative Filtering algorithm in Python that I know of. Excellent work.

slidePresenter: Very Good (Free) Remote Slides Presentation Software

I have been looking long and hard for a way to project slides remotely. I once proposed, on my site, that the ideal solution might be AJAX-based. I used with some success Webhuddle and even wrote a script to convert PDF files to a zip file of gif images for this purpose, but I don’t like to rely on a Java applet. I further tested Vyew which is AJAX-based, but found that it had unacceptable limitations (like the 50 slides limit!).

Finally, I found what I was looking for! slidePresenter is a great piece of software you can install on your server. It requires PHP, but is free software.

It works in a very simple manner. Clients connect to your site and see an image. You also see the image, but can press a forward (or backward arrow) to move the presentation forward (or backward). As you change slide, the client automatically changes slide. Installing it requires about 30 seconds. Uploading your slides is a bit trickier. If you have PowerPoint or PDF slides, you first have to convert them to images. In my case, I had to done some manual work to get slidePresenter to see my presentation, but the author will no doubt improve this over time.

Assuming you have a videoconference setup, using Skype for example, then you are ready to give (free) talks all over the world.

Next Page »

18 queries. 0.425 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.