Computer Science Departments will not survive

When asked what would happen if Computer Science departments do not collaborate in the creation of a new Academic field, Web Science, Ben Shneiderman, from the University of Maryland, said: “they will not survive.” At least, his point of view is clear.

Writing and Maintaining Software are not Engineering Activities

Raghavendra Rao Loka, an experienced software developper working for Synopsys, wrote a column for IEEE Computer (February 2007, pages 110-112), Software Development: What Is the Problem? He has written what I claimed earlier is known to anyone who has done real software development for a living, that is…

Many developers view software development (…) as a science or engineering activity(…) Writing software is neither: I view it as a craft or art, similar to the work required of teachers and writers. (…) So it’s not clear why we call software development software engineering. (…)

Compare this with what I wrote earlier…

(…) saying that software projects fail for lack of engineering is like saying that the latest Stephen King’s novel is boring because he forgot to draw a UML diagram of the book.

One nagging bit we inherit from the myth of software development as software engineering are schedules. Raghavendra Rao Loka could not be clearer on this topic: “I expect experts to be forthcoming about schedules and their irrelevance.”

Further reading:

Why is Computer Science Education Obselete?

I spent a great deal of time last time thinking about why Computer Science education fails to attract as many students as it once did. One of the most significant event as of late, has been the launch of the Web Science Research Initiative. Basically, Tim Berners-Lee has concluded that a Computer Science education is not an ideal foundation to study the Web. Why is that? I think the answer is probably related as to why, if we discard Software Engineering, Computer Science is really the new Physics: attracting a few bright individuals, but failing to attract crowds.

Computer Science was founded by Mathematicians and Physicists as a Data Processing Science. And because Computers are Data Processing Devices, this new science would be the science of computers. What an attractive proposition!

Except that computers are not data processing devices: they evolved beyond this initial status. Modeling computers as Turing Machines is no longer useful in most cases. The Web is hardly a physical network of Turing Machines. Computers are very social and cultural devices. Building a new application like YouTube has very little to do with programming a data processing device. It would seem like Tim agrees with me:

Within computer science, Web-related research has largely focused on information-retrieval algorithms and on algorithms for the routing of information through the underlying Internet. Outside of computing, researchers grow ever more dependent on the Web; but they have no coherent agenda for exploring the emerging trends on the Web, nor are they fully engaged with the emerging Web research community to more specifically focus on providing for scientists’ needs. (Berners-Lee et al., Science, 2006)

Information Technology is more important than ever. Computers are more important than ever. But, alas, it does not follow that Computer Science is still relevant. I studied carefully many programs out there and it seems obvious to me that Computer Science education is not adapting to what computers really are used for in 2007. It is basically staying true to the vision of Computer Science as a data processing science.

See also my post More CS Ph.D.s than ever, what about research jobs?

Subscribe to this blog
in a reader
or by Email.

TexMaker: a cross-platform LaTeX editor

I just installed TeXMaker. this is one of the best LaTeX editor I have come accross. Better yet, it runs under Linux and MacOS. Version 1.5 just came out.

Taking charge of your IT

CIO (and slashdot) reports on the new trend where users take charge of their IT needs. I’m one of these pesky users. I have my own server (daniel-lemire.com) complete with wikis, blogs, version control and so on. I use google mail for my email. I manage my own computers.

This is not limited to IT, by the way. I also bypass almost entirely my school’s library. I use Google Scholar. All UQAM gives me is access to documents through a proxy.

This trend is not really new. Back 30 years ago, secretaries would type scientific papers. Today, you have to be a pretty rich researcher to have a secretary who types your papers and technical reports. (Admittedly, I’m more than happy to delegate some of my funding applications.)

Computers are about giving users more control, not less. We shall delegate less to human beings in the future, not more. But we will grow more dependent on computers. Let us hope they do not become evil!

This does not mean that we will all be out of a job in the near future. But if you do a lot of routine labor, and managing a server is routine labor, then plan on getting a better job. Either your job will get outsourced, or servers will get smarter.

JavaScript is interesting

If you think JavaScript (errr… ECMAScript) is uninteresting. Think again! This Yahoo! talk ought to change your mind (part 1 of 3):

How artificial intelligences are already at war with us

In the most recent Communications of the ACM (February 2007), Joshua Goodman and his coauthors tell us, in Spam and the Ongoing Battle for the Inbox1, that it is very difficult to build reliable CAPTCHAs or (reverse) Turing tests, to differentiate machines from human beings. In the most reliable tests, machines had a success rate of 5% whereas in other cases they had a success rate of 67%. This may seem to be a high failure rate (95%), but this only means that the machine needs to try 20 times on average to succeed once. So you slow down the machine by a factor of 20 (in the best of cases), and since machines are thousands of times faster than human beings, you have achieved very little. They do not report human error rates, but I know that I fail Blogger’s tests routinely and I’m not an idiot (though you may think otherwise if you wish), not blind, and so on.

This is not just a theoretical concern. I have used visual CAPTCHAs before on my blog and they failed me. I still got spammed. The solution I know use is to apply a very simple CAPTCHA but one that is unique to my blog. Since I am not a very popular blogger, I hope that spammers will not bother breaking my CAPTCHAs. If I ever, by some strange turn of events, became a popular blogger, my solution would be to craft routinely new CAPTCHAs.

This means that there are AI bots out there at war with legitimate bloggers.

To those who doubt AI can be used for evil purposes, well, there you go. There are people out there purposely designing AIs for evil (spamming is certainly unethical). We are not talking about the military. We are not talking about crazy scientists. We are talking about the worst kind of evil masterminds: greedy unethical capitalists.

1- They cite Using Machine Learning to Break Visual Human Interaction Proofs by Chellapilla and Simard.

Crash course in sane Web programming

What the current SOAP fad has done is to make us forget how to build and deploy applications on the Web according to the true HTTP specification. Even wikipedia is incredibly confused and confusing with respect to HTTP. It is ridiculously simple, but overly ignored and misrepresented.

GET Get some resource identified by a URI. This request should not change the state of the resource.
The resource itself may change over time however.
POST

Add a new resource (post a new message, a new comment, a new post, a new file) or modify an existing resource. The provided URI is not the URI of the new resource, but rather the URI of a related resource (for example, the URI of the blog or posting board).

PUT

Create or replace a resource having the given URI. This method is idempotent!

DELETE Delete a resource.

What does this mean?

  • A POST from should never replace a resource. A POST form cannot be used to edit a post and is safe.
  • GET queries are stateless. No matter who does the GET, the same result should come out. If I copy and paste a URL in my browser and pass it to someone else, they should end up with the same resource. A GET query cannot create, change or delete a resource. GETs are safe. I should always be able to follow a link without fear of deleting or buying something.

As to why this might not work, see what Parand had to say about it.

Would you pass my XML course?

Some people will love this. I prepared a mockup exam for my INF 6450 students. See if you can pass it (in French, but you can probably grok most of it if only you know the basic XML vocabulary). I’m generally impressed how well my students get by in this course. The full XML course is online, but requires you to have Firefox (warning: sometimes my server is slow).

According to “highly reputable” (well…) people, this is a Mickey Mouse course. But do not take their word for it, go see yourself (with Firefox 2.0 or better). Indeed, there is no Software Engineering. No real Computer Science (as in, algorithms, data structures, and so on). Well, I do offer a real Computer Science course, but I still think that teaching XML is way cool and fully justified. It is a programming and IT course. Programming is fun. Getting by with crazy declarative languages like XSLT is hilarious. Figuring out how to do aggregations in XSLT is really a nasty problem (with several elegant and simple solutions). Figuring out how to intersect sets in XSLT, given that all you have is a union operator, is really fun too. And you never have a student ask you why he needs to learn this. Students see immediately why this is required to be a top-notch Web developer.

I still do not cover very well XQuery or XSLT 2.0. I’m starting to cover CSS 3.0, but barely. MathML is poorly supported so I do not go far in it.

XQuery seemed nice, but I’m still waiting for the real cool applications. So far, XQuery is still, to me, a poor man’s XSLT.

XSLT 2.0 looks good, but support for it is still rare and I still do not have a good use case. Certainly, XSLT 2.0 cleaned up a few things, but I was carefully not to introduce my students to the nasty parts of XSLT 1.0 which is good since they go away now. Regular expressions in XSLT 2.0 is a nice feature but it almost seems like this requires not special introduction: if you know both regular expressions and XSLT, then there is nothing special happening. Being able to generate several documents might be nice, but I still do not see the use case and it seems a trivial addition anyhow.

XLink? Badly supported, not exciting. Still useful in, say, SVG, but trivially so.

SVG? Might be nice, but it is painful to do by hand. In theory, you could have data being transformed to SVG through XSLT, but do people really do that?

XSLFO? No use case. DocBook does fine if you need to generate PDF technical reports. Want to generate bills in PDF? I cannot imagine doing it in XSLFO. Do people really do that?

AJAX is nice and it is a great DOM API use case. But the cross-browser issues are so terrible that you can only go so far.

Java-wise, I now try to show that there are several ways to tackle XML. For example, the iterative approach is rarely included and I think it is very nice.

J2EE, web services? I cover REST (quickly done), I cover some SOA. For the rest, my course is already packed and I do not want to get into enterprise computing which I think is boring and totally lacking of real innovation.

Ontologies? I cover RDF which is barely useful, but still has good use cases (Dublin Core and 3 or 4 others), but anything beyond that is probably a waste of my (undergraduate) students.

DTD, Relax NG… these are the good guys, but they are barely useful. XML is at its best as an extensible language which is not a very schema-friendly concept. Very, very few people need to write DTDs or Relax NG schemas. You sometimes need to read them to figure out what you have to output, and it is useful to check that you are producing good XML, but validating is usually a waste of time unless you have problems. XML Schema? Please! Let us not waste time with this pitiful excuse of a spec.

(Disagree with my statements? Please comment!)

The Web is not virtual

The Web is not virtual. Amazon.com is an actual store. An online course is an actual course. Email is not virtual communication. Communities on the Web are not virtual.

Something is virtual if it is a mere representation of what is. My blog is a virtual notebook: it is not a notebook. But my blog is not virtual! It is a real blog! My identity on the Web is not virtual, but an avatar in a video game is a virtual me. A virtual community would be a representation of a community, so real people and real communication between these people would not occur. Virtual memory is virtual because we make software believe that there is memory, when really, there is none. We commonly work with virtual hardware: you make the operating system believe that it runs directly on a machine, when, in fact, it runs inside a software box emulating a machine.

Something virtual is not real. If it is real, it cannot be virtual.

The word virtual is a dangerous one used by reactionary folks who like to dismiss anything electronic as not being quite real. It is deeply rooted in reactionary thinking. For example, they sometimes suggest that electronic meetings have to happen in virtual rooms with virtual chairs. (Think Second Life.) Experience shows that in the electronic reality, it is often better not to virtualize the real world for the very simple reason that we can do better than setup a virtual reality, we can create an actual one that works better. The Web has no virtual chair, virtual corridor, and so on. The Internet has real Web pages, real blogs, real instant messaging, and so on. Virtual representation of people do not work well, it is better to build real Web identities.

Next Page »

19 queries. 0.422 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.