Language, Mathematics and Programming

Even if you have extensive training in Mathematics, the average Mathematics paper is undistinguishable from the ramblings of a madman. Many of these papers seek to solve narrow problems. And yet, we respect Mathematicians.

Software programming is a form of communication, usually between human beings and machines. While different in style, programming is a subset of the language of Mathematics. If you dig into the average source code, it is undistinguishable from ramblings, even if you are an expert developer.

Yet, we denigrate programming. Many will even deny that it is a Mathematical language. But Mathematics and Programming are not so different:

Mathematics Programming
Building on the previous research papers requires you to dig through endless piles of boring, badly written research papers. Maintaining millions of lines of codes written by various people over the years is difficult, boring, error-prone.
Inventing new theorems or new mathematical theories requires much creativity. Coming up with the next best iPhone application requires much creativity.
For most people, mastering even part of Mathematics requires a decade or more. Please read Teach yourself programming in ten years by Peter Norvig.
The language of Mathematics has directly contributed to technological progress. Electricity, engines, nuclear power, space travel all required extensive use of Mathematics. Google changed the world through the brilliance of its software engineers. The open source revolution has changed how people think about collaboration.
Some Mathematicians are widely recognized as being extremely smart. Some famous people have done a fair share of difficult and technical programming : Donald Knuth and TeX, Tim Berners-Lee and the Web, Linus Tovarlds and Linux.

Why is programming getting so little respect?

  • The intense commercialization of programming has commoditized it. As Paul Graham might say : painters where initially “portrait takers”. It is only when painting lost its commercial function that it became recognized as a noble art. However, just like painters always used their free time to create great art, the best programmers are open sourcing beautiful code all the time.
  • The study  of programming itself remains rather informal. You can get degrees in Computer Science, Computing Engineering or Software Engineering, but there is no degree in Programming. Programming is taught in universities, but generally only in the first few courses of a degree. Yet, there are degrees in Communication, Fine Art, Architecture, Music or Dance. While a degree in Computer Science or Software Engineer can make you a better programmer, the fact remains that your professors are not expert practitioners.

How can we fix this? I have this secret dream of setting up the equivalent of “Creative Writing” program, but for programmers. Call it “Creative Programming”. Basically, students would come together to write great code. Yes, such code might be useful commercially, but that would be a secondary consideration. The pursuit of greatness would be the only goal that matters. It would treat programming as a bona fide language. It would attract the best programmers as guest lecturers. Would this ever work out? I do not know.

I am sure that many will point out that my secret dream is impractical. Beauty should not come first : we want cheap, reliable, maintainable code. We also want programmers to be replaceable, inexpensive and practical. However, human beings can both pursue greatness while being practical. Compromise is possible.

Let me conclude by quoting Donald Knuth:

(…) computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty. A programmer who subconsciously views himself as an artist will enjoy what he does and will do it better.

Further reading: The best software developers are great at Mathematics? and Is programming “technical”?

Who the heck got Universities into the email business?

My current employer, UQAM, refuses to allow email forwarding. Students would rather forward their emails to their existing GMail accounts, for example. And the IT Department (the SITEL) agrees that it would have several benefits. However, they refuse to allow it for the following reasons:

  • Email forwarding may create infinite email loops. These may disrupt services and require human intervention.
  • Invalid or failing remote servers may saturate the local servers as they are unable to forward the emails.
  • Professors and management send confidential information by email. Yet, without full control of the email service, the University cannot ensure the needed confidentiality.
  • With email forwarding, it may be impossible to ensure and prove that an email was received and read. Thus, homework assignments, administrative inquiries or security advisories may never reach the students, or we may be unable to prove that they reach the students because of email forwarding.
  • As a Canadian University, email forwarding puts us at risk that the emails may transit on American servers, where the Canadian law on privacy is not applicable.
  • Email forwarding may put students at risk if remote accounts are stolen or lost.

Can you help me debunk or mitigate these arguments? I know that some of these arguments are bogus, but I am looking for solid references. (Not that I expect to change their mind.)

A larger issue: shouldn’t universities stick with research and teaching? I understand that we must have networks, cables, computers, firewalls, but do we need to provide our students with email services?

Update: Turns out that our IT people encourage students who want forwarding to GMail (say) to use the POP3 protocol. It is unclear to me how email forwarding can be a dangerous practice whereas POP3 “forwarding” can be safe.

Is programming “technical”?

According to student evaluations, most of my students appreciate short programming assignments. Yet, every year, some students think that programming is below them or unimportant.

Maybe I should start my courses with this theorem:

Theorem. If you understand an idea, you can implement it in software.

There is no denying that programming requires a lot of technical knowledge. Most programmers do technical jobs, involving testing, building or refactoring code. But programming is ultimately a communication form. And it is as noble as Mathematics or English. Let us compare:

  • Writers are considered sexy and non-technical people. Yet, grammar and spelling are technical. Moreover, most writers earn a living by writing ads for boring products. Some of them make a living with grand novels, but fewer than you think.
  • Physicists are great thinkers. Yet, their mathematical derivations are often mind-numbing and technical. Many physicists spend years running extremely technical experiments. And when they don’t, they program extremely complex (and technical) simulations.

For some reason, being a writer is somehow considered more prestigious than being a programmer. If you ask me, Linus Torvalds is every bit as cool J. K. Rowling. And I’d rather have a lunch date with Linus.

Most common questions about recommender systems…

I get ten to fifteen questions a week on recommender systems from entrepreneurs and engineers. Sometimes, I help people find their way in the literature. On occasion—for a consulting fee—I get my hands dirty and evaluate, design or code specific algorithms.  But mostly, I answer the same questions again and again:

1. How much data do I need?

Given your data, you can use cross-validation or A/B testing to measure objectively the effectiveness of a recommender system.

2. We have this system in place, how do we know whether it is sane?

See previous question.

3. My online recommender system is slow!

Laziness is your friend: don’t recompute the recommendations each time you have new data.

4. My customers don’t like the recommendations!

  • Keep expectations in check: recommending products is difficult and even human beings have trouble doing it,
  • Explain the recommendations: nobody trusts a black box,
  • Allow your users to freely explore your data and products in convenient and exciting ways.

5. Which algorithm is best?

You should start with simple algorithms: it worked well enough for Amazon. To do better, a mix of different algorithms is probably best. You can combine them using ensemble methods.

The best software developers are great at Mathematics?

One of the upsides of working for a university are the stimulating academic discussions. Yesterday, a philosopher challenged me a question:

Beyond the fact that software is expressed in Mathematics artefacts (bits, algorithms), are Information Systems fundamentally Mathematical?

For my convenience, I temporarily rephrase the question to something simpler and more concrete:

How are Software Developers limited by their mathematical weaknesses?

I plan several blog posts around this question, but let me start with an example.

A common and powerful language to process XML is XPath. XPath is used within web applications, scripts, databases, and so on. I often ask students the following question about XPath. Are these two expressions equivalent?

$x="some string"

and

not($x!="some string").

(The symbol “!=” means “different from”.)

Invariably, most students conclude that they are equivalent. Wrong!

Let us examine the semantics.

  • The expression $x="some string" means that at least one element of $x is equal to "some string".
  • The expression $x!="some string" means that some element of $x is different from "some string".
  • The negation of $x!="some string" is that all elements of $x are equal to "some string". (Sorry if it sounds confusing.)

Thus, the expression not($x!="some string") is a  more restrictive condition than the expression $x="some string".

Great software developers routinely think through far more complex mathematical problems. Yet, they do not think of them as being Mathematics.

Open Sourcing your software hurts your competitiveness as a researcher?

Almost all software I write for my research is open sourced. Some fellow researcher argued today that I risk reducing the gap between and my pursuers. Similarly, I should keep my data to myself (and avoid listing good sources of research data).

Here is my take on this issue.

  1. Sharing can’t hurt the small fish. Almost nobody sets out to beat Daniel Lemire at some conference next year. I have no pursuer. And guess what? You probably don’t. But if you do, you are probably doing quite well already, so stop worrying. Yes, yes, they will give you a grant even if you don’t actively sabotage your competitors. Relax already!
  2. Sharing your code makes you more convincing. By making your work easier to reproduce, you are instantly more credible. Trust is important in science. Why would anyone trust that I actually wrote the code and ran the experiments? Because I published my code, that’s why!
  3. Source code helps spread your ideas faster. On the long run, you should not care about getting papers accepted at some hot conference. What matters is the impact you have had. Make it easy for me to use your ideas! Help yourself!
  4. Sharing raises your profile in industry. Having open source software makes your more attractive to software engineers.
  5. You write better software if you share it. While not all code I publish is bug-free, documented or even usable, I care slightly more about my code because I publish it.

Finally, does sharing code works? Do people download and use my software? Here are download statistics for my latest source-code publications:

A compressed alternative to the Java BitSet class over 280 downloads
Rolling Hash C++ Library over 200 downloads
Lemur Bitmap Index C++ Library over 2 000 downloads
Fast Nearest-Neighbor Retrieval under the Dynamic Time Warping over 1400 downloads

Related reading: Good prototyping software and The challenge of doing good experimental work by Suresh Venkatasubramanian. And More on algorithms and implementation by Michael Mitzenmacher.

Next Page »

17 queries. 0.411 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.