A journal that gets it

There is a special issue of JIME on Semantic Web for Education (as in “Learning Objects”). I picked it up from Downes‘ in one recent post.

Not only is the issue interesting, from what I could tell, it is a journal that gets it. First of all, reviews are on-line, for all to see. Don’t get me wrong: I don’t mind peer review. I cherish it. But I’ve gotten too many poorly written, poorly prepared reviews. I really wish reviews would go on-line when the paper is published. This way, it might give an incentive to the reviewer.

Plus, if a poor paper gets accepted, you can trace back the reasons why it was accepted…

It still doesn’t help if your paper gets rejected for bad reasons, but in such instances, you can go elsewhere with your paper. At the very least, you know that if the paper makes it, you’ll have an exciting review to go with it. Useful for you and the reader.

Update: other blogs have picked up this issue of JIME.

Met with Martin Brooks this afternoon

I had another crazy meeting with Martin Brooks. Martin is the closest thing to a mad scientist I’ve ever seen. But he is a friendly mad scientist. Right now, he is working with a Toronto company for the Japan World Fair. I could not google this event? Martin is convinced that broadband will change the world. He recently completed that Music Grid project which allowed people from all over the world to teach each other about music, including kids in very remote locations where they’d never could have had any music lessons.

They are setting up a very fancy videoconferencing station with motion sensors and all, in UQAM’s Président-Kennedy building (near Place-des-Arts). As far as I can tell you can just go there and stop by and look at the gear. It is pretty amazing, I was there this afternoon. They are connected live with France, to some museum over there. You can go on a stand and the system will not only record your image and voice, but you can point your finger and it will detect the motion and you can use your hands to… well, I couldn’t quite get the full picture… but it looks amazing. This seems related to what some UQAM people do in arts with kids this summer. Sorry, didn’t get all the details. But quite possibly, some like Montreal kid will get to play with some broadband equipment I can only dream about.

Actually Martin wanted to tell me about his latest work in monotonicity preserving simplification and not about broadband. He came up with the following statement: if you interpolate any real data in such a way that you don’t add any extrema, you get a function which has way too much complexity for any sensible modelling. He gets to this result by interpolating data such as images, and then he computes contour lines using some crazy lisp code on a very old Mac laptop (did I mention he was like a mad scientist?) and he gets extremely complex contour lines… much more so than you’d expect. Interesting.

Had lunch with Anna again

I had lunch with Anna again. We went for lunch at the Lotus Bleu restaurant. We chatted about the creative class theory by Richard Florida. Some of my colleagues will remember me going on about how we need to look at the gay index (how friendly a place is to gays) if we want to improve the economy. That’s because gays and creative people are attracted by the same things: nice cafés, exciting culture and music, free, accepting society, readily available services, and so on. Both I and Anna used to live in Fredericton. According to club zone, there are no gay or lesbian bar in Fredericton. Comparatively, Montréal has an entire gay village where both I and Anna lived. (I can’t speak for Anna, but no, I’m not gay.)

This doesn’t mean you can’t achieve great things anywhere in the world. Just that some of us are somewhat happier in a free and open and culturally rich place. Some of us also think that it leads to more creativity and more wealth.

Now, this theory leads me to think that Mr Bush is leading the USA to ruin. Will he get reelected?

Note: I was born in Drummondville. There is at least one gay bar there and I know a few lesbians and bisexual ladies there. The economy is doing well, I hear. Ah.

Is Python going bad? or The curse of unicode….

I’ve wasted a considerable amount of time in the last two days upgrading my RSS aggregate so that it will have better support for atom feeds. I use the feedparser library.

One thing that gets to me is how unintuitive unicode is under Python. For example, the following is a string…

t="éee"

Just copy this in your python interpreter, and it will work nicely. For example,


>>> t='éee'
>>> print t
�ee

However, for some reason, if I just type “t”, then it can’t print it properly…

>>> t
'xe9ee'

See how it is already confusing? (And we haven’t used unicode yet!)

Next, we can map this string to unicode…

r=unicode(t)

which has the following result…

>>> r=unicode(t)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
</stdin>

Ah… so it tries to interpret t as ascii… fair enough, we know it is “latin-1″ or “iso8859-1″. It is already quite strange that “print” knows what to do with my string, but nothing else in Python seems to know… so we do


>>> r=unicode(t,'latin-1')
>>> r
u'xe9ee'
>>> print r
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'xe9' in position 0: ordinal not in range(128)
</stdin>

because, see, you can’t print unicode to the string… but you can do the following…


>>> print r.encode('latin-1')
éee
>>> print r.encode('iso-8859-1')
éee

but also


>>> r.encode('latin-1')
'xe9ee'
>>> r.encode('iso-8859-1')
'xe9ee'

What is my beef?

  • If ‘print’ assumes ‘latin-1′ then shouldn’t everything else? Why is this not consistent? If it is unsafe to assume ‘latin-1′, then why does print do it?
  • The encode, decode thing is a mess. We had a perfectly valid construct for converting things to strings, and that’s ‘str’. Now, we have a new one called ‘encode’. So that, given some unicode, I can do either t.encode(‘ascii’) or str(t) for the same result. Bad. Now, I’m stuck forever in a world where I have to figure out whether I encode or decode a string, and which is which. This is hard. This is confusing.
  • A string object should know its encoding so I don’t have to. What happens if I receive a string from some library and I need to convert it to unicode? How am I supposed to know what the encoding of the string is? There is no sensible way to communicate this right now which makes debugging a pain. The only excuse I see is that sometimes it is impossible for python to know the encoding… well, then it should just fail and require the programmer to specify the encoding. There are way too many things that can go wrong when you expect the programmer to keep tracks of his strings and which is encoded how…

How to be a great scientist

Two links that are very invaluable to researchers who want to know how to succeed…

The second one was found by Seb.

Qualities for a good Ph.D. supervisor

Offline someone commented that more than half of the Ph.D. students are foreigners and that the Ph.D. is serving as a funding source. True. But that’s somewhat of a cynical view if you ask me.

In any case, you are a young student, and despite reading my blog, you still want to do a Ph.D. Maybe because you come from a Third World country and getting some cash to study is a compelling idea on its own. But whatever your reasons, here’s what you should be looking for, I humbly suggest…

  • Look at the past projects and students. Did all the students this prof. supervised ended up on welfare? Or can you google them as Harvard professors now? Can you find traces of the past projects this prof. was involved with or did they all fail? Look beyong the fanfare: look for evidence the prof. can’t control easily. Google past students.
  • Is the prof. aware of what the world is like, right now? Does he know the employment rate and career possibilities for young Ph.D.s or does he just pretend he knows? Where does he gets his facts from if he has any?
  • Does he give you the full story, with the pros and cons of doing a Ph.D. with him? Pros and cons of the research life?
  • Does he need to consume graduate students to get his research going or is the training of students only tangential to his research? In other words, can the guy still do research without students or are students cheap labour?

Selling your services as a scientific paper writer?

Nice post on Critical Mass today about a researcher who sold his services as co-author on eBay and actually got 50 bids and many phone calls. It would appear that many people, from industry to students, are willing to pay so they can produce high quality scientific content with their names on it.

I’m not sure it is an interesting line of work though. Most people don’t realize how expensive it is to write a good scientific journal article. Probably upward of $50K. It’d be difficult to sell 10 pages for $50K. Of course, there is other types of “research”, like journalistic research, when you get 10 pages for much less… but real science is awfully expensive.

What’s more interesting to me is the reasons why people where interested: “There’s this whole constellation of things they could get from it. They could get credentials. They would get the ability to have their questions actually answered.”

Why do I pick on this bit of news? Because I was actually offered jobs like this, and I always turned them down. I was offered money to write journal articles at least twice by totally different people. It was meant to promote a product or a service, in the end, or rather, give the product or service some credibility. I think this is misguided since there is an actual proper form for such publications: patents, technical reports or white papers.

In any case, it would actually be doable: sell your services as a scientist who publish papers to give credibility to products and services. It would be similar to a patent consultant, I guess, except that law is not so involved anymore. I found a lot of people everywhere think they have very unique ideas. They’d love them to be validated and have their ideas pushed in a very prestigious publication, just like having patents.

Writting papers is like taking pictures for Playboy. You look at beauty most of the time, and you have to capture the beauty… you have to make sure enough is being shown, but not too much. It is seen as a very romantic job where you are living a dream, but are, in fact, just doing your job. The only difference is that few people write papers attracting as many eye balls as Playboy pictures and most earn less money too.

The world is changing, and I’m there!

Tonight, I really feel like the world is changing.

The typical problem scientists and scholars in general have is that we need to be able to predict paradigms changes, or at least study them. But how can you know that things are changing while you are in it? Can humans study humans? Well, I’m not a social scientist, so I don’t have to worry, officially, with such issues… but all scholars are affected to some level by this paradox.

Well, I’ve been using inDiscover.net. Yes, I’m linking to my own project, well, it isn’t my project, but I’m involved from the side. So, it is self-promotion. Fine. Still, using inDiscover.net has made me realized how the world has changed. A bit like when Stephen Downes worked on his MuniMall portal project and, while the project was a failure, he realized that the world was changing and he embarked on a mission (see his value statement on his site).

Look on the right-hand-side of my blog, you should see my current playlist from inDiscover.net. All of this music is free. It is out there. You can download the MP3s and listen to the same music I listen to. No matter where you are in the world. You can then share your playlist with the world. You can have my playlist, live, as XML, that you can incorporate in any application, any web site.

I hope to write later on why I think this is a paradigm shift. We are beyond the world of blogs, beyond the Web… this is deep. I think it will eventually change society all the way.

Ok, I’m making many claims here… I need to write this up, but it is late…

WWW 2004

Ok, I’m not attending WWW 2004, but Elliotte Rusty Harold (ERH) does and he reports very well on what he sees and hear. Anna and I are really kicking ourselves for not going to WWW2004… but this is a long story. Damn.

Data storage issues?

According to ERH, Rick Rashid who I think is head of Microsoft Research, asked an interesting question… what happens when everyone has 1 TeraByte of storage. It now costs US$1000 to have that much storage, but it will be dirt cheap soon enough.

First, how much is 1 TeraByte… I think it is 1024 GB… Ok… How many 1MB picture can I store… that’s 1024*1024… so about a million pictures… over a year… that’s about a picture every 30 seconds. So, yes, that’s a lot.

I’m reminded of this cool interview with Jim Gray, also from Microsoft Research… and a Turing award recipient… and a model for me as a researcher… Jim wrote this…

Today disk-capacity growth continues at this blistering rate, maybe a little slower. But disk access, which is to say, “Move the disk arm to the right cylinder and rotate the disk to the right block,” has improved about tenfold. The rotation speed has gone up from 3,000 to 15,000 RPM, and the access times have gone from 50 milliseconds down to 5 milliseconds. That’s a factor of 10. Bandwidth has improved about 40-fold, from 1 megabyte per second to 40 megabytes per second. Access times are improving about 7 to 10 percent per year. Meanwhile, densities have been improving at 100 percent per year.

So, we have to be careful: yes, we’ll all have 1TB of storage soon. In fact, the geeks like myself will soon have far more, I think. Just because I think I always had slightly more storage and CPU power than the average Joe (not much, but a little more). However, this unlimited storage world brings its own flock of problems. If you can record a million 1MB pictures a year… what if you want to trace back something in this giant collection of pictures? You can’t possibly look at all of them, it would be boring… plus, your computer might not have the bandwidth to display all of this in a very short time, even if it were useful.

Other people have blogged about this because it has other consequences. For the last few years, for example, I’ve been storing all of my important files, and their revision history, using CVS, on daniel-lemire.com. I can afford this because my host has affordable storage. This means that in several years, I’ll be able to go back in time and precisely see what I had in terms of research results on a given day. This is not exactly new: experimental researchers keep lab books. However, this is electronic data… which means that software can go, parse it, analyse it. Here’s a dream: a piece of software will go over my logs, learn how I do research and assist me in some way.

Is there anything solid to Semantic Web?

Here’s an interesting quote from ERH:

RDF and OWL enable standard, machine interpretable semantics. XML enables only syntax. Or at least so he claims. I agree about XML, but so far I’ve yet to see evidence that there’s more semantics in RDF/OWL than in plain XML. There’s a huge number of big words being tossed around this morning (business inferencing, .NET tier, dynamic applications, reclassify corporate data, proprietary metadata markup, “align the semantics of federated distributed sources”, “rich, automatic, service orchestration”, etc.) which mostly seems to obscure the fact that none of this does anything.

or

They’re investing in and exploring a broad range of applications: Semantic blogging, information portals, and SMILE Joint. He’s describing several application areas, but again I don’t see how RDF/OWL have anything to do with what he’s talking about. I realized part of the problem: no one is showing any code. I feel like I’m a mechanical engineer in 1904 listening to a bunch of other engineers talks about airplanes, but nobody’s willing to show me how they actually expect to get their flying machines into the air. Maybe they can do it, but I won’t believe it until I see a plane in the air, and even then I really want to take the machine apart before I believe it isn’t a disguised hot air balloon. A lot of what I’m hearing this morning sounds like it could float a few balloons.

I think that ERH think exactly like I think. You want to semantically mark up something using XML, you do this…

<team>
<player>joe</player>
<player>jill</player>
</team>

I’ve seen no evidence that there is any thing more out there you can do. Oh… there are extensive specifications, nice diagrams, but like ERH says, I haven’t seen the plane fly yet, just engineers talking about it.

Of course, I should point to this article which also goes in the same direction. And I’m currently talking with a new colleague, Jean Robillard who has his own arguments.

Had lunch with Anna

I had lunch with Anna today. Anna is the computational linguist I designed the slope one collaborative filtering algorithm with. What is slope one? No published paper yet, but that’s the algorithm Sean implemented in PHP/SQL for indiscover.net. It is a very neat algorithm and we hope to have a paper out on it soon!

Anna has a Ph.D. in linguistics from McGill… in Montr�al… the city we both love… She recently left a researcher job at NRC, she was my neighbour, for a job in some cool Montréal start-up. This start-up seems soooo cool. They might just change the world… well, a tiny bit of it. Of course, Anna couldn’t tell me more than what’s public knowledge, but it sounded so cool.

Unavoidable subject: Ph.D.s leaving pure research for industry and doing so totally on purpose. Anna chose to go back to industry. That’s a concept most tenured or half-tenured scholars don’t understand… But watch it! Anna is one hell of scholar, wherever she is…

We talked about the fact that academia doesn’t have words to describe scholars who go outside of academia… it is like… like they fall off the world.

Note: Some might object that NRC is not academia. And they’d be right… but somehow, NRC manages to walk a fine line and remain mostly an academic creature while being a giant government lab. So the difference is more of a nuance, really.

After meeting Seb last week in Montréal, now Anna… I’m having so much fun! The next one I want to see in Montréal is Harold: quit chatting with Tim Berners-Lee and come by Montréal. I might not be as smart as Tim, but I know some cool cafés Tim doesn’t know! (Explanation: Harold is attending WWW2004, and I think he is in a workshop together with Tim so he’ll probably talk to him. Harold is famous, you know. You get that way after working 16 hours a day for the last 20 years.)

Next Page »

18 queries. 0.540 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.