Best Possible Way to GET/PUT an XML File?

After seeking a good GTD-compliant to-do list manager, I finally designed my own. I seek a tool that lets me:

  • backup the data
  • does not suffer from vendor lock-in
  • keeps stuff confidential (sorry, you can’t know what I have to do today)
  • will not lose or corrupt my data (ever).

Among the tools I have reviewed is Chandler which looks good but is still in alpha. Actiontastic is very nice, but is not GTD-compliant in my opinion and it is hard to tell what the license is. What’s next is really brilliant, but it is not GTD-compliant and comes with its own Web server which is a bit odd.

Then you have dcubed or MonkeyGTD, TiddlyWiki-based solutions. It is very nice and there is no doubt some people will like it, but I never could get used to TiddlyWiki and I distrust it.

Really, the best application I found so far is PHP-GTD but the developers are not hacking it fast enough and they seem to have a case of spaghetti code based on how slowly they come up with new versions.

What I did is actually pretty sweet. I simply fill out an XML that looks like this:


<goal title="stay alive" category="personal">
<nextaction title="stop the fire in my kitchen" />
<action title="go get some milk" tickle="2008-12-12" />
<someday title="go on a diet" />
</goal>

In any case, you see the idea. My application supports deadlines, goals, actions, next actions, ticklers, lists, descriptions, some-day projects and so on. I can easily extend it (recall what the X of XML stands for!).

The XML file is linked to an XSLT file. This XSLT file (executed by the browser) generates HTML which, thanks to ECMAScript, allows me to navigate through the data fully. As far as I can tell, I support many of the same views as an application like php-gtd, except that my application is a thousand times faster and I have ten times less code. Everything is in XML and in this instance, it does make things so much better. I do not even want to think about designing a database schema for this data.

So, what is the problem? Well, I can happily edit an XML file, but before I release this software to the world, and I think it has value even though I only took one evening to write it, I need a user-friendly way to edit the data. It won’t do to have people edit an XML file by hand. It is easy enough for me to include, through ECMAScript, so way to add actions and stuff. However, how and where do I save the data?

There is no browser-oblivious way for an HTML page, even a local HTML page, to modify a local XML file. This probably means that I need some kind of server-side companion to my XSLT/ECMAScript application. Of course, it appears that TiddlyWiki manages to store its own data in an HTML file, but I am not certain I trust this sort of mechanism: I would always be afraid to have unsaved data. Google Gear is browser-specific (won’t work with Camino, Safari, Konqueror, and so on). It is fine and sweet to build Firefox-only applications, but that’s eventually as bad as writing Internet Explorer-only applications. I do not consider a browser-specific application to be a Web application.

What I need is brutally simple. I only need a server-side application that will allow me to retrieve the file (GET) and then to replace it with another one (PUT or POST) after the user has edited. I say POST because I toy with the idea of having version control: instead of replacing the existing file, edits would be reversible.

So, security issues aside, I think I only need a server-side application that’s really very, very simple. Maybe ten lines of Perl or Python.

I searched, but I can’t find any discussion on the best possible way to do something so simple. Naturally, my goal here is to keep things so incredibly simple that can pick up my application and build their own variants.

Anyone can help me?

CIKM 2007 accepted papers

The list of CIKM 2007 accepted papers is out. Thanks again to Owen for pointing this out to me.

Some papers that caught my eye…

  • Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine Plaisant, Discovering interesting usage patterns in text collections: integrating text mining with visualization (looks like a HCI paper)
  • Akihiro Inokuchi, Koichi Takeda, An Online Analytical Processing of Text Data (no preprint to be found, anyone has a copy?)
  • Stefan Buettcher, Charles Clarke, Index Compression is Good, Especially for Random Access (no preprint, but I like Stefan’s work
  • Fianny Mingfei Jiang; Jian Pei; Ada Wai-chee Fu, IX-Cubes: Iceberg Cubes for Data Warehousing and OLAP on XML Data (no preprint to be found, but I like icebergs)

Maybe you don’t

I have been looking for a written-down version of this deep-sounding quote from my favorite TV show (Battlestar Galactica) ever since I heard it, and I finally found it, on IMDB. It was uttered in the episode Resurrection Ship: Part 2 in 2006. Here it is:

Lt. Sharon ‘Boomer’ Valerii: [Adama asks Sharon why the Cylons hate humanity so much] I don’t know if ‘hate’ is the right word… it’s like you said at the ceremony… you said something that sounded like it wasn’t the speech you had prepared. You said, ‘Man never asked itself why it should survive.’ Maybe you don’t.

Context: the cylons, these advanced AI machines initially created by man, wiped out humanity without any apparent reason.

VLDB 2007 accepted papers

Owen pointed out to me that the list of VLDB 2007 accepted papers is available. On a first pass, here are some papers that caught my attention:

Notice that all these papers are available for the authors’ home pages. The days when a researcher could afford not to post electronic documents are over.

A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP

I worked hard this year with Kamel Aouiche on a paper called A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP and we just learned it was accepted at the ACM workshop on data warehousing and OLAP (DOLAP 2007) with good reviews. I think this is a very solid paper.

The view-size estimation problem is just an instance of the cardinality estimation problem: given a large stream of values, estimate how many distinct values there are, using very little memory. There has been lots of work done on this topic in theoretical computer science. In data warehousing, people (at least in academia) commonly use probabilistic counting to solve this problem. Fancier techniques were recently proposed as well.

In practice, we found that a technique due to Cai et al. (2005), we call it adaptive counting, worked best. It is a small variant on probabilistic counting that seems to do away with its main drawback (its instability when you hit medium-size views) while being very scalable (you can throw lots of memory at the problem without slowing down the processing).

Is the cosine similarity transitive?

A simple enough similarity measure is the cosine similarity measure. It is used often in Information Retrieval and it works well. It is also quite simple: cos(v,w)=<v/|v|,w/|w|>.

Clearly, it is reflexive (cos(v,v)=1) and symmetric (cos(v,w)=cos(w,v)). But it is also transitive: if cos(v,w) is near 1, and cos(w,z) is near 1, then cos(v,z) is near 1.

Can you prove transitivity?

I do have a hastily-derived inequality, but I want to know if anyone can better me. (Not hard.)

(Yes, I am looking for a two-liner.)

How to manage email (Inbox Zero)

I have totally blown up the blog categories on this Web site. Sorry, I can’t maintain a taxonomy on the long run. To me, a taxonomy is useful as a screen shot of your mind. But it is not a valid, long term, knowledge management system. I will never sort emails or blog posts into folders ever again. The era where we tried to copy the paper folders in the digital world is over. Digital knowledge management is finally here, and this video by Merlin Mann is an important piece in training you to it.

Finally giving up on PDAs

I have been one of the early adopters of PDAs. I had a pocket computer always with me circa 1985. I have been a PalmOS user for about 7 years now. But the sorry state of the market, the much improved free online offerings (such as Google Calendar), and the wider availability of WiFi make PDAs less attractive.

I figured out that I can safely have roaming access to my calendar, mail, todos, and so on, at a tiny cost, online. I nearly never work without a computer, and if any computer is good enough to give me access to my data, why carry a PDA?

Well, there is one need that online applications cannot fill. Sometimes, I really just want to jot down an idea, or an appointment, and I do not have a computer with me. Oddly, I am often at a meeting and there is no WiFi available. Or sometimes, I have a nice work-related idea, but I am shopping for milk. Or sometimes, I am at work, and I recall that I must go shop for milk. But for jotting down something in a hurry, paper does the job. I will go back in time to 20 years ago when I used to carry a small (paper) notebook in my pocket. Paper is cheap, quick, efficient, it does not need to be recharged, and so on.

Ah! But here is the real twist. I will not manage my data with paper. Paper is the expendable part of my system. You just collect the random information on paper, and then you later sort it out in digital form where it remains in a more permanent and manageable form. Have you ever tried creating a backup of a paper calendar? What about searching for a meeting by keywords (did I have ever meet with John Borzak?)?

I am already an avid user of online applications including wikis, blogs, Web-based mail client and calendaring applications (Google’s), version control systems (subversion), and so on. This has already allowed me to be a roaming computer user: I can just go from my home computer, to my laptop, to my office computer, and always have all of my data at hand. The nice thing about such a setting is that you also do away with the need for backups, since you are constantly backing up remotely your data as part of the process.

In any case, I’ll let you know how my new paper-based approach goes. Give me a few weeks.

PDFView is dead, vive Skim!

PDFView, my trusty MacOS PDF viewer is dead. But fortunately, Skim comes to the rescue. Skim has pretty much the same features as PDFView. For example, you can tell it to automatically reload a PDF file when it changed on disk, which is a needed feature if you are going to use LaTeX seriously. Also, though this is not without flaws, Skim allows you to annotate PDF files in fancy ways: to do so, it cheats since it does not actually modify the PDF file itself, but rather the file’s metadata… which means that if you share the PDF file, by email, the annotations will silently disappear. Fortunately, there are ways around it.

I have never understood why Adobe and others make it so difficult to annotate PDF files. This is really an historical mistake.

Computing the Hamming distance between two strings in Java?

Odd. I was looking this morning for some Java code to compute the Hamming distance between any two strings in Java, and could not find it. There are plenty of code samples for the Hamming distance between integers, but I am really looking for something that can process String objects.

Anyone knows where I could find this?

Update. Yes, I know it is not difficult. I am really looking for a Java two-liner.

« Previous PageNext Page »

19 queries. 0.425 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.