Thursday, November 25th, 2004

Computing argmax fast in Python

Filed under: — Daniel Lemire @ 18:03

Python doesn’t come with an argmax function built-in. That is, a function that tells you where a maximum is in an array. And I keep needing to write my own argmax function so I’m getting better and better at it. Here’s the best I could come up with so far:

from itertools import izip
argmax = lambda array: max(izip(array, xrange(len(array))))[1]

The really nice thing about izip and xrange is that they don’t actually output arrays, but only lazy iterators. You also have plenty of similar functions such as imap or ifilter. Very neat.

Here’s a challenge to you: can you do better? That is, can you write a faster, less memory hungry function? If not, did I find the optimal solution? If so, do I get some kind of prize?

Next, suppose you want to find argmax, but excluding some set of bad indexes, you can do it this way…

from itertools import izip
argmaxbad = lambda array,badindexes: max(ifilter(lambda t: t[1] not in badindexes,izip(array, xrange(len(array)))))[1]

Python is amazingly easy.

As a side-note, you can also do intersections and union of sets in Python in a similar (functional) spirit:
def intersection(set1,set2): return filter(lambda s:s in set2,set1)
def union(set1, set2): return set1 + filter(lambda s:s not in set1, set2)

Update: you can do the same things with hash tables:
max(izip(hashtable[max].itervalues(), hashtable[max].iterkeys()))[1]

Tuesday, November 23rd, 2004

Aaron Straup Cope’s NYTimes Widgets

Filed under: — Daniel Lemire @ 10:02

One of the most interesting talk we had at SWIG’04 was “Design Issues and Technical Challenges Making the Eatdrinkfeelgood Markup Language RDF” where Aaron showed why it was hard to use RDF in a XML project. I think it all boils down to the fact that we have no good widespread way of serializing RDF to XML. In any case, Aaron finally sent me a link to his NYTimes Widgets.

It lacks sufficient documentation for me to grok it quickly, but from what I understand, Aaron tried to create a useful and innovative RDF application. Here’s what he says about his widgets:

The New York Times includes a large amount of topical metadata with each article it publishes. These are widgets that, having harvested the data, try to do something interesting with it.

Saturday, November 20th, 2004

Good software engineering according to Paul Graham

Filed under: — Daniel Lemire @ 10:27

Paul Graham describes what good software developers do:

In software, paradoxical as it sounds, good craftsmanship means working fast. If you work slowly and meticulously, you merely end up with a very fine implementation of your initial, mistaken idea. Working slowly and meticulously is premature optimization. Better to get a prototype done fast, and see what new ideas it gives you.

Thursday, November 18th, 2004

Globalization and the American IT Worker

Filed under: Science and Technology — Daniel Lemire @ 18:42

Norman Matloff wrote a solid paper called Globalization and the American IT Worker, published in the latest issue (Nov. 2004) of Communications of the ACM. Here’s a rather bleak quote:

University computer science departments must be
honest with students regarding career opportunities
in the field. The reduction in programming jobs
open to U.S. citizens and green card holders is per-
manent, not just a dip in the business cycle. Students
who want technological work must have less of a
mindset on programming and put more effort into
understanding computer systems in preparation for
jobs not easily offshored (such as system and data-
base administrators). For instance, how many gradu-
ates can give a cogent explanation of how an OS
boots up?

RSS is the Semantic Web

Filed under: — Daniel Lemire @ 16:30

Here’s what Stephen Downes has to say about the Semantic Web:

RSS is the semantic web. It is not the official semantic web as I said, it is not sanctioned by any standards body or organization whatsoever. But RSS is what has emerged as the de facto description of online content, used by more than four million sites already worldwide, used to describe not only resources, but people, places, objects, calendar entries, and in my way of thinking, learning resources and learning objects.

What makes RSS work is that it approaches search a lot more like Google and a lot less like the Federated search described above. Metadata moves freely about the internet, is aggregated not by one but by many sources, is recombined, and fed forward. RSS is now used to describe the content of blogs, and when aggregated, is the combining of blog posts into new and novel forms. Sites like Technorati and Bloglines, Popdex and Blog Digger are just exploring this potential. RSS is the new syntax, and the people using it have found a voice.

Yuhong Yan’s Home Page

Filed under: Academia/Research — Daniel Lemire @ 16:25

My ex-colleague Yuhong Yan has now taken her own domain name where she wants to publish her results. Here’s what she has to say…

Why do I use commercial service to host my web site? 1) to avoid the complexity and regulations if using NRC resources; 2) to bring the research to the real world in a faster and more controllable way. You are encouraged to use the information, software and send me your suggestions.

I never used by employers’ web hosting services myself, but I find it interesting to see that Yuhong is taking charge of her identity on the Web.

TOOL: The Open Opinion Layer

Filed under: — Daniel Lemire @ 8:22

Here’s an interesting paper by Hassan Masum, TOOL: The Open Opinion Layer. Here’s the abstract:

Shared opinions drive society: what we read, how we vote, and where we shop are all heavily influenced by the choices of others. However, the cost in time and money to systematically share opinions remains high, while the actual performance history of opinion generators is often not tracked.

This article explores the development of a distributed open opinion layer, which is given the generic name of TOOL. Similar to the evolution of network protocols as an underlying layer for many computational tasks, we suggest that TOOL has the potential to become a common substrate upon which many scientific, commercial, and social activities will be based.

Valuation decisions are ubiquitous in human interaction and thought itself. Incorporating information valuation into a computational layer will be as significant a step forward as our current communication and information retrieval layers.

« Previous PageNext Page »

44 queries. 1.476 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.