itemgetter, attrgetters and keyword argument key

One thing which consistently amazes me is python’s flexibility and ability to concisely do stuff. Case in point – suppose you have a list of tuples like [(1,2), (3,4), (5,0)]

The task is to find the tuple which has the min value in 2nd position.

import sys
list_of_tuples = [(1,2), (3,4), (5,0)]
min_tuple = None
minimum = sys.maxint
for pair in list_of_tuples:
    x,y = pair
    if y < minimum:
        min_tuple = pair
print min_tuple

Sure, this works, but too verbose, difficult to maintain etc. They say that when in Rome, do as the Romans do. Let’s do it the python way.

def snd(pair):
    x,y = pair
    return y

list_of_tuples = [(1,2), (3,4), (5,0)]
min(list_of_tuples, key=snd)

Nice, this code looks much better. Now the intent is clear. The function snd takes out the second element, and that element is used for comparison.

Using Itemgetter

operator.itemgetter is a really cool thing to have. It can be used to get a particular item from a sequence.

The usage of itemgetter is not really obvious, here’s an example.

import operator
list__ = [1,2,3]
print operator.itemgetter(1)(list__) ## prints 2

What’s going on here? itemgetter(position) returns a callable, that takes a sequence, and returns the value at position in sequence. itemgetter could be roughly implemented as

def __itemgetter(position):
    def applier(sequence):
        return sequence.__getitem__(position)
    return applier

Note: of course this works only for a single element, whereas operator.itemgetter can take a series of elements (or even slices) and __getitem__ gets applied for all of them – that part is omitted for simplicity. An equivalent source is provided in documentation – so interested readers can go through that.

So what happens? the outer function takes a position, and then the inner function takes the sequence and closes position in it. Now the inner function is returned, which, of course is a callable.

Lets see how this is useful in our original problem

import operator

list_of_tuples = [(1,2), (3,4), (5,0)]
min(list_of_tuples, key=operator.itemgetter(1))

We specify the key function as operator.itemgetter(1), which returns a callable, which then gets applied to every element of list_of_tuples. Pretty nifty eh?

Using Attrgetter

operator.attrgetter works more or less similarly to itemgetter, except that it looks up an attribute instead of an index.
Suppose you have a class which goes like

class Student(object):
    def __init__(self, id, name, marks):
        self.id = id
        self.name = name
        self.marks = marks
    
    def __str__(self):
        return '%s has marks %s' %(self.name, self.marks)

And say we have a list of Student instances, named students
The objective is to find student with maximum marks. We can use max function here, which luckily supports the key parameter as well.

students = [ Student(0, 'Foo', 30), Student(1, 'Bar', 95), Student(2, 'Baz', 80)]

best_student = max(students, key=operator.attrgetter('marks')) # don't forget the quotes
print best_student

Tell me you’re not impressed by python. attrgetter gets the value for the given attribute ( Note that it uses __getattr__ for introspection behind the scenes, so you need to pass a string ) and max can use that value as key for performing the comparison.

Remember, key can be used with max, min and sorted. Make use of it. Along with itemgetter and attrgetter, comparisons can be done very easily.

Advertisements
Standard

Python closures oddity

So today, I came across some weird code.

def foo(val):
    print val

lambda_list = [ lambda: foo(i) for i in xrange(3) ]
for lambda__ in lambda_list:
    lambda__()

Don’t peek. Try to predict the output.

.
.
.
.
.
.
.
.

Surprising, isn’t it? I was certainly surprised to see the output come out as

2
2
2

Why’s this happening?

Well, it turns out that python closes on names, and not on values. What that means is, the value of i in our code is looked up only when the lambda__() function gets actually called within the for loop.

Solution?

1. Just pass in the value of i as a parameter to the function. Unfortunately, the problem is, we are calling the function as lambda__() without any parameters. But default arguments would work nicely in this case.

lambda_list = [ lambda i=i: foo(i) for i in xrange(3) ]

2. Use functools.partial to wrap the function and the argument in, and then call it later

lambda_list = [ functools.partial(foo, i) for i in xrange(3) ]

3. If you are a closeted Javascript programmer, you could also do this…

lambda_list = [ (lambda i: (lambda: foo(i)))(i) for i in xrange(3) ]  

But seriously, please don’t do this, its terribly unreadable.

Standard