itemgetter, attrgetters and keyword argument key

One thing which consistently amazes me is python’s flexibility and ability to concisely do stuff. Case in point – suppose you have a list of tuples like [(1,2), (3,4), (5,0)]

The task is to find the tuple which has the min value in 2nd position.

import sys
list_of_tuples = [(1,2), (3,4), (5,0)]
min_tuple = None
minimum = sys.maxint
for pair in list_of_tuples:
    x,y = pair
    if y < minimum:
        min_tuple = pair
print min_tuple

Sure, this works, but too verbose, difficult to maintain etc. They say that when in Rome, do as the Romans do. Let's do it the python way.

def snd(pair):
    x,y = pair
    return y

list_of_tuples = [(1,2), (3,4), (5,0)]
min(list_of_tuples, key=snd)

Nice, this code looks much better. Now the intent is clear. The function snd takes out the second element, and that element is used for comparison.

Using Itemgetter

operator.itemgetter is a really cool thing to have. It can be used to get a particular item from a sequence.

The usage of itemgetter is not really obvious, here's an example.

import operator
list__ = [1,2,3]
print operator.itemgetter(1)(list__) ## prints 2

What's going on here? itemgetter(position) returns a callable, that takes a sequence, and returns the value at position in sequence. itemgetter could be roughly implemented as

def __itemgetter(position):
    def applier(sequence):
        return sequence.__getitem__(position)
    return applier

Note: of course this works only for a single element, whereas operator.itemgetter can take a series of elements (or even slices) and __getitem__ gets applied for all of them - that part is omitted for simplicity. An equivalent source is provided in documentation - so interested readers can go through that.

So what happens? the outer function takes a position, and then the inner function takes the sequence and closes position in it. Now the inner function is returned, which, of course is a callable.

Lets see how this is useful in our original problem

import operator

list_of_tuples = [(1,2), (3,4), (5,0)]
min(list_of_tuples, key=operator.itemgetter(1))

We specify the key function as operator.itemgetter(1), which returns a callable, which then gets applied to every element of list_of_tuples. Pretty nifty eh?

Using Attrgetter

operator.attrgetter works more or less similarly to itemgetter, except that it looks up an attribute instead of an index.
Suppose you have a class which goes like

class Student(object):
    def __init__(self, id, name, marks):
        self.id = id
        self.name = name
        self.marks = marks
    
    def __str__(self):
        return '%s has marks %s' %(self.name, self.marks)

And say we have a list of Student instances, named students
The objective is to find student with maximum marks. We can use max function here, which luckily supports the key parameter as well.

students = [ Student(0, 'Foo', 30), Student(1, 'Bar', 95), Student(2, 'Baz', 80)]

best_student = max(students, key=operator.attrgetter('marks')) # don't forget the quotes
print best_student

Tell me you're not impressed by python. attrgetter gets the value for the given attribute ( Note that it uses __getattr__ for introspection behind the scenes, so you need to pass a string ) and max can use that value as key for performing the comparison.

Remember, key can be used with max, min and sorted. Make use of it. Along with itemgetter and attrgetter, comparisons can be done very easily.

Standard

Python closures oddity

So today, I came across some weird code.

def foo(val):
    print val

lambda_list = [ lambda: foo(i) for i in xrange(3) ]
for lambda__ in lambda_list:
    lambda__()

Don't peek. Try to predict the output.

.
.
.
.
.
.
.
.

Surprising, isn't it? I was certainly surprised to see the output come out as

2
2
2

Why's this happening?

Well, it turns out that python closes on names, and not on values. What that means is, the value of i in our code is looked up only when the lambda__() function gets actually called within the for loop.

Solution?

1. Just pass in the value of i as a parameter to the function. Unfortunately, the problem is, we are calling the function as lambda__() without any parameters. But default arguments would work nicely in this case.

lambda_list = [ lambda i=i: foo(i) for i in xrange(3) ]

2. Use functools.partial to wrap the function and the argument in, and then call it later

lambda_list = [ functools.partial(foo, i) for i in xrange(3) ]

3. If you are a closeted Javascript programmer, you could also do this...

lambda_list = [ (lambda i: (lambda: foo(i)))(i) for i in xrange(3) ]  

But seriously, please don't do this, its terribly unreadable.

Standard

Using decorators with python

Decorators are quite often used in python code. It makes stuff lot more easier. Consider this awesome decorator in django.

@login_required
def some_view(request):
    # do stuff here

login_required is said to be a decorator, and some_view is said to be the decorated function. This decorator ensures that the user is logged in before this view is processed - otherwise they are redirected to another page ( In fact - the page to be redirected can be specified through the decorator itself )

How does this work?

Before seeing how decorators work, there are some stuff that you need to know.

Closures

Python supports the concept of closures. Consider the following.

def outer_function(args):
    def inner_function(more_args):
        # some stuff here
        # notice that both args and more_args 
        # are accessible here

        print args, more_args
        pass
    return inner_function

fun = outer_function(1)
fun(2) # prints 1 2

Everytime I call the function outer_function a new definition of inner_function is created and it is returned. The fun thing is, even though the scope of args is only within the definition of outer_function it is available when inner_function is executed.

That was confusing. Let me give a better, practical example.

Functional programming languages have a concept of partial functions. Lets make a partial function.

def partial(function, *args, **kwargs):
    def ret(*iargs, **ikwargs):
        return function(*(args+iargs), **dict(ikwargs, **kwargs))
    return ret

Now I'm going to create a partial function out of add, which takes two parameters and returns their sum.

def add(x,y):
    return x+y
>>> add(5,6)
11

The partial function I create will be called add5, and will take a single parameter, add that to 5 and return the sum. To create a partial function, we call the function partial as follows

>>> add5 = partial(add, 5) 
# the first parameter to be supplied to add is 5
>>> add5(6)
11
>>> add5(10)
20

Notice how the value we passed in partial gets "stored" somehow? This works because the returned function remembers all its variables in its scope and the enclosing scopes. This can be confirmed by printing the value of locals() within the returned function.

...  
      def ret(*iargs, **ikwargs):
        print locals()
        return function(*(args+iargs), **dict(ikwargs, **kwargs))
    return ret
...
>>> add5 = partial(add, 5)
>>> add5(6)
{'args': (5,), 'iargs': (6,), 'kwargs': {}, 'function': <function add at 0x92e1844>>, 'ikwargs': {}}
11

so clearly, the value passed to partial is accessible to the inner function. keep this in mind, we're going to use this concept in decorators.

Why decorate?

Decorating a function is often required for modifying the result of function itself, or for separating out stuff that is not related to the action performed by the function.

Let us see examples for both.

A very trivial example to consider will be, suppose you have a series of functions which returns a string. And the string thus returned has to be converted to upper cased. A decorator, uppercase, can be written as follows.

def uppercase(func):
    def ret(*args, **kwargs):
        return func(*args, **kwargs).upper()
    return ret
 

def string_function(string__):
    # do some stuff with string, and return it
    return string__

string_function = uppercase(string_function) # apply the decorator

What the heck? where's the @thing I've seen for decorators?

@uppercase
def string_function(..):
   ...

is equivalent to

def string_function(..):
   ...

string_function = uppercase(string_function)

So you see, @uppercase is just syntactic sugar for calling the function uppercase with the function to be decorated as the argument. Personally, whenever I see a decorator in @decorator pattern, I mentally parse it as func = decorator(func)

Lets see another example. This is a very widely used application of the decorators. We're going to write a memoize decorator.

Memoize? What the heck is that?

Memoization is a really cool technique which can be applied for pure functions. Pure functions - as you might know, always gives same output for the same input values - so its often useful to store the output for already computed inputs. When we get an input which is already computed, we can simply retrieve the output from the storage, instead of computing it all over again. Think of it as a cache.

Lets consider an example. Fibonacci series is a well known series. If you don't know what it is, you should check it out, its fun!

A naive recursion implementation to find nth item of Fibonacci series [ with n starting from 0 ]could be written as follows.

 def fib(n):
    if n == 0 or n == 1:
        return n
    else:
        return fib(n-1)+fib(n-2)

Lets run this for bigger numbers.

In [28]: %timeit fib(35)
1 loops, best of 3: 8.56 s per loop

Of course, you may get a different output, but that's incredibly long for me. Lets think of some ways to reduce the time.

fib(5) can be expressed as fib(3)+fib(4) which in turn can be expressed as fib(2)+fib(1)+fib(3)+fib(2). See how the pattern repeats - we need to calculate fib for lower values again and again.

Given that fib itself is a pure function, why not store the values as we compute? Lets apply memoization in this case.

A memoize decorator

Our strategy to attack this problem is to have a dictionary, with each key having value as function(key). As we compute newer function(key) values, we store an entry key: function(key) in the dictionary.

def memoize(function):
    cache = {}
    def compute(key):
        if key not in cache:
            cache[key] = function(key)
        return cache[key]
    return compute

Now, lets apply this memoize decorator to our fibonacci function

@memoize
def fib(n):
    ...

Time to run the tests again.

In [37]: %timeit fib(35)
1000000 loops, best of 3: 345 ns per loop

Yaaay! look at the improvement, from 8ish seconds, it has come down to 345 nanoseconds. So there you go, that's the kind of performance boost you can get by using memoization.

That's it for now.

Standard

Using *args (reloaded)

I had already written a post on using the asterisk operator [ splat operator anyone ? ] and the double asterisk operator. The problem was, my understanding of the concept wasn’t too clear. Here goes another attempt at it.

Asterisk operator ( * ) can be used in two ways.

1. Getting variable number of arguments into a function

Scenario : You are writing a simple function to chain together arbitrary number of lists to form a single list (by the way itertools.chain does exactly this, but returns a iterator instead). Here, you do not know in advance how many parameters are going to be passed.

def chain(*args):
    result = []
    for arg in args:
        result.extend(arg)
    return result


In  : chain([1,2,3], [4,5,6])
Out : [1,2,3,4,5,6]

What happens here is, all the arguments passed are wrapped into a tuple - args in our case. Now all we need to do is take out the arguments one by one from that tuple, using the loop and adding it to our result list.

2. Unpacking argument lists

In this case, if our arguments are wrapped up in a list or a tuple - the asterisk operator can be used to unpack them.

Scenario : You have a nested list and want to find the sum of each position in lists and create a new list. Add 1st element of list 1, list 2 and list 3, Add 2nd element and so on. [[1,2,3],[4,5,6],[7,8,9]] should provide output as [12,15,18]

To tackle this, we use the zip function, it creates tuples of elements from each corresponding position of the list.


In : zip([1,2,3],[4,5,6])
Out : [(1, 4), (2, 5), (3, 6)]

Clearly, zip takes a number of lists, not a nested list. But our input is a nested list. Asterisk operator to the rescue! . We can use the asterisk operator to unpack our nested list.

def inner_sum(nested_list):
    result = []
    for tup in zip(*nested_list):
        result.append(sum(tup))
    return result

In  : inner_sum([[1,2,3],[4,5,6],[7,8,9]])
Out : [12, 15, 18]


Fun fun fun

Scenario: Transposing a matrix ( essentially a nested list ). A very concise solution would be to use zip with asterisk operator, along with a list comprehension.


In  : matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In  : [ list(x) for x in zip(*matrix) ]
Out : [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

This is similar to how we worked out the inner_sum function, except that we create a new list out of a tuple.

Hopefully that clears up the asterisk operator's usage.

Standard

Python memory model – I

Coming from a C/C++ background, Python looked deceptively familiar for me – and I made a big mistake here. I assumed that python uses the same memory model as C. This is first of a series of posts to try and summarize in broad terms how python’s memory model works.

Btw, if you are a newbie, you should definitely check ipython out. It’s an awesome interpreter with a lot of useful features, like say tabbing for autocompletion etc. Fun thing to learn python with.

The very first thing is to NOT compare C with Python. They both are completely different languages with different idioms, so its best not to compare them.

Everything in python is an object – yes, everything. Numbers are objects. Strings are objects. Lists are objects. Classes are objects. And objects have an identity. Lets find out more about identity of objects.

In python, the id() function can be used for checking identity of an object.

In [1]: id(5)
Out[1]: 151765128

In [2]: id('string')
Out[2]: 3075578752L

In [3]: id([1,2,3])
Out[3]: 156305932

Objects have values too. The values of the object may or may not change ( more on that later ) but the identity of an object cannot change.

That wasn't so interesting... I know, I know. Well, my point is, in Python everything is an object and has an identity. Please re-read that last line - because this is important.

Lets move onto variables [ in python, it is referred to as names ]. In python, we'd do something like x = 5. What happens here is x gets mapped to the object which has the value 5.
Or, you could also say, x is bound to object having value 5.

x ----> [ object with value 5 ]

Everything's good till now? awesome. Lets talk more about values now.

In Python, objects can be of two types with respect to values. Mutable and immutable. If you can modify the object and make it have a new value, then the object is mutable and otherwise immutable (Note: There are some caveats - an immutable container may contain a mutable object, but the content of container itself cannot be modified )

Integers, Floats, strings, tuples etc are immutable in python - that is you cannot modify the value of the object.

Wait, what the heck, then how am I able to do the following?

In [1]: name = 5

In [2]: name = 6

In [3]: name
Out[3]: 6

I hear you ask, didn't I change the value of 'name' now?
Nope, you didn't.

lets view the operation in terms of names and objects.

name = 5
name ---------> [ Object with value 5 ]
============================================================================================
name = 6
name ----x [ Object with value 5 ]
|
|------------> [ Object with value 6 ]


Okay, I'm not so good with asciiart now. Well, the point is, 'name' gets rebound to object with value 6. So what you are doing is not changing the value of object, but constructing a new object with value 6.

Whee! I hope that helped. If you like to read long technical documents then this link from docs.python will be very useful for you. I'll be back with mutable objects in my next post.

Standard

Reversing list in Haskell

I’m on the 5th problem of 99 haskell problems. The first attempt went something like this

rev :: [a] -> [a]
rev (x:xs) = rev xs ++ x
rev [] = []

Let’s analyze this one. Assume we call the function as

rev [1,2,3]

rev (x:xs) matches 1: [2,3], so we have rev [2,3] ++ 1. Similarly, we’ll have rev[3] ++ 2 ++ 1 and then finally rev [] ++ 3 ++ 2 ++ 1. The whole thing can be shown as

step 1: rev [1,2,3]
step 2: rev [2,3] ++ 1
step 3: rev [3] ++ 2 ++ 1
step 4: rev [] ++ 3 ++ 2 ++ 1

This gives

[] ++ 3 ++ 2 ++ 1

which is

[3, 2, 1]

Okay, moving forward. Next attempt at writing the function involves usage of foldl. So, here we go

rev xs = foldl (\acc x -> x : acc) [] xs

The foldl function takes the accumulator and an element from the list. The element is prepended onto the accumulator. Which goes like

rev [1,2,3]
step 1: acc = [], 1 : acc
step 2: acc = [1], 2 : acc
step 3: acc = [2,1] , 3 : acc
step 4: acc = [3,2,1]

and thus, we get [3,2,1] as result from this.

Now, notice the lambda function. The parameters on left and right are flipped, so why not use the flip function and take advantage of currying?

rev xs = foldl (flip (:)) [] xs

And finally, using point free notation, we can write

rev = foldl (flip (:)) []

Pretty nifty, eh.

Standard

Fibonacci series in haskell

So I was tired of doing (boring) stuff, and all – so I decided to take up a new challenge, the Project Euler. To sweeten the deal, I’ve decided that I’d use only Haskell to solve them.

The second question is to generate the sum of even terms of Fibonacci series. I’m going to show how the series is generated , and thus the following is – not – the solution to the problem. I’ve assumed that the highest term generated should be less than 15000, in this snippet.

takeWhile (\(x, y) -> x < 15000) (iterate (\(x, y) -> (x+y, x)) (1, 0))

Let me break it down -

1. \  syntax is used to define a lambda. So we define two lambdas in this snippet, one for incrementing the series, and other for checking

2. Iterate function is used to generate newer terms from existing terms by using a function.

3. takeWhile is used to pick out elements from a list till the condition is satisfied.

Now, the iterate function produces an infinite list, which is bounded by using takeWhile function.

Test run:

Prelude> takeWhile (\(x,y) -> x < 15000) (iterate (\(x, y) -> (x+y, x)) (1, 0))

[(1,0),(1,1),(2,1),(3,2),(5,3),(8,5),(13,8),(21,13),(34,21),(55,34),(89,55),(144,89),(233,144),(377,233),(610,377),(987,610),(1597,987),(2584,1597),(4181,2584),(6765,4181),(10946,6765)]

As you can see, a list of tuples of fibonacci's series are generated.

That's it for now. Hopefully I've written something that makes sense ;)

PS: Interestingly, wordpress.com's source highlighting plugin does not support haskell - so if anybody is listening, please add it soon!

Standard