Easy Data Structures in Python

Python is a beautiful, easy-to-read language. It’s also (usually) easy to write, most of the time. For the most part, it’s simple to make sure that your classes work well with the language by implementing several “magic methods”, but this gets tedious to do for a few of them.

A nice annotation that Kotlin comes with is the data annotation, which, when used on a class, will implement the equals(), hashCode(), and toString() methods for it, based on the properties passed in through the constructor. I wanted to do something similar to this in Python, since implementing __eq__, __hash__, __str__, and __repr__ can be tedious. Also, including such method definitions make your class look less appealing (those double-unders are a bit unsightly).

Let’s look at how this can be done.

Descriptors

My first thought was to use descriptors, since I was reading about them (and finally getting to the point where I really understand them) when the idea came to me. It didn’t take too long for me to give up on that idea. Why would I need to make non-data descriptors when I could give a simple function?

Decorators and Monkey-Patching

I had wanted to use a decorator from the beginning, but its implementation changed fairly quickly once I decided against descriptors.

The decorator will need to know two things: 1) what class to modify and 2) which fields to base all of the method calculations on. Since, the class will be provided by the basic decorator call, I had to decide how I wanted to figure out what fields to use in the class. I briefly considered doing a search through the __dict__ or something else like that, but quickly dismissed it; there were too many chances to end up missing a field or including a field that the user didn’t want included. So, I decided to ask for it. The structure of the decorator looks like this:

def data(*field_names):
    def data_class(cls):
        # do stuff to the class
        return cls
    return data_class

data takes in a varargs of (supposedly) strings that are names of fields in the class (properties and other descriptors will work for this too). It then defines the actual decorator function and returns it to be used on the class.

The # do stuff… area can be filled with simple assignments to cls‘s __eq__, __hash__, __str__, and __repr__ methods, such as cls.__str__ = to_string(*field_names).

Now we need to define the functions that will provide the definitions of our methods. Let’s start with __eq__, shall we?

`eq`

First, we need to tell the produced __eq__ method what fields are to be used, and since we can’t put any new fields into the function signature (__eq__, which has a specific definition to follow to be used “magically” in Python, we need to provide the fields via a closure or class definition. Being a fairly simple definition, I decided to go with a closure. Here’s the start of it:

def equals(*field_names):
    def __eq__(self, other):
        # comparison logic goes here
    return __eq__

Now we need to implement the comparison logic. To do that, we’ll loop over the field_names, mapping them to the actual values of self and other, then comparing those values:

return all(getattr(self, field_name) == getattr(other, field_name) for field_name in field_names)

This works pretty well, but will raise an AttributeError if self or other doesn’t have the field. If self doesn’t have it, this is big problem and should raise an error, but, since other isn’t necessarily expected to be the same type of object, we should just return False if this happens.

Since inlining this check will be ugly, we’ll move the mapping functionality to a function:

return all(_fields_are_equal(self, other, field_name) for field_name in field_names)

and define the function thusly:

def _fields_are_equal(self, other, field_name):
    self_value = getattr(self, field_name)
    try:
        other_value = getattr(other, field_name)
    except AttributeError:
        return False
    return self_value == other_value

Now we have our definition for the equality checker. Add the following line to the class decorator:

cls.__eq__ = equals(*field_names)

An interesting side effect is that this definition will allow the class to be equal to a namedtuple with the same list of field names (assuming the values of those fields are the same). Personally, I’m glad of this, since the basic idea behind this is supply this functionality to simple data-based classes, which is mostly what a namedtuple is. For a little while, I considered doing a type check, but decided against that. It’s not particularly pythonic, and I actually like being the same as a namedtuple. You can change this, obviously.

`hash`

Again, our hashing function is going to need the field names, so it’ll be structured like this:

def hash_code(*field_names):
    def __hash__(self):
        # hashing code here
        return __hash__

Since our class will compare as equal to namedtuples with the same data, we should also have their hash code match, in order to fit with the agreement between __eq__ and __hash__ (things that compare equal should have the same hash value). So, let’s simply make a tuple of the fields and run the hash function of that:

return hash(tuple(getattr(self, field_name) for field_name in field_names)

This gives us the same hash code as what a namedtuple with the same field names would give us. Now, don’t forget to add this line to the decorator function:

cls.__hash__ = hash_code(*field_names)

`str` and `repr`

I grouped these two together because, usually, they’re the same. With the data class decorator, they’re always the same. So, let’s start with the same basic building block of a function:

def to_string(*field_names):
    def __str__(self):
        # da code
        return __str__

The format I’m going for here is a pretty typical format of ClassName(field1=value1, field2=value2). The first thing we need is the name of the class. That’s easy, we just get it off of self:

class_name = type(self).__name__

We’re going to need an opening paren right after, so let’s combine that right away. Replace the previous line with:

start = type(self).__name__ + ')'

Next we’ll have to go through each field, getting the field name, then an =, and then the value in that field. Let’s define a quick helper function to get that bit:

def _field_printout(self, field_name):
    return field_name + '=' + str(getattr(self, field_name))

Each of those fields needs to be separated by a comma and a space, so we’ll do a join:

middle = ', '.join(_field_printout(self, field_name) for field_name in field_names)

Lastly, we need to close it all with a closing paren:

return start + middle + ')'

Put it together and add the following line in the class decorator:

cls.__str__ = cls.__repr__ = to_string(*field_names)

Using it

Here’s an example of its use:

@data('x', 'y')
class Point2D():
    def __init__(self, x, y):
        self.x = x
        self.y = y

That’s all you need. This very simple case would likely have been better off as a namedtuple, but if you wanted to add more methods to it, it’s easier to do so with the class than with the tuple.

Pick and Choose

If you don’t like the implementation of some of these functions, you can choose to leave out some to define your own. When you do this, you no longer use the class decorator; instead you can write your own that only sets the ones you want, or you set them manually. For example, if the Point2D class only wanted the __hash__ and __eq__ methods, it could be defined like this:

class Point2D():
    def __init__(self, x, y):
        self.x = x
        self.y = y

    _field_names = ('x', 'y')
    __hash__ = hash_code(*_field_names)
    __eq__ = equals(*_field_names)

The creation of the _field_names isn’t necessary, but it helps you to only need to write down the names once.

Outro

You are free to use this code all you want, or you can go !!!HERE!!!! to download the file that contains all these definitions, including the documentation.

Programming Ideas With Jake

Jake explores ideas in Java and Python programming