Except the Unexpected

by Marty Alchin on November 21, 2007 about Django

No, that title isn’t a typo. Predictably, yesterday’s post drew out an opposing view, and I’m very glad for it. While I haven’t changed my mind on the subject, Cedric did raises some reasonable points that I neglected in my original post. Maybe it’s just that I’m growing tired of posting every day, but I didn’t adequately explain my views, and for that I apologize. Only for not explaining, though, not for the views themselves. Today, I’ll try to be more detailed in my thoughts on the subject, and offer more recommendations than just “don’t return None” and “embrace exceptions”.

Also, I will apologize for the length of this post. I hope it helps, though.

Defining “exception”

A quick dictionary search for the word “exception” provides numerous options for defining the word. Removing useless definitions (“the act of excepting or the fact of being excepted”) and more specialized uses (criticism and legal usage), we’re left with a few variations on a theme:

something excepted; an instance or case not conforming to the general rule
one that is excepted, especially a case that does not conform to a rule or generalization
an instance that does not conform to a rule or generalization

The recurring theme here is that exceptions generally require a rule from which to deviate. But we’re programmers here, so what kind of rules are we talking about? In a nutshell, the rule is whatever action a piece of code is expected to perform. This might be defined by a function’s name, documentation, usage examples, or other communication, but it will be specific to each piece of code.

Any time you write a function, you’re designating a purpose for it, a task it should perform. That task, however simple or complicated, however blunt or subtle, is its rule. Deviations from that rule are exceptions. (To throw in more tongue-twisters for fun, consider this: exceptions violate expectations; they are excepted from what’s expected.)

However, just defining the concept of an exception doesn’t really do much for anybody. It’s like one of those zen sayings that doesn’t make any sense unless you’re expressly looking for meaning in the universe. And even then, it doesn’t really provide an answer, it just makes you appreciate the question. So how should we apply the concept of exceptions to programming? Well, exactly how it works will depend on the rule, and thus on each function itself.

So, it seems to me that the best designs focus on defining the rule, rather than the exceptions. Any required exceptions will be defined naturally as a side-effect of having a well-defined rule in place.

Defining a rule

When you write a function, you’re defining a task, or set of tasks, that the function will perform (some systems may make this definition more formal by designing by contract, but that’s beyond the scope of this article). Exactly how it’s performed is irrelevant to this discussion, and since rules will be different for different functions, the real key here is just to make sure that you do at least consciously define a rule for what the function will do. A few examples:

A dictionary’s __getitem__ method will accept a key and return the value referenced by that key.
A dictionary’s get method will accept a key and return the value referenced by that key, if such a value is present.
A Django ORM Manager’s get method will accept values for any number of a model’s fields, and retrieve an object for the only row in the underlying table that contains those values.

First, consider the dictionary. By using Python’s standard dictionary syntax, x[i], you’re implicitly calling __getitem__. So when using this syntax, the rule specifies that the key supplied must match a value. If this isn’t the case, it’s an exception to the rule, and Python reacts accordingly by raising a KeyError. If you supply a value for the key that can’t be used as a key (such as a list; try x[[]]), it can’t even try to look it up as a key, so that’s another exception: TypeError. These cases aren’t covered by the rule, so they’re considered to be exceptions.

Dictionaries provide another option, however, which allows for keys to be missing. This is a different function, and thus a different rule. By including “if such a value is present” in its rule, the get method must handle the inverse of that condition. If an appropriate value isn’t present, it’s no longer an exception, but an anticipated aspect of the rule, and it handles this by returning None. *gasp* Yes, this violates my previous post, and that’s why I agreed with Cedric that I should clarify my point. This situation isn’t evil, because returning None is appropriate within the rule defined for the function.

It’s also important to note that you may choose to call get instead of __getitem__, it’s not automatic. The “standard” dictionary access technique uses __getitem__, with get being generally reserved for more specialized situations which need to follow a different rule.

In the case of the get method of Django’s model managers, you’ll notice the rule is very complex, and thus has many potential points of failure. Just going by the rule I laid out above, here are the ways it could go wrong:

You supply an argument that’s not a valid model field
It can’t access the underlying database table
There are no rows in the database matching the supplied options
There’s more than one matching row (remember, the rule says “the only row”)

Of course, there are more things that can go wrong, but those are mostly ipmlementation details or things that are out of Django’s control. Those listed above are based solely on the rule I provided, and of course, that rule is probably a bit oversimplified as well.

Some words of advice

So when should your rules include provisions for None? How inclusive should your rules be? How should you convey the nature of these rules to programmers? These are all good questions, and while I don’t pretend to have a perfect answer (in fact, I doubt there is one), I’ll offer some advice based on my own experiences.

If you’re unsure, start by being as specific as possible. If you’re just throwing together a function for a specific task, make that task is specific and singularly focused as you can, while still doing what you need. it may grow later on, and you may even expect it to, but if you’re not sure exactly how it will grow, don’t plan too much for that growth right away. At least, not for your rule. Start simple and refine it as needs arise. However, don’t confuse this as an excuse to make one function do everything, without regard for other design principles. Allow your function to accommodate more situations later on, if need be, without writing code for those cases right away. To be a bit more specific:
If a function returns something, start by assuming it will always return something useful. You’re not likely to write a function that never gets erroneous input, and never experiences any other difficulties, but it’s often useful to start by pretending that it will always be successful. If a function has to look up a supplied value in a dictionary, just use standard syntax and let any KeyErrors go uncaught if the key isn’t valid. If you’re fetching an object from a cache, either return the cached object (if it’s still cached), or create a new one, cache it, and return that new one. You might find valid reasons to return None at some point, but don’t plan for it unless you already know of a situation where it will be useful.
If the function returns a list (or other iterable), always return a list (or other iterable), unless something actually went wrong. It’s quite common to see functions like find_all_in_document(document, word), where (as Cedric rightly pointed out), it’s quite likely that the supplied word simply isn’t present in the given document. This isn’t an exception. The rule specifies that it return all instances of that word. In set theory, both “any” and “all” of something that doesn’t exist are called the empty set. In Python terms, this would represented by an empty list (or other iterable), not None. By returning an empty list (or other iterable), the calling code can safely perform operations on it directly and not have to worry about whether it’s empty or not, unless an empty list (or other iterable) means something special. For instance, print len(find_all_in_document(document, 'spam')) would simply print 0 if the word wasn’t present. A loop such as for word in find_all_in_document(document, 'spam'): would never execute if the word wasn’t found. This is a very simple way to make a lot of code a lot simpler, without raising exceptions. Of course, if something really did go wrong, that’s probably outside the scope of your rule anyway, and should merit an exception.
If you do return None for a good reason, try to make that function the alternative, not the default, and make the default raise an exception instead. This one’s considerably more subjective, but I think it’s still good advice. Remember the example of the dictionary. The “standard” (most-used, most-documented) tactic is direct access, x[i], which will raise an exception if the key isn’t present. The None-returning variation, x.get(i), is the alternative, available if necessary, but not used by default. This separation and priority helps make sure that programmers make a conscious decision to deal with a function that returns None, rather than it being an unexpected side-effect of a function that didn’t stipulate that in its rules.
Document your rules wherever possible. Since programmers will have to write their own code based on what your code is expected to do, it’s best for everyone if you’re upfront about it. Name your functions as descriptively as you can: get_article_from_cache, find_in_document, list_all_users, whatever. The nouns in those examples can usually be removed if they’re instance methods of a class that’s named appropriately (Article.get_from_cache, Document.find, User.objects.list, etc.), but the idea remains the same. Also, use docstrings to explain what will happen, outlining the rule explicitly. Write it up in your program’s documentation, which should be distributed along with the program, as well as made available on the Web, if possible.
Keep rules consistent wherever possible. If you write multiple functions that behave similarly and are named similarly, try to keep the rules as similar as possible. If you write a base class with some methods, make sure that if any subclasses override those methods, you keep the rule the same, unless there’s some very obvious reason it should differ (and make sure that reason is indeed obvious; again, through documentation). This is especially true when subclassing or emulating built-in types, because programmers will likely have assumptions about a method’s rule ingrained in them from long before they use your new class. Breaking those assumptions should done rarely, always with good reason, and always with great care. Again, document it if you have to do this.

Conclusion (finally!)

As with anything else, I won’t pretend to be perfect in this regard myself. I came from a PHP background, and I wrote some of my Python code before I fully understood some of these philosophical concepts. But once you know, always try to be a better programmer, and this is one why I feel we could all be better programmers.