CSI: Python Type System, episode 2

[s02e02] Dealing With the Contravariance Related Bug

Paweł Święcki
Daftcode Blog

--

Illustration by Justyna Mieleszko

This is the second episode of the CSI: Python Type System series. The first episode can be found here.

In the first episode, we got to the bottom of the error reported by mypy: we understood exactly what was wrong with the initial code and why it wasn’t type-safe. Now, we need to do something about it.

The goal of this episode is not to give the ultimate solution to the problem, but to approach it from different perspectives and provide some (fairly simple) suggestions. Choosing and implementing the right one depends on the specific use case.

Strategy 0: Ignoring the Error and Just Moving On

a “do nothing” strategy

You can do that. It may lead to bugs. We are all adults here, I’m not stopping you 😜 If you are sure the code won’t be exploited, just use # type: ignore and move on.

But what if we really wanted to make the code type-safe? Before I demonstrate ideas how to do so, I will expand the code with Human animal, which eats only Chocolate (is it paradise or hell? 🤔):

Now, the “chocolate exploit” is real.

Strategy 1: Using `isinstance` Checks

an awful anti-pattern strategy

How about adding a series of isinstance checks inside Animal.eat() method? They would delegate “eating” to eat method in an appropriate class (e.g. if food is a Meat instance, Dog.eat() would be called, etc.) or handle eating by itself, if nothing is matched. Something like this:

Let me say this once and for all: this code is hacky, unpythonic and just awful. It’s an anti-pattern that adds some kind of a method resolution algorithm on top of the Python’s built-in one. Having this in a codebase would be hell in terms of readability and maintenance. The fact that it wouldn’t properly fix our code (just like the next strategy, see below) is the least of our problems. Just don’t do it, please 😐

Strategy 2: Adding an Abstract Base Class

a limited strategy

This strategy will use the abstract base class (or ABC) pattern. I will use ABC module. So, begin with changing the code by turning Animal class into an abstract base class: BaseAnimal. Now, make BaseAnimal.eat() method untyped (it’s safe, see below). Next, in the BaseAnimal put the rest of the things common to all animals.

I assume we want to instantiate animals other than Dog and Human as well. In this case, we need an additional class, e.g. OtherAnimal. food parameter of its eat() method is to be annotated to Food.

The code would look like this:

You can also implement the abstract base class pattern “manually” by adding `raise NotImplementedError` in `BaseAnimal.eat()`.

BaseAnimal.eat() is left unannotated. This is basically the same as annotating it with Any. It’s done to silence mypy. Is it safe? Yes, this is fine in this case — it’s not a class that can be instantiated, so BaseAnimal.eat() will never be called. A food of a wrong type won’t be passed to it, then.

Now, instantiating a generic BaseAnimalis not possible. Thus, it’s impossible to feed a wrong food by passing a Dog (or a Human) instance to a function where the instance of “other animal” (i.e. neither a dog nor a human) is expected. Now, it’s the OtherAnimal class that supports those “other animals”. Type of eat’s food parameter is Food, but nothing inherits from this class, so the original issue is not repeated at a lower level.

Unfortunately, the chocolate exploit is still possible. We still can define a function that expects a BaseAnimal instance, that is an instance of one of the concrete classes inheriting from it:

So now, using forbidden Food subtype is possible. Sadly, using completely unrelated types is possible as well. It’s because BaseAnimal.eat’s food type is unannotated — its type is Any:

Also, within this approach we can create a BaseAnimal subtype with eat’s food parameter of whichever type, e.g.:

Now, monkeys can eat only instances of Railroad, which is not a Food subtype. Believe me, they won’t be happy about it! 🙊

Therefore, this solution is limited. To make the code type-safe, we additionally need to:

  1. remember not to annotate anything with BaseAnimal, that is: do not make any function expect a BaseAnimal instance;
  2. keep track of the proper type of eat’s food parameters in all new BaseAnimal subtypes — mypy won’t do that for us;
  3. remember not to subclass OtherAnimal class (or, if you really want to, don’t make its instances eat subtypes of Food).

Thanks to Paweł Stiasny and Bartosz Stalewski for helping me better understand the pitfalls of this strategy.

Strategy 3: Eliminating one of the class hierarchies

a strategy changing an initial assumption

So far, I did not challenge the following assumption behind the original code: it’s suitable to use class hierarchies to model both animals and food. It’s true that the assumption was related to the original issue. Yet, I think the flaw was not in the assumption, but in the code that misused both hierarchies by incorrectly combining them. Strategy 4 and Strategy 5— which avoid the problem while keeping both hierarchies — will, in my opinion, show that.

Now, I will give up implementing animals as a class hierarchy. Dog and Human can be implemented like this:

The infamous “chocolate exploit” just isn’t possible now. Right, but we just lost all connection between them. To restore it, we can do at least two things.

First, in the typing realm, we can reconnect both types using a Union type. With type defined this way, we can properly annotate all the places where an animal is expected:

Now, mypy reports the following error for the marked line:

error: Argument 1 to "eat" of "Dog" has incompatible type "Chocolate"; expected "Meat"

Great! Feeding a dog with chocolate is now impossible. Our main goal is achieved — Lassie is saved!

Safe and happy Lassie. [source]

Second, in the runtime class realm, we can reconnect both classes using mixin classes. I will add one: CanEatMixin. It will have a generic eat method. Purpose of this mixin is not typing-related. It doesn’t define a typing protocol either. It’s created just to provide a generic eat implementation. Therefore, to stress that, I’m explicitly adding super calls in both inheriting classes:

Generally, adding a super call should be done by default. So far I focused on types, so I did not do that explicitly.

Let me rephrase — CanEatMixin is only a provider of generic eat implementation and it should not to be used in type annotations. If it was, we would be back to square one. The Animal is the type to be used in type annotations. It doesn’t affect the runtime, though.

Downsides of this approach:

  1. Union type is made up of a flat list of types. Mixin classes cannot (or rather, should not) create hierarchies as well. So, we really do need to give up any proper animal hierarchy.
  2. This strategy somewhat separates the Animal type from the classes. This is not very clean and might get even messier along the way. Also, it makes us manually update the Animal type whenever we implement a new animal class.
  3. Just like in the strategy of using ABC, we need to make sure to annotate all eat’s food parameters with a proper food type, so monkeys won’t be forced to eat railroad 😬

Thanks to Paweł Stiasny for suggesting to use Union instead of inheritance.

In both strategies discussed below, I restore the original assumption of implementing animal and food as class hierarchies.

Strategy 4: Tying Relations Between Hierarchies

a simple strategy introducing a more significant change

Another way to deal with our problem is to strictly fix relations between elements of both class hierarchies. For instance, we can tie every animal class with the most fitting food class. It can be done in many different ways. One of the simpler ones is to use class attributes. Here, I define food_cls on every animal class:

Using this code, we just cannot pass a wrong type of food to eat method. It’s because food is not passed as eat‘s argument anymore. It’s instantiated inside eat method with Food’s class defined as a class attribute of Animal class.

This approach is simple and effective, but it has downsides as well:

  1. We cannot control Food creation outside of Animal class anymore. A partial (but possibly not very clean) solution would be to pass all necessary Food.__init__ arguments via eat method.
  2. Implementing this strategy will make you change all of the code that used Animal.eat methods.
  3. The code seems less flexible in terms of what animal can eat. As a partial fix, we can utilize inheritance mechanisms for class attributes (e.g., if food_cls was not defined on Human, the one from Animal would be used).

Something Extra

Let’s say we have a Monkey(Animal) class. What if we assigned, in that class, something wrong to food_cls? Like this:

class Monkey(Animal):
food_cls = Railroad

Mypy will accept it. However, with typing and mypy we can control that as well.

Normally we annotate stuff like this: my_food: Food. It means that my_food variable has type Food, i.e. it accepts only Food instances. On the other hand, we can tell mypy that a variable is to accept only classes themselves. We use a special Type type here. It’s used like this:

food_cls: Type[Food]

It means that food_cls may only accept Food class (not instance) and classes (not instances) inheriting from it. This is what it would look like in practice:

Nice! For more about Type type see the docs.

Strategy 5: Using Multiple Dispatch

even more radical and possibly unpythonic strategy

Our Animal/Food code might seem to fit for a restructuring of another kind— using the multiple dispatch pattern. Python language does not natively support the multiple dispatch. Fortunately, there is Multiple Dispatch library. There is no time for explanations, let’s dive right in!

For the multiple dispatch to work, we need to define eat function with multiple implementations, depending on passed types. For the sake of variety, I’ve added Chicken-eating LapDog:

Unfortunately, currently there is no way to cleanly combine multiple dispatch with Python typing and mypy type checker. I hope this will be somehow possible in the future.

Now, when eat is called, the library will choose the most specific implementation of the function that fits types of passed arguments:

Everything works as expected. Now, after running eat(lassie, chocolate_bar) (which I won’t do for safety reasons 😅), we would have 'Animal eating Food' printed, as that would be the most specific eat implementation that fits types of the passed objects. We definitely don’t want that, since the food would be a chocolate bar. Now, to prevent feeding Lassie with chocolate, we just need to define this exact forbidden eat version, and raise an exception inside:

Now:

Lassie is saved, once more!

An alternative solution, depending on our needs, would be to remove the “Animal eating Food” implementation altogether. In that case, calling eat(lassie, chocolate_bar) would raise NotImplementedError.

Downsides of this solution:

  1. We need to know which combinations of parameter types we want to implement and which to explicitly exclude.
  2. We need to keep track of the types involved as well. We do not want to implement eat’s animal parameter with Monkey type and food parameter with Railroad type (or even the other way around 😵).
  3. Adopting this strategy will make you change all the code that used Animal.eat methods.

Downsides 1 and 2 are not a problem with two parameters of three or four alternative types each. With more and more parameters and types keeping track of all the combinations becomes more and more troublesome.

This multiple dispatch might seem… odd, surprising, or even unpythonic? For sure, it begs for more comments, explanations, and examples. Do you want to know more about the multiple dispatch in Python? Let me know in the comments below! I might write a blog post about it 😎

Summary

On one hand, the discussed strategies are fairly simple. They do not use any complex patterns that introduce higher levels of abstraction. I think it just wasn’t needed in our case. Even the multiple dispatch strategy — while possibly being strange to some — does not, in fact, add any layers compared to the classical object-oriented approach.

On the other hand, even those simple strategies aren’t really quick fixes to the code. They are rather, better or worse, workarounds (apart from Strategy 0). I think it’s quite clear now that in the original code both class hierarchies just did not work well together, in a fundamental way. Thus, it couldn’t be fixed quickly and easily. To avoid the “chocolate exploit” we needed to make more severe structural changes. They might even lead to deep refactorings in the surrounding codebase.

Now, which one is the best depends on the particular use case. Maybe you will find another one, even better in your case. Just remember to look at the bigger picture: how the code in question is being used and how it might evolve. Just don’t make the code worse while fixing it 😉

If you are interested in the subject, I recommend reading Eric Lippert’s Wizards and warriors blog post series (part one can be found here). It begins with a case similar to ours but tackles more and more complex examples with more and more complex solutions. I think the conclusion is particularly interesting. It uses C#, but it applies to Python, and other languages, as well.

So, are we finished? Is there anything left after dealing with the issue identified in s02e01? Yes, there’s more! 🎁 I’ve decided to make one more episode of CSI: Python Type System. In s02e01 I briefly mentioned the concept of contravariance. I think it will be helpful to define it more formally, along with related concepts of covariance and invariance. Those definitions, as well as multiple accompanying examples, will improve our understanding of relationships between types. Also, you will find there the list of sources for the whole blog post series. Enjoy:

If you enjoyed this post, please hit the clap button below 👏👏👏

You can also follow us on Facebook, Twitter and LinkedIn.

--

--