First Steps with Python Type System

[s01e01]

Paweł Święcki
Daftcode Blog

--

Illustration by Magdalena Tomczyk

What is it all about?

In the last few years, type annotation syntax and semantics were gradually introduced to Python language. Typing in Python is still quite new and often misunderstood subject. In this post, I will introduce its basics, while some more advanced features will be covered in the follow-up to this text.

The post is based on PEP 483: The Theory of Type Hints, PEP 484: Type Hints, Python Typing docs, mypy docs, mypy github issues and my personal experience of working with typing in real-life code. I’m using Python 3.6 and mypy 0.620.

1. Type vs. Class

To get a good grasp of Python’s type system we need to distinguish types from classes. Before publishing PEP 483 and 484 there was some confusion regarding those two concepts. Now, generally speaking, a type is a type checker concept and a class is a runtime concept. In this basic characterization of both, we already get the crucial information: types are not something in the “realm” of runtime. Indeed, types are something on another “layer” of a program, a layer meant for type checker.

But what is a type checker? It’s a tool that analyses our code (similar to flake8, but way smarter). It doesn’t run our code in any way, but statically checks the code for type consistency. Python’s community official type checker is mypy and I will use it here. There are also Facebook’s pyre-check and Google’s pytype.

1.1. How to define a type?

Type checker needs information about types to check out our program. There are three basic ways to define a type:

  1. by defining a class,
  2. by specifying functions that work with variables of a type,
  3. by using more basic types to create more complex ones.

In the first case, statement class Animal: ... defines Animal class and Animal type at the same time. It works also for built-ins: int, float, str, list, dict (etc.), which are both classes and types. In this case inheritance relationships between classes are mapped one-to-one to subtyping relationships. So if Dog is a subclass of Animal then Dog is a subtype of Animal etc. This approach to typing is called “nominal subtyping”. In a minute, I will show how it works in the context of type checking.

The second case is in the spirit of duck typing: we define a type by specifying which functions/methods work with variables of this type. E.g. if an object has __len__ method then it has Sized type. This approach to typing is called “structural subtyping”. It’s a topic in itself and it will be covered in this blogpost only marginally.

In the third case, we use earlier defined types (in whichever way) to define more complex types. E.g. we can define the following type: ‘a list that contains only instances of integers or strings’. I will introduce those kinds of types later on.

2. Type annotation syntax

To annotate our code with type information we need a special syntax. This syntax was gradually introduced to the language, but I will focus on its current (and most likely final) state.

2.1. Annotating variables

To annotate a variable we use its name followed by colon and the name of a type. Initializing the variable is optional:

name: Type

Type annotation together with initialization of the variable, not surprisingly, looks like this:

name: Type = initial_value

So from now on, mypy knows that, in this scope, name should have Type type and it will check if that’s indeed the case. In fact, the first check is performed at the very assignment stage: does initial_value fit Type type?

If a variable is not initialized we cannot use it (NameError would be raised), but later on, after we initialize it, mypy will confront the type of the value with the declared type of the variable.

Let’s see what it looks like. For convenience, I will put all mypy errors in comments (and wrap them, if necessary).

So even in these simple cases, mypy is already useful 🙌

2.2. Annotating functions

We can also annotate types of function’s parameters and type of its return value. The following syntax is used:

def function(param1: Type1, param2: Type2) -> ReturnType:
...

Let’s take a look:

In the second case, mypy knows that broken_add has a wrong return by checking return type of + operator (or __add__ method, in fact) when it’s used on two ints: it’s an int, not a str, so return type of the function is declared incorrectly.

3. Subtyping

Before we start playing with all the Python’s typing goodies, we need to better understand the basic subtyping relationship.

Let’s take a look at those two classes:

class Animal:
...

class Dog(Animal):
...

Basically, a subtype is a less general type. In our case Dog is less general than Animal, so it's a subtype of Animal. But let’s dive a bit deeper and see how subtyping relation is defined in Python. This definition will determine assignment rules and usage of attributes rules that mypy enforces on the code.

3.1. Definition

Let <: mean "is a subtype of". (So B <: A reads “B is a subtype of A”.)

Now, B <: A if and only if:

  1. every value of type B is also in the set of values of type A; and
  2. every function of type A is also in the set of functions of type B.

(“Function of type A” basically means “function accepting objects of type A as its argument”. So it can be either a stand-alone function with parameter of type A or a method defined on class A.)

So the set of values becomes smaller in the process of subtyping, while the set of functions becomes larger (see docs).

In case of our two types, Dog <: Animal means that:

  1. Set of Dogs is a subset of Animals (every Dog is an Animal, but not every Animal is a Dog). That basically means there are fewer Dogs than Animals.
  2. Set of functions of Animal is a subset of functions of Dog (Dog can do whatever Animal can, but Animal cannot do everything Dog can). Basically, Animals can do less than Dogs.

3.2. Assignment rules

This definition determines which assignment is acceptable and which isn’t. Let’s try to assign a variable of one type to a variable of the other type.

Assigning scooby to an_animal is type-safe because scooby is guaranteed to be an Animal.

Assigning an_animal to scooby is not type-safe because an_animal might not be a Dog.

Checking inheritance relationships is part of the nominal subtyping approach.

3.3. Attribute rules

Mypy keeps an eye not only on assignments but also on attribute usage. More precisely it checks if an attribute is actually defined on an object. Let’s see it in action.

class Animal:
def eat(self): ...

class Dog(Animal):
def bark(self): ...

Now Animal can eat and Dog can both eat (by inheritance) and bark.

Mypy makes sure that methods are indeed defined on objects in question. an_animal does not have bark method defined, so an error is reported.

Checking attributes, especially methods, is a part of structural subtyping approach. Within this approach “the subtype relation is deduced from the declared methods” [source].

4. Defining complex types

Let’s see how we can use more basic types to create more complex types. This is the third way to define a type in Python. I will focus on a few, typical complex types, while the rest work basically the same.

4.1. List

One of the most basic of them is List. Its spelling is the same as built-in list except the capital L. The syntax is the following: List[TypeOfElements]. So a list of integers is List[int], list of strings is List[str] and so on. Let’s look into the code:

This makes sense, but we all know that in a Python list we can put items of multiple types: [1, 2, '3'] is still a valid list. In a minute we will see how to express types like “integer or string”. But first, let’s take a look at tuple and dict types.

4.2. Tuple

In the Python language tuple traditionally has two purposes. First, it’s an “immutable list”. Second, it’s a “record” or “row of values”, where the value on each position usually has specifically defined type; think of it like of a row in an SQL database. Tuple (with capital T) type supports both approaches.

To define tuple-as-record use the following syntax: Tuple[Type1, Type2, Type3](etc.).

To define tuple-as-immutable-list use Tuple with the ellipsis object (spelled with three dots:...): Tuple[TypeOfAllElements, ...].

4.3. NamedTuple

In Python there is also a namedtuple. Even without typing involved it’s a very handy tuple’s expansion. It adds field-name look-up and a nice string representation:

In Python 3 namedtuple has a younger typed sibling: NamedTuple(again, spelled using capital N and T). At runtime, it has exactly the same API as namedtuple but additionally, it supports type annotations:

Nice, readable and handy! 😎

As a side note: if you know NamedTuple, you basically already know Python 3.7’s dataclass, which is kind of NamedTuple on steroids (see docs).

4.4. Dict

Another essential Python’s type is dict. Its type is defined similarly to Tuple: Dict[KeyType, ValueType]. So a dict mapping integer keys to string values would be Dict[int, str].

There are also types of other Python’s collections: Set, FrozenSet, DefaultDict, Counter, Deque, and many others. For the full list see the docs.

Now let’s focus on another way to create complex types.

4.5. Union

Suppose we have a variable that can have str type or int type, depending on the situation (like different data source). To define such a type we use Union. In our case, it would be Union[str, int]. In square brackets we can put as many types as we want: Union[Type1, Type2, Type3, Type4](etc.). Formally:

Union[t1, t2, ...]. Types that are subtype of at least one of t1 etc. are subtypes of this. [source]

Some examples:

4.6. None type and Optional type

A common programming pattern is to use one variable for concrete value or alternatively something symbolizing no value (when the value is missing, corrupted, not yet available, inadequate in the current context, etc.). In Python, to indicate there is no value None is the most commonly used object. Type of None is NoneType, but in typing context, there is an alias for it, which is… None itself. The alias is very useful since it does not involve importing anything. Therefore the most natural way to express value-of-type-T-or-no-value type would be Union[T, None]. So int-or-nothing would be Union[int, None].

This something-or-nothing pattern is so common that in Python’s typing system Union[T, None] has an alias: Optional[T]. E.g. to express Union[int, None] use Optional[int].

Forgetting about “optionality” of variables quite often causes bugs. Mypy can really help us out here 💪

So, now you are acquainted with the basics of Python typing system. Want to know more? Check out my next blogpost about more advanced features of Python’s typing, along with some useful tips.

If you enjoyed this post, please hit the clap button below 👏👏👏

You can also follow us on Facebook, Twitter and LinkedIn.

--

--