First Steps with Python Type System
[s01e01]
What is it all about?
In the last few years, type annotation syntax and semantics were gradually introduced to Python language. Typing in Python is still quite new and often misunderstood subject. In this post, I will introduce its basics, while some more advanced features will be covered in the follow-up to this text.
The post is based on PEP 483: The Theory of Type Hints, PEP 484: Type Hints, Python Typing docs, mypy docs, mypy github issues and my personal experience of working with typing in real-life code. I’m using Python 3.6 and mypy 0.620.
1. Type vs. Class
To get a good grasp of Python’s type system we need to distinguish types from classes. Before publishing PEP 483 and 484 there was some confusion regarding those two concepts. Now, generally speaking, a type is a type checker concept and a class is a runtime concept. In this basic characterization of both, we already get the crucial information: types are not something in the “realm” of runtime. Indeed, types are something on another “layer” of a program, a layer meant for type checker.
But what is a type checker? It’s a tool that analyses our code (similar to flake8, but way smarter). It doesn’t run our code in any way, but statically checks the code for type consistency. Python’s community official type checker is mypy and I will use it here. There are also Facebook’s pyre-check and Google’s pytype.
1.1. How to define a type?
Type checker needs information about types to check out our program. There are three basic ways to define a type:
- by defining a class,
- by specifying functions that work with variables of a type,
- by using more basic types to create more complex ones.
In the first case, statement class Animal: ...
defines Animal
class and Animal
type at the same time. It works also for built-ins: int
, float
, str
, list
, dict
(etc.), which are both classes and types. In this case inheritance relationships between classes are mapped one-to-one to subtyping relationships. So if Dog
is a subclass of Animal
then Dog
is a subtype of Animal
etc. This approach to typing is called “nominal subtyping”. In a minute, I will show how it works in the context of type checking.
The second case is in the spirit of duck typing: we define a type by specifying which functions/methods work with variables of this type. E.g. if an object has __len__
method then it has Sized
type. This approach to typing is called “structural subtyping”. It’s a topic in itself and it will be covered in this blogpost only marginally.
In the third case, we use earlier defined types (in whichever way) to define more complex types. E.g. we can define the following type: ‘a list that contains only instances of integers or strings’. I will introduce those kinds of types later on.
2. Type annotation syntax
To annotate our code with type information we need a special syntax. This syntax was gradually introduced to the language, but I will focus on its current (and most likely final) state.
2.1. Annotating variables
To annotate a variable we use its name followed by colon and the name of a type. Initializing the variable is optional:
name: Type
Type annotation together with initialization of the variable, not surprisingly, looks like this:
name: Type = initial_value
So from now on, mypy knows that, in this scope, name
should have Type
type and it will check if that’s indeed the case. In fact, the first check is performed at the very assignment stage: does initial_value
fit Type
type?
If a variable is not initialized we cannot use it (NameError
would be raised), but later on, after we initialize it, mypy will confront the type of the value with the declared type of the variable.
Let’s see what it looks like. For convenience, I will put all mypy errors in comments (and wrap them, if necessary).
So even in these simple cases, mypy is already useful 🙌
2.2. Annotating functions
We can also annotate types of function’s parameters and type of its return value. The following syntax is used:
def function(param1: Type1, param2: Type2) -> ReturnType:
...
Let’s take a look:
In the second case, mypy knows that broken_add
has a wrong return by checking return type of +
operator (or __add__
method, in fact) when it’s used on two int
s: it’s an int
, not a str
, so return type of the function is declared incorrectly.
3. Subtyping
Before we start playing with all the Python’s typing goodies, we need to better understand the basic subtyping relationship.
Let’s take a look at those two classes:
class Animal:
...
class Dog(Animal):
...
Basically, a subtype is a less general type. In our case Dog
is less general than Animal
, so it's a subtype of Animal
. But let’s dive a bit deeper and see how subtyping relation is defined in Python. This definition will determine assignment rules and usage of attributes rules that mypy enforces on the code.
3.1. Definition
Let <:
mean "is a subtype of". (So B <: A
reads “B
is a subtype of A
”.)
Now, B <: A
if and only if:
- every value of type
B
is also in the set of values of typeA
; and - every function of type
A
is also in the set of functions of typeB
.
(“Function of type A
” basically means “function accepting objects of type A
as its argument”. So it can be either a stand-alone function with parameter of type A
or a method defined on class A
.)
So the set of values becomes smaller in the process of subtyping, while the set of functions becomes larger (see docs).
In case of our two types, Dog <: Animal
means that:
- Set of
Dog
s is a subset ofAnimal
s (everyDog
is anAnimal
, but not everyAnimal
is aDog
). That basically means there are fewerDog
s thanAnimal
s. - Set of functions of
Animal
is a subset of functions ofDog
(Dog
can do whateverAnimal
can, butAnimal
cannot do everythingDog
can). Basically,Animal
s can do less thanDog
s.
3.2. Assignment rules
This definition determines which assignment is acceptable and which isn’t. Let’s try to assign a variable of one type to a variable of the other type.
Assigning scooby
to an_animal
is type-safe because scooby
is guaranteed to be an Animal
.
Assigning an_animal
to scooby
is not type-safe because an_animal
might not be a Dog
.
Checking inheritance relationships is part of the nominal subtyping approach.
3.3. Attribute rules
Mypy keeps an eye not only on assignments but also on attribute usage. More precisely it checks if an attribute is actually defined on an object. Let’s see it in action.
class Animal:
def eat(self): ...
class Dog(Animal):
def bark(self): ...
Now Animal
can eat and Dog
can both eat (by inheritance) and bark.
Mypy makes sure that methods are indeed defined on objects in question. an_animal
does not have bark
method defined, so an error is reported.
Checking attributes, especially methods, is a part of structural subtyping approach. Within this approach “the subtype relation is deduced from the declared methods” [source].
4. Defining complex types
Let’s see how we can use more basic types to create more complex types. This is the third way to define a type in Python. I will focus on a few, typical complex types, while the rest work basically the same.
4.1. List
One of the most basic of them is List
. Its spelling is the same as built-in list
except the capital L
. The syntax is the following: List[TypeOfElements]
. So a list of integers is List[int]
, list of strings is List[str]
and so on. Let’s look into the code:
This makes sense, but we all know that in a Python list we can put items of multiple types: [1, 2, '3']
is still a valid list. In a minute we will see how to express types like “integer or string”. But first, let’s take a look at tuple and dict types.
4.2. Tuple
In the Python language tuple
traditionally has two purposes. First, it’s an “immutable list”. Second, it’s a “record” or “row of values”, where the value on each position usually has specifically defined type; think of it like of a row in an SQL database. Tuple
(with capital T
) type supports both approaches.
To define tuple-as-record use the following syntax: Tuple[Type1, Type2, Type3]
(etc.).
To define tuple-as-immutable-list use Tuple
with the ellipsis object (spelled with three dots:...
): Tuple[TypeOfAllElements, ...]
.
4.3. NamedTuple
In Python there is also a namedtuple
. Even without typing involved it’s a very handy tuple’s expansion. It adds field-name look-up and a nice string representation:
In Python 3 namedtuple
has a younger typed sibling: NamedTuple
(again, spelled using capital N
and T
). At runtime, it has exactly the same API as namedtuple
but additionally, it supports type annotations:
Nice, readable and handy! 😎
As a side note: if you know NamedTuple
, you basically already know Python 3.7’s dataclass
, which is kind of NamedTuple
on steroids (see docs).
4.4. Dict
Another essential Python’s type is dict. Its type is defined similarly to Tuple
: Dict[KeyType, ValueType]
. So a dict mapping integer keys to string values would be Dict[int, str]
.
There are also types of other Python’s collections: Set
, FrozenSet
, DefaultDict
, Counter
, Deque
, and many others. For the full list see the docs.
Now let’s focus on another way to create complex types.
4.5. Union
Suppose we have a variable that can have str
type or int
type, depending on the situation (like different data source). To define such a type we use Union
. In our case, it would be Union[str, int]
. In square brackets we can put as many types as we want: Union[Type1, Type2, Type3, Type4]
(etc.). Formally:
Union[t1, t2, ...]
. Types that are subtype of at least one of t1 etc. are subtypes of this. [source]
Some examples:
4.6. None type and Optional type
A common programming pattern is to use one variable for concrete value or alternatively something symbolizing no value (when the value is missing, corrupted, not yet available, inadequate in the current context, etc.). In Python, to indicate there is no value None
is the most commonly used object. Type of None
is NoneType
, but in typing context, there is an alias for it, which is… None
itself. The alias is very useful since it does not involve importing anything. Therefore the most natural way to express value-of-type-T-or-no-value type would be Union[T, None]
. So int-or-nothing would be Union[int, None]
.
This something-or-nothing pattern is so common that in Python’s typing system Union[T, None]
has an alias: Optional[T]
. E.g. to express Union[int, None]
use Optional[int]
.
Forgetting about “optionality” of variables quite often causes bugs. Mypy can really help us out here 💪
So, now you are acquainted with the basics of Python typing system. Want to know more? Check out my next blogpost about more advanced features of Python’s typing, along with some useful tips.
If you enjoyed this post, please hit the clap button below 👏👏👏