Expert python topics you should know

Expert python topics you should know: muti version handling, dunders, decorators, interfaces, context managers, asyncio, profiling, logging and more.

Expert python topics you should know
Photo by David Clode / Unsplash

What are the main topics that distinguish an advanced developer from a just effective enough python programmer? You are good at python when what you code is elegantly simple and idiomatic.

Each language and community has its own way of resolving certain kind of problem. That specific way of doing things, is what we call idiomatic. We want our code to be idiomatic because not only we will be writing code that is easier to understand but also we are resolving problems using well known and tested techniques. Being idiomatic is to create simple code that relies on existing solution for normal problems. We don’t reinvent the wheel.

In this post I will describe the main topics that can make your code more idiomatic, and some advanced functionalities you need to be familiar as an advanced python developer.

Multiple python versions

There will be situations where you need multiple versions of python. You may be just fine using the default python 2 or 3 of your system. But there are situations when some client/project requires a very specific version. You may also need to work on different projects and any of them may use different specific versions. In this scenario, you need a way to manage your python versions. And this is not the same as managing dependency versions. I’m talking about the python language version itself.

The solution to this problem is very simple, just use pyenv. With it, you will be able to have any version you want at your disposal, very easily.

$ pyenv versions # lists all installed versions
$ pyenv install 3.7.4 # installs specific verion
$ pyenv global 3.7.4 # activates the specific version
$ pyenv local 3.7.4 # version for a directory

The pyenv also installs development headers that you will need when making c/c++ extensions. But you shouldn’t worry about the exact path. CMake‘s find_package is going to help you with that.

find_package(Python3 COMPONENTS Development)
target_include_directories(<project_name>
    PUBLIC ${Python3_INCLUDE_DIRS})

Dunder methods

Dunder or magic method, are methods that start and end with a double _ like __init__ or __str__. This kind of method is the mechanism we use to interact directly with python’s data model. A language data model describes:

  • How Values are stored in memory.
  • Object identity, equality and truthiness.
  • Name resolution, function/method dispatching.
  • Basic types, type/value composition.
  • Evaluation order, eagerness/laziness.

Basically the __ methods allow us to interact with core concepts of the python language. You can see them also as a mechanism for implementing behaviors, and interface methods.

For a detailed description of many useful dunders and related concepts, I recommend you to read this guide.

@ Function decorators

Decorators are nothing more than a special case of Higher Order functions with @ syntax support. It’s quite equivalent to doing function composition. We can use a decorator not only for normal functions but also for class methods.

from functools import wraps
def add10(f):
    @wraps(f)
    def g(*args, **kwargs):
        return f(*args, **kwargs) + 10
    return g
@add10
def add1(a):
    return a + 1
p.add1(0)  # 11

wraps from fuctools is in itself another decorator that keeps the metadata from the original wrapped function.

Interfaces

Interfaces help us to enforce the implementation of certain characteristics by other code that commits to doing so.

One of the characteristic we can enforce is the definition of specific methods like we would do defining a normal java interface:

from abc import ABC, abstractclassmethod
class Animal(ABC):
    @abstractclassmethod
    def make_sound(self):
        return "indistinguishable noise"
class Cat(Animal):
    def make_sound(self):
        return "miauu"
class Dog(Animal):
    def make_something(self):
        return "eat"

We used the @abstractclassmethod decorator to enforce the definition of specific methods in child classes:

>>> Cat().make_sound()
'miauu'
>>> Dog()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>;
TypeError: Can't instantiate abstract class Dog
           with abstract methods make_sound

But the previous enforcement required us to try to create an instance in the first place. If we would like to be even more strict, we could use metaclass to make the script fail while loading the class definition.

class Animal(type):
    def __new__(cls, name, bases, body):
        if 'make_sound' not in body:
            raise TypeError('no make_sound method')
        return super().__new__(cls, name, bases, body)
class Cat(metaclass=Animal):
    def make_sound(self):
        return "miauu"
class Dog(metaclass=Animal):
    def make_something(self):
        return "eat"
Traceback (most recent call last):
  [...] class Dog(metaclass=Animal):
TypeError: no make_sound method

In the same way that when we instantiate a class we create an object, when we instantiate a Meta-class we create a class. Meta Classes are a way of controlling the creation of classes. This example also indirectly shows that the __new__ dunder has the responsibility of creating the instance while __init__ initialized the instance previously created by __new__.

Since python 3.6 instead of having the __new__ method inside a metaclass we can just use __init_subclass__ instead. Then our interface example would be the following:

from _collections_abc import _check_methods
class Animal():
    def __init_subclass__(cls, *args, **kwargs):
        if _check_methods(cls, 'make_sound') is NotImplemented:
            raise TypeError("make_sound not implemented")
class Cat(Animal):
    [...] # same than in previous example
class Dog(Animal):
    [...] # same than in previous example

If you use _check_methods you will have extra style points.

Context manager

Context manager, or in mundane words: classes with __enter__ and __exit__ methods. Context managers give us support for the RAII pattern through the with syntax. An important thing to remember is that when implementing __exit__ you should check the exception values, because you have the choice to propagate or not the exception that happened inside with. If you return a true value you can suppress the exception. But under no circumstance you are expected to re-raise an exception inside the __exit__ method.

For example, let suppose we had nothing better to do than to use the low level http.client library; we could wrap HTTPConnection inside a context manager:

from http.client import HTTPConnection
# not really necessary but looks cool
from contextlib import AbstractContextManager

class Conn(AbstractContextManager):
    def __init__(self, host):
        self.host = host
    def __enter__(self):
        self.conn = HTTPConnection(self.host, 80)
        return self.conn
    def __exit__(self, *args):
        self.conn.close()

with Conn('example.com') as conn:
    conn.request('GET', '/')
    res = conn.getresponse()
    print(res.status, res.reason)

That code is very verbose, we can fix that using the contextlib module. I recommend you to read its whole documentation. If we want we can use a generator instead of a full AbstractContextManager class implementation.

from http.client import HTTPConnection
from contextlib import contextmanager
@contextmanager
def Conn(host):
    conn = HTTPConnection(host, 80)
    try:
        yield conn
    finally:
        conn.close()

We also can get rid completely of the Conn class with closing:

from contextlib import closing
with closing(HTTPConnection("example.com", 80)) as conn:
    conn.request('GET', '/')
    res = conn.getresponse()
    print(res.status, res.reason)

Asynchronous programming

When we use async and await we are doing cooperative concurrency (not parallelism). You may want to check some online documentation or tutorial online if you are not familiar with those terms.

In practice, we have a bunch of async functions and an event loop. And that’s it. But what happens when you actually want to define a real parallel operation? What if some important client wants to have some custom crazy high performant network code? How can we create low-level parallel code that from the point of view of Python appears to be asynchronous code?

First, we need a way to make a blocking code a coroutine. The following make_async function does exactly that:

import sys
import inspect
from functools import wraps
from concurrent.futures import ThreadPoolExecutor
def make_async(g):
    @wraps(g)
    async def f(*args, **kwargs):
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            ThreadPoolExecutor(),
            lambda: g(*args, **kwargs)
        )
    sys.modules[__name__]
    frm = inspect.stack()[1]
    mod = inspect.getmodule(frm[0])
    setattr(mod, g.__name__, f)

A very neat function right? Yeah, but this will only truly work if the g function releases the GIL. The following code, which uses pybind11, defines a C++ module x with a send_message function that inside it releases the GIL.

#include <pybind11/pybind11.h>
#include <thread>
#include <chrono>
std::string send_message(std::string input)
{
    pybind11::gil_scoped_release release; // GIL RELEASE
    std::this_thread::sleep_for(std::chrono::seconds(1));
    return input + " done!";
}
PYBIND11_MODULE(x, m) {
    m.def("send_message", &amp;send_message, 
          "sends something though the network");
}

The pybind11::gil_scoped_release class releases the GIL when is constructed and then acquires the GIL again at the end of the function call.

import asyncio
from x import send_message
make_async(send_message) # using the function from above
async def send(msg):
    print("sending", msg)
    result = await send_message(msg)
    print("sent", result)
    return result
loop = asyncio.get_event_loop()
to_send = [loop.create_task(send(str(i))) for i in range(3)]
loop.run_until_complete(asyncio.wait(to_send))

And the output is what you would expect:

sending 0
sending 1
sending 2
sent 0 done!
sent 1 done!
sent 2 done!

Profiling

Profiling is some of those techniques that we use when we really fucked something up. It’s a great tool to know, but you will suffer trying to figure out why you are getting some esoteric crashes, or why something isn’t working as it should.

For calling trees and CPU time we can use cprofile and KCacheGrind:

$ python -m cProfile -o script.profile main.py
$ pyprof2calltree -i script.profile -o script.calltree
$ kcachegrind script.calltree

But cprofile, profile and hotshot aren’t that useful if we have multi-threaded code or if any bottleneck is generated by non-explicit function calls. A much more effective profiler is yappi, and it really is. You won’t go back to cprofile after playing around with yappi. Don’t take my word for it, you can see that the PyCharm IDE uses yappi by default if you have it installed.

To use yappi we need to add some code to our script:

import yappi
yappi.start(builtins=True)
# YOUR CODE GOES HERE
# a context manager would be great for this
func_stats = yappi.get_func_stats()
func_stats.save('script.calltree', 'CALLGRIND')
yappi.stop()
yappi.clear_stats()

After the profiling ends we can open the profiling file with:

$ kcachegrind script.calltree

Another important aspect of profiling is to record memory usage.

We can take a memory snapshot at any moment with pympler:

from pympler import muppy, summary
all_objects = muppy.get_objects()
summary.print_(summary.summarize(all_objects), limit=100)

The main features of pympler can be accessed through ClassTracker.

Network analysis

Some applications are very hard to understand and we need to start seeing them as black boxes. Or maybe we have a very obscure problem when we send data through the network.

The most practical way of analysis would consist of you modifying your software to record every request it receives or sends, and in a perfect world, you would want that feature to be able to turn on/off while in production.

But most mortals don’t understand their own systems enough nor want to go through that time investment. But even in that case, there are things we can do:

  • Incoming traffic: Ensure that our server receives HTTP traffic, we can do this being behind a load balancer or a reverse proxy, so that we can keep serving https. Doing this we can simply use wireshark to read the incoming traffic.
  • Outgoing traffic: We need a proxy, and if we are making HTTPS requests we need to install custom certificates, so we can do the man in the middle. This requires the use of mitmproxy

In normal scenarios, where you understand and control the codebase, you should be logging and analyzing the traffic internally without relying on the previous tricks, especially because you may don’t be able to do “mitm attacks” with a production server under heavy load without slowing everything down.

Logs are your best friend…

Logging

Most of the time, logging just works and you shouldn’t worry. But under heavy load, we can approach logging by:

  • Don’t log at all, and only use metrics instead. Or,
  • Send the logs through the network: if you log locally, that will put your server and your code under heavy load, and you may need to create code specially designed to be able to handle the logging.
  • Avoid logging into a disk that you use for something else: Don’t put a load on a disk that you use for other tasks.
  • If you want to write your local logs yourself, please ensure that you rotate them. You could use RotatingFileHandler, but logrotate is better.

Obviously how you approach any logging problem will depend on how often, how important, and how big are the logs. In most cases, you can just log and forget that that exists.