Expert python topics you should know
Expert python topics you should know: muti version handling, dunders, decorators, interfaces, context managers, asyncio, profiling, logging and more.
What are the main topics that distinguish an advanced developer from a just effective enough python programmer? You are good at python when what you code is elegantly simple and idiomatic.
Each language and community has its own way of resolving certain kind of problem. That specific way of doing things, is what we call idiomatic. We want our code to be idiomatic because not only we will be writing code that is easier to understand but also we are resolving problems using well known and tested techniques. Being idiomatic is to create simple code that relies on existing solution for normal problems. We don’t reinvent the wheel.
In this post I will describe the main topics that can make your code more idiomatic, and some advanced functionalities you need to be familiar as an advanced python developer.
Multiple python versions
There will be situations where you need multiple versions of python. You may be just fine using the default python 2 or 3 of your system. But there are situations when some client/project requires a very specific version. You may also need to work on different projects and any of them may use different specific versions. In this scenario, you need a way to manage your python versions. And this is not the same as managing dependency versions. I’m talking about the python language version itself.
The solution to this problem is very simple, just use pyenv. With it, you will be able to have any version you want at your disposal, very easily.
$ pyenv versions # lists all installed versions
$ pyenv install 3.7.4 # installs specific verion
$ pyenv global 3.7.4 # activates the specific version
$ pyenv local 3.7.4 # version for a directory
The pyenv
also installs development headers that you will need when making c/c++ extensions. But you shouldn’t worry about the exact path. CMake‘s find_package
is going to help you with that.
find_package(Python3 COMPONENTS Development)
target_include_directories(<project_name>
PUBLIC ${Python3_INCLUDE_DIRS})
Dunder methods
Dunder or magic method, are methods that start and end with a double _
like __init__
or __str__
. This kind of method is the mechanism we use to interact directly with python’s data model. A language data model describes:
- How Values are stored in memory.
- Object identity, equality and truthiness.
- Name resolution, function/method dispatching.
- Basic types, type/value composition.
- Evaluation order, eagerness/laziness.
Basically the __
methods allow us to interact with core concepts of the python language. You can see them also as a mechanism for implementing behaviors, and interface methods.
For a detailed description of many useful dunders and related concepts, I recommend you to read this guide.
@
Function decorators
Decorators are nothing more than a special case of Higher Order functions with @
syntax support. It’s quite equivalent to doing function composition. We can use a decorator not only for normal functions but also for class methods.
from functools import wraps
def add10(f):
@wraps(f)
def g(*args, **kwargs):
return f(*args, **kwargs) + 10
return g
@add10
def add1(a):
return a + 1
p.add1(0) # 11
wraps
from fuctools
is in itself another decorator that keeps the metadata from the original wrapped function.
Interfaces
Interfaces help us to enforce the implementation of certain characteristics by other code that commits to doing so.
One of the characteristic we can enforce is the definition of specific methods like we would do defining a normal java
interface:
from abc import ABC, abstractclassmethod
class Animal(ABC):
@abstractclassmethod
def make_sound(self):
return "indistinguishable noise"
class Cat(Animal):
def make_sound(self):
return "miauu"
class Dog(Animal):
def make_something(self):
return "eat"
We used the @abstractclassmethod
decorator to enforce the definition of specific methods in child classes:
>>> Cat().make_sound()
'miauu'
>>> Dog()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>;
TypeError: Can't instantiate abstract class Dog
with abstract methods make_sound
But the previous enforcement required us to try to create an instance in the first place. If we would like to be even more strict, we could use metaclass
to make the script fail while loading the class definition.
class Animal(type):
def __new__(cls, name, bases, body):
if 'make_sound' not in body:
raise TypeError('no make_sound method')
return super().__new__(cls, name, bases, body)
class Cat(metaclass=Animal):
def make_sound(self):
return "miauu"
class Dog(metaclass=Animal):
def make_something(self):
return "eat"
Traceback (most recent call last):
[...] class Dog(metaclass=Animal):
TypeError: no make_sound method
In the same way that when we instantiate a class we create an object, when we instantiate a Meta-class we create a class. Meta Classes are a way of controlling the creation of classes. This example also indirectly shows that the __new__
dunder has the responsibility of creating the instance while __init__
initialized the instance previously created by __new__
.
Since python 3.6 instead of having the __new__
method inside a metaclass
we can just use __init_subclass__
instead. Then our interface example would be the following:
from _collections_abc import _check_methods
class Animal():
def __init_subclass__(cls, *args, **kwargs):
if _check_methods(cls, 'make_sound') is NotImplemented:
raise TypeError("make_sound not implemented")
class Cat(Animal):
[...] # same than in previous example
class Dog(Animal):
[...] # same than in previous example
If you use _check_methods
you will have extra style points.
Context manager
Context manager, or in mundane words: classes with __enter__
and __exit__
methods. Context managers give us support for the RAII pattern through the with
syntax. An important thing to remember is that when implementing __exit__
you should check the exception values, because you have the choice to propagate or not the exception that happened inside with
. If you return a true value you can suppress the exception. But under no circumstance you are expected to re-raise an exception inside the __exit__
method.
For example, let suppose we had nothing better to do than to use the low level http.client
library; we could wrap HTTPConnection
inside a context manager:
from http.client import HTTPConnection
# not really necessary but looks cool
from contextlib import AbstractContextManager
class Conn(AbstractContextManager):
def __init__(self, host):
self.host = host
def __enter__(self):
self.conn = HTTPConnection(self.host, 80)
return self.conn
def __exit__(self, *args):
self.conn.close()
with Conn('example.com') as conn:
conn.request('GET', '/')
res = conn.getresponse()
print(res.status, res.reason)
That code is very verbose, we can fix that using the contextlib
module. I recommend you to read its whole documentation. If we want we can use a generator instead of a full AbstractContextManager
class implementation.
from http.client import HTTPConnection
from contextlib import contextmanager
@contextmanager
def Conn(host):
conn = HTTPConnection(host, 80)
try:
yield conn
finally:
conn.close()
We also can get rid completely of the Conn
class with closing
:
from contextlib import closing
with closing(HTTPConnection("example.com", 80)) as conn:
conn.request('GET', '/')
res = conn.getresponse()
print(res.status, res.reason)
Asynchronous programming
When we use async
and await
we are doing cooperative concurrency (not parallelism). You may want to check some online documentation or tutorial online if you are not familiar with those terms.
In practice, we have a bunch of async
functions and an event loop. And that’s it. But what happens when you actually want to define a real parallel operation? What if some important client wants to have some custom crazy high performant network code? How can we create low-level parallel code that from the point of view of Python appears to be asynchronous code?
First, we need a way to make a blocking code a coroutine. The following make_async
function does exactly that:
import sys
import inspect
from functools import wraps
from concurrent.futures import ThreadPoolExecutor
def make_async(g):
@wraps(g)
async def f(*args, **kwargs):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
ThreadPoolExecutor(),
lambda: g(*args, **kwargs)
)
sys.modules[__name__]
frm = inspect.stack()[1]
mod = inspect.getmodule(frm[0])
setattr(mod, g.__name__, f)
A very neat function right? Yeah, but this will only truly work if the g
function releases the GIL. The following code, which uses pybind11, defines a C++ module x
with a send_message
function that inside it releases the GIL.
#include <pybind11/pybind11.h>
#include <thread>
#include <chrono>
std::string send_message(std::string input)
{
pybind11::gil_scoped_release release; // GIL RELEASE
std::this_thread::sleep_for(std::chrono::seconds(1));
return input + " done!";
}
PYBIND11_MODULE(x, m) {
m.def("send_message", &send_message,
"sends something though the network");
}
The pybind11::gil_scoped_release
class releases the GIL when is constructed and then acquires the GIL again at the end of the function call.
import asyncio
from x import send_message
make_async(send_message) # using the function from above
async def send(msg):
print("sending", msg)
result = await send_message(msg)
print("sent", result)
return result
loop = asyncio.get_event_loop()
to_send = [loop.create_task(send(str(i))) for i in range(3)]
loop.run_until_complete(asyncio.wait(to_send))
And the output is what you would expect:
sending 0
sending 1
sending 2
sent 0 done!
sent 1 done!
sent 2 done!
Profiling
Profiling is some of those techniques that we use when we really fucked something up. It’s a great tool to know, but you will suffer trying to figure out why you are getting some esoteric crashes, or why something isn’t working as it should.
For calling trees and CPU time we can use cprofile
and KCacheGrind:
$ python -m cProfile -o script.profile main.py
$ pyprof2calltree -i script.profile -o script.calltree
$ kcachegrind script.calltree
But cprofile
, profile
and hotshot
aren’t that useful if we have multi-threaded code or if any bottleneck is generated by non-explicit function calls. A much more effective profiler is yappi
, and it really is. You won’t go back to cprofile
after playing around with yappi
. Don’t take my word for it, you can see that the PyCharm IDE uses yappi
by default if you have it installed.
To use yappi
we need to add some code to our script:
import yappi
yappi.start(builtins=True)
# YOUR CODE GOES HERE
# a context manager would be great for this
func_stats = yappi.get_func_stats()
func_stats.save('script.calltree', 'CALLGRIND')
yappi.stop()
yappi.clear_stats()
After the profiling ends we can open the profiling file with:
$ kcachegrind script.calltree
Another important aspect of profiling is to record memory usage.
We can take a memory snapshot at any moment with pympler
:
from pympler import muppy, summary
all_objects = muppy.get_objects()
summary.print_(summary.summarize(all_objects), limit=100)
The main features of pympler
can be accessed through ClassTracker
.
Network analysis
Some applications are very hard to understand and we need to start seeing them as black boxes. Or maybe we have a very obscure problem when we send data through the network.
The most practical way of analysis would consist of you modifying your software to record every request it receives or sends, and in a perfect world, you would want that feature to be able to turn on/off while in production.
But most mortals don’t understand their own systems enough nor want to go through that time investment. But even in that case, there are things we can do:
- Incoming traffic: Ensure that our server receives HTTP traffic, we can do this being behind a load balancer or a reverse proxy, so that we can keep serving https. Doing this we can simply use wireshark to read the incoming traffic.
- Outgoing traffic: We need a proxy, and if we are making HTTPS requests we need to install custom certificates, so we can do the man in the middle. This requires the use of
mitmproxy
In normal scenarios, where you understand and control the codebase, you should be logging and analyzing the traffic internally without relying on the previous tricks, especially because you may don’t be able to do “mitm attacks” with a production server under heavy load without slowing everything down.
Logs are your best friend…
Logging
Most of the time, logging just works and you shouldn’t worry. But under heavy load, we can approach logging by:
- Don’t log at all, and only use metrics instead. Or,
- Send the logs through the network: if you log locally, that will put your server and your code under heavy load, and you may need to create code specially designed to be able to handle the logging.
- Avoid logging into a disk that you use for something else: Don’t put a load on a disk that you use for other tasks.
- If you want to write your local logs yourself, please ensure that you rotate them. You could use
RotatingFileHandler
, butlogrotate
is better.
Obviously how you approach any logging problem will depend on how often, how important, and how big are the logs. In most cases, you can just log and forget that that exists.