Serialize arbitrary objects to YAML with Python
When working with YAML files in Python you probably are using PyYAML library. This is handy when storing configuration files, nested structures or sequence-like data. The nice thing about it is the ability to dump/load Python objects as well (not only regular int‘s, str‘s, float‘s, list‘s). In this article we are going to talk about customizing this serialization process and adapting resulting format to our needs.
Suppose we have a User
model written with a help of another great library named attrs
1 | import attr |
We can create our own store with a list of users and serialize/deserialize it to a file
1 | users = [] |
That’s pretty neat to have such a convenient method of storing any Python object in a file. But look at the output
1 | - !!python/object:__main__.User {age: 24, name: Misha} |
It’s not what I’ve expected to see and it’s not what can be easily edited manually without accidentally messing with its structure. What we want actually is to have something like this one
1 | - name: Misha |
Trying to step back and store our users as regular dictionaries doesn’t help too
1 | users = [{'name': 'Misha', 'age': 24}, {'name': 'Bob', 'age': 42}] |
We’ll still have a slightly different output from desired one
1 | - {age: 24, name: Misha} |
It’s pretty easy to fix that passing additional parameters to dump function
1 | yaml.dump(users, default_flow_style=False, indent=2) |
At this point we’ve got nicely formatted output. You can also provide default_style
argument. YAML provides three styles: double-quoted, single-quoted and plain (unquoted). Pass default_style='"'
in case you want items in your output to be quoted.
But that is only for regular dictionary and we wanted to have such a nice output for our custom objects. There is still type definition for each record !!python/object:__main__.User
. We can shift responsibility of loading an instance from library to class
itself. This way our class will know how to create it’s instances from regular dicts and resulting file will be clean and human readable. We can implement to_dict
method on our class but hopefully attrs has its own attr.asdict
function.
The next thing we need to implement is our own Dumper
class that will be passed to yaml.dump
method when serializing User
objects. This class is somewhat a complex thing inherited from Emitter
, Serializer
, Representer
, Resolver
bases. But the only one we care about is Representer
deciding how our object will look like in the target output.
1 | from yaml.representer import SafeRepresenter |
We need to define our own represent method which basically will create a dict from our object and than invoke existing represent_mapping
implementation. Here’s the place where you can provide your own logic or call something like obj.serialize
method.
Then we declare our dumper class which is just a subclass of SafeDumper
and our UserRepresenter
classes. Safe dumper limits the ability to use serialization with simple Python objects instead of regular one that is as powerful a pickle.
1 | from functools import partial |
Having the code above in place let’s check how it works with our users storage
1 | users = [User(name='Misha', age=24), User(name='Bob', age=42)] |
Here we go! Now we can dump/edit/load our users seamlessly within YAML file and we can write same process for any object we want. I’m going to leave you with a picture of this cat and its pet, have a good weekend.