Introducing Jdict Module in Python

Providing attribute access to Python dictionary entries

Acknowledgment

Thanks to Asher Sterkin, BlackSwan Technologies SVP of Engineering, who proposed the idea for this project and provided invaluable guidance on its development. The project is an offshoot of BlackSwan’s work developing a Cloud AI Operating System, or CAIOS, which is intended to provide 10x productivity improvements when coding for cloud/serverless environments.

Problem Statement

JavaScript has advantages over native Python when it comes to accessing attribute values in a dictionary object. In this article, we will demonstrate how to achieve the same level of usability and performance in Python as with JavaScript.

JavaScript Dictionary Access

With JavaScript, key/value pairs can be accessed directly from a dictionary object either through the indexer or as a property of the object.

var dict = {FirstName: “Chris”, “one”: 1, 1: “some value”};
// using indexer
var name = dict[“FirstName”];
// as property
var name = dict.FirstName;

In other words, in JavaScript, one could use dict.x and dict[‘x’] or dict[y] where y=’x’ interchangeably.

Python Dictionary

Even though it is possible to access object attributes by obj.attr notation, it does not work for dictionaries.

In the dictionary, you can get the value using the following methods:

dict = {“Name”: "Chris", "Age": 25,}dict[’Name’]
dict[x] where x=’Name’
dict.get(‘Name’, default) or dict.get(x, default)

Web API and Configuration Files

When using Python, almost all external documents are converted through one of these formats into dictionaries: JSON/YAML configuration files, messages exchanged via Web API, or AWS lambda events. XML sometimes is also used.

AWS SDK

Our team often has to work with deeply nested files like data coming from the AWS SDK or as an event parameter of the Lambda function handler.

{
  'Buckets': [{
      'Name': 'string',
      'CreationDate': datetime(2015, 1, 1)
  }],
  'Owner': {
  'DisplayName': 'string',
  'ID': 'string'
  }
}

Code Write/Read Speed Optimization

The problem is work efficiency. For example, JavaScript notation requires only 75% (one dot character vs two brackets and quotes) of the writing and reading overhead when compared to Python.

Attribute Access

In order to provide non-trivial access to attributes in Python, one has to implement two magic methods: __getattr__ and __setattr __.

Based on the discussion above, we need to extend the behavior of the existing dict class with these two magic methods. The adapter design pattern accomplishes this task. There are two options to consider: Object Adapter or Class Adapter.

Evaluating Object Adapter

Applying the Object Adapter design pattern means wrapping the original dict object with an external one and implementing the required magic methods.

Python collections.abc

One possibility is to implement Mapping and Mutable Mapping abstractions from the collections.abc module, then to add __getattr__ and __setattr__ magic methods to them. Indeed, that was how the initial version of jdict was implemented.

This method turned out to be heavyweight and inefficient:

It required reproducing all the methods of the Python dictionary.
It behaved even worse when we needed to deal with deeply nested data structures.
To learn how we finally addressed nested structures, see the JSON hook and botocore patch sections below.

UserDict

UserDict is another possible form of Object Adapter for a Python dictionary. In this case, it comes from the Python standard library.

Using this option does not offer any significant advantage, since:

After Python 2.2, it’s possible to inherit directly from the built-in dict class.
We also have to reproduce the magic methods of the attribute.
It incurs the overhead of regular __getitem__, __setitem__ operations.

Named Tuples

Another idea was to make the dictionary behave like named tuples, which supports attribute-level access.

This approach also turned out to be ineffective:

It created a complete copy of original dictionary and thus was impractical from a performance point of view.
It did not solve the nested data structure problem.

Jdict Class Adapter

After completing our the research, we came to the conclusion that applying
the Class Adapter design pattern has the best potential.

The class adapter uses inheritance and can only extend the base class and supply additional functionality to it.

This is how our Class Adapter code looks:

from typing import Any, Union
from copy import deepcopy
import json

class jdict(dict):
   """
   The class gives access to the dictionary through the attribute  name.
   """   

   def __getattr__(self, name: str) -> Union[Any]:
       try:
           return self.__getitem__(name)
       except KeyError:
           raise AttributeError(name + ' not in dict')   

   def __setattr__(self, key: str, value: Any) -> None:
       self.__setitem__(key, value)

__deepcopy__

def __deepcopy__(self, memo):
    return jdict((k, deepcopy(v, memo)) for k,v in self.items())

We also added the __deepcopy__ method to the adapter. Without this magic method deepcopy() a jdict object will produce a dict object, thus
losing the advantage of attribute-level access.

from caios.jdict import jdict
import copy

py_dict = dict(a = [1, 2, 3], b = 7)
j_dict = jdict(a = [1, 2, 3], b = 7)
py_copy = copy.deepcopy(py_dict)
j_copy = copy.deepcopy(j_dict)

print(type(py_copy))
<class 'dict'>

print(type(j_copy))
<class 'caios.jdict.jdict.jdict'>

Dealing with nested data structures

While applying the Class Adapter design pattern turned out to be the optimal starting point, it still left open the question of how to deal with nested data structures. In other words, what should be done about having jdict containing another dict.

In order to solve this problem, we need to consider separately JSON object deserialization and explicit creation of a dict somewhere in the underlying SDK.

JSON Decoding

When working with data that we receive from external sources in JSON format, the following translations are performed by default when decoding in python:

An object_pairs_hook, if specified, will be called with the result of every JSON object decoded with an ordered list of pairs. The return value of
object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders. If object_hook also is defined, then the object_pairs_hook takes priority.

Thus, we utilize this hook in order to create jdict instead of dict during JSON decoding, This approach covers 80% of the cases we practically have to deal with.

Botocore Patch

The object pairs hook mentioned above, however, does not help with Boto3 SDK. The reason for this is that AWS service APIs return XML instead of JSON, and the results are parsed by the BaseXMLResponseParser, which creates and populates the dict object directly.

Structure of Python

Since in this case the JSON hook does not help, we need to look at automatic rewriting of compiled Python code.

To understand how Python works and how we can solve this problem, let’s
look at the full path of the program from source code to execution.

Abstract Syntax Tree (AST)

To solve the problem, based on the structure of the full path of the program from source code to execution, we need to replace the code inside the AST. By traversing the AST, we will change the regular dictionary to jdict. Thus, the Boto3 SDK will return the jdict, as is required.

Below is the code of the class that walks through the abstract syntax tree and changes the Python dictionary to jdict.

import ast
from typing import Any

class jdictTransformer(ast.NodeTransformer): 
  """
   The visitor class of the node that traverses the abstract syntax    tree and calls the visitor function for each node found. Inherits   from class NodeTransformer.
   """   

   def visit_Module(self, node: Any) -> Any:
       node = self.generic_visit(node)
       import_node = ast.ImportFrom(module='caios.jdict.jdict',
                                    names=[ast.alias(name='jdict')],
                                    level=0)
       node.body.insert(0, import_node)
       return node   

   def visit_Dict(self, node: Any) -> Any:
       node = self.generic_visit(node)
       name_node = ast.Name(id='jdict', ctx=ast.Load())
       new_node = ast.Call(func=name_node, args=[node], keywords=[])
       return new_node

Patch Module

Using AST, we created a patch for the module botocore. To convert XML to jdict in runtime:

def patch_module(module: str) -> None:
    parsers = sys.modules[module]
    filename = parsers.__dict__[‘__file__’]
    src = open(filename).read()
    inlined = transform(src)
    code = compile(inlined, filename, ‘exec’)
    exec(code, vars(parsers))

In this case, we are patching the botocore parsers file.

import boto3
import caios.jdict

caios.jdict.patch_module(‘botocore.parsesrs’)

Limitations of the method

There are several limitations to the method above:

Each Jdict instance actually stores 2 dictionaries, one inherited and another one in __dict__.
If a dictionary key is not a valid Python name, attribute-based access
won’t work. Consider, for example, dict.created-at. It could be either
dict[‘created-at’] or dict.created_at (would require a schema change)/
Another limitation is encountered when a field name is a value of another variable. One could write dict[x] but not dict.x because dict.x means dict[‘x’], not the value of the x variable.
If a dictionary contains a key of the dict class methods (e.g. keys), then accessing it via the dot notation will return the dict method, while accessing via __getitem__ will return the dictionary value.
In other words d.keys will be not equal to d [‘keys’]?

To Be Pursued

At the moment, our program does not use such configuration files as YAML
(we don’t need them at the moment). Also, the program does not support
csv and tables. We are currently in the development of a program that will work with AWS tables.

Third Party Libraries

While working on this project, we did not discover any suitable third-party
libraries to utilize . At the time of final writing for this article, I did, in fact, encounter several possibilities, namely:

In our project, we conceivably could use any of these options. All three
are based on the idea of creating an adapter and overriding the dictionary functions in it. Plus, some of them add functionality that is not required for our work.

Conclusions

This effort demonstrates that dictionary objects can be managed nearly as efficiently in Python as in JavaScript. Our team believes it noticeably will increase productivity.
This effort simplified the effort using Python dictionaries. Now, we do not need to “break fingers” by typing endless sequences of brackets and quotes.
The JSON hook solved the nested data structures problem for JSON decoding This covers 80% of the cases we practically encounter.
The botcore patch solved the problem of results coming from Boto3 SDK which parses AWS services API results arriving in XML, and builds dict objects on the spot.
If required, the same patch could be applied to other libraries.