Parsing Nested JSON Records in Python
JSON is the typical format used by web services for message passing that’s also relatively human-readable. Despite being more human-readable than most alternatives, JSON objects can be quite complex. For analyzing complex JSON data in Python, there aren’t clear, general methods for extracting information (see here for a tutorial of working with JSON data in Python). This post provides a solution if one knows the path through the nested JSON to the desired information.
Motivating Example
Suppose you have the following JSON record:
This record has two keys at the top level: employees and firm. The value for the employees key is a list of two objects of the same schema; each object has the keys name, role, and nbr. The value for the firm key is an object with the keys name and location.
Suppose you want to extract the names of the employees. This record will give problems for approaches that just search through key names, since the name of the firm will be returned as well.
Solution
Calling the extract_element_from_json function on the above record delivers the desired result:
Under the Hood
This function nests into the record(s) in obj according to the keys specified in path to retrieve the desired information. When a list is encountered as the value of a key in path, this function splits and continues nesting on each element of the encountered list in a depth-first manner. This is how both ‘Alice’ and ‘Bob’ are returned; since the value of employees is a list, the nesting is split on both of its elements and each of the values for name are appended to the output list.
If obj is a single dictionary/JSON record, then this function returns a list containing the desired information, and if obj is a list of dictionaries/JSON records, then this function returns a list of lists containing the desired information.
If any element of path is missing from the corresponding level of the nested dictionary/JSON, then this function returns a None .
Below is the full function (inspired/motivated from what’s discussed here):
Update
This post is featured in Issue #374 of PyCoder’s Weekly.