AvscReader class calls .read() to accomplish the following two things:
- read and serialize .avsc file into JSON object and append it to a list stored in the reader
- build a namespace tree where each node in the tree has a name, a dictionary of file objects, and a dictionary of children nodes
- each node represents a namespace; in the
RecordWithNestedUniontest example, the namespace tree that is ultimately built consists of a root node, a child of the root node namedrecords, and a child ofrecordsnamednested
- each node represents a namespace; in the
Once the namespace tree has been built, it is passed to the AvroWriter class. This class traverses the tree and writes the information in each node's file objects to Python files as Python classes.
AvscReaderis given an .avsc file to parse. Below is an example of the contents ofRecordWithNestedUnion.avsc
{"type": "record", "name": "RecordWithNestedUnion", "namespace": "records", "fields": [
{"name": "nestedUnion", "type": ["null",
{"type": "record", "name": "NestedUnion", "namespace": "records.nested", "fields": [
{"name": "categories", "type": ["null",
{"type": "array", "items": {"type": "record", "name": "CommonReference", "fields": [
{"name": "group", "type": "int"},
{"name": "isApproved", "type": ["null", "boolean"],
"default": null
},
{"name": "index", "type": ["null", "int"]}
]
}
}
],
"default": null
}
]
}
],
"default": null
},
{"name": "nestedUnion2", "type": ["null",
{"type": "record", "name": "NestedUnion2", "namespace": "records.nested", "fields": [
{"name": "categories2", "type": ["null",
{
"type": "array",
"items": "CommonReference"
}
],
"default": null
}
]
}
],
"default": null
}
]
}- Each element of type
recordorenumin the .avsc file has its own .avsc file associated with it. Therefore, each element of typerecordorenumwill generate a Python fie and needs to be added as a file object to one of he nodes in the namespace tree AvscReadercreates these files in a Breadth-First Search order- in the example above,
AvscReadercreates the file objects forRecordWithNestedUnion,NestedUnion,NestedUnion2, andCommonReferencein that order
- in the example above,
- As
AvscReadertraverses through each element in the file, it will add each record it comes across into aqueueto be processed later queueis initially populated with whatever was serialized byAvscReader. In the case of the above example, thequeueis initially populated withRecordWithNestedUnion
- As
- In general, the overall flow of building the namespace tree is as follows:
- create empty
root_node - populate
queue - until
queueis empty: - grab the first
itemin thequeue - get the
namespaceof theitem - create or set
current_nodetonamespace - create skeleton
filefor theitem - if
itemis a record - traverse its
fields - for each
fieldinfields - get
typeoffield - get
typeof nested elements if needed (for arrays, unions, etc.) - create that nested element if necessary
- get
- get
- create that
field - add the
fieldtofile - if
fieldwas a record, add it to thequeue
- for each
- traverse its
- if
- if
itemis an enum - add enum symbols to
file
- add enum symbols to
- if
- add
filetocurrent_node
- grab the first
- until
- set
file_treeattribute inAvscReadertoroot_node
- create empty
- The resulting namespace tree after reading
RecordWithNestedUnion.avscis structured as follows: - root_node
- name=''
- files={}
- children={Node<'records'>}
- records
- name='records'
- files={File<'RecordWithNestedUnion'>}
- children={Node<'nested'>}
- nested
- name='nested'
- files={File<'NestedUnion'>, File<'NestedUnion2'>, File<'CommonReference'>}
- children={}
- The resulting namespace tree after reading
- the current node represents the
namespaceor directory of the .avsc file - the
Fileclass represents the contents and meta information of the .avsc file - the
itemin thequeueis the contents of an individual .avsc file - everything that gets added to the
queuewill have a Python file created for it
- everything that gets added to the
- the
get_field_typeidentifies the type of the element being parsed (str, arrays, unions, etc.)