cookbase.parsers¶
Module contents¶
A package that includes different parsing tools used in the context of the Cookbase platform.
Submodules¶
cookbase.parsers.jsonfoodex¶
Parsing suite for the Cookbase platform from FoodEx2 data into JSON documents.
The main command, parsexml, allows for lossless translation from FoodEx2
XML data into a collection of JSON documents. Nonetheless, it also permits to filter out
and discard the desired hierarchies together with the ingredients that belong only to
those hierarchies. Field contents are parsed into Python built-in types (str,
int and bool). The original ordering and format are respected,
however there are a number of particularities when mapping into JSON to be considered:
- The JSON output represents the content of the root
<catalogue>tag.- The
<hierarchyGroups>tag is mapped into JSON object that holds an array with the text from each contained<hierarchyGroup>tag.- The
<hierarchyAssignment>tag is mapped into a JSON object whose key is the<hierarchyCode>tag content, and the value is a JSON document including all its data.- The
<implicitAttribute>tag is mapped into a JSON object whose key is the<attributeCode>tag content, and the value is an array with the text from each contained<attributeValue>tag.
The -d/--discardedhierarchies option lets the user choose whether or
not to discard any desired hierarchy (including the terms that are only related to them)
by providing a list of hierarchy codes. By default, if not used, all hierarchies not
directly related to food preparation are discarded: botanic, pest,
biomo, legis, feed, partcon, place,
vetdrug, report, fpurpose, replev, targcon
and feedAddExpo. In case of wanting not to discard any hierarchy, the
-d/--discardedhierarchies flag should be used providing no
hierarchies to discard.
The -cb/--cookbase flag argument indicates to generate identifiers
(_id) for each catalogue term suitable for the Cookbase platform.
The hierarchize command permits to build a JSON document describing a
hierarchy tree.
cookbase.parsers.termcode¶
A module allowing to generate and translate numeric identifiers from the FoodEx2
term code strings. A term code consists of a string of five alphanumeric characters,
e.g. 'A111J'. While most of the times they start with an A character,
this module does not restrict to that.
cookbase.parsers.utils¶
-
cookbase.parsers.utils.check_for_duplicate_keys(ordered_pairs: List[Tuple[Hashable, Any]]) → Dict[KT, VT][source]¶ Checks for duplicates on the keys of a JSON object.
The function is defined to be used as the
object_pairs_hookargument of ajson.load()method.Parameters: ordered_pairs (list[tuple[Hashable, Any]]) – A list of key-value pairs representing all the content of a JSON object Returns: A dictionary containing the JSON document Return type: dict[str, Any] Raises: ValueError: There is at least one duplicate key in the JSON object.
-
cookbase.parsers.utils.parse_cbr(path: str) → Dict[str, Any][source]¶ Parses a Cookbase Recipe (CBR).
Parameters: path (str) – The path to the CBR document Returns: A dictionary containing the parsed CBR Return type: dict[str, Any]
-
cookbase.parsers.utils.populate_collection(collection_dir: str, object_type: str) → None[source]¶ Bulk inserts CBDM objects into collections.
Parameters: - collection_dir (str) – The local path to the directory containing the objects to insert
- object_type (str) – The type of object to insert into collection