It's hard to answer without any examples provided, but you can use jsonschema for that generally.
Here's metaschema definition in YAML:
"$schema": http://json-schema.org/draft-07/schema
title: Metaschema for BigQuery fields definition schemas
description: "See also: https://cloud.google.com/bigquery/docs/schemas"
type: array
minItems: 1
uniqueItems: yes
items:
"$id": "#/items"
title: Single field definition schema
type: object
examples:
- name: Item_Name
type: STRING
mode: NULLABLE
description: Name of catalog item
- name: Item_Category
type: STRING
mode: REQUIRED
- name: Exchange_Rate
type: NUMERIC
additionalProperties: no
required:
- name
- type
properties:
name:
"$id": "#/items/properties/name"
title: Name of field
description: "See also: https://cloud.google.com/bigquery/docs/schemas#column_names"
type: string
minLength: 1
maxLength: 128
pattern: "^[a-zA-Z_]+[a-zA-Z0-9_]*$"
examples:
- Item_Name
- Exchange_Rate
description:
"$id": "#/items/properties/description"
title: Description of field
description: "See also: https://cloud.google.com/bigquery/docs/schemas#column_descriptions"
type: string
maxLength: 1024
type:
"$id": "#/items/properties/type"
title: Name of BigQuery data type
description: 'See also: https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types'
type: string
enum:
- INTEGER
- FLOAT
- NUMERIC
- BOOL
- STRING
- BYTES
- DATE
- DATETIME
- TIME
- TIMESTAMP
- GEOGRAPHY
mode:
"$id": "#/items/properties/mode"
title: Mode of field
description: 'See also: https://cloud.google.com/bigquery/docs/schemas#modes'
type: string
default: NULLABLE
enum:
- NULLABLE
- REQUIRED
- REPEATED
This is the most precise metaschema I've been able to generate from GCP docs. Structures and arrays are not supported here, though.
YAML is just for readability here and you can easily convert it into JSON if needed.
Assuming the metaschema from above is saved as "/path/to/metaschema.yaml", the usage is the following:
import json
from pathlib import Path
import jsonschema
import yaml
metaschema = yaml.safe_load(Path("/path/to/metaschema.yaml").read_text())
schema = """[{"name": "foo", "type": "STRING"}]"""
schema = json.loads(schema)
jsonschema.validate(schema, metaschema)
The yaml
module from above is provided by PyYAML package.
If the schema
is valid, jsonschema.validate()
function will simply pass. Otherwise, jsonschema.exceptions.ValidationError
will be thrown with error explanation.
It's up to you whether to use JSON or YAML and how to store and parse schemas.
Also it's up to you whether to convert names of types and modes to upper-/lowercase.