Validate JSON data using schemas

From RidgeRun Developer Wiki
Revision as of 14:19, 21 March 2024 by Anavarro (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

JSON files are a great way to expose configurations in our application without having to change the code. However, sometimes we or our users might enter data that is not expected, so we need to validate somehow the input. One way to do this is programatically, by checking that certain key values are present, however, for more complex configurations where some attributes are optional this can become difficult.

One alternative is to use schemas, as defined by the JSON Schema specification https://json-schema.org/specification

Let's see a simple example

Dependencies

We are going to use python's jsonschema library, install it with

pip install jsonschema

There are validators for other languages, please see them in https://json-schema.org/implementations#validators

File: schema.json

{
    "type": "object",
    "required": ["name", "age"],
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number"},
        "children": {
            "type": "array",
            "items": { "$ref": "#"}
        },
        "address": { "$ref": "#/$defs/address"}
    },
    "additionalProperties": false,
    "$defs": {
        "address": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "state": {"type": "string"}
            },
            "additionalProperties": false
        }
    }
}

As we can see we have a "person" type which has the properties name (string), age (number), children (of type person as well), and address which is another object defined in the schema in the $defs section with properties street, city, and state (all strings). For "person" name and age are required and no extra properties are allowed by setting "additionalProperties": false

Let's define an object which we will try to validate

File: data.json

{
    "name": "Joe",
    "age": 25,
    "address": {
        "street": "123 Main St",
        "city": "Springfield",
        "state": "IL"
    }
}

Now let's define the python code to validate it

import json

from jsonschema import validate

schema = None
data = None

with open('schema.json', 'r') as f:
    schema = json.load(f)

with open('data.json', 'r') as f:
    data = json.load(f)

try:
    validate(data, schema)
    print("Data is valid")
except Exception as e:
    print("Data is invalid")
    print(e)

As expected if we run this script, the data will be valid

python3 validate.py 
Data is valid

However, if we edit the json and remove the "age" property, the validation will fail with a message like

Data is invalid
'age' is a required property

Failed validating 'required' in schema:
    {'$defs': {'address': {'additionalProperties': False,
                           'properties': {'city': {'type': 'string'},
                                          'state': {'type': 'string'},
                                          'street': {'type': 'string'}},
                           'type': 'object'}},
     'additionalProperties': False,
     'properties': {'address': {'$ref': '#/$defs/address'},
                    'age': {'type': 'number'},
                    'children': {'items': {'$ref': '#'}, 'type': 'array'},
                    'name': {'type': 'string'}},
     'required': ['name', 'age'],
     'type': 'object'}

On instance:
    {'address': {'city': 'Springfield',
                 'state': 'IL',
                 'street': '123 Main St'},
     'name': 'Joe'}

The same will happen if we change the type of a property like name to an int

Data is invalid
1 is not of type 'string'

Failed validating 'type' in schema['properties']['name']:
    {'type': 'string'}

On instance['name']:
    1

There are a lot more attributes and validations you can perform, if you would like to learn more visit https://json-schema.org/specification