How to quickly and painlessly complete a regression in the API with the PyTest + JSON Schema bundle

12 min readFeb 5, 2024

Testing positive API usage scenarios can be easily and quickly done using JSON schema. In this article, we will discuss useful techniques, tricks that can be applied to speed up the process, and sources of data for such testing. I’ll present it in a beginner-friendly manner — if you want to quickly integrate JSON schemas into your project, and online generators are not enough for you, but you don’t plan to dive into reading documentation, welcome.

I have posted examples from this article on Gitlab (they were prepared for the PiterPy conference, hence the project’s corresponding name): https://gitlab.com/bp3_niva/piterpy-2023-sample

Hello, everyone! In narrow circles, I am widely known as Uncle Vova. This year marks 17 years since I started working in testing, and 7 of them have been at Maxilect, where I focus on test automation. My current role is a lead, but in my heart, I remain an expert in backend testing.

Why is all this necessary?

Testing is always a bottleneck in development, partly because it always catches up. That’s why in my articles, I talk about how to do things as quickly as possible. Let’s talk about using JSON schema in regression (effectively in smoke testing).

I’ll start with a brief digression — just “for the little ones.” What is smoke testing?

I specialize in backend testing, i.e., API is my everything. I apply the JSON schema directly to verify that each API response corresponds to the expected result. However, the potential applications of the JSON schema are not limited to this. In fact, any JSON can be validated in this way, regardless of where it comes from — it could be messages from Kafka/Rabbit or even fields from a database. Often, writing a schema is much faster than pulling data from somewhere for comparison.

Unfortunately, online schema generators found on the internet do questionable things. So, I want to explain how to quickly prepare a JSON schema yourself, without using third-party tools. I really love Python and PyTest — so I’ll build my narrative around them.

I’ll be explaining using highly simplified examples. You can find their source code on GitLab: https://gitlab.com/bp3_niva/piterpy-2023-sample

By switching between branches, you’ll find all the mentioned sections.

About the example in detail

In real-life testing, we interact with some API that responds using JSON. However, for the purpose of simplicity, we don’t discuss everything under the server’s hood. We only need the JSON it sends. So, we will work with it as a string or a file.

Let’s assume the JSON contains information about a user — with an identifier, full name, gender, date of birth, phone, etc. And let “Result” be the field indicating that the user’s card was processed correctly.

answer = {
  "id": 1,
  "name": "Иван",
  "patronymic": "Петрович",
  "surname": "Белкин",
  "sex": "M",
  "birth_date": "1999-06-06",
  "phone": "+79991234567",
  "passport": None,
  "result": "ok",
  "height": 1.76,
  "citizenship": True
}

At the moment, I’ve chosen an example in a way to use primitive data types. As the narrative progresses, we’ll go from simple to complex — creating more sophisticated schemas with non-trivial checks.

Let’s take the draft of a primitive schema:

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#"
}

And let’s take a test:

from jsonschema import validate
from src.schema import schema_str
from src.test_json import answer

def test_this():
 validate(answer, schema_str)

It’s worth noting that in this case, `validate` works not with JSON strings and schemas directly. It is cast specifically to a dictionary in Python. Therefore, in the JSON above, we see `None` and `True` (with capital letters), although in the API response, it would be `null` and `true` (with lowercase letters). In this form, the test runs without errors because the schema is currently empty.

Example 1. Type Checking

In the API response in JSON, various primitive types can be present — boolean, integer, number (it might seem a bit unusual that “float” is referred to as “number” here, but keep in mind that there is a difference between Python and JSON).

In our schema, we can specify that the ID is an integer, full name, gender, date of birth, phone, and result are strings, passport is null (in JSON terms), height is a non-integer type (number), and citizenship is boolean.

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "properties": {
        "id": {"type": "integer"},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string"},
        "birth_date": {"type": "string"},
        "phone": {"type": "string"},
        "passport": {"type": "null"},
        "result": {"type": "string"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
    }
}

Here, we see that the entire schema has `”type”: “object”`. We will come back to this a bit later.

A test using this schema runs without errors. However, if, for example, we put the height value in quotes in our API response (making it a string as well), the test will fail with an error — “1.76” is not of type number. And so you can validate any field.

Typically, a JSON schema reacts to the first issue and does not report others (if there are multiple issues in the JSON). Discovering subsequent issues often requires sequential validation. In practice, in most cases, by fixing the first problem, you will discover the others yourself. And if not, iterations will highlight them in any case.

Next, let’s talk about more advanced validation.

Example 2. Value Checking

It’s important for us that “Result” is always “ok.” This can be specified in the schema:

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "properties": {
        "id": {"type": "integer"},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string"},
        "birth_date": {"type": "string"},
        "phone": {"type": "string"},
        "passport": {"type": "null"},
        "result": {"type": "string", "const": "ok"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
    }
}

If the API responds with “OOK,” the test will fail with an error.

Similarly, you can specify response options using enumeration (enum). This is easy to demonstrate with the example of gender — “M” or “F”:

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "properties": {
        "id": {"type": "integer"},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string", "enum": ["М", "Ж"]},
        "birth_date": {"type": "string"},
        "phone": {"type": "string"},
        "passport": {"type": "null"},
        "result": {"type": "string", "const": "ok"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
    }
}

Before introducing this validation, the gender could be anything, but now the test will pass without errors only if the gender is “M” or “F.”

Example 3. Minimum and maximum

Let’s set boundaries for ID — let it not be equal to zero (i.e., let’s set its minimum value equal to 1).

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "properties": {
        "id": {"type": "integer", "minimum": 1},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string", "enum": ["М", "Ж"]},
        "birth_date": {"type": "string"},
        "phone": {"type": "string"},
        "passport": {"type": "null"},
        "result": {"type": "string", "const": "ok"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
    }
}

If the API response contains ID=0, the test will fail with an error.

You can work with maximum values in the same way.

You can also specify exclusiveMinimum and exclusiveMaximum in a schema. These are variations of minimum and maximum, respectively, which do not include the specified values, i.e. work as more than / less than (not “more than or equal to” / “less than or equal to”).

Example 4: Length Limit

Another handy thing is the length limit. We can specify that the phone number must have 12 characters (counting “+”).

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "properties": {
        "id": {"type": "integer", "minimum": 1},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string", "enum": ["М", "Ж"]},
        "birth_date": {"type": "string"},
        "phone": {"type": "string", "minLength": 12, "maxLength": 12},
        "passport": {"type": "null"},
        "result": {"type": "string", "const": "ok"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
    }
}

You can also define a pattern using a regular expression. Let’s say you can specify the following conditions for a date:

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "properties": {
        "id": {"type": "integer", "minimum": 1},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string", "enum": ["М", "Ж"]},
        "birth_date": {"type": "string", "pattern": 
"^(19[789]|20[012])\\d-(0\\d|1[0-2])-([0-2]\\d|3[01])$"},
        "phone": {"type": "string", "minLength": 12, "maxLength": 12},
        "passport": {"type": "null"},
        "result": {"type": "string", "const": "ok"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
    }
}

If, as a result of some error, the API response contains 16 months, the test will fail.

I would like to note that using a regular expression to validate dates is not the best practice. It won’t account for leap years. However, in this specific case, a regular expression helps track changes in the data format — for example, if the date comes first and then the year.

Example 5: List of fields

In fact, JSON is an object with certain fields, and the schema allows you to check the presence or absence of these fields.

Let’s say we lost the date field in the response. The object will remain valid in its current form. To set required fields, use the required setting. In it we list which fields must be present in a valid object:

schema_str = {
 "$schema": "http://json-schema.org/draft-07/schema#",
 "type": "object",
 "required": {
        "id",
        "name",
        "patronymic",
        "surname",
        "sex",
        "birth_date",
        "phone",
        "passport",
        "result",
        "height",
        "citizenship"
    }
 "properties": {
        "id": {"type": "integer", "minimum": 1},
        "name": {"type": "string"},
        "patronymic": {"type": "string"},
        "surname": {"type": "string"},
        "sex": {"type": "string", "enum": ["М", "Ж"]},
        "birth_date": {"type": "string", "pattern": 
"^(19[789]|20[012])\\d-(0\\d|1[0-2])-([0-2]\\d|3[01])$"},
        "phone": {"type": "string", "minLength": 12, "maxLength": 12},
        "passport": {"type": "null"},
        "result": {"type": "string", "const": "ok"},
        "height": {"type": "number"},
        "citizenship": {"type": "boolean"}
        }
    "additionalProperties": false
}

Another useful configuration is “additional properties,” which determines whether the object can have new fields. By default, it is set to true, meaning that additional fields are allowed. If you set the value of this property to false, the JSON schema will not allow additional fields.

Interestingly, on the current project, at the request of the developers, “additional properties” is set to true. In this case, I update the tests in a working order after receiving a task to modify the list of properties (true ensures the passage of their builds while the task is in progress). However, I’ve encountered opposing stories. On one project, this property was set to false because developers preferred it that way. They wanted the tests to turn red when properties were changed. So, in part, it’s a matter of preference. To avoid confusion, I recommend explicitly specifying “additional properties,” even if it corresponds to the default value.

Example 6: Arrays

Suppose we have a function that returns a list of entities, and each test object is an element in that list. JSON schema allows us to work with the minimum and maximum number of items in the list (minItems and maxItems).

You probably don’t expect the list to return 0 objects, so minItems can be set to 1. And maxItems allows validating default values for the number of objects per page, for example, when testing an API with pagination. Typically, API responses for lists default to returning 20 items, which fit on one page on the frontend. When you click the next page button, the API provides another 20 elements. Thus, maxItems can be set to 20 — ensuring that pagination works correctly (so that a developer from another project, for example, doesn’t change this value to something more familiar, like 10 or 50).

Returning to the example, here the type is now set as an array, containing items as we discussed earlier. I set minItems to 1 because an empty array is not expected. To illustrate how other properties work, I added the “driver_license” field and set its category to “B”:

"$schema": "http://json-schema.org/draft-07/schema#",
    "type": "array",
    "minItems": 1,
    "items": {
        "type": "object",
        "required": [
            "id",
            "name",
            "patronymic",
            "surname",
            "sex",
            "birth_date",
            "phone",
            "passport",
            "result",
            "height",
            "citizenship"
        ],
        "properties": {
            "id": {"type": "integer", "minimum": 1},
            "name": {"type": "string"},
            "patronymic": {"type": "string"},
            "surname": {"type": "string"},
            "sex": {"type": "string", "enum": ["М", "Ж"]},
            "birth_date": {"type": "string", "pattern": 
"^(19[789]|20[012])\\d-(0\\d|1[0-2])-([0-2]\\d|3[01])$"},
            "phone": {"type": "string", "minLength": 12, "maxLength": 12},
            "passport": {"type": "null"},
            "result": {"type": "string", "const": "ok"},
            "height": {"type": "number", "exclusiveMaximum": 3},
            "citizenship": {"type": "boolean"},
            "driver_license": {
                "type": "array",
                "items": {"type": "string", "enum": ["A", "B", "C", "D", "E"]},
                "uniqueItems": true
            }
        },
        "additionalProperties": false
    }

Note that the new “driver_license” field is not listed as required. Thus, all objects mentioned before (without this field) remain valid.

Here, I specified that it’s an array, and its elements can be strings from the list — “A”, “B”, “C”, “D”, “E” — and they must be unique (uniqueItems). This means the test will fail if we specify category “Q” or accidentally input “BB.”

Example 7: Alternative Values

Example 7. Alternative Values

I started my explanation by emphasizing that we would create JSON schemas without using online generators because they often struggle when the type of the same field varies from record to record.

Consider a simple example. In some cases, including in Russia, middle names may not be specified in passports. Therefore, we cannot make this field mandatory. However, if we do choose to record its existence, the field can contain either a real string or null.

answer = [
    {
        "id": 1,
        "name": "Иван",
        "patronymic": "Петрович",
        "surname": "Белкин",
        "sex": "М",
        "birth_date": "1999-06-06",
        "phone": "+79991234567",
        "passport": None,
        "result": "ok",
        "height": 1.76,
        "citizenship": True,
        "driver_license": ["B"]
    },
    {
        "id": 2,
        "name": "Программист",
        "patronymic": None,
        "surname": "Мамкин",
        "sex": "М",
        "birth_date": "2002-04-10",
        "phone": "+79991232367",
        "passport": None,
        "result": "ok",
        "height": 1.82,
        "citizenship": False,
        "driver_license": []
    }
]

If you pass such a list to an online generator, it will create a schema based on the first encountered type. For example, the “surname” field will be inferred as a string. Typically, online generators have their own validators, and this list will fail the validation because the middle name in the second element is null. This creates an absurd situation where the generator cannot validate the schema it created itself.

By creating the schema manually, you can specify the type using a list [“string”, “null”] for the “patronymic” field:

"$schema": "http://json-schema.org/draft-07/schema#",
    "type": "array",
    "minItems": 1,
    "items": {
        "type": "object",
        "required": [
            "id",
            "name",
            "patronymic",
            "surname",
            "sex",
            "birth_date",
            "phone",
            "passport",
            "result",
            "height",
            "citizenship"
        ],
        "properties": {
            "id": {"type": "integer", "minimum": 1},
            "name": {"type": "string"},
            "patronymic": {"type": ["string", "null"]},
            "surname": {"type": "string"},
            "sex": {"type": "string", "enum": ["М", "Ж"]},
            "birth_date": {"type": "string", "pattern": 
"^(19[789]|20[012])\\d-(0\\d|1[0-2])-([0-2]\\d|3[01])$"},
            "phone": {"type": "string", "minLength": 12, "maxLength": 12},
            "passport": {"type": "null"},
            "result": {"type": "string", "const": "ok"},
            "height": {"type": "number", "exclusiveMaximum": 3},
            "citizenship": {"type": "boolean"},
            "driver_license": {
                "type": "array",
                "items": {"type": "string", "enum": ["A", "B", "C", "D", "E"]},
                "uniqueItems": true
            }
        },
        "additionalProperties": false
    }

In this case, the list will pass the validation.

Online generators have another problem — they include too much redundant information in the scheme, which will interfere with its service. That’s why I insist on writing JSON schemas by hand. They are easier to read and modify, and they look neater.

Another alternative that I wanted to talk about in this example is ID. Let’s assume that the ID in the response can be either numeric or lowercase like ID002. Those. it either takes the value of an integer with a minimum of 1, or is a string with a certain pattern. These conditions can be set using any of, specifying all possible options:

"$schema": "http://json-schema.org/draft-07/schema#",
    "type": "array",
    "minItems": 1,
    "items": {
        "type": "object",
        "required": [
            "id",
            "name",
            "patronymic",
            "surname",
            "sex",
            "birth_date",
            "phone",
            "passport",
            "result",
            "height",
            "citizenship"
        ],
        "properties": {
            "id": {"anyOf": [{"type": "integer", "minimum": 1}, 
{"type": "string", "pattern": "^ID\\d{3}$"}]},
            "name": {"type": "string"},
            "patronymic": {"type": ["string", "null"]},
            "surname": {"type": "string"},
            "sex": {"type": "string", "enum": ["М", "Ж"]},
            "birth_date": {"type": "string", "pattern": 
"^(19[789]|20[012])\\d-(0\\d|1[0-2])-([0-2]\\d|3[01])$"},
            "phone": {"type": "string", "minLength": 12, "maxLength": 12},
            "passport": {"type": "null"},
            "result": {"type": "string", "const": "ok"},
            "height": {"type": "number", "exclusiveMaximum": 3},
            "citizenship": {"type": "boolean"},
            "driver_license": {
                "type": "array",
                "items": {"type": "string", "enum": ["A", "B", "C", "D", "E"]},
                "uniqueItems": true
            }
        },
        "additionalProperties": false
    }

Example 8. References to Definitions

Let’s say that in the API response, there are multiple dates, not just the user’s birth date but also some activation date. I adhere to the principle that if you write something twice, you are doing something wrong. To check the date above, we used a quite complex regular expression: `^(19[789]|20[012])\d-(0\d|1[0–2])-([0–2]\d|3[01])$`. This pattern can be extracted to a separate place for reuse wherever needed. To do this, we need to create a “defs” section in the schema:

"$defs": {
        "formatted_date": {"type": "string", "pattern": 
"^(19[789]|20[012])\\d-(0\\d|1[0-2])-([0-2]\\d|3[01])$"}
    },

Then, in the schema, wherever a date is expected, we use a reference to `formatted_date`:

"properties": {
            "id": {"anyOf": [{"type": "integer", "minimum": 1}, 
{"type": "string", "pattern": "^ID\\d{3}$"}]},
            "name": {"type": "string"},
            "patronymic": {"type": ["string", "null"]},
            "surname": {"type": "string"},
            "sex": {"type": "string", "enum": ["М", "Ж"]},
            "birth_date": {"$ref": "#/$defs/formatted_date"},
            "activation_date": {"$ref": "#/$defs/formatted_date"},
            "phone": {"type": "string", "minLength": 12, "maxLength": 12},
            "passport": {"type": "null"},
            "result": {"type": "string", "const": "ok"},
            "height": {"type": "number", "exclusiveMaximum": 3},
            "citizenship": {"type": "boolean"},
            "driver_license": {
                "type": "array",
                "items": {"type": "string", "enum": ["A", "B", "C", "D", "E"]},
                "uniqueItems": true
            }
        },

This approach allows changing the regular expression in only one place if needed. For example, if developers have not yet decided on the date format and occasionally change it.

However, it’s crucial to remember that the JSON schema always shows only the first problem. This means that if a test fails, it will be unclear what exactly went wrong — whether it’s the API response or, for instance, the regular expression (i.e., the format of all dates is incorrect).

Conclusion

The topic of schemas is very extensive, and everyone can find in it what suits them best. It is a tool that allows you to quickly obtain a means to validate most endpoints. If the project does not involve complex dependency chains, where creating one entity requires creating a lot of things around it, JSON schemas can help cover almost the entire project in just a couple of working days.

If you have any questions, feel free to ask in the comments.

Article author: Vladimir Vasyaev, Maxilect.

The article is based on a presentation by the specialist at the PiterPy 2023 conference.

PS. Subscribe to our social networks: Twitter, Telegram, FB to learn about our publications and Maxilect news.