Inheritance for SQLAlchemy Models

In software engineering one of the key principles of object oriented software is the concept of inheritance. It can be used to increase code re-use which reduces the volume of tests and speeds up development. You can use inheritance in SQLAlchemy as described here. However, this inheritance is mainly used to describe relationships between tables and not for the purpose of re-using certain pieces of models defined elsewhere.

The openapi specification allows for inheritance using the allOf statement. This means that you could, for example, define a schema for id properties once and re-use that schema for any number of objects where you can customise things like the description that may differ object by object. You can also use allOf to combine objects, which is a powerful way of reducing duplication. You could, for example, define a base object with an id and name property that you then use repeatedly for other objects so that you don’t have to keep giving objects an id and a name.

If this feature could be brought to SQLAlchemy models, you would have a much shorter models.py files which is easier to maintain and understand. The plan for the openapi-SQLAlchemy package is to do just that. The first step has been completed with the addition of support for allOf for column definitions. If you aren’t familiar with the package, the Reducing API Code┬áDuplication article describes the aims of the package.

To start using the column inheritance feature, read the documentation for the feature which describes it in detail and gives and example specification that makes use of it.

openapi-SQLAlchemy Now Supports $ref for Columns

A new version of openapi-SQLAlchemy has been release which adds support for $ref for columns. The package now supports openapi schemas such as the following:

components:
  schemas:
    Id:
      type: integer
      description: Unique identifier for the employee.
      example: 0
    Employee:
      description: Person that works for a company.
      type: object
      properties:
        id:
          $ref: "#/components/schemas/Id"
        name:
          type: string
          description: The name of the employee.
          example: David Andersson.

If you are interested in how this was accomplished with decorators and recursive programming, the following article describes the implementation: Python Recursive Decorators.

Reducing API Code Duplication

One of the basic principles of good software engineering is the DRY principle – Don’t Repeat Yourself. It means that information should only exist in one place and should not be repeated elsewhere. This leads to code that is easier to maintain since any change only has to be made once, among a range of other benefits.

Practicing the principle is harder than stating it. For example, in the case of an API that is supported by a database, chances are there are overlaps between the database and API schema.

In my experience, there usually is significant overlap between the schema that is defined in the database and the schema that is returned by the linked API. Changes to the database schema might accidently not be properly propagate to the API schema or the API interface documentation might not get updated.

A popular tool for exposing a database schema to a Python application is the SQLAlchemy library. This is usually achieved by defining a models.py file with classes that map to tables in the database. For one of the API endpoints you might retrieve some of these objects from the database, apply some business logic to them and then return them through the API interface.

To communicate to your users how to interact with your API, you might write an openapi specification. You could even go further and use tools like connexion to map endpoints to Python functions for fulfilment. One part of that openapi specification is to define the returned schema of each endpoint.

To get closer to fulfilling the DRY principle you might wish that there was some way to connect the SQLAlchemy models and openapi schema so that you only have to define the schema in one place. To fulfil that wish I started an open source package called openapi-SQLAlchemy.

The aim of the package is to accept an openapi specification and simplify creating the SQLAlchemy models file. The aim for the MVP is the following. Given an openapi specification, when it is read into a python dictionary and passed to the module, a model factory is returned. That model factory can be called with the name of a schema which returns a class that is a valid SQLAlchemy model. For example:

# example-spec.yml
openapi: "3.0.0"

info:
  title: Test Schema
  description: API to illustrate openapi-SQLALchemy MVP.
  version: "0.1"

paths:
  /employee:
    get:
      summary: Used to retrieve all employees.
      responses:
        200:
          description: Return all employees from the database.
          content:
            application/json:
              schema:
                type: array
                items:
                  "$ref": "#/components/schemas/Employee"

components:
  schemas:
    Employee:
      description: Person that works for a company.
      type: object
      properties:
        id:
          type: integer
          description: Unique identifier for the employee.
          example: 0
        name:
          type: string
          description: The name of the employee.
          example: David Andersson.
        division:
          type: string
          description: The part of the company the employee works in.
          example: Engineering
        salary:
          type: number
          description: The amount of money the employee is paid.
          example: 1000000.00
      required:
        - id
        - name
        - division

Normally the following models.py file would be required.

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Float


Base = declarative_base()


class Employee(Base):
    """
    Person that works for a company.

    Attrs:
        id: Unique identifier for the employee.
        name: The name of the employee.
        division: The part of the company the employee works in.
        salary: The amount of money the employee is paid.

    """
    __tablename__ = "employee"
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String, index=True, nullable=False)
    division = Column(String, index=True, nullable=False)
    salary = Column(Float, nullable=False)

As you can see there is a lot of duplicate information. The aim is to instead pass the specification to openapi-SQLAlchemy and reduce the models.py file to the following:

from yaml import loads
from sqlalchemy.ext.declarative import declarative_base
from openapi_sqlalchemy import ModelFactory


Base = declarative_base()
with open("example-spec.yml") as spec_file:
    SPEC = yaml.load(spec_file)
model_factory = ModelFactory(base=Base, spec=SPEC)


Employee = model_factory(name="Employee")

There is significantly less duplicate information across the specification and models file. The name of the object (Employee) is repeated a few times, although this can be viewed as a reference which means that it is acceptable.

Whilst things like whether a column is nullable can be derived from the required property of the object, there are some additional pieces of information that have to be included in the specification file. For example, not every schema must be a table. Also, the primary key, auto increment and index column modifications are not currently recorded in the specification. The final Employee schema might look like the following.

    Employee:
      description: Person that works for a company.
      type: object
      x-tablename: employee
      properties:
        id:
          type: integer
          description: Unique identifier for the employee.
          example: 0
          x-primary-key: true
          x-autoincrement: true
        name:
          type: string
          description: The name of the employee.
          example: David Andersson.
          x-index: true
        division:
          type: string
          description: The part of the company the employee works in.
          example: Engineering
          x-index: true
        salary:
          type: number
          description: The amount of money the employee is paid.
          example: 1000000.00
      required:
        - id
        - name
        - division

There are more column modifiers that would need to be supported, such as the unique constraint. There are also more column types, such as foreign keys, and there is also inheritance and references that will need to be supported. All of these need to be supported whilst also ensuring that the specification remains valid.

This problem does not seem to be difficult to achieve in Python. There are also some opportunities to use some Python tricks to reduce the amount of code that needs to be written. As I develop this package I expect to write updates on progress and also write articles on some of the Python tricks.