One of the basic principles of good software engineering is the DRY principle – Don’t Repeat Yourself. It means that information should only exist in one place and should not be repeated elsewhere. This leads to code that is easier to maintain since any change only has to be made once, among a range of other benefits.
Practicing the principle is harder than stating it. For example, in the case of an API that is supported by a database, chances are there are overlaps between the database and API schema.
In my experience, there usually is significant overlap between the schema that is defined in the database and the schema that is returned by the linked API. Changes to the database schema might accidently not be properly propagate to the API schema or the API interface documentation might not get updated.
A popular tool for exposing a database schema to a Python application is the SQLAlchemy library. This is usually achieved by defining a models.py file with classes that map to tables in the database. For one of the API endpoints you might retrieve some of these objects from the database, apply some business logic to them and then return them through the API interface.
To communicate to your users how to interact with your API, you might write an openapi specification. You could even go further and use tools like connexion to map endpoints to Python functions for fulfilment. One part of that openapi specification is to define the returned schema of each endpoint.
To get closer to fulfilling the DRY principle you might wish that there was some way to connect the SQLAlchemy models and openapi schema so that you only have to define the schema in one place. To fulfil that wish I started an open source package called openapi-SQLAlchemy.
The aim of the package is to accept an openapi specification and simplify creating the SQLAlchemy models file. The aim for the MVP is the following. Given an openapi specification, when it is read into a python dictionary and passed to the module, a model factory is returned. That model factory can be called with the name of a schema which returns a class that is a valid SQLAlchemy model. For example:
# example-spec.yml
openapi: "3.0.0"
info:
title: Test Schema
description: API to illustrate openapi-SQLALchemy MVP.
version: "0.1"
paths:
/employee:
get:
summary: Used to retrieve all employees.
responses:
200:
description: Return all employees from the database.
content:
application/json:
schema:
type: array
items:
"$ref": "#/components/schemas/Employee"
components:
schemas:
Employee:
description: Person that works for a company.
type: object
properties:
id:
type: integer
description: Unique identifier for the employee.
example: 0
name:
type: string
description: The name of the employee.
example: David Andersson.
division:
type: string
description: The part of the company the employee works in.
example: Engineering
salary:
type: number
description: The amount of money the employee is paid.
example: 1000000.00
required:
- id
- name
- division
Normally the following models.py file would be required.
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Float
Base = declarative_base()
class Employee(Base):
"""
Person that works for a company.
Attrs:
id: Unique identifier for the employee.
name: The name of the employee.
division: The part of the company the employee works in.
salary: The amount of money the employee is paid.
"""
__tablename__ = "employee"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String, index=True, nullable=False)
division = Column(String, index=True, nullable=False)
salary = Column(Float, nullable=False)
As you can see there is a lot of duplicate information. The aim is to instead pass the specification to openapi-SQLAlchemy and reduce the models.py file to the following:
from yaml import loads
from sqlalchemy.ext.declarative import declarative_base
from openapi_sqlalchemy import ModelFactory
Base = declarative_base()
with open("example-spec.yml") as spec_file:
SPEC = yaml.load(spec_file)
model_factory = ModelFactory(base=Base, spec=SPEC)
Employee = model_factory(name="Employee")
There is significantly less duplicate information across the specification and models file. The name of the object (Employee) is repeated a few times, although this can be viewed as a reference which means that it is acceptable.
Whilst things like whether a column is nullable can be derived from the required property of the object, there are some additional pieces of information that have to be included in the specification file. For example, not every schema must be a table. Also, the primary key, auto increment and index column modifications are not currently recorded in the specification. The final Employee schema might look like the following.
Employee:
description: Person that works for a company.
type: object
x-tablename: employee
properties:
id:
type: integer
description: Unique identifier for the employee.
example: 0
x-primary-key: true
x-autoincrement: true
name:
type: string
description: The name of the employee.
example: David Andersson.
x-index: true
division:
type: string
description: The part of the company the employee works in.
example: Engineering
x-index: true
salary:
type: number
description: The amount of money the employee is paid.
example: 1000000.00
required:
- id
- name
- division
There are more column modifiers that would need to be supported, such as the unique constraint. There are also more column types, such as foreign keys, and there is also inheritance and references that will need to be supported. All of these need to be supported whilst also ensuring that the specification remains valid.
This problem does not seem to be difficult to achieve in Python. There are also some opportunities to use some Python tricks to reduce the amount of code that needs to be written. As I develop this package I expect to write updates on progress and also write articles on some of the Python tricks.
2 thoughts on “Reducing API Code Duplication”