In database terminology primary key refers to the column in a table that's intended to be the primary way of identifying rows. Each table must have exactly one, and it needs to be unique. This is usually some kind of a unique identifier associated with objects presented by the table, or if such an identifier doesn't exist simply a running ID number (which is incremented automatically).
Learning Outcomes and Material¶
This exercise introduces OpenAPI for writing API documentation and a quick glance at tools related to it. You will learn the basic structure of an OpenAPI document, and how to offer API documentation directly from Flask. Documentation in the first part of this exercise will be made for the same version of SensorHub API that was used in the previous testing material.
The second part of this exercise focuses on self-documenting APIs that use hypermedia. You will learn how to prepare your API for dynamic navigation by machine clients. The second part takes a break from SensorHub and focuses on the MusicMeta API, initially designed in the API design extra material.
Introduction Lecture¶
This is an optional introduction lecture that adds some depth and visuality to the material in this exercise. As usual it's not necessary for completing the exercise but can be interesting if you want to know a little bit more. However, it might be a better fit for explaining the OpenAPI format than this exercise material in its current form.
API Documentation with OpenAPI¶
Your API is only as good as its documentation. It hardly matters how neat and useful your API is if no one knows how to use it. This is true whether it is a public API, or between different services in a closed architecture. Good API documentation shows all requests that are possible; what parameters, headers, and data are needed to make those requests; and what kinds of responses can be expected. Examples and semantics should be provided for everything.
API documentation is generally done with description languages that are supported by tools for generating the documentation. In this exercise we will be looking into
OpenAPI
and Swagger
- an API description format and the toolset around it. These tools allow creating and maintaining API documentation in a structured way. Furthermore, various tools can be used to generate parts of the documentation automatically, and reuse schemas
in the documentation in the API implementation itself. All of these tools make it easier to maintain documentation as the distance between your code and your documentation becomes smaller.While there are a lot of fancy tools to generate documentation automatically, you first need a proper understanding of the API description format. Without understanding the format it is hard to evaluate when and how to use fancier tools. This material will focus on just that: understanding the OpenAPI specification and being able to write documentation with it.
Preparation¶
There's a couple of pages that are useful to have open in your browser for this material. First there is the obvious OpenAPI specification. It is quite a hefty document and a little hard to get into at first. Nevertheless, after going through this material you should have a basic understanding of how to read it. The second page to keep handy is the Swagger editor where you can paste various examples to see how they are rendered. Also very useful when documenting your own project to ensure your documentation conforms to the OpenAPI specification.
On the Python side, there are a couple of modules that are needed. Primarily we want Flasgger which a Swagger toolkit for Flask. We also have some use for PyYaml. Perform these sorceries and you're all set:
pip install flasgger pip install pyyaml
Very Short Introduction to YAML¶
At its core OpenAPI is a specification format that can be written in JSON or YAML. We are going to use
YAML
in the examples for two reasons: it's the format supported by Flasgger, but even more importantly it is much less "noisy" which makes it a whole lot more pleasant to edit. YAML is "a human-friendly data serialization language for all programming languages". It is quite similar to JSON but much like Python, it removes all the syntactic noise from curly braces by separating blocks by indentation. It also strips the need for quotation marks for strings. To give a short example, here is a comparison of the same sensor serialized first in JSON{
"name": "test-sensor-1",
"model": "uo-test-sensor",
"location": {
"name": "test-site-a",
"description": "some random university hallway"
}
}
And the same in YAML:
name: test-sensor-1
model: uo-test-sensor
location:
name: test-site-a
description: some random university hallway
The only other thing you really need to know is that items that are in an array together are prefixed with a dash (-) instead of key. A quick example of a list of sensors serialized in JSON:
{
"items": [
{
"name": "test-sensor-1",
"model": "uo-test-sensor",
"location": "test-site-a"
},
{
"name": "test-sensor-2",
"model": "uo-test-sensor",
"location": null
}
]
}
And again the same in YAML:
items:
- name: test-sensor-1
model: uo-test-sensor
location: test-site-a
- name: test-sensor-2
model: uo-test-sensor
location: null
Note the lack of differentiation between string values and the null value. In here null is simply a reserved keyword that is converted automatically in parsing. Numbers work similarly. If you absolutely need the string "null" instead, then you can add quotes, writing
location: 'null'
instead. Finally there are two ways to write longer pieces of text: literal and folded style. Examples below:multiline: |
This is a very long description
that spans a whole two lines
folded: >
This is another long description
that will be in a single line
There are a few more detail to YAML but they will not be relevant for this exercise. Feel free to look them up from the specification.
OpenAPI Structure¶
An OpenAPI document is a rather massively nested structure. In order to get a better grasp of the structure we will start from the top, the OpenAPI Object. This is the document's root level object, and contains a total of 8 possible fields, out of which 3 are required.
- openapi: Required. This is the version of the OpenAPI specification used by the document. For our purposes it will be
"3.0.3"
- info: This will contain metadata about the API. Introduced later in this section
- servers: This is an array listing servers for the API. This can list relative URLs, so we're just going to put the URL
"/api"
as the only item here, and call it done. - paths: Basically the main portion of the document, this will contain documentation for every single path (URI) available in the API. We will talk about it more in this section.
- components: Another important portion. Contains various reusable components that can be referenced from other parts of the document.
- security: Array of possible security mechanisms that can be used in the API. Not discussed in this material.
- tags: Array of tags that can be used to categorize operations. Not discussed in this material.
- externalDocs: Can contain a link to external documentation. Not discussed in this section.
The next sections will dive into info, paths and components in more detail. Presented below is the absolute minimum of what must be in OpenAPI document. Absolutely useless as a document, but should give you an idea of the very basics.
openapi: 3.0.3
info:
title: Absolute Minimal Document
version: 1.0.0
paths:
/:
get:
responses:
'200':
description: An empty root page
Info Object¶
The info object contains some basic information about your API. This information will be displayed at the top of the documentation. It should give the reader relevant basic information about what the API is for, and - especially for public APIs - terms of service and license information. The fields are quite self-descriptive in the OpenAPI specification, but we've listed them below too.
- title: The title of the API. Required.
- version: The versio number of this API. Required. Extremely important when providing multiple versions of the same API (often needed to give clients a transition period when the API changes).
- description: A longer description of the API. CommonMark can be used in the description. It is most likely that the description should use literal or folded style.
- termsOfService: A URL that points to the terms of service document for the API. Obviously important for public APIs, but this is not a law course so we will not bother with TOS.
- contact: Contact information for the API. This has to contain an object with up to three fields: name, url, and email.
- license: License information for the API data. The value for this field is an object with two fields: license name and url. Again, obviously important, but promptly ignored within this course.
Below is an example of a completely filled info object.
info:
title: Sensorhub Example
version: 0.0.1
description: |
This is an API example used in the Programmable Web Project course.
It stores data about sensors and where they have been deployed.
termsOfService: http://totally.not.placehold.er/
contact:
url: http://totally.not.placehold.er/
email: pwp-course@lists.oulu.fi
name: PWP Staff List
license:
name: Apache 2.0
url: https://www.apache.org/licenses/LICENSE-2.0.html
Components Object¶
The components object is a very handy feature in the
OpenAPI
specification that can drastically reduce the amount of work needed when maintaining documentation. It is essentially a storage for resuable objects that can be referenced from other parts of the documentation (the paths component in particular). Anything that appears more than once in the documentation should be placed here. That way you do not need to update multiple copies of the same thing when making changes to the API.This object has various fields that categorise the components by their object type. First we will go through all the fields. After that we're going to introduce the component types that are most likely to end up here in their own sub sections.
- schemas: This field is for storing reusable schemas. Possibly the most important reusable component type as schemas not only come up often, but are also rather chonky.
- responses: This is for storing responses. Useful if multiple routes in the API return identical responses but we are mostly trying to avoid that. We will talk more about responses in the Paths Object section.
- parameters: This stores parameters that can be present in URIs,headers,cookies, andqueries. Due to the hierarchy of URIs, these will be repeated a lot, and make a good candidate to be placed in the Components Object.
- examples: Examples that can be shown in the API documentation for both requestandresponsebodies.
- requestBodies: Essentially a couple of levels up from example, stores the entire request body instead. This can be useful as POST and PUT requests often have similar bodies.
- headers: Stores headers. If there are headers that are reused as-is, they should be placed here.
- securitySchemes: Reusable security scheme components. If the API has authentication, it is quite likely repeated often, and therefore is best placed here.
- links: Stores links that can point out known relationships and travelsal. We'll talk more about links when discussing hypermedia.
- callBacks: Stores out-of-band callbacks that can be made related to the parent. Not discussed here.
Out of these we are going to dive into details about schemas, parameters, and requestBodies next.
Schema Object¶
The schemas field in components will be the new home for all of our schemas. The structure is rather simple: it's just a mapping of schema name to schema object. Schema objects are essentially
JSON schemas
, just written out in YAML
(that is, in our case - OpenAPI can be written in JSON too.) OpenAPI does adjust the definitions of some properites of JSON schema, as specified in the schema object documentation. Below is a simple example of how to write the sensor schema we used earlier into a reusable schema component in OpenAPI.
components:
schemas:
Sensor:
type: object
properties:
model:
description: Name of the sensor's model
type: string
name:
description: Sensor's unique name
type: string
required:
- name
- model
Since we already wrote these schemas once as part of our
model classes
, there's little point in writing them manually again. With a couple of lines in the Python console you can output the results of your json_schema methods into yaml, which you can then copy-paste into your document:import yaml from sensorhub import Sensor print(yaml.dump(Sensor.json_schema()))
Parameter Object¶
Describing all of the URL variables in our
route
under the parameters field in components is usually a good idea. Even in a small API like the course project, at least the root level variables will be present in a lot of URIs. For instance the sensor variable is already present in at least three routes:/api/sensors/<sensor>/ /api/sensors/<sensor>/measurements/ /api/sensors/<sensor>/measurements/1/
For the sake of defining each thing in only one place, it seems very natural for parameters to be reusable components. Also, although we didn't talk about
query parameters
much, they can also be described here - useful if you have lots of resources that support filtering or sorting using similar queries. In OpenAPI a parameter is described through a few fields. - name: Obviously required. This is the parameter's name within the documentation. Not necessarily the same as the URI variable name in your code (which is just an implementation detail and not visible to clients).
- in: Required field. This field defines where the parameter is. For route variables this value should be "path". For query parameters it's "query". We won't cover the other options ("header" and "cookie") in this exercise.
- description: Optional description that explains what the parameter is. Should be provided, but less relevant for hypermedia APIs.
- required: This field indicates whether the parameter is required or not. For path parameters this must be present and set to true.
- deprecated: Should be set to true for parameters that are going to go out of use in future versions of the API. Should not be used with path parameters.
- schema: Can contain a schema that defines the type of the parameter.
Below is an example of the sensor path parameter:
components:
parameters:
sensor:
description: Selected sensor's unique name
in: path
name: sensor
required: true
schema:
type: string
As you can see it's quite a few lines just to describe one parameter. All the more reason to define it in one place only. If you look at the parameter specification it also lists quite a few ways to describe parameter style besides schema, but for our purposes schema will be sufficient.
Security Scheme Component¶
If your API uses authentication, it is quite likely that it is used for more than one resource. Therefore placing security schemes in the reusable components part seems like a smart thing to do. What exactly a security scheme should contain depends on its type. For API keys there are four fields to fill.
- type: Indicates the type. Required, and for API key it should be "apiKey"
- description: Just a short description, optional.
- name: Name of the header,query parameter, orcookiewhere the API key is expected to be.
- in: Defines where the API key should be presented, possible values are "header", "query", and "cookie".
A quick example.
components:
securitySchemes:
sensorhubKey:
type: apiKey
name: Sensorhub-Api-Key
in: header
Paths Object¶
The paths object is the meat of your
OpenAPI
documentation. This object needs to document every single path (route
) in your API. All available methods also need to be documented with enough detail that a client can be implemented based on the documentation. This sounds like a lot of work, and it is, but it's also necessary. Luckily there are ways to reduce the work, but first let's take a look at how to do it completely manually to get an understand of what actually goes into these descriptions.By itself the paths object is just a mapping of path (route) to path object that describes it. So the keys in this object are just your paths, including any path parameters. Unlike Flask where these are marked with angle braces (e.g.
<sensor>
), in OpenAPI they are marked with curly braces (e.g. {sensor
}). So, for instance, the very start of our paths object would be something like:paths:
/sensors/:
...
/sensors/{sensor}/:
...
Note that these paths are appended to whatever you put in the servers field in the root level servers attribute. Since we put
/api
there, these paths in full would be the same as our routes: /api/sensors/
and /api/sensors/{sensor}/
. Path Object¶
A single path is mostly a container for other objects, particularly: parameters and operations. As we discussed earlier, pulling the parameters from the
components object
is a good way to avoid typing the same documentation twice. The operations refer to each of the HTTP methods that are supported by this resource. Before moving on to operations, here is a quick example of referencing the sensor parameter we placed in components:paths:
/sensors/{sensor}/:
parameters:
- $ref: '#/components/parameters/sensor'
In short, a reference is made with
$ref
key, using the referred object's address in the documentation as the value. When this is rendered, the contents from the referenced parameter are shown in the documentation.Operation Object¶
An operation object contains all the details of a single operation done to a resource. These match to the HTTP methods that are available for the resource. Operations can be roughly divided into two types: ones that return a response body (GET mostly) and ones that don't. Once again OpenAPI documentation for operations lists quite a few fields. We'll narrow the list down a bit.
- description: A description of what the operation does. While in REST it should be clear from the method being used, it's still nice to write a small summary.
- responses: This field is required, and contains a mapping of all the possible responses, including errors.
- security: This is an array of possible security schemes used for this operation. Note that only one of the listed measures needs to be satisfied. Therefore if you only have one way to authorize the operation, this array should have exactly one item. Ideall a reference to an existing security scheme.
- parameters: This field can have parameters that specific to one operation instead of the whole path. Mostly useful for GET methods that support filtering and/or sorting via query parameters.
- requestBody: This field shows what is expected from the request body. Very relevant for POST, PUT, and PATCH operations. As discussed earlier, this could potentially be something where you want to use references to reusable components.
The responses part is a mapping of status code into a description. Important thing to note is that the codes need to be quoted, as yaml doesn't support starting a name with a number (again, like Python). The contents of a response are discussed next.
Response Object¶
A response object is an actual representation of what kind of data is to be expected from the API. This also includes all error responses that can be received when the client makes an invalid request. At the very minimum a response object needs to provide a description. For error responses this might be sufficient as well. However for 200 responses the documentation generally also needs to provide at least one example of a response body. This goes into the content field. The content field itself is a mapping of media type to media type objects.
The media type defines the contents of the response through schema and/or example(s). This time we will show how to do that with examples. In our SensorHub API we can have two kinds of sensors returned from the sensor resource: sensors with a location, and without. For completeness' sake it would be best to show an example of both, in which case using the examples field is a good idea. The examples field is a mapping of example name to an example object that usually contains a description, and then finally value where the example itself is placed. Here is a full example all the way from document root to the examples in the sensor resource. It's showing two responses (200 and 404), and two different examples (deployed-sensor and stored-sensor).
paths:
/sensors/{sensor}/:
parameters:
- $ref: '#/components/parameters/sensor'
get:
description: Get details of one sensor
responses:
'200':
description: Data of single sensor with extended location info
content:
application/json:
examples:
deployed-sensor:
description: A sensor that has been placed into a location
value:
name: test-sensor-1
model: uo-test-sensor
location:
name: test-site-a
latitude: 123.45
longitude: 123.45
altitude: 44.51
description: in some random university hallway
stored-sensor:
description: A sensor that lies in the storage, currently unused
value:
name: test-sensor-2
model: uo-test-sensor
location: null
'404':
description: The sensor was not found
Another example that shows using a single example, in the example field. In this case the example content is simply dumped as the field's value. This time the response body is an array, as denoted by the dashes.
paths:
/sensors/:
get:
description: Get the list of managed sensors
responses:
'200':
description: List of sensors with shortened location info
content:
application/json:
example:
- name: test-sensor-1
model: uo-test-sensor
location: test-site-a
- name: test-sensor-2
model: uo-test-sensor
location: null
One final example shows how to include the Location header when documenting 201 responses. This time a headers field is added to the
operation object
while content is omitted (because 201 response is not supposed to have a body). paths:
/sensors/:
post:
description: Create a new sensor
responses:
'201':
description: The sensor was created successfully
headers:
Location:
description: URI of the new sensor
schema:
type: string
Here the key in the headers mapping must be identical to the actual header name in the response.
Request Body Object¶
In POST, PUT, and PATCH operations it's usually helpful to provide an example or schema for what is expected from the request body. Much like a response object, a request body is also made of the description and content fields. As stated earlier, it might be better to put these into components from the start, but we're showing them embedded into the paths themselves. bAs such there isn't much new to show here as the content field should contain a similar media type object as the respective field in responses. Our example here shows the POST method for sensors collection, with both schema (referenced) and an example:
paths:
/sensors/:
post:
description: Create a new sensor
requestBody:
description: JSON document that contains basic data for a new sensor
content:
application/json:
schema:
$ref: '#/components/schemas/Sensor'
example:
name: new-test-sensor-1
model: uo-test-sensor-plus
Full Example¶
You can download the full SensorHub API example below. Feed it to the Swagger editor to see how it renders. In the next section we'll go through how to have it rendered directly from the API server.
Swagger with Flasgger¶
Flasgger is a toolkit that brings Swagger to Flask. At the very minimum it can be used for serving the API documentation from the server, with the same rendering that is used in the Swagger editor. It can also do other fancy things, some of which we'll look into, and some will be left to the reader's curiosity.
Basic Setup¶
When setting up documentation the source YAML files should be put into their own folder. As the first step, let's create a folder called doc, under the folder that contains your app (or the api.py file if you are using a proper project structure). Download the example from above and place it into the doc folder.
In order to enable Flasgger, it needs to be imported, configured, and initialized. Very much like Flask-SQLAlchemy and Flask-Caching earlier. This whole process is shown in the code snippet below.
from flasgger import Swagger, swag_from
app = Flask(__name__, static_folder="static")
# ... SQLAlchemy and Caching setup omitted from here
app.config["SWAGGER"] = {
"title": "Sensorhub API",
"openapi": "3.0.3",
"uiversion": 3,
}
swagger = Swagger(app, template_file="doc/sensorhub.yml")
This is actually everything you need to do to make the documentation viewable. Just point your browser to
http://localhost:5000/apidocs/
after starting your Flask test server, and you should see the docs. NOTE: Flasgger requires all YAML documents to use the start of document marker, three dashes
---
.Modular Swaggering¶
If holding all of your documentation in a ginormous YAML file sounds like a maintenance nightmare to you, you are probably not alone. Even by itself OpenAPI supports splitting the description into multiple files using file references. If you paid attention you may have noticed that the YAML file was passed to Swagger constructor as template_file. This indicates it is intended to simply be the base, not the whole documentation.
Flasgger allows us to document each view (resource method) in either a separate file, or in the method's docstring. First let's look at using docstrings. In order to document from there, you simply move the contents of the entire
operation object
inside the docstring, and precede it with three dashes. This separates the OpenAPI part from the rest of the docstring. Here is the newly documented GET method for sensors collectionclass SensorCollection(Resource):
def get(self):
"""
This is normal docstring stuff, OpenAPI description starts after dashes.
---
description: Get the list of managed sensors
responses:
'200':
description: List of sensors with shortened location info
content:
application/json:
example:
- name: test-sensor-1
model: uo-test-sensor
location: test-site-a
- name: test-sensor-2
model: uo-test-sensor
location: null
"""
body = {"items": []}
for db_sensor in Sensor.query.all():
item = db_sensor.serialize(short_form=True)
body["items"].append(item)
return Response(json.dumps(body), 200, mimetype=JSON)
The advantage of doing this is bringing your documentation closer to your code. If you change the view method code then the corresponding API documentation is right there, and you don't need to hunt for it from some other file(s). If you remove the sensors collection path from the tempalte file and load up the documentation, the GET method should still be documented, from this docstring. It will show up as
/api/sensors/
however, because Flasgger takes the path directly from your routing. One slight inconvenience is that you can't define parameters on a resource level anymore, and have to include them in every operation instead. In other words this small part in the sensor resource's documentation
parameters:
- $ref: '#/components/parameters/sensor'
has to be replicated in every method's docstring. References to the components can still be used, as long as those components are defined in the template file. For instance, documented PUT for sensor resource:
class SensorItem(Resource):
def put(self, sensor):
"""
---
description: Replace sensor's basic data with new values
parameters:
- $ref: '#/components/parameters/sensor'
requestBody:
description: JSON document that contains new basic data for the sensor
content:
application/json:
schema:
$ref: '#/components/schemas/Sensor'
example:
name: new-test-sensor-1
model: uo-test-sensor-plus
responses:
'204':
description: The sensor's attributes were updated successfully
'400':
description: The request body was not valid
'404':
description: The sensor was not found
'409':
description: A sensor with the same name already exists
'415':
description: Wrong media type was used
"""
if not request.json:
raise UnsupportedMediaType
try:
validate(request.json, Sensor.json_schema())
except ValidationError as e:
raise BadRequest(description=str(e))
sensor.deserialize(request.json)
try:
db.session.add(sensor)
db.session.commit()
except IntegrityError:
raise Conflict(
"Sensor with name '{name}' already exists.".format(
**request.json
)
)
return Response(status=204)
Another option is to use separate files for each view, and the swag_from
decorator
. In that case you would put each operation object
into its own YAML file and place it somewhere like doc/sensorcollection/get.yml
. Then you'd simply decorate the methods like this:@swag_from("doc/sensorcollection/get.yml")
class SensorCollection(Resource):
def get(self):
...
Or if you follow the correct naming convention for your folder structure, you can also have Flasgger do all of this for you without explicitly using the swag_from decorator. Specifically, if your file paths follow this pattern:
/{resource_class_name}/{method}.yml
then you can add "doc_dir" to Flasgger's configuration, and it will look for these documentation files automatically. Note that your filenames must have the .yml extension for autodiscover to find them, .yaml doesn't work. One addition to the config is all you need.
app.config["SWAGGER"] = {
"title": "Sensorhub API",
"openapi": "3.0.3",
"uiversion": 3,
"doc_dir": "./doc",
}
Ultimately how you manage your documentation files is up to you. With this section you now have three options to choose from, and with further exploration you can find more. However it needs to be noted that currently Flasgger does not follow file references in YAML files, so you can't go and split your template file into smaller pieces. Still, managing all your reusable components in the template file and the view documentations elsewhere already provides some nice structure.
Hypermedia APIs¶
In the second part of this exercise material we will dive into hypermedia APIs. This exercise moves away from the SensorHub example to the MusicMeta example which provides a better base for what we are about to discuss.
Full Example¶
Below is the full MusicMeta API example that we will be discussing in this section. Furthermore, if you haven't done so already, read through the API design extra material in order to understand what this API is trying to achieve.
Enter Hypermedia¶
In order for client developers to know what to actually send - and what to expect in return - APIs need to be documented. We achieved some of this goal by using OpenAPI to document the API, but we can further with
hypermedia
in responses given by the API. This way the API itself describes possible actions that can be taken to the client. For this example we have chosen Mason as our hypermedia format because it has a very clear syntax for defining hypermedia elements and connecting them to body. Hypermedia Controls and You¶
You can consider the API as a map and each
resource
as a node. The resource that you most recently sent a GET request to is basically the node that says "you are here". Hypermedia
controls
describe the logical next actions: where to go next, or actions that can be performed with the particular node you're in. Together with the resources they actually form a client-side state diagram of how to navigate the API. Hypermedia controls are extra attributes attached to the data representation
we showed in the API design example.A hypermedia control is a combination of at least two things:
link relation
("rel") and target URI
("href"). These answer two questions: what does this control do, and where to go to activate it. Note that link relation is a machine-readable keyword, not a description for humans. Many generally used relations are being standardized (full list) but APIs can define their own when needed as well - as long as each relation always means the same thing. When a client wants to do something, it uses the available link relations to discover what URI the next request should go to. This means that clients using our API should never need to have hardcoded URIs - they will find the URI by searching for the correct relation instead.Mason also defines some additional attributes for hypermedia controls. Of these "method" is one that we will be using frequently, because it tells which
HTTP method
should be used to make the request (usually omitted for GET as it is assumed to be the default). There's also "title" which can be used in generic clients
(or other generated clients) to help the client's human user figure out what the control does. Even beyond that we can also include JSON schema representation that defines how to send data to the API. In Mason hypermedia controls can be attached to any object by adding the
"@controls"
attribute. This in itself is an object where link relations are attribute names whose values are also objects that have at least one attribute: href. For example, here is a track item with controls to get back to the album it is on ("up") and to edit its information ("edit"):{
"title": "Wings of Lead Over Dormant Seas",
"disc_number": 2,
"track_number": 1,
"length": "01:00:00",
"@controls": {
"up": {
"href": "/api/artists/dirge/albums/Wings of Lead Over Dormant Seas/"
},
"edit": {
"href": "/api/artists/dirge/albums/Wings of Lead Over Dormant Seas/2/1/",
"method": "PUT"
}
}
}
Or if we want each item in a collection to actually have its own URI available to clients:
{
"items": [
{
"artist": "Scandal",
"title": "Hello World",
"@controls": {
"self": {
"href": "/api/artists/scandal/albums/Hello World/"
}
}
},
{
"artist": "Scandal",
"title": "Yellow",
"@controls": {
"self": {
"href": "/api/artists/scandal/albums/Yellow/"
}
}
}
]
}
Custom Link Relations¶
While it's good to use standards as much as possible, realistically each API will have a number of
controls
whose meaning cannot be explicitly conveyed with any of the standardized relations
. For this reason Mason documents can use link relation namespaces
to extend available link relations. A Mason namespace defines a prefix and its associated namespace (similar to XML namespace, see CURIEs). The prefix will be added to link relations that are not defined in the IANA list. When a relation is prefixed with a namespace prefix, it is meant to be interpreted as attaching the relation at the end of the namespace and makes the relation unique - even if another API defined a relation with the same name, it would have a different namespace in front. For example if want to have a relation called "albums-va" to indicate a control that leads to a collection of all VA albums, its full identifier could be
http://wherever.this.server.is/musicmeta/link-relations/#albums-va
. To make this look less wieldy we can define a namespace prefix called "mumeta", and then include this control like so:{
"@namespaces": {
"mumeta": {
"name": "http://wherever.this.server.is/musicmeta/link-relations/#"
}
},
"@controls": {
"mumeta:albums-va": {
"href": "/api/artists/VA/albums"
}
}
}
Also if a client developer visits the full URL, they should find a description about the link relation. Note also that this is normally expected to be a full URL because the server part is what guarantees uniqueness. In later examples you will see we're using a relative
URI
- this way the link to the relation description itself works even if the server is running in a different address (i.e. most likely localhost:someport). Information about the link relations must be stored somewhere. Note that this is intended for client developers i.e. humans. In our case a simple HTML document with anchors for each relation should be sufficient. This is why our namespace name ends with #. It makes it convenient to find each relation's description. Before moving on, here's the full list of custom link relations our API uses:
add-album, add-artist, add-track, albums-all, albums-by, albums-va, artists-all, delete
.API Map¶
The last order of business in designing our API is to create a full map with all the
resources
and hypermedia
controls
visible. This a kind of a state diagram where resources are states and controls are transitions. Generally speaking only GET methods are used to moving from one state to another because other methods don't return a resource representation
. We have presented other methods as arrows that circle back to the same state. Here's the full map in all its glory. NOTE: The box color codes are only included for educational purposes to show you how data from the database is connected to resources - you don't need to share implementation details like this in real life, or your course project for that matter.
NOTE 2: the
link relation
"item" does not exist, this is actually "self". In this diagram "item" is used to indicate that this is a transition to an item from a collection through the item's "self" link.A map like this is useful when designing the API and should be done before designing individual representations returned by the API. As all actions are visible in a single diagram, it's easier to see if something is missing. When making the diagram keep in mind that there must be a path from every state to every other state (
connectedness
principle). In our case we have three separate branches in the URI
tree and therefore we have to make sure to include transitions between brances (e.g. AlbumCollection resource
has "artists-all" and "albums-va"). Entry Point¶
A final note about mapping API is the
entry point
concept. This should be at the root of the API (in our case: /api/
. It's kind of like the API's index page. It's not a resource
, and isn't generally returned to (which is why it isn't in the diagram). It just shows the reasonable starting options a client has when "entering" the API. In our case it should have controls
to GET either the artists collection or the albums collection (potentially also the VA album collection). Advanced Controls with Schema¶
Up to now we have defined possible actions by using
hypermedia
. Each action comes with a link relation
that has an explicit meaning, address for the associated resource, and the HTTP method
to use. This information is sufficient for GET and DELETE requests, but not quite there for POST and PUT - we still don't know what to put in the request body
. Mason supports adding JSON Schema
to hypermedia controls. A schema object can be attached to a Mason hypermedia
control
by assigning it to the "schema" attribute. If the schema is particularly large or you have another reason to not include it in the response body, you can alternatively provide the schema from a URL on your API server (e.g. /schema/album/) and assign the URL to the "schemaUrl" attribute so that clients can retrieve it. The client can then use the schema to form a proper request when sending data to your API. Whether a machine client can figure out what to put into each attribute is a different story. One option is to use names that conform to a standard e.g. we could use the same attribute names as IDv2 tags in MP3 files. Schemas are particularly useful for (partially) generated clients that have human users. It's quite straightforward to write a piece of code that generates a form from a schema so that the human user can fill it. We'll show this in the last exercise of the course. Below is an example of a POST method control with schema:
{
"@controls": {
"mumeta:add-artist": {
"method": "POST",
"encoding": "json",
"title": "Add a new artist",
"schema": {
"type": "object",
"properties": {
"name": {
"description": "Artist name",
"type": "string"
},
"location": {
"description": "Artist's location",
"type": "string"
},
"formed": {
"description": "Formed",
"type": "string",
"format": "date"
},
"disbanded": {
"description": "Disbanded",
"type": "string",
"format": "date"
}
},
"required": [
"name",
"location"
]
}
}
}
}
Schemas can also be used for resources that use
query parameters
. In this case they will described the available parameters and values that are accepted. As an example we can add a query parameters that affects how the all albums collection is sorted. Here's the "mumeta:albums-all" control with the schema added. Note also the addition of "isHrefTemplate", and that "type": "object"
is omitted from the schema.{
"@controls": {
"mumeta:albums-all": {
"href": "/api/albums/?{sortby}",
"title": "All albums",
"isHrefTemplate": true,
"schema": {
"properties": {
"sortby": {
"description": "Field to use for sorting",
"type": "string",
"default": "title",
"enum": ["artist", "title", "genre", "release"]
}
},
"required": []
}
}
}
}
Client Example¶
In order to give you some idea about why we're going through all this trouble and adding a bunch of bytes to our payloads, let's consider a small example from the client's perspective. Our client is a submission bot that browses its local music collection and sends metadata to the API for artists/albums that do not exist there yet. Let's say its local collection is grouped by artists, then albums. Let's say it's currently examining an artist folder ("Miaou") that contains one album folder ("All Around Us"). The goal is to see if this artist is in the collection, and whether it has this album.
- bot enters the api and finds the artist collection by looking for a hypermediacontrolnamed "mumeta:artists-all"
- bot sends a GET to the artist collection using the hypermedia control's href attribute
- bot looks for an artist named "Miaou" but doesn't find it
- bot looks for "mumeta:add-artist" hypermedia control
- bot compiles a POST request using the control's href attribute and the associated JSON schema
- after sending the POST request, the bot discovers the artist's address from the response's location header
- bot sends a GET to the address it received
- from the artist representation the bot looks for the "mumeta:albums-by" hypermedia control
- bot send a GET to the control's href attribute, receiving an empty album collection
- since the album is not there, bot looks for "mumeta:add-album" control
- bot compiles a POST request using the control's href attribute and the associated JSON schema
The important takeaway from this example is that the bot doesn't need to now any
URIs
besides /api/
. For everything else it has been programmed to look for link relations
. All the addresses it visits are parsed from the responses it gets. They could be completely arbitrary and the bot would still work. Depending on the bot's AI it can survive quite drastical API changes (for example when it GETs the artist representation and finds a bunch of controls, how exactly has it been programmed to follow "mumeta:albums-by"?) One really cool thing about hypermedia APIs is that they usually have a generic client to browse any API if it's valid. The client will generate a human-usable web site by using hypermedia controls to provide links from one view to another, and schemas to generate forms.
Hypermedia Profiles¶
By adding
hypermedia
we have managed to create APIs that machine clients can navigate once they have been taught the meaning of each link relation
, and the meaning of each attribute in resource representations
. But how exactly does the machine learn these things? This is a ongoing challenge for API development - for now one way is to educate the human developers by using resource profiles
. Profiles describe the semantics of resources in human-readable format. This way human developers can transfer this knowledge to their client, or a human user of a client can use this knowledge when navigating the API.What's in a Profile?¶
There's no universal consensus about what exactly should be in a profile, or how to write one. Regardless of how it's written, the profile should have semantic descriptors for attributes (of the resource representation) and protocol semantics for actions that can be taken (or a list of link relations associated with the resource). Collections don't necessarily have their own profiles, like in our example they don't. Except for album since it is both an item and a collection.
If your resource represents something that is relatively common, using attributes defined in a standard (or standard proposal) is recommended. If your entire resource representation can conform to a standard, all the better. You can look for standards in https://schema.org/. One important future step for our example API would be to use attributes from this schema for albums and tracks.
Distributing Profiles¶
Like
link relations
, information about your profiles
should be accessible from somewhere. In our example we have chosen to distribute them as HTML pages from the server using routing
/profiles/{profile_name/
. Links to profiles can be inserted as hypermedia
controls
using the "profile" link relation. For example, to link the track profile from a track representation
:{
"@controls": {
"profile": {
"href": "/profiles/track/"
}
}
}
Another possibility is to use HTTP Link
header
in responses. Link: <http://where.ever.the.server.is/profiles/track/>; rel="profile"
However this is somewhat more ambiguous. Our album
resource
is an example that actually should link to two profiles - album and track. For this reason we have included profiles as hypermedia controls, and for collection types we have included one with every item. Implementing Hypermedia¶
Hypermedia is essentially a bunch of JSON that is added to GET responses. At its core this is a rather simple matter of adding more content to the dictionaries we get from
serializing
model instances. As the Mason syntax is quite verbose, simply hardcoding these additions to response body dictionaries is a fast lane to trouble town. In this section we will discuss how to be a bit more systematic when adding hypermedia to responses.Subclass Solution¶
In Mason the root type of a hypermedia response is
JSON
object which - as we have learned - is in most ways the equivalent of a Python dictionary. However if you go and define the entire response as a dictionary in each resource method separately, the likelihood of introducing inconsistencies is quite high. Furthermore the code becomes cumbersome to maintain. For any applications that produce JSON, a good development pattern is to create a dictionary subclass that includes a number of convenience methods that automatically manage integrity of the selected JSON format. As stated before, our chosen hypermedia type for examples in this course is Mason. There are three special attributes in Mason JSON documents that we use commonly:
"@controls"
, "@namespaces"
and "@error"
. Just to give you an idea of what we're trying to avoid, here's how we would need to make a Mason document with one namespace
and control
with normal dictionaries: body = artist.serialize()
body["@namespaces"] = {
"mumeta": {
"name": "/musicmeta/link-relations/#"
}
}
body["@controls"] = {
"mumeta:albums-by": {
"href": api.url_for(AlbumCollection, artist=artist)
}
}
Putting stuff like this - and usually in bigger numbers - is incredibly messy. What we want to achieve is, instead, something like this:
body = MasonBuilder(**artist.serialize())
body.add_namespace("mumeta", "/musicmeta/link-relations/#")
body.add_control("mumeta:albums-by", api.url_for(AlbumCollection, artist=artist))
Without doubt this looks much cleaner. The MasonBuilder class would take care of details about how exactly to add the namespace and control into the resulting document. If something about that changed, making the change in the class would also apply the change to all resource methods. So, how does this look on the class itself? Something like this:
class MasonBuilder(dict):
def add_namespace(self, ns, uri):
if "@namespaces" not in self:
self["@namespaces"] = {}
self["@namespaces"][ns] = {
"name": uri
}
def add_control(self, ctrl_name, href, **kwargs):
if "@controls" not in self:
self["@controls"] = {}
self["@controls"][ctrl_name] = kwargs
self["@controls"][ctrl_name]["href"] = href
Observe how
MasonBuilder
extends the dict Python class, so the way of creating a MasonBuilder
is exactly the same to create a dictionary using the dict
class.Implementation detail: if you have not seen
**kwargs
used before, this is a Python feature called packing/unpacking. It's a wildcard catch for keyword arguments given to the function/method when it's called: all such arguments will be packed into the kwargs dictionary. So when we call this method with method="POST"
the kwargs will be end up like this: {"method": "POST"
}. This feature is also used in the dict __init__ method (which we inherit), you can give it keyword arguments to initialize it with a bunch of keys.Because each object should have only one
"@controls"
and "@namespaces"
attributes, it makes sense to automatically create this when the first namespace/control is added. We can add a similar method for the "@error"
attribute: def add_error(self, title, details):
self["@error"] = {
"@message": title,
"@messages": [details],
}
You can download the entire class with docstrings added from below. If you're using the more elaborate project structure, this class is something that definitely belongs into the
utils.py
file and should be imported to other modules with from sensorhub.utils import MasonBuilder
. The file below also incudes utility methods for adding PUT, POST, and DELETE controls.Since items in a colletion type
resource
can have their controls, you should construct them as MasonBuilder instances instead of dictionaries. This way you can add controls
to them just as effortlessly as you can to the root object. Let's take an example of how to add the very important "self" relation
to each sensor in the sensors collection resource representation
. body = MasonBuilder(items=[])
for artist in Artist.query.all():
item = MasonBuilder(artist.serialize(short_form=True))
item.add_control("self", api.url_for(Artist, artist=artist))
body["items"].append(item)
Note: instead of passing the serialize method's result to the MasonBuilder constructor like we do in this example, you can change the seriliaze method itself to initialize a MasonBuilder instance of a normal dictionary.
API Specific Subclasses¶
While the builder class we gave you goes quite a long way, making an API specific subclass can reduce the amount of boilerplate code in view methods a little bit more. For instance, to create a control for creating artist resources with POST, a function call like this is required:
body.add_control_post(
"mumeta:add-artist",
"Add a new artist",
api.url_for(ArtistCollection),
Artist.json_schema()
)
If you have to put multiple such method calls into your view methods, they get very bulky. It would probably be better if we could simply do:
body.add_control_add_artist()
This of course require that the gritty details of the add_control_post function call are hidden away somewhere else. A good way to hide these details is to subclass MasonBuilder, and put these convenience methods there.
class MusicMetaBuilder(MasonBuilder):
def add_control_add_artist(self):
self.add_control_post(
"mumeta:add-artist",
"Add a new artist",
api.url_for(ArtistCollection),
Artist.json_schema()
)
This case is very simple because the control has no variables at all. But we can also take an example with variables, like the PUT method control for tracks:
def add_control_edit_track(self, artist, album, disc, track):
self.add_control_put(
"Edit this track",
api.url_for(
TrackItem,
artist=artist,
album=album,
disc=disc,
track=track
),
Track.json_schema()
)
Now we can get a control for editing a track with a much simpler function call (all the variable values come from the view method's parameters).
body.add_control_edit_track(artist, album, disc, track)
With the gritty details carefully hidden away, the view method code will stay much more compact, making it much more easier to see what the view method actually does. For instance, an artist item has quite a few controls to it, but the view method code stays quite neat:
class ArtistItem(Resource):
def get(self, artist):
body = MusicMetaBuilder(artist.serialize())
body.add_namespace("mumeta", LINK_RELATIONS_URL)
body.add_control("self", href=request.path)
body.add_control("profile", href=ARTIST_PROFILE_URL)
body.add_control("collection", href=api.url_for(ArtistCollection))
body.add_control_albums_all()
body.add_control_albums_by(artist)
body.add_control_edit_artist(artist)
body.add_control_delete_artist(artist)
return Response(json.dumps(body), 200, mimetype=MASON)
Responses and Errors¶
We briefly discussed Flask's
response object
in the Resource Locator task where we used it to set custom headers
. We have now arrived at a stage where we should actually be using it for all responses. This is largely because we need to announce the content type of our responses
, and this is done by using the mimetype keyword argument. Because we're using Mason, we need to set it to "application/vnd.mason+json"
. Since this will be repeated in every GET method, it'd be wise to make a constant of it (i.e. MASON = "application/vnd.mason+json"
). From now on a typical 200 response would look like:return Response(json.dumps(body), 200, mimetype=MASON)
We went back to using json.dumps because Response takes the
response body
as a string. We can also change all 201 and 204 responses accordingly. We already learned how to do the 201 response with Location header, and a 204 response is even simpler: return Response(status=204)
We have now resolved issues regarding responses in the 200 range (i.e. successful operations). What about errors in the 400 range? Mason also defines what errors should look like. In fact, we actually already implemented the add_error method into our MasonBuilder dictionary subclass. However, even with that, returning an error becomes a multiline effort, and we don't really want that because most of it is boilerplate and resource methods typically return errors at multiple points of their execution. Let's make a convenience function for generating errors:
def create_error_response(status_code, title, message=None):
resource_url = request.path
body = MasonBuilder(resource_url=resource_url)
body.add_error(title, message)
body.add_control("profile", href=ERROR_PROFILE)
return Response(json.dumps(body), status_code, mimetype=MASON)
This generates a Mason error messages with a title, and one message with more description about the problem. It also puts the resource URL into the response body, just in case the client forgot what it was trying to do (sometimes actually relevant, e.g. asynchronous use cases). Now instead of writing all that whenever an error is encountered in a resource method, we can just write:
return create_error_response(404, "Not found", "No sensor was found with the given name")
Static Parts of Hypermedia¶
In addition to generating
hypermedia
representations for resources
, a fully functional hypermedia API should also serve some static content. Namely: link relations
and resource profiles
. Also if you have particularly large schemas and would rather serve them separately from resource representations
, these should also be served as static content. For profiles and link relations, you can send them out as static files. As Static Files¶
If your project doesn't have a static folder yet, now's the time to create one. It's also recommended to create some subfolders to keep things organized, e.g.
static ├── profiles └── schema
In order to use a static folder, it must be registered with Flask. This is done when initializing the app:
app = Flask(__name__, static_folder="static")
Static views are
routed
with @app.route
. If you are storing profiles and such locally as html files, you can implement these views quite easily by using Flask's send_from_directory
function which sends the contents of a file as the response body
- you should add it to your growing from flask import
line. With profiles you can use one route definition and view function for all the profiles, like this:@app.route("/profiles/<resource>/")
def send_profile_html(resource):
return send_from_directory(app.static_folder, "{}.html".format(resource))
The
send_from_directory
function is convenient enough that it will send a 404 response if the file is not found. Because link relation descriptions are not particularly lengthy they can be gathered into a single file, served in a similar manner:@app.route("/sensorhub/link-relations/")
def send_link_relations_html():
return send_from_directory(app.static_folder, "links-relations.html")
If you are using schema files, you can send them out in a similar manner. You will also need these URLs often in your code since almost all responses will include the
namespace
which requires the link relation URL, and likewise all resource representations have at least one profile link. Therefore you should probably at least introduce them as constants in your code, e.g.SENSOR_PROFILE = "/profiles/sensor/"
MEASUREMENT_PROFILE = "/profiles/measurement/"
LINK_RELATIONS_URL = "/sensorhub/link-relations/"
If you are using the more elaborate project structure, consider putting these into their own file, e.g.
constants.py
. Documenting Hypermedia¶
So we documented our API with Swagger. Then we self-documented the API with hypermedia. Now the two are no longer in sync, so the final step is to update the OpenAPI documentation to match the new and improved hypermedia API we have built. Frankly there isn't much to do, just need to update the examples in all GET method responses.
Snatching Bodies with Requests¶
So basically what we want to have for, e.g. the documentation of the GET method for a single artist in our API, looks like this:
parameters:
- $ref: '#/components/parameters/artist/'
responses:
'200':
content:
application/vnd.mason+json:
example:
'@controls':
collection:
href: /api/artists/
edit:
encoding: json
href: /api/artists/scandal/
method: PUT
schema:
properties:
disbanded:
description: Disbanded
format: date
type: string
formed:
description: Formed
format: date
type: string
location:
description: Artist's location
type: string
name:
description: Artist name
type: string
required:
- name
- location
type: object
title: Edit this artist
mumeta:albums-all:
href: /api/albums/?sortby={sortby}
isHrefTemplate: true
schema:
properties:
sortby:
default: title
description: Field to use for sorting
enum:
- artist
- title
- genre
- release
type: string
required: []
type: object
title: All albums
mumeta:albums-by:
href: /api/artists/scandal/albums/
mumeta:delete:
href: /api/artists/scandal/
method: DELETE
title: Delete this artist
profile:
href: /profiles/artist/
self:
href: /api/artists/scandal/
'@namespaces':
mumeta:
name: /musicmeta/link-relations#
disbanded: null
formed: '2006-08-01'
location: Osaka, JP
name: Scandal
unique_name: scandal
'404':
description: The artist was not found
Now that's a handful. There is no way we are writing anything like this manually into the documentation files. Even if we replace the schemas with references, there's just too many things here that it's way too easy to forget something. Luckily, just like with schemas earlier, we can just pull the examples from our own code. Not quite as directly, but close enough.
Regardless of what your API is, the first step is to populate the database with enough data to have examples for everything. Then you can simply go through your routes with requests and use PyYaml to dump the responses into YAML format. Then you can just place the example into wherever you need it. Here's the basic way to do it:
import requests
import yaml
import os.path
SERVER_ADDR = "http://localhost:5000/api"
DOC_ROOT = "./doc/"
DOC_TEMPLATE = {
"responses":
"200":
"content":
"application/vnd.mason+json":
"example": {}
}
resp_json = requests.get(SERVER_ADDR + "/artists/scandal/").json()
DOC_TEMPLATE["responses"]["200"]["content"]["application/vnd.mason+json"]["example"] = resp_json
with open(os.path.join(DOC_ROOT, "artist/get.yml"), "w") as target:
target.write(yaml.dump(resp_json, default_flow_style=False))
A small note on folder naming in this example. The MusicMeta API defines endpoint names for resources because it uses the same resource class for multiple endpoints (to separate single artist and VA albums). So the doc folder naming is based on endpoint names instead of resource class names. Resource class name in lowercase is simply the default endpoint name assigned by Flask Restful if none is given.
After running this for all resources, you'd then simply add the remaining details like parameters and error codes. With this as a base operation you could quite easily build your own automation machinery that uses e.g. your API test database, and updates all examples in all documentation files. Or, now that you understand the basic process, you could look into libraries that provide this kind of automation.
Anna palautetta
Kommentteja materiaalista?