In database terminology primary key refers to the column in a table that's intended to be the primary way of identifying rows. Each table must have exactly one, and it needs to be unique. This is usually some kind of a unique identifier associated with objects presented by the table, or if such an identifier doesn't exist simply a running ID number (which is incremented automatically).
Implementing Hypermedia Clients¶
In this final exercise we're visiting the other side of the API puzzle: the clients. We've been discussing the advantages
hypermedia
has for client developers for quite a bit. Now it's time to show how to actually implement these mythical clients. This exercise material has two parts: in the first one we make a fully automated machine client (Python script). In the second part we'll go over another means of communication between services using task queues with RabbitMQ.API Clients with Python¶
Our first client is a submission script that manages its local MP3 files and compares their metadata against ones stored in the API. If local files have data that the API is missing, it automatically adds that data. If there's a conflict it just notifies a human user about it and asks their opinion - this is not an AI course after all.
Learning goals: Using Python's requests library to make
HTTP requests
.Preparations¶
Another exercise, another set of Python modules. We're using Requests for making API calls. Another module that is used in the latter half of the exercise is Pika, a RabbitMQ library for Python. Installation into your virtual environment as usual.
pip install requests pip install pika
In order to test the examples in this material, you will need to run the MusicMeta API. By now we expect you to know how to do that. The code is unchanged from the end of the previous exercise, but is provided below for convenience.
Using Requests¶
The basic use of Requests is quite simple and very similar to using Flask's test client. The biggest obvious difference is that now we're actually making a
HTTP request
. Like the test client, Requests also has a function for each HTTP method
. These functions also take similar arguments: URL as a mandatory first argument, then keyword arguments like headers, params and data (for headers
, query parameters
and request body
respectively). It also has a keyword argument json as a shortcut or sending a Python dictionary as JSON. For example, to get artists collection:In [1]: import requests
In [2]: SERVER_URL = "http://localhost:5000"
In [3]: resp = requests.get(SERVER_URL + "/api/artists/")
In [4]: body = resp.json()
For another example, here is how to send a POST request, and read the Location header afterward:
In [5]: import json
In [6]: data = {"name": "PassCode", "location": "Osaka, JP"}
In [7]: resp = requests.post(SERVER_URL + "/api/artists/", json=data)
In [8]: resp.headers["Location"]
Out[8]: '/api/artists/passcode/'
Often when making requests using
hypermedia
controls the client should use the method included in the control
element. When doing this, using the request function is more convenient than using the method specific ones. Assuming we have the control as a dictionary called crtl:In [10]: resp = requests.request(ctrl["method"], SERVER_URL + ctrl["href"])
Using Requests Sessions¶
Our intended client is expected to call the API a lot. Requests offers sessions which can help improve the performance of the client by reusing TCP connections. It can also set persistent
headers
which is helpful in sending the Accept header, as well as authentication tokens for APIs that use them. Sessions should be used as context managers using with statement to ensure that the session is closed.In [1]: import requests
In [2]: SERVER_URL = "http://private-xxxxx-yourapiname.apiary-mock.com"
In [3]: with requests.Session() as s:
...: s.headers.update({"Accept": "application/vnd.mason+json"})
...: resp = s.get(SERVER_URL + "/api/artists/")
With this setup, when using the session object to send
HTTP requests
, all the session headers are automatically included. Any headers defined in the request method call are added on top of the session headers (taking precedence in case of conflict).Basic Client Actions¶
The client code we're about to see makes some relatively sane assumptions about the API. First of all, it works with the assumption that
link relations
that are promised in the API resource
state diagram are present in the representations sent by the API. Furthermore it trusts that the API will not send broken hypermedia
controls
or JSON schema
. It will also have issues if new mandatory fields are added for POST and PUT requests (but we'll make it easy to update in this regard).We're not going to show the full code here, only the parts that actually interact with the API (but you can download the full code later). Furthermore, while the client was tested with actual MP3 files, it might be easier for you to simply fake the tag data by creating a data class with necessary attributes, e.g. (only in Python 3.7. or newer)
from dataclasses import dataclass
@dataclass
class Tag:
title: str
album: str
track: int
year: str
disc: int
disc_total: int
In older Python versions you need to make a normal class and write the __init__ method yourself (data classes implement this kind of __init__ automatically).
class Tag:
def __init__(self, title, album, track, year, disc=1, disc_total=1):
self.title = title
self.album = album
self.track = track
self.disc = disc
self.disc_total = disc_total
self.year = year
Learning goals: How to navigate an API with an automated client, and send requests. Taking advantage of hypermedia to implement dynamic clients.
Client Workflow¶
The submission script works by going through the local collection with the following order of processing:
- check first artist
- check first album by first artist
- check each track on first album
- check second album by first artist
and so on, creating artists, albums and tracks as needed. It also compares data and submits differences. It trusts the local curator more than the API, always submitting the local side as the correct version. However when it doesn't have data for some field, it uses the API side value. Since MP3 files don't have metadata about artists, it uses "TBA" for the location field (because it is mandatory).
GETting What You Need¶
The key principles of navigating a
hypermedia
API are:- start at the entry point
- follow the link relationsthat will lead to your goal
This way your client doesn't give two hoots even if the API changed its
URIs
arbitrarily on a daily basis, as long as the resource state diagram remains unchanged. Our submission script needs to start at the artist collection. However, instead of starting the script with a GET to /api/artists/, it should start digging at the entry point /api/ and find the correct URI for the collection it's looking for by looking at the "href" attribute of the "mumeta:artists-all" control
.With this in mind, this is how the client should start its interaction with the API:
with requests.Session() as s:
s.headers.update({"Accept": "application/vnd.mason+json"})
resp = s.get(API_URL + "/api/")
if resp.status_code != 200:
print("Unable to access API.")
else:
body = resp.json()
artists_href = body["@controls"]["mumeta:artists-all"]["href"]
This is the only time the entry point is visited. From now on we'll be navigating with the link relations of
resource representations
(starting with the artist collection's representation). With the artist collection at hand, we can start to check artists from the local collection one by one.def check_artist(s, name, artists_href):
resp = s.get(API_URL + artists_href)
body = resp.json()
artist_href = find_artist_href(name, body["items"])
if artist_href is None:
artist_href = create_artist(s, name, body["@controls"]["mumeta:add-artist"])
resp = s.get(API_URL + artist_href)
body = resp.json()
albums_href = body["@controls"]["mumeta:albums-by"]["href"]
We've chosen to fetch the artist collection anew for each artist, for the off chance that the artist we're checking is added by another client while we were processing the previous one. The first order of things is to go through the "items" attribute and look if the artist is there. Remembering the non-uniqueness issue with artist names, our script falls back to the human user to make a decision in the event of finding more than one artist with the same name. Doing comparisons in lowercase avoids capitalization inconsistencies.
def find_artist_href(name, collection):
name = name.lower()
hits = []
for item in collection:
if item["name"].lower() == name:
hits.append(item)
if len(hits) == 1:
return hits[0]["@controls"]["self"]["href"]
elif len(hits) >= 2:
return prompt_artist_choice(hits)
else:
return None
Assuming we find the artist, we can now use the item's "self" link relation to proceed into the artist resource. This is only an intermediate step that is needed (according to the state diagram) in order to find the "mumeta:albums-by" control for this artist. This is the resource we need for checking the artist's albums. We have skipped exception handling because we trust the API to adhere to its own documentation (also for the sake of brevity).
Schematic POSTing¶
When something doesn't exist, the submission script needs to obviously send it to the API. We're skipping ahead a bit to creating albums and tracks. For both the data comes from MP3 tags (for albums we take the first track's tag as the source). The POST
request body
for both can also be composed in a similar manner thanks to JSON schema
included in the hypermedia
control. The basic idea is to go through properties in the schema and for each property:- find the corresponding local value (i.e. MP3 tag field)
- convert the value into the correct format using the property's "type" and related fields (like "pattern" and "format" for strings)
- add the value to the message bodyusing the property name
In the event that a corresponding value is not found, the client can check whether that property is required. If it's not required, it can be safely skipped. Otherwise the client needs to figure out (or ask a human user) how to determine the correct value. We've chosen not to implement this part in the example though. It would be relevant only if the API added new attributes to its resources.
As a reminder of what it looks like, here's the "mumeta:add-album" control from the the albums collection
resource
:"mumeta:add-album": {
"href": "/api/artists/scandal/albums/",
"title": "Add a new album for this artist",
"encoding": "json",
"method": "POST",
"schema": {
"type": "object",
"properties": {
"title": {
"description": "Album title",
"type": "string"
},
"release": {
"description": "Release date",
"type": "string",
"pattern": "^[0-9]{4}-[01][0-9]-[0-3][0-9]$"
},
"genre": {
"description": "Album's genre(s)",
"type": "string"
},
"discs": {
"description": "Number of discs",
"type": "integer",
"default": 1
}
},
"required": ["title", "release"]
}
}
As it turns out, we only need one function for constructing POST requests for both albums and tracks:
def create_with_mapping(s, tag, ctrl, mapping):
body = {}
schema = ctrl["schema"]
for name, props in schema["properties"].items():
local_name = mapping[name]
value = getattr(tag, local_name)
if value is not None:
value = convert_value(value, props)
body[name] = value
resp = submit_data(s, ctrl, body)
if resp.status_code == 201:
return resp.headers["Location"]
else:
raise APIError(resp.status_code, resp.content)
In this function, tag is an object. In real use it's an instance of
tinytag.TinyTag, but can also be instance of the class Tag we showed earlier. The ctrl parameter is a dictionary picked from the resource's
controls
(e.g. "mumeta:add-album"). The mapping parameter is a dictionary with API side resource attribute names as keys and the corresponding MP3 tag fields as values. The knowledge of what goes where comes from reading the API's resource profiles
. As an implementaion note, we're also using the getattr function which is how object's attributes can be accessed using strings in Python (as opposed to normally being accessed as e.g. tag.album).The mapping dictionary for albums looks like this, where keys are API side names and values are names used in the tag objects.
API_TAG_ALBUM_MAPPING = {
"title": "album",
"discs": "disc_total",
"genre": "genre",
"release": "year",
}
Since all values are not stored in the same type or format as they are required to be in the request, the
convert_value function (shown below) takes care of conversion:def convert_value(value, schema_props):
if schema_props["type"] == "integer":
value = int(value)
elif schema_props["type"] == "string":
if schema_props.get("format") == "date":
value = make_iso_format_date(value)
elif schema_props.get("format") == "time":
value = make_iso_format_time(value)
return value
Finally, notice how we have put
submit_data as its own function? What's great about this function is that it works for all POST and PUT requests in the client. It looks like this:def submit_data(s, ctrl, data):
resp = s.request(
ctrl["method"],
API_URL + ctrl["href"],
data=json.dumps(data),
headers = {"Content-type": "application/json"}
)
return resp
Overall this solution is very dynamic. The client makes almost every decision using information it obtained from the API. The only thing we had to hardcode was the mapping of resource attribute names to MP3 tag field names. Everything else regarding how to construct the request is derived from the hypermedia control: what values to send; in what type/format; where to send the request and which HTTP method to use. Not only is this code resistant to changes in the API, it is also very reusable.
Of course if the control has "schemaUrl" instead of "schema", the additional step of obtaining the schema from the provided URL is needed, but is very simple to add.
To PUT or Not¶
When using dynamic code like the above example, editing a resource with PUT is a staggeringly similar act to creating a new one with POST. The bigger part of editing is actually figuring out if it's needed. Once again the core of this operation is the
schema
. One reason to use the schema instead of the resource representation's
attributes is that the attributes can contain derived attributes that should not be submitted in a PUT request (e.g. album resource does have "artist" attribute, but the value cannot be changed).In order to decide whether it should send a PUT request, the client needs to compare its local data against the data it obtained from the API regarding an album or a track. For comparisons to make sense, we need to once again figure out what are the corresponding local values, and convert them into the same type/format. This process is very similar to what we did in the
create_with_mapping function above, and in fact most of its code can be copied into a new function called compare_with_mapping:def compare_with_mapping(s, tag, body, schema, mapping):
edit = {}
change = False
for field, props in schema["properties"].items():
api_value = body[field]
local_name = mapping[field]
tag_value = getattr(tag, local_name)
if tag_value is not None:
tag_value = convert_value(tag_value, props)
if tag_value != api_value:
change = True
edit[field] = tag_value
continue
edit[field] = api_value
if change:
try:
ctrl = body["@controls"]["edit"]
except KeyError:
resp = s.get(API_URL + body["@controls"]["self"]["href"])
body = resp.json()
ctrl = body["@controls"]["edit"]
submit_data(s, ctrl, edit)
Overall this process looks very similar. There's just the added step of checking whether a field needs to be updated, and marking the change flag as True the first time a difference is discovered. Note also that for albums we're doing this comparison for the album resource, but for tracks we are actually doing it to track data that's in the album resource's "items" listing. This way we don't need to GET each individual track unless it needs to be updated. When this happens, we actually need to first GET the track and then find the edit
control
from there. This explains why we're not directly passing a control to this method, and also why finding the control at the end has the extra step if an edit control is not directly attached to the object we're comparing.Fun fact: if at a later stage the API developer chooses to add the edit control to each track item in the album resource, this code would find that, making the extra step unnecessary. Sometimes clients can apply logic to find a control that's not immediately available. Following the self
link relation
of an item in a collection is a good guess about where to find additional controls related to that item.A final reminder about PUT: remember that it must send the entire representation, not just the fields that have changed. The API should use the request body to replace the resource entirely. That is why we're always adding the API side value to fields when we don't have a new value for that field.
Closing Remarks and Full Example¶
Although this was a specific example, it should give you a good idea about how to approach client development in general when accessing a
hypermedia
API: minimize assumptions and allow the API resource representations
to guide your client. When you need to hardcode logic, always base it on information from profiles
. Always avoid working around the API - workarounds often rely on features that are not officially supported by the API, and may stop working at any time when the API is updated. Having a client that adjusts itself to the API is also respectful towards the API developer, making the job of maintaining the API much easier when there aren't clients out there relying on ancient/unintended features.Here's the full example. If you want to run it without modifications, you need to actually have MP3 files with tag data that matches your Apiary documentation's examples. The submission script doesn't currently support VA albums.
