In database terminology primary key refers to the column in a table that's intended to be the primary way of identifying rows. Each table must have exactly one, and it needs to be unique. This is usually some kind of a unique identifier associated with objects presented by the table, or if such an identifier doesn't exist simply a running ID number (which is incremented automatically).
Auxiliary API Example¶
This page shows you a short example of adding another API to the SensorHub system. This auxiliary API is used for retrieving measurement batches from the main API based on some criteria (in this example, a timestamp interval). This example combines ideas from Exerise 4 with some API server code from previous exercises.
The API¶
This is a very simple REST API with only two resources: sensor collection and measurement collection. The latter is a search type resource: its exact contents are defined by its two query parameters from and to. The API has only three routes:
/api/ /api/sensors/ /api/sensors/{sensor_name}/measurements/
This API does not have its own database, it simply gets its data from the main API. Sensor collection resource is almost an exact copy of the same collection in the main API. The only difference is in what
hypermedia
controls
are included in the [!term=Resource Representation!]. Each sensor in the list has one control: the GET method for retrieving measurements between two timestamps. They do not even have self, because the sensors are not managed in any way through this API. In order to not flood the main API with needless API calls, this auxiliary API should do some caching. For the purposes of this example, we are simply caching for a short while. This is helpful in that if a clien requests the same resource within one session, our API probably has it cached.
Implementation¶
In this section we'll go through the main points of the implementation. The full example can be downloaded from below:
Sensors Resource¶
Let's start by looking at the simpler scenario, showing the list of available sensors. First, the resource class itself to provide context. By this point there should not be anything particularly shocking about this.
class SensorCollection(Resource):
@cache.cached(timeout=10)
def get(self):
api_data = get_sensors()
resp_data = MasonBuilder(
items=[]
)
resp_data.add_namespace("measag", LINK_RELATIONS_URL)
resp_data.add_control("self", api.url_for(SensorCollection))
for sensor in api_data["items"]:
item = MasonBuilder(
name=sensor["name"],
model=sensor["model"],
location=sensor["location"],
)
item.add_control(
"measag:measurements",
api.url_for(MeasurementCollection,
sensor=sensor["name"]
) + "?from={from}&to={to}",
kwargs={
"isHrefTemplate": True,
"schema": {
"properties": {
"from": {
"type": "string",
"format": "date-time",
},
"to": {
"type": "string",
"format": "date-time",
},
},
"required": ["from", "to"]
}
}
)
resp_data["items"].append(item)
return Response(json.dumps(resp_data), 200, mimetype=MASON)
As can be seen, we get the main API data, and wrap it into a new resource representation with different hypermedia controls. We also use a different namespace, "measag" (measurement aggregate) as this is a different API after all. The cache timeout is set to 10 seconds. We don't want it to be too high because this API is not currently able to follow changes to sensors in real time. We still want to have it though, so the sensors list isn't fetched constantly.
In order to get data from the API, we use several utility functions that wrap the API calls and their exception handling in a nice way. The base utility function is call_api, shown below:
def call_api(s, path):
try:
resp = s.get(
app.config["SENSORHUB_API_SERVER"] + path,
timeout=app.config["SENSORHUB_API_TIMEOUT"],
)
return resp.json()
except Exception as e:
raise ServiceUnavailable
Where s is a Python requests session object, and path is the
URI
being accessed (without host part
). We also want to use timeout here - otherwise this API could be stuck forever waiting for a response from the main API. For exception handling we are simply converting all of them into 503 Service Unavailable to indicate that the API is currently down. In a real project we should at least log the error properly so that API developers can check where the problem lies. We also have a second utility function that is quite similar to a function used for the same purpose in Exercise 4.
No lexer for alias 'follow_rel' founddef follow_rel(s, doc, link_rel): try: path = doc["@controls"][link_rel]["href"] except KeyError: raise ServiceUnavailable else: data = call_api(s, path) return data
This function will follow a link relation (link_rel) from a given document (doc). If there is a KeyError while trying to find the "herf" attribute for the named link relation, we will once again respond with Service Unavailable. This is because the hypermedia format on the main API is not conforming to what we were told in its documentation, and we have no way to automatically address the issue. Again, this should be logged. If we find the address, then we just use the previous function to retrieve its contents.
Finally we have the get_sensors function itself.
def get_sensors():
with requests.Session() as s:
s.headers.update({"Accept": "application/vnd.mason+json"})
entry = call_api(s, "/api/")
return follow_rel(s, entry, "senhub:sensors-all")
This function does two API calls: first the entry point, and then whatever URI the "senhub:sensors-all" control is pointing to. The end result is that the contents of the main API's sensor collection resource are returned. We are letting the ServiceUnavailable exception go through so that it gets returned to the client in case anything goes wrong. After this it is up to the get method we showed earlier to grab the list of sensors from the data, add its own hypermedia controls, and serve it to the client.
Measurements Resource¶
Our second and final resource is the measurements collection. This is a filter type resource that takes two timestampts from the
query parameters
and fetches all measurements from the main API from the designated sensor between the designated timestamps. Last time we saw the main server it used pagination for measurements. This means this new API needs to fetch pages until it has all the measurements it wants. The resource code itself is quite straightforward:class MeasurementCollection(Resource):
def get(self, sensor):
try:
start = datetime.fromisoformat(request.args["from"])
end = datetime.fromisoformat(request.args["to"])
except (KeyError, ValueError):
raise BadRequest
resp_data = MasonBuilder()
resp_data["items"] = get_measurements(
sensor, start, end
)
resp_data.add_namespace("measag", LINK_RELATIONS_URL)
resp_data.add_control("self", api.url_for(MeasurementCollection, sensor=sensor))
resp_data.add_control(
"measag:sensors-all",
api.url_for(SensorCollection)
)
return Response(json.dumps(resp_data), 200, mimetype=MASON)
We grab the start and end timestamps from query parameters and then let get_measurements do majority of the work. This is another utility function that makes use of call_api and follow_rel.
@cache.memoize(120)
def get_measurements(sensor, from_stamp, to_stamp):
measurements = []
with requests.Session() as s:
s.headers.update({
"Accept": "application/vnd.mason+json"
})
entry = call_api(s, "/api/")
collection = follow_rel(s, entry, "senhub:sensors-all")
for item in collection["items"]:
if item["name"] == sensor:
break
else:
raise NotFound
sensor = follow_rel(s, item, "self")
page = follow_rel(s, sensor, "senhub:measurements-first")
while True:
for item in page["items"]:
stamp = datetime.fromisoformat(item["time"])
if stamp > to_stamp:
return measurements
elif stamp >= from_stamp:
measurements.append((item["time"], item["value"]))
if "next" not in page["@controls"]:
return measurements
page = follow_rel(s, page, "next")
Like previously, we start from the entry point, then find the sensor we're looking for (or return a NotFound if it doesn't exist). From the sensor resource we grab the first page of measurements with the "senhub:measurements-first" control. After that we keep hitting the "next" control until a) it's not present (indicating there are no more measurements); or b) we pass the to_stamp datetime value. A small optimization would be using states instead of comparing every individual stamp, and always checking the last measurement of a page first to avoid comparing every measurement on intermediate pages to to_stamp.
We talked about memoize cache briefly in Exercise 2, and here we see a suitable use case for it. As default view caching only uses the request path as its cache key, it would be actively harmful here. Imagine two clients wanting a different set of measurements from the same sensor within few seconds of each other. If we cached the view function result, the second client would get the result requested by the first client, because caching doesn't account for the difference in query parameters. Memoize caching on the other hand does account for function arguments when forming the cache key, and the requested interval is given to the get_measurements function. By caching the result of this utility function instead of the view we get the desired behavior without messing around with forming cache keys ourselves.
With this our very small auxiliary API is complete. You can view the rest from the source code file. There shouldn't be anything you haven't already seen.
Discussion¶
Before moving on some discussion points are worth bringing up. First of all this auxiliary API is by no means even remotely a good idea in real life. It would be way, way faster to do time interval lookups on the main API server as database queries, and cache them there for short periods of time instead. For it to be truly worth making a separate API, there would have to be some heavy processing on the data. However such processing would only take attention away from the technical aspects of proxying data from a second API, which was the purpose of this short example.
Another point is the necessity of always going through the entry point and all intermediate steps when fetching data instead of using the correct URI directly. This is something that was done this way intentionally even if it might seem a bit excessive. Taking advantage of hypermedia is certainly useful here, there is no contest about that. But does the API change that often that another API that's part of the same ecosystem would need to go though the entry point every single time? Probably not.
A reasonable middle ground could be holding an "API map" on the auxiliary server that stores the instructions of how to make each request. As a hypermedia API is easy to crawl through and all of the controls are already in serialized form, making and storing such an API map would be rather effortless. The only remaining question is when to update such a map. One option is to simply have a command line function for doing that, and it would be ran as part of the setup, and then by the admin when an API change happens.
We could also check the API version whenever an error is encountered, and if the API version has changed, then initiate the API map update process (while returning Service Unavailable to the client in the meanwhile). Of course this would require that our main API lets us know what the version is, which would be simple enough to do with e.g. a header. Even further, we could use pub/sub between different services to transmit API version change events.