Termbank
  1. A
    1. API Blueprint
    2. Addressability
    3. Ajax
    4. Anonymous Function
    5. App Context
  2. B
    1. Blueprint
    2. Business Logic
  3. C
    1. CORS
    2. Callback
    3. Client
    4. Column
    5. Column Attribute
    6. Components Object
      Concept
    7. Connectedness
    8. Control
    9. Converter
      Framework
    10. Cookie
      WWW
    11. Credentials
      Concept
  4. D
    1. DOM
    2. Database Schema
    3. Decorator
      Python
  5. E
    1. Element
    2. Entry Point
  6. F
    1. Fixture
    2. Flask App
    3. Foreign Key
  7. G
    1. Generic Client
  8. H
    1. HTTP Method
    2. HTTP Request
    3. Hash
      Concept
    4. Header
    5. Host Part
    6. Hypermedia
  9. I
    1. Idempotent
    2. Info Object
      Concept
    3. Instance Folder
  10. J
    1. JSON
    2. JSON Schema
  11. L
    1. Link Relation
  12. M
    1. MIME Type
    2. Migration
    3. Model Class
  13. N
    1. Namespace
  14. O
    1. ORM
    2. OpenAPI
      OpenAPI
    3. Operation Object
      Concept
  15. P
    1. Pagination
      Concept
    2. Path Parameter
      OpenAPI
    3. Primary Key
    4. Profile
  16. Q
    1. Query
    2. Query Parameter
  17. R
    1. Regular Expression
    2. Request
    3. Request Body
    4. Request Object
    5. Resource
    6. Resource Class
    7. Resource Representation
    8. Response
    9. Response Body
    10. Response Object
    11. Rollback
    12. Routing
    13. Route
      Routing
    14. Row
  18. S
    1. SQL
    2. Serialization
    3. Static Content
    4. Swagger
      tool
  19. T
    1. Table
    2. Test Setup
    3. Test Teardown
  20. U
    1. URI
    2. URL Template
    3. Uniform Interface
    4. Unique Constraint
  21. V
    1. View Function
    2. Virtualenv
  22. W
    1. Web API
  23. Y
    1. YAML
      Language
Completed: / exercises

Deadline

Deadline for delivering this exercise is 2024-02-19 23:59

Learning outcomes and material

During this exercise students will learn how to implement a RESTful API utilizing Flask Web Framework. Students will learn also how to test the API by reading the testing tutorial. We expect that you follow the same process to complete the Deliverable 3.
When following through this material, keep your local copy of the sensorhub app updated. You will need it for the last task. You can download what we had at the end of exercise 1 below. Filling in the other parts is considered part of the last task.
app.py
You can also use this simple script to generate a few sensors and locations for testing purposes. Run it after setting the FLASK_APP environment variable.
db_init.py
import importlib
import random
import os
flask_app = os.environ.get("FLASK_APP")
app = importlib.import_module(flask_app)


with app.app.app_context():

    app.db.create_all()

    for idx, letter in enumerate("ABC", start=1):
        loc = app.Location(
            name=f"Location-{letter}",
            latitude=round(random.random() * 100, 2),
            longitude=round(random.random() * 100, 2),
            altitude=round(random.random() * 100, 2),
        )
        sensor = app.Sensor(
            name=f"Sensor-{idx}",
            model="test-sensor",
        )
        sensor.location = loc
        app.db.session.add(sensor)
        
    app.db.session.commit()

Introduction Lecture

This is an optional introduction lecture that adds some depth and visuality to the material in this exercise. As usual it's not necessary for completing the exercise but can be interesting if you want to know a little bit more.

Lay Your Database to REST

Exposing your data through a (REST) API is a good method to achieve decoupling between the database server itself and client applications that use the data. As long as the resources exposed through the API do not change in any way, the database server's internal workings can be altered without any risk of unwanted side effects across the larger ecosystem. Need to change your entire database engine? Not a problem. Change relationships between tables? Again, not a problem. Need to implement a reasonable backup behavior for the database being unavailable? Perfectly achievable.

From Models to Resources

In REST the basic unit of data is called a resource. A resource is a data representation that is deemed to be interesting enough to clients that it is exposed through the API. It's worthy of note that while it may sometimes be the case, there doesn't have to be a 1:1 correspondence between model classes and resources. In fact, more often it's more like 1:2 ratio between model classes and resources.
Resources can be roughly categorised into two types: collection type resources and item type resources. A collection type resource contains data from multiple database rows. Typically it's either a representation of all data in a specific table, or a specified subset of that data. If there's more than one interesting subset, we will often end up with even more than one collection resources per database table. Item type resources on the other hand contain everything about one item of interest. Once again, while often this equals a single row of data in a table, it is entirely possible that the resource is a representation that contains data from multiple tables.
It is also worth reiterating that a resource can be any thing that might be of interest to clients. One interesting example is a connector type resource which doesn't have any content, but its existence means there's link between other resources. Clients can then manipulate these connector resources when they wish to link or unlink objects. They're useful for APIs that expose many-to-many relationships because inherently we do not have a link verb in the basic RESTful vocabulary.

Addressing Important Things

The first key REST principle is called addressability. This principle states that everything of interest needs to be addressable with a uniform resource identifier, URI. When a client needs something from your API, it will send a request to the URI that belongs to the thing it needs, and it will always get the same resource as the response. One key thing to remember in REST: addresses are only assigned to resources, and they are essentially nouns. Actions, i.e. verbs, do not have addresses. They are represented through the uniform interface, i.e. using different HTTP methods on the same address. In other words, the URI is always the target of an action.
URIs are often designed as a tree structure that follows the data's inherent hierarchy. For instance, in our sensor example, measurements would be underneath the sensor that produced them. The URI template would then take a form along the lines of
/api/sensors/<sensor>/measurements/<measurement>/
Not only does this make the hierarchy of things visible to a human viewer, it also has implementation benefits that we will get into later. An interesting note about this particular example is that while it's very unlikely that a client will ever be interested in fetching an individual measurement, there might be a data cleaning client that is interested in deleting or modifying measurements. In these cases the delete/modify action would be targeted at the URI of the individual measurement.
This way of structuring URIs also naturally adds filtering by ownership: through the URI we're indicating we only want measurements from one particular sensor. Of course there are a lot of domains where a single thing can have multiple "owners". Luckily there is no limitation on how many URIs a particular endpoint can have. For instance, it's quite common for video games to have a publisher that's separate from the developer. This would give us two perfectly reasonable ways to list games: by publisher, and by developer.
/api/publishers/{publisher}/games/
/api/developers/{developer}/games/
While not a rule, you will also notice that quite often the URIs are patterned as /collection/item/collection/item and so on. This just follows naturally from the way data is usually organized. The selection of URIs gets more complicated as relationships between models increase in amount and variety. Ultimately it's then up to the API designer to think which ways of accessing the data are relevant to potential client applications.
URI parts generally point to things that exist, and usually don't point to attribute values. If your API needs to offer filtering via attribute values, it is better to implement this as a search using
query parameters
. A suitable URI for such a search type could be a separate collection type, or just the master collection that by default includes every resource of the associated type. So in our sensor example we could just have /api/measurements/ where filtering can be applied.

Handling Important Things

The second REST principle is called uniform interface. This principle states that actions are based on HTTP methods, and their implementation must follow the HTTP standard. There's a clear advantage here: as long as this principle is followed, actions on resources always work in the same way in every API. Of course there's a downside: it also limits us to a rather small set of verbs. The HTTP methods that are commonly used in REST APIs are: GET, PUT, POST, and DELETE. In addition, PATCH is also sometimes used but unlike the other four, it lacks a clear standard and thus its implementation will always be API specific.
GET fetches a document from the API. This document will be the resource description of whatever is pointed at by the URI. Generally speaking, if there haven't been any changes in the data, then GET will always return the same document. The client can specify what kind of a document it wishes to receive through headers. While we're not going to dive too deeply into this topic, one such commonly used header is Accept. This header lists the document types that the client is ready to accept. There are also condition headers that will tell the API server to not bother sending the response body if those conditions are not met. GET is always
idempotent
because it doesn't perform any changes.
PUT replaces a resource with a new one, i.e. it quite literally puts a resource into the addressed URI. It is used mostly for modify operations, even though it technically also enables the creation of new resources (see POST below). It's important to realize that PUT is a complete replace. The request body must contain all attributes of the resource regardless of whether they are changed or not, and if an attribute is omitted, its previous value will be overwritten with a suitable empty value. In other words, PUT is not supposed to implement partial updates. Due to its nature as a replace, PUT is idempotent - no matter how many times it's performed, the result will always be whichever content was put in last.
POST is not as standardized in the HTTP specification itself, but in the context of REST it has an explicit meaning: it will create a new child resource that belongs to the addressed resource. It is most often linked to collection type resources, and is the preferred method of creating new resources. Addressing create actions to the collection type has a clear advantage: the client does not need to know what the final address of the resource is going to be. This is particularly relevant in contexts where the API resolves name conflicts when new resources are created - the client woud have no way of knowing whether it's about to cause a name conflict, or how to resolve it. It is typical for POST responses to include a Location header to inform the client about the final URI its newly created resource was placed into. POST is usually not idempotent - if a POST request is sent twice, it will create two identical resources.
DELETE is perhaps the most simple one of the bunch. It deletes the addressed resource. There really isn't much else to say about it. DELETE is idempotent because once deleted, the addressed resource simply isn't there anymore.
Finally we have PATCH which applies some kind of a modification to the addressed resource. However, as already stated, there is no one standard for this method. The request body is supposed to describe the operation, but the syntax is not standardized. If you use PATCH, you need to specify your API's PATCH syntax somewhere in the API documentation. It should not be used as a partial PUT. If all you are doing is replacing attribute values with new ones, PUT should always be used because there isn't anything vague about it.
What about actions that do not match any of the above HTTP methods? If at all possible, you should always consider a resource-based solution instead of implementing custom actions to your existing resources. Instead of coming up with a new verb, think whether you can come up with a noun that does what you want with the four basic verbs instead. If you are absolutely stumped, the last option is to overload POST. This means specifying your own POST format that allows performing multiple different actions on a resource using POST. However, doing so will mean your API is no longer strictly RESTful.

Implementing REST APIs with Flask

This exercise material covers how to implement REST APIs using Flask-RESTful, a Flask extensions that exists specifically for this purpose (in case you didn't figure that out from the name). The material has examples for both single file applications, and applications that use the the project layout we proposed. For exercise tasks you need to submit single file applications. However for your course project we recommend following the more elaborate project structure.

Introduction to Flask-Restful

In the first part of the exercise we'll cover how to use the RESTful extension. In the examples we are going back to the sensorhub example from the first material. As a very brief recap, we had four key concepts: measurements, sensors, sensor locations and deployment configurations. We'll implement some of these into
resources
as this example goes on.
Learning goals: Learn the basics of Flask-RESTful: how to define
resource classes
and implement the
HTTP methods
of
resources
. Learn how to define
routes
for resources, and about reverse routing for building
URIs
.

Installing

Some new modules are needed for this exercise. Fire up your
virtual environment
and cast the following spells (the last one is not needed for this section but it will be for the next one):
pip install flask-restful
pip install Flask-SQLAlchemy
pip install flask-caching
pip install jsonschema

A Resourceful Class

Flask-RESTful defines a class called Resource. Much like Model was the base class for all
models
in our database, Resource is the base class for all our
resources
. A
resource class
should have a method for each
HTTP method
it supports. These methods must be named the same as the corresponding HTTP method, in lowercase. For instance a collection type resource will usually have two methods: get and post. These methods are very similar in implementation to
view functions
. However they do not have a route decorator - their
route
is based on the resource's route instead. Let's say we want to have two resource classes for sensors: the list of sensors, and then individual sensor where we can also see its measurements. The resource class skeletons would look like this:
from flask_restful import Resource

class SensorCollection(Resource):

    def get(self):
        pass

    def post(self):
        pass


class SensorItem(Resource):

    def get(self, sensor):
        pass

    def put(self, sensor):
        pass

    def delete(self, sensor):
        pass
We're using SensorItem for individual sensors instead of just Sensor, mostly because we already used Sensor for the
model
and this would cause conflicts if everything was placed in a single file. If you want to pursue that path, simply place these classes inside your application module that has the models. However, if you followed the project layout material, these classes should be placed in a new module (e.g. sensor.py) inside the resources subfolder (also make sure there's a file called __init__.py in the folder - it can be empty, but must exist for Python to recognize the folder as a package).
The methods themselves are just like
views
. For example, here's a post method for SensorCollection that looks very similar to the last version of the add_measurement view in exercise 1.
    def post(self):
        if not request.json:
            abort(415)
            
        try:
            sensor = Sensor(
                name=request.json["name"],
                model=request.json["model"],
            )
            db.session.add(sensor)
            db.session.commit()
        except KeyError:
            abort(400)
        except IntegrityError:
            abort(409)
        
        return "", 201
Do note that all methods must have the same parameters because they all are served from the same resource
URI
! You can, however, have different
query parameters
between these methods. For example, this would be typical for resources that have some filtering or sorting support in their get method using query parameters.

Resourceful Routing

In order for anything to work in Flask-RESTful we need to initialize an API object. This object will handle things like
routing
for us. To proceed with our example, we'll show you how to create this object, and how to use it to register routes to the two
resource classes
. In a single file app the process is very simple: import Api from flask_restul, and create an instance of it.
from flask import Flask, request
from flask_restful import Api, Resource

app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///test.db"
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
db = SQLAlchemy(app)
api = Api(app)
Assuming your resource classes are in the same file, you can now add routes to them by dropping these two lines at the end of the file.
api.add_resource(SensorCollection, "/api/sensors/")
api.add_resource(SensorItem, "/api/sensors/<sensor>/")
Now you could send GET and POST to /api/sensors/, and likewise GET, PUT and DELETE to e.g. /api/sensors/uo-donkeysensor-4451/. Not that they'd do much (except for POST to sensors collection which we just implemented).

Resourceful Inventory Engineer

This taks continues the inventory engineer saga where we develop a very small inventory management service. Previously we chose to do it in a rather unorganized manner. This time we're going to update part of it to the RESTful age.
Learning goals: Implement a simple collection
resource
with two methods using
Flask
-RESTful.

Before you begin:
You may want to retrieve your application code from this task's predecessor. In particular the POST method will be identical to the
view function
for the route "/products/add/".

Resource: ProductCollection
  • Route: "/api/products/"
  • Methods:
    • GET - get list of all products (return JSON array with objects as items)
    • POST - creates a new product
GET: this method retrieves all products from the database and forms a list of them, where each product is a dictionary with the same keys as the database column names. It is effectively a simpler version of "/storage/" route's view function. Example response (formatted for reader's sanity):
[
    {
        "handle": "donkey plushie",
        "weight": 1.20,
        "price": 20.0
    }
]
Note that Flask-RESTful automatically converts Python data structures to
JSON
if a method returns them. So, unlike last time, do not use json.dumps when returning the data structure!
POST: creates a product and returns 201 if successful, various error codes if not. Literally same function as previously. Simply drop the contents of your "/products/add/" view function to the
resource class'
post method and you're golden.

In summary, your code should do: Flask-RESTful initialization, one resource class with two methods, and route registration for the resource with api.add_resource.
Warning: You have not logged in. You cannot answer.
Extra note: When using a more elaborate project structure, resources should be routed in the api.py file which in turn imports the resources from their individual files. Here's the sample api.py file, which assumes the resource classes were saved to sensor.py in the resources folder.
from flask import Blueprint
from flask_restful import Resource, Api

api_bp = Blueprint("api", __name__, url_prefix="/api")
api = Api(api_bp)

# this import must be placed after we create api to avoid issues with
# circular imports
from sensorhub.resources.sensor import SensorCollection, SensorItem

api.add_resource(SensorCollection, "/sensors/")
api.add_resource(SensorItem, "/sensors/<sensor>/")

@api_bp.route("/"):
def index():
    return ""

Even More Resourceful Routing

When it comes to
addressability
, it only states that each
resource
must be uniquely identifiable by its address. It doesn't say it can't have more than one address. Sometimes it makes sense that the same resource can be found in multiple locations in the
URI
hierarchy. Consider the video game example from earlier. In that case both of these
URI templates
would make equal sense:
Both are different ways to identify the same resource. Luckily Flask-RESTful allows the definition of multiple
routes
for each resource. These would be routed as follows:
api.add_resource(GameItem, 
    "/api/publishers/<publisher>/games/<game>/",
    "/api/developers/<developer>/games/<game>/"
)
Do note that if you route like this, the resource's methods must now take into account the fact that they do not always receive the same keyword arguments: they will receive either publisher or developer. In this scenario, the GameItem resource could have a get method that starts like this:
class GameItem(Resource):

    def get(game, publisher=None, developer=None):
        if publisher is not None:
            game_obj = Game.query.join(Publisher).filter(
                Game.title == game, Publisher.name == publisher
            ).first()
        elif developer is not None:
            game_obj = Game.query.join(Developer).filter(
                Game.title == game, Developer.name == developer
            ).first()
You can also utilize multiple routes to implement several similar resources using the same
resource class
.

Reverse Routing

One more feature that we will soon be using a lot is the ability to generate a
URI
from the
routing
. Hardcoding URIs into your API code is an update disaster waiting to happen. It's much better to do reverse routing with api.url_for. Going back to our sensorhub example, this is how you should always retrieve the URI of a) sensors collection and b) specific sensor item:
collection_uri = api.url_for(SensorCollection)
sensor_uri = api.url_for(Sensor, sensor="uo-donkeysensor-4451")
The function finds the first route that matches the
resource class
and given variables, or raises BuildError if no matching route is found. If found, the URI is returned as a string. Our two examples would generate:
/api/sensors/
/api/sensors/uo-donkeysensor-4451/

Resource Locator

In this task we're bringing our inventory manager one step closer to being a well-designed API. When new resources are created with POST, the response should always contain a Location header. This header's value tells the client where it can find the resource that was just created.
Learning goals: Learn about returning
response objects
and setting
custom headers
.

Before You Begin:
Grab the code from the previous exercise. You will now also need to define at least a dummy for the Product resource, and a
route
for it. The following resource class is sufficient for this task:
class ProductItem(Resource):
    
    def get(self, handle):
        return Response(status=501)
We'll leave the
routing
to you. The route should be "/api/products/<handle>/.

Response Objects:
In order to implement the required change, you need to import one more thing from Flask: Response. After this your import from Flask should be:
from flask import Flask, Response, request
A response object can be returned from a view (or HTTP method in
resources
). You can find its documentation here. Its most important keyword arguments are:
  • status - status code
  • mimetype - content type of the response body
  • headers - a dictionary of HTTP headers
When creating a response object, its first argument is the response body, if any. Alternatively you can use the data keyword. In this exercise we're going to need the headers argument, which takes a dictionary of header-value pairs.

Modified Resource: Product Collection
In order to complete this task, modify the POST method from the previous exercise to return a Response object with the location header for the newly created product. Using api.url_for is adviced.
Remember to test your solution by
running your Flask app
.
Warning: You have not logged in. You cannot answer.
Please bear in mind that if you separate resources into multiple modules, using Flask's basic routing function makes life much easier as explained here.

Even More Resourceful Routing

One thing that might get a little bit tiresome quite fast is having to fetch the corresponding model instances for resources that are present in the URI variables. I.e., we have to do something like this at the beginning of most methods:
class SensorItem(Resource):

    def get(self, sensor):
        db_sensor = Sensor.query.filter_by(name=sensor).first()
        if db_sensor is None:
            raise NotFound
While it's not the worst thing in the world, it's still a few lines of boilerplate that gets repeated often. In larger projects this can also lead into inconsistencies between parts of the system. Back in exercise 1 we introduced a way to convert numbers in the URI to floats automatically so that by the time they arrived to the view function as arguments, their type is already float. This was done in routing by adding type specifiers:
@app.route("/add/<float:number_1>/<float:number_2>/")
def add(number_1, number_2):
    ...
The route make use of
converters
. They are small classes that convert strings to Python types when resolving a route, and Python types to strings when constructing a URI. There's a few basic types available, but most importantly, Flask allows us to register custom ones. In other words we can make converters that turn unique identifiers used in URIs to the corresponding model instance - and the other way around too. Converters are relatively simple classes that inherit Werkzeug's BaseConverter class, and define two methods: to_python and to_url. The former is used in routing, and the latter is used in reverse routing. Here's an example for sensors:
from werkzeug.exceptions import NotFound
from werkzeug.routing import BaseConverter

class SensorConverter(BaseConverter):
    
    def to_python(self, sensor_name):
        db_sensor = Sensor.query.filter_by(name=sensor_name).first()
        if db_sensor is None:
            raise NotFound
        return db_sensor
        
    def to_url(self, db_sensor):
        return db_sensor.name
Once we have this converter, it needs to be registered, and then we can use it in routing. When registering we define its name that is used to indicate when to convert a string to a sensor instance. The registration is done to the app's URL map. After that the converter is ready to be used in routing, also shown below:
app.url_map.converters["sensor"] = SensorConverter
api.add_resource(SensorItem, "/api/sensors/<sensor:sensor>/")
With this all view methods in the SensorItem resource will have the corresponding sensor instance right from the start. Also when using api.url_for to get the URL of a freshly created sensor, you can simply pass a model instance as the sensor keyword argument, and it will automatically use its name in the URI.
Although converters are very neat, there's some caveats to using them. An obvious one is overhead, especially with long URIs that have a lot of intermediate steps in the hierarchy. On the one hand, a model instance of every resource on the way might not be needed, but on the other hand it would be more consistent to use converters for all of them. So will you pass just the identifier string to some parameters, but an actual converted model instance to others? Or will you stay uniform and introduce some overhead from getting model instances that are not needed? Choose your own adventure, as there is no correct choice here.
Converters are also not very useful with resources that do not have a globally unique identifier (other than database ID which should generally not be used) of their own. The problem arises from lack of context: a converter does not see other parts of the URI - it simply converts one part of it. So even if /<owner_name>/pets/<pet_name>/ would identify a unique pet resource, it's very unlikely that the API would make pet names by themself unique. In this case a converter would not know how to retrieve a pet if it only has the name.
This is another problem with no clear-cut solution. You could accept that not all parts of the URI are converted, with the tradeoff of making your implementation less consistent. Or you could decide that everything gets its own globally unique identifier that is derived from a locally unique identifier. For instance, by adding a running counter when trying to use an identifier that is already taken. This way the first model instance to get a name would simply get the name (e.g. "doge"), but the next would get a counter added to it (e.g. "doge-1").
Overall, converters are a good tool for increasing consistency in your code, but their use is not always free. As with many other things, whether to make use of them or not depends on what you want to achieve.

Inventory Converter

Converters are pretty easy to implement and can be extremely handy. Let's make one for practice!
Learning goals: How to implement a converter class for Flask/Werkzeug

Before you begin:
You are going to need an application file that contains the Product model class. Using your app from the previous task is fine.

Class to implement: ProductConverter
  • Parent class: BaseConverter (from werkzeug.routing)
  • Methods:
    • to_python - converts a string URL variable to the desired Python type (i.e. Product instance)
    • to_url - converts a Python value to its corresponding URL variable value (i.e. Product to its handle)
The URL variable for products is their handle. Implement the methods that will find the corresponding Product instance, and extract a product's handle for URL.

Register your converter to your app's URL map and change your routing to take advantage of it.
Warning: You have not logged in. You cannot answer.

Serial Modeling

Actually serializing models, but close enough.
Serialization
is a process of turning an arbitrary object in program memory into a format that can be stored onto the disk, or transmitted over a network - often as a string. The latter is of particular importance for APIs. A very simple example of serialization would be turning a Python dictionary into
JSON
string using json.dumps. For APIs serialization is most often needed when model instances need to be sent to the client as (part of) a resource. Since model instances are Python objects they cannot be serialized directly with just json.dumps, and typically some code is needed to turn a model instance into serializable dictionary.
As this is a very commonly occurring problem, there's a number of tools available to automate the process. One such tool is Marshmallow, which you can look into if you're interested. In this material we are going to implement serialization from scratch, as it is not that huge of an endeavor for our simple usecases. When going this route, the biggest question to start with is: where to put serialization code.
We have basically two reasonable options. View method is where it's needed (to produce the response), while model class method would put it closer to data, and make it easier to serialize the same type of model in the same way every time. Which option to pick depends mostly on how many different ways of serialization are needed per model class. If it's one or two, then model class method makes more sense. In a simple case that's usually how it is: we have the full details individual resource version, and then maybe a limited details collection resource version for each model class in the database.
For a simple example, a serialize method for sensors could look like this:
    def serialize(self):
        return {
            "name": self.name,
            "model": self.model
        }
There are also some details that need to be considered when dealing with columns that do not have an obvious serialized equivalent. The two most common ones would be datetime columns, and then foreign key columns. The former is a bit simpler because it's only one value, so let's talk about that first. Datetimes are generally stored with the database engine's own datetime type, and on the Python side they are accesible as Python's datetime type. This type cannot be serialized automatically, and json.dumps will raise TypeError for a data structure that contains datetime types. At this point you need to decide how your API serializes datetimes. If you don't have any particular format in mind, a good default is the ISO 8601 format, which can be obtained with datetime's isoformat method (Python 3.8 and upwards). Serialize method for measurements:
    def serialize(self):
        return {
            "time": self.time.isoformat(),
            "value": self.value
        }

Embedded Serial Modeling

When it comes to representing foreign key columns, it's often relevant to consider what details should be shown in which resource. For instance, the location model in our example contains several details about the location, like its geolocation coordinates. Obviously all of these should be shown when the location resource itself is requested, but how much should be shown in the sensor resource? As usual, it all depends on what you want from the API. For our example, let's assume the only thing we want to show about the sensor's location in the sensor resource is the location's name. The name is also the location's unique identifier, so clients can use that to fetch more information about the location if it's relevant. With this, the sensor's serialize method could become
    def serialize(self):
        return {
            "name": self.name,
            "model": self.model,
            "location": self.location and self.location.name
        }
Implementation detail: because location is not mandatory, it is possible it will be None instead of a Location model instance. Using the and operator here will get rid of potential errors that would arise from trying to access the name attribute of None. Instead, the location value will simply become None.
Another approach would be to embed the serialize results of the location model while including a new optional argument to the serialize method. With this argument views and other serialize methods can control whether they want the long or short form of the referenced resource. The location's serialize would then be:
    def serialize(self, short_form=False):
        doc = {
            "name": self.name
        }
        if not short_form:
            doc["longitude"] = self.longitude
            doc["latitude"] = self.latitude
            doc["altitude"] = self.altitude
            doc["description"] = self.description
        return doc
and sensor's serialize (with the same keyword argumment added to keep the methods consistent with each other):
    def serialize(self, short_form=False):
        return {
            "name": self.name,
            "model": self.model,
            "location": self.location and self.location.serialize(short_form=True)
        }
This same keyword argument is handy when serializing relationships for collection type resources as well.

The Serializer

One key point in serializing data for collections is choosing what to include from each individual item. Sometimes there is way too much data in the items to include. Such is the case in this example for storing Japanese vocabulary and characters. For each word we want to link the related characters, but only their meaning - if the client is interested in other things like the various possible readings, they should GET the character itself.
Learning goals: How to implement serializers that support two forms.

Before you begin:
We have prepared a new database module for this task. Please find it below in the resources section. Make your modifications to this file. You are not allowed to touch any of the existing code, simply add the requested methods for each
model class
.
The module contains a populate_db function. You can call this function after creating the database to have a minimal amount of data to test your code with.

1st method to implement: Kanji.serialize
  • Parameters:
    • short_form - a Boolean that indicates whether to use the short or long form when serializing (default value False)
  • Returns:
    • Dictionary containing the serializable data
When the method is called with the short_form argument set to True, it should only return a dictionary with two keys: kanji and meaning. If the argument is omitted, or set to False, then the other three fields should be included in the dictionary as well.
Example from the given test database contents, in short form:
{
    "kanji": "配",
    "meaning": "distribute; spouse; exile; rationing"
}
and in long form
{
    "kanji": "配",
    "meaning": "distribute; spouse; exile; rationing",
    "kunyomi": "くば.る",
    "onyomi": "ハイ",
    "strokes": 10
}

2nd method to implement: Word.serialize
  • Parameters:
    • short_form - a Boolean that indicates whether to use the short or long form when serializing (default value False)
  • Returns:
    • Dictionary containing the serializable data
When the method is called with the short_form argument set to True, it should only return a dictionary that contains the following keys: written, reading and meaning. If the argument is omitted, or set to False, then the dictionary must also contain a list with the key kanji_list. This list must contain all of the kanji related to that word, in the shortened form (i.e. only kanji and meaning for each).
Example from the given test database contents, in short form:
{
    "written": "配列",
    "reading": "はいれつ",
    "meaning": "1) arrangement; disposition; 2) array (programming)"
}
and in long form
{
    "written": "配列",
    "reading": "はいれつ",
    "meaning": "1) arrangement; disposition; 2) array (programming)",
    "kanji_list": [
        {
            "kanji": "配",
            "meaning": "distribute; spouse; exile; rationing"
        },
        {
            "kanji": "列",
            "meaning": "file; row; rank; tier; column"
        },
    ]
}

Resources:
You can download the database module from below
kanji_db.py
Warning: You have not logged in. You cannot answer.
With serializers and converters in use, the get method for sensors will become very simple:
class SensorItem(Resource):
    
    def get(self, sensor):
        return sensor.serialize()

Deserial Modeling

On the other side of the equation, you may also want to make a method for creating a model instance from a JSON document. This is something that is essentially needed whenever processing POST and PUT requests. It makes sense to put it in the same place as the serialize method. A deserialize method would take a Python dictionary, and construct a model instance from it. For this purpose, it's good to keep in mind that model instances can be initialized as empty, and then filled in later - checking for required fields is done on commit. This means it's better to make our deserialize method for the update case with PUT, and then apply it to an empty model when creating new instances with POST. Here's an example for locations:
    def deserialize(self, doc):
        self.name = doc["name"]
        self.latitude = doc.get("latitude")
        self.longitude = doc.get("longitude")
        self.altitude = doc.get("altitude")
        self.description = doc.get("description")
Note the two different ways of reading from the dictionary. A key lookup is used for mandatory columns, and get is used for optional columns. This way the intended nature of PUT is implemented correctly, i.e. if an optional field is not set in the request body, its existing value should be replaced with the appropriate empty value (None for all of the columns here). Meanwhile if a required column is missing, this method will raise a KeyError. However, we do not intend for that to happen, since we are going to validate requests with schemas as described next.

Dynamic Schemas, Static Methods

JSON schema is a JSON document that's used for defining the valid structure of another particular JSON document. It defines what attributes the document can have, what are their types, and what kinds of values they can take. It can also define which attributes are required. For instance, the following schema defines what a valid sensor document looks like:
{
    "type": "object",
    "required": ["name", "model"],
    "properties": {
        "name": {
            "description": "Sensor's unique name",
            "type": "string"
        },
        "model": {
            "description": "Name of the sensor's model",
            "type": "string"
        }
    }
}
JSON schemas are useful for validating incoming POST and PUT requests. However, they do have a nasty drawback: they are awfully verbose. A schema that's easily over ten lines of code is definitely something that must be written in only one place. The same schema is often referenced at least twice (in corresponding POST and PUT methods) so it should not be hardcoded into any single resource method. It's also hard to attach to a resource class because the POST method to create a resource and the PUT method to modify it are in separate places (collection and seprate item respectively).
One of the more logical places for a method that produces the schema is the model class. This way it will be physically close to the code that defines the corresponding database structure. Furthermore, considering that we do not have a model instance in hand when validating an incoming POST request, it would be best to be able to call this method without one. In other words, making a static method serves this purpose rather elegantly. This example shows the method that we're adding to the Sensor model class.
    @staticmethod
    def json_schema():
        schema = {
            "type": "object",
            "required": ["name", "model"]
        }
        props = schema["properties"] = {}
        props["name"] = {
            "description": "Sensor's unique name",
            "type": "string"
        }
        props["model"] = {
            "description": "Name of the sensor's model",
            "type": "string"
        }
        return schema
Implementation detail: a static method is a method that can be called without an instance of the class and it also doesn't usually refer to any of the class attributes (that's what class methods are for). In other words it's actually a function that's just been slapped on a class to keep things more organized. It can be called as self.json_schema() from normal methods within the same class. From the outside it's called as Sensor.json_schema().
If you want to go even further, you can even generate the schema from the model class itself. Here's a starting point if you want to look into it, or you could just write your own.
On the view side these schemas can be used in validation. In order to do this, two names are imported from the jsonschema module: from jsonschema import validate, ValidationError. The former being the function that performs validation against a schema, and the latter being an exception with details of why validation failed. Example of use can be found under the next heading.

PUTting It All Together

Below is an example of what a PUT method for sensors could look like after we have added all these convenience methods to model classes.
class SensorItem(Resource):

    def put(self, sensor):
        if not request.json:
            raise UnsupportedMediaType

        try:
            validate(request.json, Sensor.json_schema())
        except ValidationError as e:
            raise BadRequest(description=str(e))

        sensor.deserialize(request.json)
        try:
            db.session.add(sensor)
            db.session.commit()
        except IntegrityError:
            raise Conflict(
                409,
                description="Sensor with name '{name}' already exists.".format(
                    **request.json
                )
            )
        
        return Response(status=204)

POSTing it All Together

This task will be your final "exam" on the subject of API basics with Flask. This time we're asking you to extend the sensorhub project by implementing the interface for adding new measurements.
Learning goals: Making a POST method that uses JSON schema validation.

Before you begin:
Compile together all the sensorhub code from the previous exercise, and from this exercise so far. You will also need to import the draft 7 format checker from jsonschema in order to validate ISO datetime format. You may also want to import a few more exceptions from Werkzeug.
from jsonschema import validate, ValidationError, draft7_format_checker
from werkzeug.exceptions import NotFound, Conflict, BadRequest, UnsupportedMediaType

Class to modify: Measurement
The first part of this task is to complete the Measurement model class by adding two methods: deserialize and json_schema. These are not tested separately, but should be useful for implementing the second part of this task.
Method: Measurement.deserialize
  • Parameters:
    • dictionary containing data for setting values for a new measurement. The sensor associated to the measurement is coming as URL variable, so it is not sent in the json body.
This method turns a Python dictionary into suitable values for the model instance's fields. In particular the timestamp needs to be converted from a string representation to a Python DateTime. This can be achieved with the fromisoformat method in the datetime module. Note that this method only sets the attributes. Actually creating the instance, and committing it are done in the view method.
Method: Measurement.json_schema
  • Static method!
  • Returns:
    • dictionary that defines the JSON schema for measurements
This method should form a JSON schema as a dictionary, and return it. The schema needs to ensure that both time and value are required properties, and that they are of the correct type and format:
  • time - string with "date-time" format
  • value - number

New class: MeasurementCollection
This class will serve as the resource for getting the list of measurements from a sensor (GET), and for adding new measurements (POST). In this task you only need to implement the latter. We'll show the GET method later in the exercise material.
Method: MeasurementCollection.post
  • Parameters / route variables
    • sensor - this will indicate which sensor is adding measurements
  • Returns:
    • a Flask Response object (or exception)
This method needs to do the following things
  1. Check the request's media type.
    • if not "application/json" -> 415 (unsupported media type)
  2. Validate the json document
    • if not valid -> 400 (bad request) with the validation exception text as description
  3. Save the measurement in the database, linked to the correct sensor
  4. Return a 201 response with the Location header added. The URL of a measurement is in the Routing section below.
When validating the json, you need to give the extra keyword argument format_checker to the validate function. Use draft7_format_checker as this keyword argument's value. Note that this validation will only work if you have rfc3339-validator module installed.

Placeholder class: MeasurementItem
You need this to be in the module for reverse routing to work properly.
class MeasurementItem(Resource):
    
    def delete(self, sensor, measurement):
        pass

Routing:
Add routing for the new resources. Measurement collection route is
/api/sensors/<sensor:sensor>/measurements/
Measurement item route is one of the following, depending on whether you want to use a converter or ID number directly.
/api/sensors/<sensor:sensor>/measurements/<measurement:measurement>/
/api/sensors/<sensor:sensor>/measurements/<int:measurement>/
Note: If you choose to use a converter for measurements, keep in mind that to_url must return a string.
Warning: You have not logged in. You cannot answer.

Caching

Caching is present in computing on many levels, starting from your computer's processor where it caches the results of operations. In the web world, caching also happens on many levels. Browsers tend to cache web content - static content in particular - to avoid unnecessary network transfers. Generally content that is not expected to change often is cached. A good example would be CSS files and images that are part of a website's layout. This type of content can go unchanged for years so there is very little reason to fetch a fresh version every single time the page is loaded.
Servers can use cache control headers to instruct browsers of how long it would be appropriate to cache each type of content, and as long the cache is considered valid, your browser will simply use the cached version that it has stored somewhere on your computer. In some cases this can cause sites to bug out temporarily if the server code has been updated but your browser is holding an older version of a script file that is no longer compatible with the backend. To work around this, browsers have a "force reload" mechanism that is harder than the normal refresh: triggering it will immediately invalidate all cached content and refetch them.

Server Side Caching

In this course we are primarily interested in server side caching. The primary purpose of server side caching is to reduce the frequency of hitting performance bottlenecks. In simple scenarios this usually translates to reducing database access as much as possible. Complex database queries with joins across multiple tables will quickly become very expensive performance wise, and if there is any possibility that the same result will be needed again, the benefits of caching that result should be rather obvious.
Lovelace content pages are a good real life example that you should be quite familiar with by now. A lot of things are embedded into the content page and the database structure underneath consists of a lot of tables - most of which are needed when rendering a content page. Most of this content is also quite static in nature: only very few things on the page depend on who is viewing them. Furthermore materials are not edited particularly frequently - a few times per year at most. In other words, unless edited, about 95% of what you see is the same HTML document every time you open it, and also the same document that everyone else sees.
Before server side caching support was implemented there was a notable loading time in the order of seconds every time a material page was loaded. After caching support, that 95% of the HTML document is now stored in cache indefinitely, and the loading time is barely noticeable when its retrieved from there. Because the content is valid for a long period of time and edits are relatively infrequent, cache in Lovelace is only invalidated when something that affects the cached document is changed. Term data is also cached separately. Meanwhile, user specific data like answer pages and progress bars are not cached because they change frequently, and are not shared between multiple users.
What to cache, when to cache, and when to invalidate are all critical decisions in the light of server performance. Cache writes and deletes do add overhead, so performing them on data that is unlikely to be requested again as exactly the same should be avoided. Another consideration that is largely application specific is whether it's ok to return stale data, and for how long. Obviously users will want to see the results of changes they have committed immediately, but is it critical for others to see them immediately as well?
This also brings to another use of caching: depending on how critical it is for your API to return fresh data, it's possible to use cache as a failsafe when parts of the API is down or too busy to respond. Dealing with temporary failure is a much larger concept and not something we'll tackle in this material, but it's worth mentioning as one of the advantages of caching.

Cache Implementation

Since caching is such a central component of web development, frameworks generally come with a built-in caching solution. They usually also allow you to choose between a number of different caching backends, each suitable for different use cases. Essentially all you need for caching is some way to store data, a simple lookup system, and some way to purge invalidated data - so it would not be a big deal to implement one yourself from scratch either.
For storage, one of the simpler solutions is to use the file system, and store cached documents as files. Not particularly elegant, and overall slowest of all the solutions even with solid state drives, but still a reasonable solution. Especially if there's a lot of content to cache. The opposite end is caching in the server's memory which is the fastest but has the obvious downside of being much more limited in storage space. As an in-between solution, document databases are quite ideal for storing cached results as well. While it's still essentially disk storage, database engines have better tools and optimizations than simple file system storage.
Flask has a caching extension available, called simply Flask-Caching. We will be using this extension in the upcoming examples. Similarly to Flask-SQLAlchemy, this extension gives us the freedom to choose our cache storage solution by configuring it to use one of the supported backeneds. Some of these backends work out of the box and are sufficient for learning purposes - others require installation of additional libraries, but are much more suitable for production deployments.
Flask-Caching lists four separate use cases:
  1. caching view function results
  2. caching other function results regardless of arguments
  3. caching function results for different sets of arguments (memoize)
  4. caching arbitrary data manually
The first and third use cases form the cache key automatically, with an option to add a prefix. In the other two cache keys are defined by the program code. In the first three cases, when the function is called, the cache framework will check whether there's a valid cached result available. If a result is available, it will be returned from cache instead of actually calling the function. If it's not available, then the function will be called, and the result will be cached before it is returned to the original caller. All three are available as
decorators
. The last use case has the least automation, but also is not limited in any way beyond normal rules for what can be cached.

Cache Configuration

Cache is configured similarly to the database backend, by setting specific keys in the configuration dictionary. The full list of configuration keys for Flask-Caching can be found in its documentation. A lot of the configuration keys are cache type specific. Since we don't want to set up a separate service for caching, we have three choices for the cache type:
The former two aren't really useful besides allowing your code to run when there is no actual cache backend available. SimpleCache would also be suitable for testing whether something goes into your cache the way it's supposed to. For now let's configure FileSystemCache because it allows full exploration of how caching works. In order to do so, at least two keys need to be set:
app.config["CACHE_TYPE"] = "FileSystemCache"
app.config["CACHE_DIR"] = "cache"
This would place cache files into the cache subfolder of the directory where your application file is. If using more elaborate project structure, it would be better to put the cache folder into your application's instance folder, similarly to how the database file is placed there: os.path.join(app.instance_path, "cache"). Another key that can be considered here is "CACHE_DEFAULT_TIMEOUT". This determines how long a cached entry will be valid by default, expressed in seconds. This value can be overridden for each entry separately as well. Choosing this value is extremely context-dependent. Leaving it unset defaults to caching things forever unless manually cleared.
Overall, cache expiration is a very complex topic, and while an interesting problem, it's not something we're going to dig too deeply into. If you don't set a timeout at all, you'll only pay the regeneration price when things actually change. Sometimes this is what you want, but then you have to be very careful to actually regenerate the cache on every change - otherwise you'll be offering stale data indefinitely. Using a long default timeout will guarantee that even when cache clearing is incomplete, your data will eventually update. On the other end, using a very short timeout can be beneficial for data that changes frequently so that you don't even need to worry about manual clearing. Ultimately everything depends on what you need from your caching plan.
After configuration the cache needs to be initialized. This is very much like the API initialization.
from flask import Flask
from flask_restful import Api
from flask_caching import Cache

app = Flask(__name__)
app.config["CACHE_TYPE"] = "FileSystemCache"
app.config["CACHE_DIR"] = "cache"

api = Api(app)
cache = Cache(app)
After this we can access the caching funcionality through the cache object. In the full project structure version cache is initialized without app at first (just like the database), and then the app object is bound to it with cache.init_app(app) in the app constructor function.
As a final note, in some deployments "CACHE_KEY_PREFIX" will also be an important option to set in configuration. When sharing a cache backend between multiple apps, giving a key prefix for each app will make it possible to do app-wide cache clearing without incidentally deleting cached data that belongs to other apps.

Caching Views

The simplest use case is to cache views with the cached
decorator
. When using the decorator, the Response object returned from the view function is cached using request.path (e.g. "/api/sensors/sensor-0001/") value as the cache key. Thanks to the addressability principle, this approach is a very good fit for REST APIs. When every URI matches exactly one resource, there is no ambiguity as to what should be cached. With this in mind, we can start adding the cached decorator to get methods for our resource classes. Like this:
class SensorItem(Resource):

    @cache.cached()
    def get(self, sensor):
        db_sensor = Sensor.query.filter_by(name=sensor).first()
        if db_sensor is None:
            raise NotFound
        body = {
            "name": db_sensor.name,
            "model": db_sensor.model,
            "location": db_sensor.location.description
        }
        return Response(json.dumps(body), 200, mimetype=JSON)
Every call to this view function is now cached so that each separate sensor item gets its own cache key. In other words, as long as the sensor parameter is different, a new response will be generated and then cached for that particular sensor. However, if the same parameter is requested twice, the decorator will simply return the cached response instead of ever actually calling the function.
Normally this is a perfect use case because every set of URI variables matches exactly one response. However, if the resource uses
query parameters
to modify the result, this is no longer the case. This approach uses the request's path as the cache key, and this path does not include query parameters. This means that if you were to cache the result with one set of query parameters, then all further responses would use that cached result instead of calling the view function. In short, you would get the first response with every request regardless of what query parameters are used later.
Now the first big question is whether you even want to cache such resources. Let's assume we want to
paginate
the measurements collection from a sensor because there's quite a few measurements. In order to do that, our measurement collection takes a query parameter, page, an integer that shows which chunk of the data the client is interested in. For purposes like this, the cached
decorator
has an optional keyword argument, make_cache_key. This argument takes a function that will construct and return the cache key.
A simple option would be a function that returns request.full_path instead of request.path. Like this:
def query_key(*args, **kwargs):
    return request.full_path
However, keys provided by this would not be very reliable because the query string is not limited in any way. It can have extra parameters that are simply ignored by the rest of our code, but this function would form a different key for every different set. Also, if the view supports more than one parameter, their order would also result in different keys. A better solution is to actually extract the page and use that together with the URI as the key.
def page_key(*args, **kwargs):
    page = request.args.get("page", 0)
    return request.path + f"[page_{page}]"
NOTE: although not mentioned in the documentation, the function for generating the cache key gets passed all of the arguments that were given to the view function. Since we don't have any use for those, all arguments and keyword arguments are simply made optional.
Using it in the measurement collection resource:
class MeasurementCollection(Resource):
    
    PAGE_SIZE = 50
    
    @cache.cached(timeout=None, make_cache_key=page_key)
    def get(self, sensor):
        db_sensor = Sensor.query.filter_by(name=sensor).first()
        if db_sensor is None:
            raise NotFound
        page = request.args.get("page", 0)
        remaining = Measurement.query.filter_by(
            sensor=db_sensor
        ).order_by("time").offset(page * self.PAGE_SIZE)
        body = {
            "sensor": db_sensor.name,
            "measurements": []
        }
        for meas in remaining.limit(self.PAGE_SIZE):
            body["measurements"].append(
                {
                    "value": meas.value,
                    "time": meas.time.isoformat()
                }
            )
        return Response(json.dumps(body), 200, mimetype=JSON)
As a side note we've made the decision to never expire this cache. The reasoning is simple: once a page is complete, it will never be changed as all new measurements are added to the latest page. There is another keyword argument that could be used here to prevent caching of incomplete pages: response_filter which is a function that gets one argument - the response object - and makes a decision whether to cache it or not. It's mostly ideal for bypassing caching when the response code is not 200. For this particular use case it's a bit cumbersome because we would need to parse our own response to see whether the page is full or not. Managing this might actually end up being easier with manual caching.

Manual Caching

In the previous example we have introduced a new problem: we now need to know when a page is incomplete, and forgo caching if that is the case. While this can be done with view function caching, it will be a rather messy solution - mainly because the decorator cannot see inside the function, and needs to read the response in order to find out what was returned from the view. In this case caching manually inside the function code could be a better solution. This allows the view function itself to have finer control of what gets cached. The caching process itself is quite straightforward: just use the cache.set method to store a value into a key. Since we want to cache the whole response, all modifications go to the end of the code, changing the return statement essentially to:
        response = Response(json.dumps(body), 200, mimetype=JSON)
        if len(body["measurements"]) == self.PAGE_SIZE:
            cache.set(page_key(), response, timeout=None)
        return response
We can use the same page_key function to form the cache key. This is important for the next part since now we need to rethink how to read the cache. Previously the
decorator
took care of both checking the cache, and saving into it too. Now that we have implemented manual caching we want to get rid of the latter part. However, if possible, we would still like the decorator to do the cache checking. This can be achieved with the response_filter keyword argument that was mentioned in the previous section. If this argument is given a function that returns False, then the decorator will not perform any caching. The simplest way to do this is a lambda function. The end result is then:
class MeasurementCollection(Resource):
    
    PAGE_SIZE = 50
    
    @cache.cached(timeout=None, make_cache_key=page_key, response_filter=lambda r: False)
    def get(self, sensor):
        db_sensor = Sensor.query.filter_by(name=sensor).first()
        if db_sensor is None:
            raise NotFound
        page = request.args.get("page", 0)
        remaining = Measurement.query.filter_by(
            sensor=db_sensor
        ).order_by("time").offset(page * self.PAGE_SIZE)
        body = {
            "sensor": db_sensor.name,
            "measurements": []
        }
        for meas in remaining.limit(self.PAGE_SIZE):
            body["measurements"].append(
                {
                    "value": meas.value,
                    "time": meas.time.isoformat()
                }
            )
        response = Response(json.dumps(body), 200, mimetype=JSON)
        if len(body["measurements"]) == self.PAGE_SIZE:
            cache.set(page_key(), response, timeout=None)
        return response
Since we are using the same function to generate the key at both ends, this works out very nicely. If you want proof of it working, you can put any print statement to the beginning of the method while running flask in development mode. If you see a print pop up, the cache was not hit, and the view method was called. If you do the same request again, this time you should not see a print as the result comes from cache, and is picked up from there by the decorator. If you request another page of measurements, then you should see a print pop up again.
Note that this is extremely permanent caching. There is absolutely nothing that will clear the cache. Then again, there is no interface to modify or remove existing measurements either, so it's exactly what we wanted. The only caveat being that if something is changed in the implementation (or there's bugs) then we would need a way to forcibly clear the cache, otherwise we're holding broken data indefinitely.

Cache Clearing

Unless you are only using extremely short cache lifetime (order of seconds), there will likely be times when you need to clear cache inside your API's view functions. This can be done one key at a time, or the entire cache can be nuked out of existence. The latter is something you might want to add as a command line tool for clearing the cache in situations where a bug is causing your API to offer stale data. To do this from Python console:
In [1]: from flask_caching import Cache
In [2]: from app import app
In [3]: cache = Cache(app)
In [4]: with app.app_context():
   ...:     cache.clear()
We'll turn this into a command line tool later. Note: if there are multiple apps using the same cache backend, this can nuke the entire cache if key prefixes are not used. While wiping everything is sometimes needed, the more common use case is clearing data that we know has gone stale. Implementation wise this is quite simple: cache.delete can be used to delete one key, or cache.delete_many can be used to delete a list of keys. Since we know how the keys are formed, we can also delete them quite easily. Keys for view caching can be obtained with api.url_for, whereas in manual caching we can simply use the same function to form the key when storing and deleting.
The bigger question lies in taking care to delete cache for all affected resources. For instance, when a sensor is created, we need to take care to refresh the cache for the sensor collection. Likewise when a sensor's details are changed, we may also need to refresh cache for the sensor collection, and also the sensor's location. Since cache refreshes can essentially be triggered from three different view methods (POST, PUT, DELETE) there are two places in our code base where they would make sense: as methods in model classes. or in resource classes.
For this example, we will implement cache clearing as resource class methods. After all, it is the resources themselves that we are caching - usually whatever comes out of its view functions. We're going to add an internal method _clear_cache to each resource. This internal method can then be called from view methods that modify the resource, and it will then take care of clearing everything that needs to be cleared. For example, to refresh both the sensor itself and its collection when a sensor is updated, the following method would be appropriate:
    def _clear_cache(self):
        collection_path = api.url_for(SensorCollection)
        cache.delete_many((
            collection_path,
            request.path,
        ))
Since we know that the keys are the same as the views' URIs, we can obtain the keys in a consistent manner by using api.url_for and request.path, where the latter will take care of refreshing the resource that is directly being affected, and the former can be used to refresh any connected resources - such as the collection this item belongs to.

API Authentication

Authentication is yet another huge topic in the web world. In this small section we'll take a very brief look at API authentication. Our primary focus here is how to include authentication into your API code. While we do provide some basic pointers about security implementation with Python, this is by no means a security tutorial. Likewise the measures shown here are mere examples that show you how to include security into your API, we do not propose them as sufficient by any account.

API Keys vs Session Keys

Web sites typically use session keys to identify when a user is logged in. When you type in your username and password for a web site, it will send you a
cookie
that contains a newly generated session key. As long as this session key is valid, you are considered logged in from the browser that holds the cookie. This way your user
credentials
will only be sent with the initial login and are not exposed in later network traffic - or in the cookie. Session keys can have limited validity, and a multitude of other security mechanisms to prevent attackers from capturing your cookies etc.
However, from RESTful API point of view, sessions are out of consideration. REST APIs are expected to be stateless - no state is held on the server side - and sessions are essentially state that the server uses to track when a user is logged in. When authentication is needed, API keys are used instead. Conceptually they are similar to session keys: the key is a generated authentication token that is used instead of username and password to authenticate transactions. Like session keys, API keys are also sent along with every request, and similarly a registry of API keys is held somewhere in the server. Unlike session keys, API keys are generally permanent, and independent of the client application. In this sense they are more akin to your user credentials: you can access the server from any client as long as it has your API key.
When using an API key it is usually inserted into the client's configuration. When the client accesses the server, there is no login phase because it already holds the authentication token. Therefore there is also no server-side state, and every request sent by the client is entirely independent from other requests. It probably goes without saying that API keys should be treated equally to credentials when storing them. Unlike a stolen session key, a stolen API key can be used by anyone, at any time, from anywhere.

Implementing API Key Authentication

Implementing authentication with API keys has two parts to it. First of all we need to have some registry for existing API keys; second, we need to apply authentication where it's required. We will be showing both of these steps from scratch to give you a more transparent view into what is going on. However, for real life uses an existing, properly tested solutions should be used.

Key Registry

Key registry in our simple example will just be another table in the database - or in other words, a new model class. What goes into this table is once again up to what you need. Generally though, it should have the API key, and some information about what privileges the key grants. If it's a key linked to user credentials, then the table would indicate which user it was generated for. Instead of individual users, your API could also have different groups that have access to different resources etc. Likewise if you want to control what kinds of rights the key grants, there can be a field for that.
For our sensorhub API we could have two different client groups: sensors that are allowed to post measurements, and admin clients that are allowed to change all information. In this case we should make three columns: key, sensor name (nullable), and a boolean column to indicate admin privileges. Also, for minimum basic security, we should not store API keys in plaintext. The key field will therefore contain a
hash
of the key. This way even if our database gets leaked, the keys still need to be cracked.
class ApiKey(db.Model):
    
    key = db.Column(db.String(32), nullable=False, unique=True)
    sensor_id = db.Column(db.Integer, db.ForeignKey("sensor.id"), nullable=True)
    admin =  db.Column(db.Boolean, default=False)
    
    sensor = db.relationship("Sensor", back_populates="api_key", uselist=False)
    
    @staticmethod
    def key_hash(key):
        return hashlib.sha256(key.encode()).digest()
The added helper method will take care of encryption for us when keys are created and compared. Once again we place the method close to where it is relevant. Having one place where encryption is defined also makes it easier to move to a more secure solution - simply modify this method to fit your encryption needs.
The admin key should probably by created only once, and locally on the API server during installation. Since we haven't talked about CLI commands yet, we'll just create it from the Python console.
In [1]: import secrets
In [2]: from app import ApiKey, db
In [3]: token = secrets.token_urlsafe()
In [4]: db_key = ApiKey(
   ...:     key=ApiKey.key_hash(token),
   ...:     admin=True
   ...: )
In [5]: db.session.add(db_key)
In [6]: db.session.commit()
In [7]: print(token)
When generating keys, Python's secrets module should be used to provide cryptographically strong randomness, which the random module does not do (because it is intended for different use cases). The print at the end is the only time you will be able to obtain the plain text token, and from here it should be copied to your admin client's configuration.
For sensor keys we need to think about the logistics a bit more. One relatively simple way would be to simply generate a key when a sensor is created and return the key to the client somehow. Probably in the headers since responses to POST should not have a body. Another option would be to include API key as a required field when creating a sensor. This way the client that registers sensors will be responsible for generating keys. We'll leave the logistics as an exercise for the reader at this point.

Validating Keys

Once we have our keys, we need some way to require them for certain views. Our friends,
decorators
, will be here to help us out. We have two levels of privilege, and a simple way to go about is to make one decorator for each. The first decorator will be called require_admin that will block the view unless the authentication headers contain a valid admin token. For this example we're going to use a custom HTTP
header
to carry the API key. This is mostly because the standard Authorization header has a syntax that's more complex than what we need. The authentication key will be in "Sensorhub-Api-Key" header.
def require_admin(func):
    def wrapper(*args, **kwargs):
        key_hash = ApiKey.key_hash(request.headers.get("Sensorhub-Api-Key").strip())
        db_key = ApiKey.query.filter_by(admin=True).first()
        if secrets.compare_digest(key_hash, db_key.key):
            return func(*args, **kwargs)
        raise Forbidden
    return wrapper
Once we have this decorator, we can decorate any resource class method with @require_admin, and it can no longer be accessed unless the correct admin key is provided in the request headers. Now we need another decorator for authenticating individual sensors. This decorator will be somewhat similar. The only difference is that we do need the view method's sensor parameter inside the decorator too.
def require_sensor_key(func):
    def wrapper(self, sensor, *args, **kwargs):
        key_hash = ApiKey.key_hash(request.headers.get("Sensorhub-Api-Key").strip())
        db_key = ApiKey.query.filter_by(sensor=sensor).first()
        if db_key is not None and secrets.compare_digest(key_hash, db_key.key):
            return func(*args, **kwargs)
        raise Forbidden
    return wrapper
If you put this decorator on a resource method that has sensor as a parameter, then that particular sensor's API key will be required to access the method. We didn't actually implement a way to distribute the sensor keys though, so doing that will make the method entirely inaccessible.

Security Caution

There's one security consideration that needs to be stated here: regardless of how fancy encryption you use for storing your keys, they are only as safe as your transport layer security. At the end of the day, the key is included in request headers. If you are not using an encrypted protocol, these headers will be readable in plain text for any party that can intercept the message. Always, always remember to do any and all transmissions involving secret keys over a secure connection, i.e. HTTPS. However, setting up HTTPS for Flask falls out of this material's scope as it is almost entirely a deployment issue rather than implementation.
?
API Blueprint is a description language for REST APIs. Its primary categories are resources and their related actions (i.e. HTTP methods). It uses a relatively simple syntax. The advantage of using API Blueprint is the wide array of tools available. For example Apiary has a lot of features (interactive documentation, mockup server, test generation etc.) that can be utilized if the API is described in API Blueprint.
Another widely used alteranative for API Blueprint is OpenAPI.
Addressability is one of the key REST principles. It means that in an API everything should be presented as resources with URIs so that every possible action can be given an address. On the flipside this also means that every single address should always result in the same resource being accessed, with the same parameters. From the perspective of addressability, query parameters are part of the address.
Ajax is a common web technique. It used to be known as AJAX, an acronym for Asynchronous Javascript And XML but with JSON largely replacing XML, it become just Ajax. Ajax is used in web pages to make requests to the server without a page reload being triggered. These requests are asynchronous - the page script doesn't stop to wait for the response. Instead a callback is set to handle the response when it is received. Ajax can be used to make a request with any HTTP method.
  1. Description
  2. Examples
Anonymous functions are usually used as in-place functions to define a callback. They are named such because they are defined just like functions, but don't have a name. In JavaScript function definition returns the function as an object so that it can e.g. passed as an argument to another function. Generally they are used as one-off callbacks when it makes the code more readable to have the function defined where the callback is needed rather than somewhere else. A typical example is the forEach method of arrays. It takes a callback as its arguments and calls that function for each of its members. One downside of anonymous functions is that they function is defined anew every time, and this can cause significant overhead if performed constantly.
  1. Description
  2. Example
In Flask application context (app context for short) is an object that keeps tracks of application level data, e.g. configuration. You always need to have it when trying to manipulate the database etc. View functions will automatically have app context included, but if you want to manipulate the database or test functions from the interactive Python console, you need to obtain app context using a with statement.
Blueprint is a Flask feature, a way of grouping different parts of the web application in such a way that each part is registered as a blueprint with its own root URI. Typical example could be an admin blueprint for admin-related features, using the root URI /admin/. Inside a blueprint, are routes are defined relatively to this root, i.e. the route /users/ inside the admin blueprint would have the full route of /admin/users/.
Defines how data is processed in the application
Cross Origin Resource Sharing (CORS) is a relaxation mechanism for Same Origin Policy (SOP). Through CORS headers, servers can allow requests from external origins, what can be requested, and what headers can be included in those requests. If a server doesn't provide CORS headers, browsers will browsers will apply the SOP and refuse to make requests unless the origin is the same. Note that the primary purpose of CORS is to allow only certain trusted origins. Example scenario: a site with dubious script cannot just steal a user's API credentials from another site's cookies and make requests using them because the APIs CORS configuration doesn't allow requests from the site's origin. NOTE: this is not a mechanism to protect your API, it's to protect browser users from accessing your API unintentionally.
Callback is a function that is passed to another part of the program, usually as an argument, to be called when certain conditions are met. For instance in making Ajax requests, it's typical to register a callback for at least success and error situations. A typical feature of callbacks is that the function cannot decide its own parameters, and must instead make do with the arguments given by the part of the program that calls it. Callbacks are also called handlers. One-off callbacks are often defined as anonymous functions.
Piece of software that consumes or utilizes the functionality of a Web API. Some clients are controlled by humans, while others (e.g. crawlers, monitors, scripts, agents) have different degree of autonomy.
In databases, columns define the attributes of objects stored in a table. A column has a type, and can have additional properties such as being unique. If a row doesn't conform with the column types and other restrictions, it cannot be inserted into the table.
  1. Description
  2. Common keywords
In object relational mapping, column attributes are attributes in model classes that have been initialized as columns (e.g. in SQLAlchemy their initial value is obtained by initializing a Column). Each of these attributes corresponds to a column in the database table (that corresponds with the model class). A column attribute defines the column's type as well as additional properties (e.g. primary key).
  1. Description
  2. Example
In OpenAPI the components object is a storage for reusable components. Components inside this object can be referenced from other parts of the documentation. This makes it a good storage for any descriptions that pop up frequently, including path parameters, various schemas, and request body objects. This also includes security schemes if your API uses authentication.
Connectedness is a REST principle particularly related to hypermedia APIs. It states that there for each resource in the API, there must exist a path from every other resource to get there by following hypermedia links. Connectedness is easiest to analyze by creating an API state diagram.
  1. Description
  2. Example
A hypermedia control is an attribute in a resource representation that describes a possible action to the client. It can be a link to follow, or an action that manipulates the resource in some way. Regardless of the used hypermedia format, controls include at least the URI to use when performing the action. In Mason controls also include the HTTP method to use (if it's not GET), and can also include a schema that describes what's considered valid for the request body.
  1. Description
  2. Example
  3. Using
A (URL) converter is a piece of code used in web framework routing to convert a part of the URL into an argument that will be used in the view function. Simple converters are usually included in frameworks by default. Simple converters include things like turning number strings into integers etc. Typically custom converters are also supported. A common example would be turning a model instance's identifier in the URL to the identified model instance. This removes the boilerplate of fetching model instances from view functions, and also moves the handling of Not Found errors into the converter.
The term credentials is used in authentication to indicate the information that identifies you as a specific user from the system's point of view. By far the most common credentials is the combination of username and password. One primary goal of system security is the protection of credentials.
Document Object Model (DOM) is an interface through which Javascript code can interact with the HTML document. It's a tree structure that follows the HTML's hierarchy, and each HTML tag has its own node. Through DOM manipulation, Javascript code can insert new HTML into anywhere, modify its contents or remove it. Any modifications to the DOM are updated into the web page in real time. Do note that since this is a rendering operation, it's very likely one of the most costly operations your code can do. Therefore changing the entire contents of an element at once is better than changing it e.g. one line at a time.
Database schema is the "blueprint" of the database. It defines what tables are contained in the database, and what columns are in each table, and what additional attributes they have. A database's schema can be dumped into an SQL file, and a database can also be created from a schema file. When using object relational mapping (ORM), the schema is constructed from model classes.
  1. Description
  2. Example
Decorator is a function wrapper. Whenever the decorated function is called, its decorator(s) will be called first. Likewise, when the decorated function returns values, they will be first returned to the decorator(s). In essence, the decorator is wrapped around the decorated function. Decorators are particularly useful in web development frameworks because they can be inserted between the framework's routing machinery, and the business logic implemented in a view function. Decorators can do filtering and conversion for arguments and/or return values. They can also add conditions to calling the view function, like authentication where the decorator raises an error instead of calling the view function if valid credentials are not presented.
In HTML element refers to a single tag - most of the time including a closing tag and everything in between. The element's properties are defined by the tag, and any of the properties can be used to select that element from the document object model (DOM). Elements can contain other elements, which forms the HTML document's hierarchy.
For APIs entry point is the "landing page" of the API. It's typically in the API root of the URL hierarchy and contains logical first steps for a client to take when interacting with the API. This means it typically has one or more hypermedia controls which usually point to relevant collections in the API or search functions.
In software testing, a fixture is a component that satisfies the preconditions required by tests. In web application testing the most common role for fixtures is to initialize the database into a state that makes testing possible. This generally involves creating a fresh database, and possibly populating it with some data. In this course fixtures are implemented using pytest's fixture architecture.
  1. Description
  2. Creating DB
  3. Starting the App
This term contains basic instructions about setting up and running Flask applications. See the term tabs "Creating DB" and "Starting the App". For all instructions to work you need to be in the folder that contains your app.
In database terminology, foreign key means a column that has its value range determined by the values of a column in another table. They are used to create relationships between tables. The foreign key column in the target table must be unique.
For most hypermedia types, there exists a generic client. This is a client program that constructs a navigatable user interface based on hypermedia controls in the API, and can usually also generate data input forms. The ability to use such clients for testing and prototyping is one of the big advantages of hypermedia.
HTTP method is the "type" of an HTTP request, indicating what kind of an action the sender is intending to do. In web applications by far the most common method is GET which is used for retrieving data (i.e. HTML pages) from the server. The other method used in web applications is POST, used in submitting forms. However, in REST API use cases, PUT and DELETE methods are also commonly used to modify and delete data.
HTTP request is the entirety of the requets made by a client to a server using the HTTP protocol. It includes the request URL, request method (GET, POST etc.), headers and request body. In Python web frameworks the HTTP request is typically turned into a request object.
In computing a hash is a string that is calculated from another string or other data by an algorithm. Hashes have multiple uses ranging from encryption to encoding independent transmission. Hash algorithms can roughly be divided into one- and two-directional. One-directional hashing algorithms are not reversible - the original data cannot be calculated from the hash. They are commonly used to store passwords so that plain text passwords cannot be retrieved even if the database is compromised. Two-directional hashes can be reversed. A common example is the use of base64 to encode strings to use a limited set of characters from the ASCII range to ensure that different character encodings at various transmission nodes do not mess up the original data.
Headers are additional information fields included in HTTP requests and responses. Typical examples of headers are content-type and content-length which inform the receiver how the content should be interpreted, and how long it should be. In Flask headers are contained in the request.headers attribute that works like a dictionary.
Host part is the part of URL that indicates the server's address. For example, lovelace.oulu.fi is the host part. This part determines where (i.e. which IP address) in the world wide web the request is sent.
In API terminology hypermedia means additional information that is added on top of raw data in resource representations. It's derived from hypertext - the stuff that makes the world wide web tick. The purpose of the added hypermedia is to inform the client about actions that are available in relation to the resource they requested. When this information is conveyed in the representations sent by the API, the client doesn't need to know how to perform these actions beforehand - it only needs to parse them from the response.
An idempotent operation is an operation that, if applied multiple times with the same parameters, always has the same result regardless of how many times it's applied. If used properly, PUT is an idempotent operation: no matter how many times you replace the contents of a resource it will have the same contents as it would have if only one request had been made. On the other hand POST is usually not idempotent because it attempts to create a new resource with every request.
  1. Description
  2. Example
The info object in OpenAPI gives basic information about your API. This basic information includes general description, API version number, and contact information. Even more importantly, it includes license information and link to your terms of service.
Instance folder is a Flask feature. It is intended for storing files that are needed when running the Flask application, but should not be in the project's code repository. Primary example of this is the prodcution configuration file which differs from installation to installation, and generally should remain unchanged when the application code is updated from the repository. The instance path can be found from the application context: app.instance_path. Flask has a reasonable default for it, but it can also be set manually when calling Flask constuctor by adding the instance_path keyword argument. The path should be written as absolute in this case.
  1. Description
  2. Serializing / Parsing
JavaScript Object Notation (JSON) is a popular document format in web development. It's a serialized representation of a data structure. Although the representation syntax originates from JavaScript, It's almost identical to Python dictionaries and lists in formatting and structure. A JSON document conists of key-value pairs (similar to Python dictionaries) and arrays (similar to Python lists). It's often used in APIs, and also in AJAX calls on web sites.
JSON schema is a JSON document that defines the validity criteria for JSON documents that fall under the schema. It defines the type of the root object, and types as well as additional constraints for attributes, and which attributes are required. JSON schemas serve two purposes in this course: clients can use them to generate requests to create/modify resources, and they can also be used on the API end to validate incoming requests.
  1. Description
  2. Common MIME types
MIME type is a standard used for indicating the type of a document.In web development context it is placed in the Content-Type header. Browsers and servers the MIME type to determine how to process the request/response content. On this course the MIME type is in most cases application/json.
Database migration is a process where an existing database is updated with a new database schema. This is done in a way that does not lose data. Some changes can be migrated automatically. These include creation of new tables, removal of columns and adding nullable columns. Other changes often require a migration script that does the change in multiple steps so that old data can be transformed to fit the new schema. E.g. adding a non-nullable column usually involves adding it first as nullable, then using a piece of code to determine values for each row, and finally setting the column to non-nullable.
  1. Description
  2. Example
In ORM terminology, a model class is a program level class that represents a database table. Instances of the class represent rows in the table. Creation and modification operations are performed using the class and instances. Model classes typically share a common parent (e.g. db.Model) and table columns are defined as class attributes with special constuctors (e.g. db.Column).
  1. Description
  2. Example
In API terminology, namespace is a prefix for names used by the API that makes them unique. The namespace should be a URI, but it doesn't have to be a real address. However, usually it is convenient to place a document that described the names within the namespace into the namespace URI. For our purposes, namespace contains the custom link relations used by the API.
Object relational mapping is a way of abstracting database use. Database tables are mapped to programming language classes. These are usually called models. A model class declaration defines the table's structure. When rows from the database table are fetched, they are represented as instances of the model class with columns as attributes. Likewise new rows are created by making new instances of the model class and committing them to the database. This course uses SQLAlchemy's ORM engine.
OpenAPI (previously: Swagger) is a description language for API documentation. It can be written with either JSON or YAML. An OpenAPI document is a single nested data structure which makes it suitable to be used with various tools. For example, Swagger UI is a basic tool that renders an OpenAPI description into a browsable documentation page. Other kinds of tools include using schemas in OpenAPI description for validation, and generating OpenAPI specification from live code.
  1. Description
  2. Example
Operation object is one of the main parts of an OpenAPI specification. It describes one operation on a resource (e.g. GET). The operation object includes full details of how to perform the operation, and what kinds of responses can be expected from it. Two of its key parameters are requestBody which shows how to make the request, and responses, which is a mapping of potential responses.
With Flasgger, an operation object can be put into a view method's docstring, or a separate file, to document that particular view method.
Pagination divides a larger dataset into smaller subsets called pages. Search engine results would be the most common example. You usually get 10 or 20 first hits from you search, and then have to request the next page in order to get more. The purpose of pagination is to avoid transferring (and rendering) unnecessary data, and it is particularly useful in scenarios where the relevance of data declines rapidly (like search results where the accuracy drops the further you go). An API that offers paginated data will typically offer access to specific pages with both absolute (i.e. page number) and relative (e.g. "next", "prev", "first" etc.) URLs. These are usually implemented through query parameters.
In OpanAPI a path parameter is a variable placeholder in a path. It is the OpenAPI equivalent for URL parameters that we use in routing. Path parameter typically has a description and a schema that defines what is considered valid for its value. These parameter definitions are often placed into the components object as they will be used in multiple resources. In OpenAPI syntax path parameters in paths are marked with curly braces, e.g. /api/sensors/{sensor}/.
In database terminology primary key refers to the column in a table that's intended to be the primary way of identifying rows. Each table must have exactly one, and it needs to be unique. This is usually some kind of a unique identifier associated with objects presented by the table, or if such an identifier doesn't exist simply a running ID number (which is incremented automatically).
Profile is metadata about a resource. It's a document intended for client developers. A profile gives meaning to each word used in the resource representation be it link relation or data attribute (also known as semantic descriptors). With the help of profiles, client developers can teach machine clients to understand resource representations sent by the API. Note that profiles are not part of the API and are usually served as static HTML documents. Resource representations should always contain a link to their profile.
In database terminology, query is a command sent to the database that can fetch or alter data in the database. Queries use written with a script-like language. Most common is the structured query language (SQL). In object relational mapping, queries are abstracted behind Python method calls.
  1. Description
  2. Example
Query parameters are additional parameters that are included in a URL. You can often see these in web searches. They are the primary mechanism of passing arbitrary parameters with an HTTP request. They are separated from the actual address by ?. Each parameter is written as a key=value pair, and they are separated from each other by &. In Flask applications they can be found from request.args which works like a dictionary.
  1. Description
  2. Examples
Regular expressions are used in computing to define matching patterns for strings. In this course they are primarily used in validation of route variables, and in JSON schemas. Typical features of regular expressions are that they look like a string of garbage letters and get easily out of hand if you need to match something complex. They are also widely used in Lovelace text field exercises to match correct (and incorrect) answers.
In this course request referes to HTTP request. It's a request sent by a client to an HTTP server. It consists of the requested URL which identifies the resource the client wants to access, a method describing what it wants to do with the resource. Requests also include headers which provide further context information, and possihby a request body that can contain e.g. a file to upload.
  1. Description
  2. Accessing
In an HTTP request, the request body is the actual content of the request. For example when uploading a file, the file's contents would be contained within the request body. When working with APIs, request body usually contains a JSON document. Request body is mostly used with POST, PUT and PATCH requests.
  1. Description
  2. Getting data
Request object is related to web development frameworks. It's a programming language object representation of the HTTP request made to the server. It has attributes that contain all the information contained within the request, e.g. method, url, headers, request body. In Flask the object can be imported from Flask to make it globally available.
in RESTful API terminology, a resource is anything that is interesting enough that a client might want to access it. A resource is a representation of data that is stored in the API. While they usually represent data from the database tables it is important to understand that they do not have a one-to-one mapping to database tables. A resource can combine data from multiple tables, and there can be multiple representations of a single table. Also things like searches are seen as resources (it does, after all, return a filtered representation of data).
Resource classes are introduced in Flask-RESTful for implementing resources. They are inherited from flask_restful.Resource. A resource class has a view-like method for each HTTP method supported by the resource (method names are written in lowercase). Resources are routed through api.add_resource which routes all of the methods to the same URI (in accordance to REST principles). As a consequence, all methods must also have the same parameters.
In this course we use the term representation to emphasize that a resource is, in fact, a representation of something stored in the API server. In particular you can consider representation to mean the response sent by the API when it receives a GET request. This representation contains not only data but also hypermedia controls which describe the actions available to the client.
In this course response refers to HTTP response, the response given by an HTTP server when a request is made to it. Reponses are made of a status code, headers and (optionally) response body. Status code describes the result of the transaction (success, error, something else). Headers provide context information, and response body contains the document (e.g. HTML document) returned by the server.
Response body is the part of HTTP response that contains the actual data sent by the server. The body will be either text or binary, and this information with additional type instructions (e.g. JSON) are defined by the response's Content-type header. Only GET requests are expected to return a response body on a successful request.
Response object is the client side counterpart of request object. It is mainly used in testing: the Flask test client returns a response object when it makes a "request" to the server. The response object has various attributes that represent different parts of an actual HTTP response. Most important are usually status_code and data.
In database terminology, rollback is the cancellation of a database transaction by returning the database to a previous (stable) state. Rollbacks are generally needed if a transaction puts the database in an error state. On this course rollbacks are generally used in testing after deliberately causing errors.
  1. Description
  2. Routing in Flask
  3. Reverse routing
  4. Flask-RESTful routing
URL routing in web frameworks is the process in which the framework transforms the URL from an HTTP request into a Python function call. When routing, a URL is matched against a sequence of URL templates defined by the web application. The request is routed to the function registered for the first matching URL template. Any variables defined in the template are passed to the function as parameters.
In relational database terminology, row refers to a single member of table, i.e. one object with properties that are defined by the table's columns. Rows must be uniquely identifiable by at least one column (the table's primary key).
SQL (structured query language) is a family of languages that are used for interacting with databases. Queries typically involve selecting a range of data from one or more tables, and defining an operation to perform to it (such as retrieve the contents).
Serialization is a common term in computer science. It's a process through which data structures from a program are turned into a format that can be saved on the hard drive or sent over the network. Serialization is a reversible process - it should be possible to restore the data structure from the representation. A very common serialization method in web development is JSON.
In web applications static content refers to content that is served from static files in the web server's hard drive (or in bigger installations from a separate media server). This includes images as well as javascript files. Also HTML files that are not generated from templates are static content.
Swagger is a set of tools for making API documentation easier. In this course we use it primarily to render easily browsable online documentation from OpenAPI description source files. Swagger open source tools also allow you to run mockup servers from your API description, and there is a Swagger editor where you can easily see the results of changes to your OpenAPI description in the live preview.
In this course we use Flasgger, a Swagger Flask extension, to take render API documentation.
In database terminology, a table is a collection of similar items. The attributes of those items are defined by the table's columns that are declared when the table is created. Each item in a table is contained in a row.
In software testing, test setup is a procedure that is undertaken before each test case. It prepares preconditions for the test. On this course this is done with pytest's fixtures.
In software testing, test teardown is a process that is undertaken after each test case. Generally this involves clearing up the database (e.g. dropping all tables) and closing file descriptors, socket connections etc. On this course pytest fixtures are used for this purpose.
Universal resource identifier (URI) is basically what the name says: it's a string that unambiguously identifies a resource, thereby making it addressable. In APIs everything that is interesting enough is given its own URI. URLs are URIs that specify the exact location where to find the resource which means including protocol (http) and server part (e.g. lovelace.oulu.fi) in addition to the part that identifies the resource within the server (e.g. /ohjelmoitava-web/programmable-web-project-spring-2019).
  1. Description
  2. Type converters
  3. Custom converters
URL template defines a range of possible URLs that all lead to the same view function by defining variables. While it's possible for these variables to take arbitrary values, they are more commonly used to select one object from a group of similar objects, i.e. one user's profile from all the user profiles in the web service (in Flask: /profile/<username>. If a matching object doesn't exist, the default response would be 404 Not Found. When using a web framework, variables in the URL template are usually passed to the corresponding view function as arguments.
Uniform interface is a REST principle which states that all HTTP methods, which are the verbs of the API, should always behave in the same standardized way. In summary:
  • GET - should return a representation of the resource; does not modify anything
  • POST - should create a new instance that belongs to the target collection
  • PUT - should replace the target resource with a new representation (usually only if it exists)
  • DELETE - should delete the target resource
  • PATCH - should describe a change to the resource
In database terminology, unique constraint is a what ensures the uniqueness of each row in a table. Primary key automatically creates a unique constraint, as do unique columns. A unique constraint can also be a combination of columns so that each combination of values between these columns is unique. For example, page numbers by themselves are hardly unique as each book has a first page, but a combination of book and page number is unique - you can only have one first page in a book.
  1. Description
  2. Registering
View functions are Python functions (or methods) that are used for serving HTTP requests. In web applications that often means rendering a view (i.e. a web page). View functions are invoked from URLs by routing. A view function always has application context.
  1. Description
  2. Creation
  3. Activation
A Python virtual environment (virtualenv, venv) is a system for managing packages separately from the operating system's main Python installation. They help project dependency management in multiple ways. First of all, you can install specific versions of packages per project. Second, you can easily get a list of requirements for your project without any extra packages. Third, they can placed in directories owned by non-admin users so that those users can install the packages they need without admin privileges. The venv module which is in charge of creating virtual environments comes with newer versions of Python.
Interface, implemented using web technologies, that exposes a functionality in a remote machine (server). By extension Web API is the exposed functionality itself.
  1. Description
  2. Example
YAML (YAML Ain't Markup Language) is a human-readable data serialization language that uses a similar object based notation as JSON but removes a lot of the "clutter" that makes JSON hard to read. Like Python, YAML uses indentation to distinguish blocks from each other, although it also supports using braces for this purpose (which, curiously enough, makes JSON valid YAML). It also removes the use of quotation characters where possible. It is one of the options for writing OpenAPI descriptions, and the one we are using on this course.