Termbank
  1. C
    1. Checking Daemon
      Exercises System
    2. Celery Worker
      Checking Daemon
    3. Content Graph
      Content Courses
    4. Content Page
      Content
    5. Course
      Courses
    6. Course Instance
      Courses
    7. Course Prefix
      Courses System
  2. E
    1. Embedded Content
      Content Exercises
    2. Enrollment
      Courses System
  3. F
    1. Feedback
      Content Feedback
    2. File
      Media
    3. File Upload Exercise
      Exercises
    4. Front Page
      Content Courses
  4. H
    1. Hint
      Exercises
  5. I
    1. Instance
      Course Instance
    2. Image
      Media
  6. L
    1. Lecture Page
      Content
    2. Legacy Checker
  7. M
    1. Media File
      File
    2. Markup
      Content
    3. Media
      Media
  8. P
    1. PySenpai
  9. R
    1. Regex
    2. Repeated Exercise Generator
    3. Responsible Teacher
      Courses System
    4. Revision
      System
  10. S
    1. Slug
      System
    2. Staff
      Courses System
    3. Statistics
      Exercises
  11. T
    1. Teacher Toolbox
      System
    2. Term
      Content
    3. Textfield Exercise
    4. Triggerable Highlight
      Exercises
Completed: / exercises

Implementing Python Checkers with PySenpai

As Lovelace is installed on a Linux server, it is recommended that you develop and test your checkers in a Linux environment. For Python checkers developing on Windows should be safe in almost all scenarios too though. You do not need a running instance of Lovelace for testing - checkers are stand-alone programs.

Checker Basics

In this section we'll go through the basics of implementing a Python checker, and some of the more common and simpler customizations. The file shown below is the skeleton of a minimal checker that tests a function from the student submission.
pychecker_skel.py
import random
import test_core as core

st_function = {
    "fi": "",
    "en": ""
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    return v

def ref_func():
    return None

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)
    
Starting from the top, the st_function dictionary contains the name of the function to be tested, in each of the available language. The keys are standard language codes. Even if only one language is supported, the function name still must be given as a dictionary.
The skeleton also creates another dictionary for messages. This is a subclass dictionary that has methods for accessing the same key with multiple languages. The methods are:
This is the type of dictionary that is expected as the argument for custom_msgs in the various testing functions. Very basic checkers may not need to set any messages, in which case this can be ignored.
Next up are two mandatory functions. The first generates the test vector which in turn determines the number of test cases. The second is the reference function. For simple checkers this is a copy of the function from your reference solution. Details and tips for implementing both are given in later sections. There are a lot of other functions that can be defined here, but these are the mandatory ones.
The section under if __name__ == "__main__": contains the code that actually executes the test preparations and tests themselves. The parsing of command line arguments is done by PySenpai's parse_command function which returns the names of files ($RETURNABLES in the command written in Lovelace file exercise admin interface) and the language argument. If only one file is returned, the first one is the one to check. If the exercise requires multiple files, then you will need some way of figuring out which is which (requiring specific names is the most common).
The load_module function turns the student submission from a file name to an imported Python module. It returns None if importing the module fails, in which case there is no point in trying to continue testing. The rest of the code is calling various test functions.
In the skeleton, two functions are called, both with minimal arguments. These are the most common functions for Python checkers to call. Note that test_function in particular has a lot of optional parameters for various purposes - we'll get to each of them eventually in this guide chapter. For most checkers using pylint_test is recommended because using it doesn't typically require any extra effort from you, and setting info_only=True only shows the student the linter messages (if set to False too low scores will result in rejection).
If we fill in the function name dictionary and the two functions, we have ourselves a basic checker. Below is an example that checks a function that calculates kinetic energy.
kinetic_test_basic.py
import random
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    
    return v

def ref_func(v, m):
    return 0.5 * m * v * v

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)
    

About Test Vectors

A test vector should contain one list of arguments/inputs for each test case. The number of test cases is directly derived from the length of the test vector. Note that the test vector must always contain lists (or tuples) even if the function being tested only takes a single argument - the function is always called by unpacking the argument list. This is the line that calls the student function:
res = st_func(*args)
So the entire test vector has to always be a list of sequences.
A good test vector has multiple cases, some of which are entirely randomly generated and some of which cover the edge cases that need to be tested specifically. Randomness makes it impossible for students to make code that tries to pass the test instead of actually implementing what was required, while covering edge cases makes sure that partially functioning solutions don’t get accepted accidentally if they get a favorable random test vector.
There are no strict guidelines to how many test cases should be in a checker. For simple checkers, we've been using 10 cases. If the exercise is complex enough that running some or all cases takes a noticeable amount of time, it's best to keep the checking time to a minimum. The default timeout for checker runs is 5 seconds and while this can be changed, it should only be done if necessary.

Changing the Validator

The default validator of PySenpai, called result_validator, is adequate for most checkers where no fuzziness is required in validation (it does strip whitespaces from both ends of strings). However, there are times when changing the validator is needed (or recommended). Our example here has floating point numbers as results. This runs the risk of rounding errors if the student implementation is different. For this reason, it would be safer to use rounding_float_result_validator as our validator. To do so, we'll change the test_function call to:
core.test_function(st_module, st_function, gen_vector, ref_func, lang, validator=core.rounding_float_result_validator)
There are a couple of other validators available:
If you need more elaborate validation, it's time to implement a custom validator. Details are provided in a later section.

Using Input

In this section we'll go through how to test functions that read inputs from the user. Extending the previous example, let's assume we want to add another function to the exercise. This function will prompt the user for a number until they give a proper positive float value. The prompt text is given as an argument to the function.
On the checker side this means we need to add another set of basics function test components. This time two vectors are needed: one for arguments to the function and another for inputs. The argument vector should have a few different prompt messages. Even though these are not used in the test in any way, they're still shown in the output so we should make a few sensible options. They should be put in a dictionary where keys are the available languages (shown in the example at the end). After we have the prompts, we can generate the argument vector:
def gen_prompt_vector():
    v = []
    for in range(10):
        v.append((random.choice(prompts[lang]), ))
    return v
The input vector contains the values that will be fed to the student's input function calls through a faked stdin. The stdin for each test run will be formed from the input vector by joining it with newlines. This line in the test_function code does it:
sys.stdin = io.StringIO("\n".join([str(x) for x in inps]))
Since str will be called for each value in the input vector, the vector can contain any types of values. This makes implementing the test somewhat easier. To test, we want to generate vectors where the last item is a proper input, and it is preceded by up to several garbage inputs that are either text or negative numbers. The following example does randomization rather thoroughly:
def gen_input_vector():
    v = []
    v.append((random.choice(string.ascii_letters), round(random.random() * 100, 3)))
    v.append((round(random.random() * -100, 3), round(random.random() * 100, 3)))
    for i in range(8):
        case = []
        for j in range(random.randint(0, 2)):
            if random.randint(0, 1):
                case.append(random.choice(string.ascii_letters))
            else:
                case.append(round(random.random() * -100, random.randint(1, 4)))
            
        case.append(round(random.random() * 100, random.randint(1, 4)))
        v.append(case)
    
    random.shuffle(v)
    return v
What about the reference? By default it would only be given the argument. However this is not very useful because the information we actually want is in the input vector. In order to get the input vector passed to the reference, we need to set the reference_needs_input keyword argument to True when calling test_function. This change needs to be reflected in the reference function - it now gets two arguments: the list of arguments and the list of inputs. The reference in this case is actually very simple:
def ref_prompt(args, inputs):
    return inputs[-1]
This is because we know from implementing the test vector that the last item is always the proper input while everything before it is not. Therefore, when the student function is working correctly it should return the same value. All that's left is to call test_function with two new keyword arguments - one for providing the inputs, and another for telling the function that our reference expects to see both vectors:
core.test_function(st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang,
    inputs=gen_input_vector,
    ref_needs_inputs=True
)
The updated example is below:
kinetic_test_input.py
import random
import string
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

st_prompt_function = {
    "fi": "pyyda_liukuluku",
    "en": "prompt_float"
}

prompts = {
    "fi": ["Anna nopeus: ", "Syötä massa: ", "Anna positiivinen luku: "],
    "en": ["Input speed: ", "Define mass: ", "Give a positive number: "]
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    return v

def gen_prompt_vector():
    v = []
    for i in range(10):
        v.append((random.choice(prompts[lang]), ))
    return v

def gen_input_vector():
    v = []
    v.append((random.choice(string.ascii_letters), round(random.random() * 100, 3)))
    v.append((round(random.random() * -100, 3), round(random.random() * 100, 3)))
    for i in range(8):
        case = []
        for j in range(random.randint(0, 2)):
            if random.randint(0, 1):
                case.append(random.choice(string.ascii_letters))
            else:
                case.append(round(random.random() * -100, random.randint(1, 4)))
            
        case.append(round(random.random() * 100, random.randint(1, 4)))
        v.append(case)
    
    random.shuffle(v)
    return v

def ref_prompt(args, inputs):
    return inputs[-1]

def ref_func(v, m):
    return 0.5 * m * v * v

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang, inputs=gen_input_vector, ref_needs_inputs=True)
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)
        
    

Using Output

Another way to obtain results from a student function is to parse values from its output. Typically this involves implementing a parser function that obtains the relevant values, and changing the validator to a one that uses these values instead of the function's return values. To show how to do this, we're going to make a checker for a function that prints all even or odd numbers from a list.
The parser is a function that receives the raw output produced by the student function and returns values that will be validated against the reference. In this example, it should be a function that finds all integers from the output and returns them as a list. Most of the time this is done using the findall method of regular expression objects.
int_pat = re.compile("-?[0-9]+")

def parse_ints(output):
    return [int(v) for v in int_pat.findall(output)]
Note that it is always better to be more lenient in parsing than in validating. For example, if your exercise demands floating point numbers to be printed with exactly 2 decimal precision, your parser should still return all floats from the output. Incorrect precision should be caught by the validator instead. Also it's best to make sure your parser doesn't miss values due to unexpected formatting - at least within reasonable limits. It's confusing to the students if their code prints values but the checker claims it didn't find them. In cases where you absolutely want to complain about the output being unparseable, you can raise OutputParseError from the parser function. This will abort the current test case and mark it incorrect.
Implementing a reference for output validation is no different than implementing one for a function that returns the values instead. You can see the reference in the example at the end. A few adjustments are needed when validating output instead of return value. First, the validator needs to be changed to parsed_result_validator (or some other that validates outputs). Second, some messages need to be adjusted. By default the messages talk about return values. Here's a set of replacements for tests that validate outputs instead. In the future we may add an option to use these by setting a flag when calling test_function.
custom_msgs = core.TranslationDict()
custom_msgs.set_msg("CorrectResult", "fi", "Funktio tulosti oikeat arvot.")
custom_msgs.set_msg("CorrectResult", "en", "Your function printed the correct values.")
custom_msgs.set_msg("PrintStudentResult", "fi", "Funktion tulosteesta parsittiin arvot: {parsed}")
custom_msgs.set_msg("PrintStudentResult", "en", "Values parsed from function output: {parsed}")
custom_msgs.set_msg("PrintReference", "fi", "Olisi pitänyt saada: {ref}")
custom_msgs.set_msg("PrintReference", "en", "Should have been: {ref}")
With these in mind, we can call test_function. Because the return value of the function is always None, setting test_recurrence to False is called for. Doing so avoids complaining to the student that their function always returns the same value.
core.test_function(st_module, st_function, gen_vector, ref_func, lang, 
    custom_msgs=custom_msgs,
    output_parser=parse_ints,
    validator=parsed_result_validator,
    test_recurrence=False
)
Full example:
even_odd_test.py
import random
import re
import test_core as core

int_pat = re.compile("-?[0-9]+")

st_function = {
    "fi": "tulosta_parilliset_parittomat",
    "en": "print_even_odd"
}

custom_msgs = core.TranslationDict()
custom_msgs.set_msg("CorrectResult", "fi", "Funktio tulosti oikeat arvot.")
custom_msgs.set_msg("CorrectResult", "en", "Your function printed the correct values.")
custom_msgs.set_msg("PrintStudentResult", "fi", "Funktion tulosteesta parsittiin arvot: {parsed}")
custom_msgs.set_msg("PrintStudentResult", "en", "Values parsed from function output: {parsed}")
custom_msgs.set_msg("PrintReference", "fi", "Olisi pitänyt saada: {ref}")
custom_msgs.set_msg("PrintReference", "en", "Should have been: {ref}")

def gen_vector():
    v = []
    v.append(([2, 3, 5, 8, 10], True))
    v.append(([4, 3, 7, 9 ,11], False))
    
    for i in range(8):
        numbers = [random.randint(-100, 100) for j in range(random.randint(5, 10))]
        v.append((numbers, bool(random.randint(0, 1))))
        
    random.shuffle(v)
        
    return v

def ref_func(numbers, even):
    if even:
        return [n for n in numbers if n % 2 == 0]
    else:
        return [n for n in numbers if n % 2 != 0]

def parse_ints(output):
    return [int(v) for v in int_pat.findall(output)]

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang,
            custom_msgs=custom_msgs,
            output_parser=parse_ints,
            validator=core.parsed_result_validator,
            test_recurrence=False
        )
        core.pylint_test(st_module, lang, info_only=True)
    

Validating Messages

The previous section covered how to validate values that are parsed from output. What about validating things like error messages? While you can technically do it with a normal validator, using a separate message validator is recommended. When the two validations are separated the student has a better idea of where the problems in their code are. Message validators operate with knowledge of arguments and inputs given to the student function, and its full output. Just like a normal validator, a message validator should also make assert statements about the output. Because message validators often deal with natural language, maximum leniency is recommended.
Getting back to our kinetic energy example, let's assume we've instructed the students to give two different error messages when prompting input. Either "This is not a number" if the input cannot be converted to float, or "Value must be positive" if it's negative. Usually these kinds of validators either use regular expressions or dissect the string in some other way. When using regular expressions, one way is to put them in a TranslationDict object.
msg_patterns = core.TranslationDict()
msgs_patterns.set_msg("NaN", "fi", re.compile("ei(?: ole)? numero"))
msgs_patterns.set_msg("NaN", "en", re.compile("not ?a? number"))
msgs_patterns.set_msg("negative", "fi", re.compile("positiivinen"))
msgs_patterns.set_msg("negative", "en", re.compile("positive"))
The validator itself should go through the inputs and see that there's a proper error message for each improper input. However before doing that it should also check that there's a sufficient number of prompts - otherwise this validator will cause an uncaught exception (IndexError for lines[i])
def error_msg_validator(output, args, inputs):
    lines = output.split("\n")
    
    assert len(lines) >= len(inputs), "fail_insufficient_prompts"
    for i, value in enumerate(inputs):
        if isinstance(value, str):
            assert msg_patterns.get_msg("NaN", lang).search(lines[i]), "fail_not_a_number"
        elif value < 0:
            assert msg_patterns.get_msg("negative", lang).search(lines[i]), "fail_negative"
This is also the first instance of using multiple validation failure messages within a single validator. Using "fail_" to prefix message names is a convention. These handles will correspond actual messages in the custom messages dictionary. Here you can also see how to add hints to a message - the value is now a dictionary instead of a string. More details about this can be found from a separate section.
msgs = core.TranslationDict()
msgs.set_msg("fail_insufficient_prompts", "fi", dict(
    content="Funktio ei kysynyt lukua tarpeeksi montaa kertaa.",
    hints=["Tarkista, että funktio hylkää virheelliset syötteet oikein."]
))
msgs.set_msg("fail_insufficient_prompts", "en", dict(
    content="The function didn't prompt input sufficient number of times.",
    hints=["Make sure the function rejects erroneous inputs properly."]
))
msgs.set_msg("fail_not_a_number", "fi", "Muuta kuin numeroita sisältävästä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_not_a_number", "en", "The error message for non-number input was wrong.")
msgs.set_msg("fail_negative", "fi", "Negatiivisestä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_negative", "en", "The error message for negative input was wrong.")
Again with all the pieces in places, we can call test_function:
core.test_function(st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang, 
    inputs=gen_input_vector,
    ref_needs_inputs=True,
    message_validator=error_msg_validator
)
Full example:
kinetic_test_messages.py
import random
import re
import string
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

st_prompt_function = {
    "fi": "pyyda_liukuluku",
    "en": "prompt_float"
}

prompts = {
    "fi": ["Anna nopeus: ", "Syötä massa: ", "Anna positiivinen luku: "],
    "en": ["Input speed: ", "Define mass: ", "Give a positive number: "]
}

msg_patterns = core.TranslationDict()
msg_patterns.set_msg("NaN", "fi", re.compile("ei(?: ole)? numero"))
msg_patterns.set_msg("NaN", "en", re.compile("not ?a? number"))
msg_patterns.set_msg("negative", "fi", re.compile("positiivinen"))
msg_patterns.set_msg("negative", "en", re.compile("positive"))

msgs = core.TranslationDict()
msgs.set_msg("fail_insufficient_prompts", "fi", dict(
    content="Funktio ei kysynyt lukua tarpeeksi montaa kertaa.",
    hints=["Tarkista, että funktio hylkää virheelliset syötteet oikein."]
))
msgs.set_msg("fail_insufficient_prompts", "en", dict(
    content="The function didn't prompt input sufficient number of times.",
    hints=["Make sure the function rejects erroneous inputs properly."]
))
msgs.set_msg("fail_not_a_number", "fi", "Muuta kuin numeroita sisältävästä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_not_a_number", "en", "The error message for non-number input was wrong.")
msgs.set_msg("fail_negative", "fi", "Negatiivisestä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_negative", "en", "The error message for negative input was wrong.")

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    return v

def gen_prompt_vector():
    v = []
    for i in range(10):
        v.append((random.choice(prompts[lang]), ))
    return v

def gen_input_vector():
    v = []
    v.append((random.choice(string.ascii_letters), round(random.random() * 100, 3)))
    v.append((round(random.random() * -100, 3), round(random.random() * 100, 3)))
    for i in range(8):
        case = []
        for j in range(random.randint(0, 2)):
            if random.randint(0, 1):
                case.append(random.choice(string.ascii_letters))
            else:
                case.append(round(random.random() * -100, random.randint(1, 4)))
            
        case.append(round(random.random() * 100, random.randint(1, 4)))
        v.append(case)
    
    random.shuffle(v)
    return v

def ref_prompt(args, inputs):
    return inputs[-1]

def ref_func(v, m):
    return 0.5 * m * v * v

def error_msg_validator(output, args, inputs):
    lines = output.split("\n")
    
    assert len(lines) >= len(inputs), "fail_insufficient_prompts"
    for i, value in enumerate(inputs):
        if isinstance(value, str):
            assert msg_patterns.get_msg("NaN", lang).search(lines[i]), "fail_not_a_number"
        elif value < 0:
            assert msg_patterns.get_msg("negative", lang).search(lines[i]), "fail_negative"

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang, 
            inputs=gen_input_vector,
            ref_needs_inputs=True,
            message_validator=error_msg_validator
        )
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)
        
    

Using Objects

Checking functions that modify existing objects instead of returning anything can be done using a result object extractor. These are functions that modify the student result. The most oftenly used mofication is to replace the return value with one of the arguments from the test vector - the one that contains the object that was modified. When doing tests with objects, it should be noted that - by default - the same object is passed to the reference and the student function, and other possible functions. Since this will lead to problems, we need another modification to the default behavior: an argument cloner. A cloner creates copies of mutable objects in the argument vector to prevent functions from affecting each other.
To show how to do these, we'll take a new example once more. This will be a simple filtering function that removes all values with an absolute value over a given threshold from a list of values. Students have been instructed to remove values from the original list - they should not make a copy of it. You can see most of the functions from the example as there isn't anything new about them. In order to change the student result from the return value to one of the arguments, this simple extractor function will do:
def values_extractor(args, res, parsed):
    return args[0]
As can be seen from the def line, arguments, student return value and values parsed by an output parser are available for extraction. This extraction is not done for the reference - the reference should directly return the value(s) we want to validate the student result against. In this case it would be:
def ref_func(values, threshold):
    return [value for value in values if value <= threshold]
Unlike the function expected from students this one actually does create a copy - but we're allowed to cheat in reference functions as long as we get the desired result. Although our reference doesn't modify the original list of values, an argument cloner is still needed because the arguments list goes into several different function calls (like when printing the function call into the evaluation log). For a one-dimensional list, the argument cloner is very simple:
def values_cloner(args):
    return args[0][:], args[1]
Most objects can also be cloned by the deepcopy function from the copy module if you're feeling lazy. Again with all the pieces in place, we'll just call test_function
core.test_function(st_module, st_function, gen_vector, ref_filter, lang,
    result_object_extractor=values_extractor,
    argument_cloner=values_cloner
)
If you are content with just using deepcopy, you can just use result_object_extractor=copy.deepcopy when calling the function and not bother with implementing your own function. However this might be slower, especially if there are parts in your argument vector that don't need cloning.
Full example:
filter_test.py
import random
import test_core as core

st_function = {
    "fi": "suodata_virhearvot", 
    "en": "filter_noise"
}

msgs = core.TranslationDict()

def gen_vector(): 
    v = [([10, 15, 15, 10, 10, 10], 12)]
    for i in range(9):    
        test = []
        for i in range(random.randint(1, 10)):
            test.append(random.randint(1, 99) * (random.randint(0, 1) * 1000 + 1))
        v.append((test, random.randint(1, 1000)))
    return v

def ref_func(values, threshold):
    return [value for value in values if value <= threshold]

def values_extractor(args, res, parsed):
    return args[0]
    
def values_cloner(args):
    return args[0][:], args[1]

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]   
    st_module = core.load_module(st_mname, lang, inputs=[[], 1])
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_filter, lang
            result_object_extractor=values_extractor,
            argument_cloner=values_cloner
        )
        core.pylint_test(st_module, lang)

Custom Validators

Default validators are adequate for tests where exact matching between reference and student result is reasonable, and informative enough. However when the correct result has multiple potential representations or it is simply very complex, custom validators might be needed. Likewise if the task itself is complex, a custom validator with more than one assert can give better information about how exactly the student's submission is wrong.
Validators are functions that use one or more assert statements to compare the student's result or parsed output (or both) against the reference. The first failed assert is reported as the test result. If the validators runs through with failed asserts, the validation is considered successful. Each assert can be connected to a message that's been defined in the custom messages provided by the checker. To do so, the assert statement should use a handle string which corresponds to a message key in the dictionary.
For our example let's look at a checker that tests a function that converts a complex number to polar coordinates using degrees. While the default validator is capable of handling this, we want to give more information than just correct / incorrect. Also there might be some rounding errors which would cause the default validator to give false negatives.
A key thing to remember in implementing custom validators is that the only exception that is caught in this step is AssertionError - it's the validators responsibility to make sure it's not trying to do impossible things. Since validating ends at the first failed assert, ordering assert statements properly can be used to prevent validation from proceeding to steps that may cause errors. In our case we need to start by making sure the student function returned two values, then make sure they are both floats (because we want to round them) and finally we can compare them to the reference.
def polar_validator(ref, res, out):
    assert isinstance(res, (tuple, list)), "fail_return_value_length"
    assert len(res) == 2, "fail_return_value_length"
    assert isinstance(res[0], float), "fail_not_float" 
    assert isinstance(res[1], float), "fail_not_float"
    assert round(res[0], 2) == round(ref[0], 2), "fail_radius"
    assert round(res[1], 2) == round(ref[1], 2), "fail_angle"
Compared to a default validator, there's now 4 different messages. We don't expect the first two to trigger very often, but providing messages for them doesn't cost us a whole lot so might as well. You can see the messages from the full example below. With the validator function and messages in place, it's time to call test_function
core.test_function(st_module, st_function, gen_vector, ref_func, lang,
    custom_msgs=msgs,
    validator=polar_validator
)
Full example:
polar_test.py
import cmath
import random
import test_core as core

st_function = {
    "fi": "muunna_napakoordinaateiksi", 
    "en": "convert_to_polar"
}

msgs = core.TranslationDict()
msgs.set_msg("fail_return_value_length", "fi", "Funktio ei palauttanut kahta arvoa.")
msgs.set_msg("fail_return_value_length", "en", "The function didn't return two values.")
msgs.set_msg("fail_not_float", "fi", "Palautetut arvot eivät olleet liukulukuja.")
msgs.set_msg("fail_not_float", "en", "The returned values were not floats.")
msgs.set_msg("fail_radius", "fi", "Osoittimen pituus oli väärin.")
msgs.set_msg("fail_radius", "en", "The radius was wrong.")
msgs.set_msg("fail_angle", "fi", "Osoittimen vaihekulma oli väärin.")
msgs.set_msg("fail_angle", "en", "The polar angle was wrong.")

def gen_vector():
    v = [(-0.7j, )]
    
    for i in range(9):
        v.append((complex(round(random.random() * 10, 5), round(random.random() * 10 - 5, 5)), ))
    
    return v    
    
def ref_func(v):
    r, a = cmath.polar(v)
    return r, a * 180 / cmath.pi

def eref_radians(v):
    r, a = cmath.polar(v)
    return r, a    
    
def polar_validator(ref, res, out):
    assert isinstance(res, (tuple, list)), "fail_return_value_length"
    assert len(res) == 2, "fail_return_value_length"
    assert isinstance(res[0], float), "fail_not_float" 
    assert isinstance(res[1], float), "fail_not_float"
    assert round(res[0], 2) == round(ref[0], 2), "fail_radius"
    assert round(res[1], 2) == round(ref[1], 2), "fail_angle"
    
if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]   
    st_module = core.load_module(st_mname, lang, allow_output=False, custom_msgs=load_msgs)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang,
            custom_msgs=msgs,
            validator=polar_validator
        )

Other Customization

Most customization was covered by the previous sections. There's two more keyword arguments to test_function that were not covered: new_test and repeat. The first is for miscellaneous preparations at the start of each test case and the latter is a mechanism for calling the student function multiple times with the same arguments and inputs instead of once. Both of these are mostly for enabling fringe checkers.
If you provide a function as the new_test callback, this function will be called as the very first thing in each test case. In most checkers it's not needed for anything at all (the default does nothing). However, if there are persistent objects involved in the checking process that are not re-initialized by the student code, this callback is the correct place to reset them. The callback receives two arguments: test case arguments and inputs. Other objects that you need to access, have to be accessed globally within the checker code.
Repeat is needed even more rarely. It only has use in testing functions that would normally be called within a loop mutliple times to achieve the desired result.

Diagnosis Features

Due to the default messages of PySenpai, even by following the basic checker instructions above most checkers give relatively useful feedback. However, to truly reduce TA work, PySenpai offers a few ways to create catches for typical mistakes, and provide additional hints when students make them. Creating these is often an iterative process as year after year you have more data about what kinds of mistakes students make. However, making some initial guesses can be helpful too.
Implementing a diagnostic typically involves creating a function that discovers the mistake, and one or more messages that are shown in the evaluation output when the mistake is encountered. Attaching hints and
highlight triggers
is also common as they can provide much more accurate information when your checker is pretty certain of the nature of the mistake.
There's three ways to implement diagnosis functions: error/false references, custom tests and information functions. There is some overlap, but usually it's pretty straightforward to choose the right one. As stated in the PySenpai overview, there is also a return value recurrence check built into PySenpai. This is enabled by default and should be disabled when testing functions that don't return or modify anything (i.e. functions that only print stuff).

False Reference Functions

False or error reference functions are one of the most convenient ways to identify commonly made mistakes by students. A false reference is a modified copy of the actual reference function that emulates a previously identified erroneous behavior. In tests, these functions are treated like reference functions. They are called in the diagnosis step, and the student’s result is compared with the false reference result using the same validator as the test itself. If it matches (i.e. there is no AssertionError), it is highly likely that the student has made the error that’s emulated by the false reference.
False references are usually very easy to implement. Attaching messages to them is also very simple. When PySenpai gets a match with the student result against the false reference, it looks up a message with the false reference function's name. Do note that this message must be provided by the checker - there is no default message. Having a default would not make sense because PySenpai cannot know what your false reference function wants to say. False references are passed to test_function as a list of functions, so you can have as many as you want.
Let's assume our students have a hard time remembering to multiply the result of m * v ** 2 by 0.5 when calculating kinetic energy. In this case the false reference would simply be:
def eref_2x_energy(v, m):
    return m * v * v
After creating the function we also need the corresponding message in our custom messages dictionary.
msgs = core.TranslationDict()
msgs.set_msg("eref_2x_energy", "fi", dict(
    content="Funktion palauttama tulos oli 2x liian suuri.",
    hints=["Tarkista kineettisen energian laskukaava."]
)
msgs.set_msg("eref_2x_energy", "en", dict(
    content="The function's return value was 2 times too big.",
    hints=["Check the formula for kinetic energy."]
)
Finally modify test_function call to include our new diagnosis function.
core.test_function(st_module, st_function, gen_vector, ref_func, lang
    custom_msgs=msgs,
    error_refs=[eref_2x_energy]            
)
Full example, based on the minimal checker we created earlier.
kinetic_test_eref.py
import random
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

msgs = core.TranslationDict()
msgs.set_msg("eref_2x_energy", "fi", dict(
    content="Funktion palauttama tulos oli 2x liian suuri.",
    hints=["Tarkista kineettisen energian laskukaava."]
)
msgs.set_msg("eref_2x_energy", "en", dict(
    content="The function's return value was 2 times too big.",
    hints=["Check the formula for kinetic energy."]
)

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    
    return v

def ref_func(v, m):
    return 0.5 * m * v * v

def eref_2x_energy(v, m):
    return m * v * v

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)
    

Custom Tests

Custom tests are additional validator functions, but instead of validating the result, they check the result for known mistakes. For this end they are given more arguments by PySenpai than normal validators. A custom test can make use of arguments, inputs and raw output of the student function in addition to what's available to normal validators (i.e. result, values parsed from output and reference). A custom test can make multiple asserts the same way a validator does. Likewise, each assert can be connected to a different message. If a corresponding message is not found, PySenpai uses the function's name to fetch a message (if this fails it raises a KeyError).
The overlap of custom tests and validators is largely due to historical reasons. In the past validators did not use assert statements - they simply returned True of False. This meant that custom tests were needed whenever a more accurate statement about the problem was called for. With assert statements, validators can do most of the work that was previously done by custom tests. However there are still some valid reasons to use custom tests.
Since custom tests are only run if the initial validation fails, they can include tests that would occasionally trigger with a correctly behaving function. Another advantage is that they give information on top of the validation rejection message (remembering that only the first failed assert is reported). Also if you are otherwise content with the default validator or one of the built-ins, custom tests can provide the additional checking that is not necessary for validating but can be useful for the student to know.
The example shown here is pretty simple. It's from a test that uses a standard validator to check a prompt function that is expected to return an integer that is 2 or bigger. The custom test is added to draw more attention to situations where the student function returns a number that's smaller than 2 as this is something they may have missed in the exercise description.
custom_msgs.set_msg("fail_less_than_two", "fi", dict(
    content="Funktio palautti luvun joka on pienempi kuin kaksi.",
    hints=["Varmista, että kyselyfunktio tarkistaa myös onko luku suurempi kuin 1."]
))
custom_msgs.set_msg("fail_less_than_two", "en", dict(
    content="Your function returned a number that's smaller than two.",
    hints=["Make sure your input function also checks that the given number is greater than 1."]
))

def less_than_two(res, parsed, out, ref, args, inps):
    if isinstance(res, int):
        assert res > 1, "fail_less_than_two"

Information Functions

Information functions are in many ways similar to custom tests, but their results are reported differently. Where custom tests report the message for the failed assertion (if any), information functions report a message that contains value(s) returned by the function. The use case for information functions is when you want to show something specific to the student instead of giving a verbal statement about the issue. Information functions receive the same arguments as custom tests. However, unlike custom tests, information functions must return a value.
Information functions need to be accompanied by a message that uses the function's name as its dictionary key. This message is given the return value of the information function as a named key argument func_res. Information functions are not expected to find something every time. For the times they do not find anything worth reporting, they should raise NoAdditionalInfo. This will signal to test_function to not print the associated message at all.
An example use of this is from a checker that tests a function that finds out whether a given number is a prime. As the function return value is simply True or False, the report has no information which divisor the student function may have missed when it gives a false positive. To add this information, we can make an information function that returns the smallest divisor for a non-prime number. If the number is a prime, there is no divisor to show - therefore NoAdditionalInfo is raised.
custom_msgs.set_msg("show_divisor", "fi", "Luku on jaollinen (ainakin) luvulla {func_res}")
custom_msgs.set_msg("show_divisor", "en", "The number's (first) factor is {func_res}")

def show_divisor(res, parsed, output, ref, args, inputs):
    for i in range(2, int(args[0] ** 0.5) + 1):
        if args[0] % i == 0:
            return i
    raise core.NoAdditionalInfo

Output Customization

An important aspect of checkers is that their output should accurately represent what is going on in the test. Misleading messages are worse than no messages at all. PySenpai's default set of messages is adequate for most normal situations, but when checkers do something slightly different, altering the messages appropriately is called for. As discussed earlier, checkers can also add their own messages for validation and diagnosis features. These should be more helpful than "Your function returned incorrect value(s)".
Another aspect of output is the representation of code and data. Again PySenpai provides reasonable defaults that represent things like simple function calls, simple values and even simple structures quite well. However, when checkers involve more complex code or structures, these default represntations can become unwieldy. In these situations, checkers should provide presenter overrides - functions that format values more pleasantly for the evaluation report.
When Lovelace processes the evaluation log from PySenpai, it uses the Lovelace markup parser to render everything. This means messages can - and usually should - include markup to make the feedback easier to read. PySenpai's default messages do use Lovelace markup. Any message can also be accompanied with a list of hints and/or triggers by using a dictionary instead of string as the message value. So the message value can be either a string, or a dictionary with the following keys:
If you want to add hints or triggers to a default message, simply omit the content key from your dictionary. For new messages content must always be provided.

Overriding Default Messages

Overriding in PySenpai works by calling the update method of the default message dictionary with the custom_msgs parameter as its argument. The update method of TranslationDict is slightly modified from a normal dictionary update. On the operational level, all messages in PySenpai are stored as dictionaries. However, when overriding or adding messages, the message can be provided as a string or a dictionary. When overriding an existing message with a string, the string replaces the value of the "content" key in the dictionary.
To create an override, simply create a message in your custom message dictionary with the same key as the one you want to replace. You can find the specifics of each default message from the -- WARNING: BROKEN LINK --PySenpai Message Reference including what keyword arguments are available for format strings. If you use a dictionary without the "content" key as your override, the default message will be unchanged but any hints or triggers will be included in the evaluation log if the message is triggered.

Adding Messages

Messages need to be added when a checker wants to report something that PySenpai doesn't report by default. Custom validators that have more than one assert typically have separate message keys for each assert. Messages corresponding to these keys must be found in the message dictionary. Most of these messages must be plain statements - they are not offered any formatting arguments. The sole exceptions are messages tied to information functions, which get the function's return value as a formatting argument. Adding them is very straightforward since the checker developer is in charge of choosing the message keys.
In addition to completely custom messages, you can also add messages for exceptions that might occur when importing the student code or calling the student function. When importing the student module, PySenpai has default messages for the following exceptions:
When calling the student function, the following exceptions have default messages:
To add a message for an exception, simply use the exception's name (with CamelCase) as the message key. When messages are printed into the evaluation log, they are given two format arguments: ename and emsg. Both are obtained from Python's stack trace. For legacy reasons the argument and input lists are also given (as args and inputs) when formatting but should not be used - both are also printed separately when an exception occurs using the PrintTestVector and PrintInputVector messages.

Overriding Presenters

Presenters are functions that prepare various test-related data to be shown in the evaluation log. Function tests support 6 presenters, although they only use 5 of those by default. The presenters are for:
Out of these arguments is not used by default because the arguments are shown in the function call representation. The latter is more useful because it shows exactly how the student's function was called, and they can copy the line into their own code for trying it out themselves. The default call presenter splits long function calls into multiple lines, and always encapsulates the call into a syntax-highlighted code block markup. The default presenter for data values mainly uses repr and cleans braces from lists, tuples and dictionaries.
Presenter overrides are given to test_function as a dictionary. You only need to provide key/value pairs for presenters you want to override. Below is the dictionary that contains the defaults. Just copy it, put in your replacements and cut the other lines. When calling test_function, this dictionary should be given as the presenter keyword argument.
default_presenters = {
    "arg": default_value_presenter,
    "call": default_call_presenter,
    "input": default_input_presenter, 
    "ref": default_value_presenter,
    "res": default_value_presenter,
    "parsed": default_value_presenter
Presenters themselves are relatively straightforward functions: they receive a single argument, and should return a single representation. Usually this should be a fully formatted string. However if you also override the message, you can return a structure and do rest of the formatting in the message format string instead. Presenters are expected to handle their own exceptions. This is especially important for presenters that format student results. A good practice is to use a blanket exception and default to return repr(value).
Most common case for custom presenters are values where repr isn't sufficient to give a nice representation (e.g. two-dimensional lists) or a sensible representation at all (e.g. objects, files). Likewise the default call presenter has trouble when given long lists as arguments. The optimal representation of data structures is often dependent on the exact task at hand. For exercises where files are written, the following example can be used to show the file contents (in a separate box using a monospace font) when given a filename.
def file_presenter(value):
    try:
        with open(value) as source:
            return "{{{\n" + source.read() + "\n}}}"
    except:
        return core.default_value_presenter(value)
There are a lot of cases where a modified representation of data is appropriate. For example in Elementary Programming we have a few exercises where a 2D map of minesweeper is involved. If presented as a Python data structure, it's an unreadable mess. However, using presenters it can be turned into a nice ASCII visualization of the same structure. For example, this presenter shows the arguments to a floodfill function in a more readable way:
fill_msgs = core.TranslationDict()
fill_msgs.set_msg("PrintTestVector", "fi", "Tutkimuksen aloituspiste:\n{args[0]}\nPlaneetta:\n{args[1]}\n")
fill_msgs.set_msg("PrintTestVector", "en", "Exploration starting point:\n{args[0]}\nPlanet:\n[args[1]}\n")

def planet_to_string_numbers(planet):
    view = ""
    cl = "  "
    for ci in range(len(planet[0])):
        cl += str(ci).rjust(2)
    view += cl + "\n"
    for ri, r in enumerate(planet):        
        view += str(ri).rjust(2) + " " + " ".join(r) + "\n"
    return view    
    
def arg_presenter(val):
    return "{1} {2}".format(*val), "{{{\n" + planet_to_string_numbers(val[0]) + "\n}}}"
In this example you can also see how to split the presenter output into two different placeholders inside the message format string. Just return two values from the presenter and use indexing in placeholders. This produces a nice output that is much more readable than what the default function call presenter would have rendered. This is how the data structure visualization looks like in the output:
  0 1 2 3 4
0     x
1     x
2 x x x x x
3     x
4     x

Program Tests

While function tests are by far the most important way of evaluating programs, sometimes testing full programs is desired. Program tests are mostly similar to function tests. This section only covers the differences between the two, and features that are unique to program tests.
Program tests have been implemented by using the reload function from importlib. This means that the student code is not being run as a standalone program, but imported. If the student program's main code is within the if __name__ == "__main__": conditional statement, the program test cannot run it. If your exercise needs to run program tests, students should be instructed to not use this conditional statement.
By far the biggest difference between function and program tests is the lack of arguments and return values. For program tests, the test vector is an input vector. Likewise the only way to get results is to parse them from the student code's output. Output parsers and message validators are therefore more commonly used in program tests. Their implementation doesn't differ from function tests.

Reference "Programs"

Because functions are generally easier to handle than modules, program test references are implemented as functions. The function should simply mimic the desired behavior of the student program and return values that should be found from the student program's output. When calling the reference, the input vector is unpacked just like the argument vector in function tests. This is useful for tests where the number of inputs is fixed as you can just pick up each input to a separate variable. However, there are just as many exercises where the number of inputs changes. In that case you should simply pack the arguments in the function definition:
def ref_func(*inputs):
Which simply puts all received arguments into the inputs variable. Beyond this difference program references are implemented in exactly the same way as function references.

Validating Programs

Validators for programs also work very similarly with function validators. The biggest difference is that while function validators have two different result objects - student result and parsed values - to examine, program validators only have one: parsed values. To make the signature of validators consistent, program tests call validators by passing None as the result argument. With this any function validator that only examines the values parsed from output works as a program validator without any changes. Indeed, the default validator for program tests is parsed_result_validator.
Message validators also have the same signature as their function test counterparts. This time the arguments argument is always None. Beyond this, just like validators, message validators work just like they do in function tests. The same goes for custom tests and information functions - they receive None for both arguments and the result.

Example Program Test

For this example we're turning the kinetic energy checker into a program test. The student main program is expected to prompt the values for mass and velocity, and then print the initial values and the result using 2 decimal precision within a result string (e.g. "The kinetic energy of an object with mass 1.00 kg moving at 1.00 m/s is 0.50 J").
In this case the test vector and the reference function are exactly the same. The only difference is that the test vector is now used for inputs. As we learned in tests that involve inputs, input vectors can have non-string values as they will be converted before writing to the student program's stdin. As the number of inputs is fixed, the reference works similarly. However since the task requires the student code to print the initial values as well as the result with a certain precision, all of them should be returned from the reference function:
def ref_func(v, m):
    return round(v, 2), round(m, 2), round(0.5 * m * v * v, 2)
Note that we're actually rounding here which doesn't produce 2 decimal precision. However, these are good values for validation because we only want to validate that the results are correct - whether the student program uses the correct precision should be determined by a message validator.
A parser is needed to actually obtain the results. As stated earlier, when parsing values we don't want to be strict with formatting. So although the task descriptions requires 2 decimal precision, the parser should just try to find floats with any precision. This can be done with a regular expression and its findall method:
float_pat = re.compile("(-?[0-9]+\.[0-9]+)")

def parse_floats(output):
    found = float_pat.findall(output)
    return [float(x) for x in found]
With this setup the default validator for parsed results will pass the student program as long as it calculates correctly. To enforce the 2 decimal precision requirement, a message validator is called for. We can do this by defining another regular expression, this time parsing floats with exactly two decimals, and using that in the validator.
precision_pat = re.compile("(-?[0-9]+\.[0-9]{2})[^0-9]")

def validate_precision(output, args, inputs):
    found = precision_pat.findall(output)
    assert len(found) == 3, "fail_precision"
Just to make sure we are testing with at least one set of inputs that results in trailing zeros, we'll actually modify the test vector slightly by generating one test case using integers instead of floats.
Finally to avoid confusion when printing the reference (remember, we used rounding which doesn't preserve trailing zeros), we use a presenter that shows the numbers with 2 decimal precision in the output:
def decimal_presenter(value):
    return " ".join("{:.2f}".format(v) for v in value)

presenters = {
    "ref": decimal_presenter
}
All that's left is to call test_program with all of this.
core.test_program(st_module, gen_vector, ref_func, lang,
    custom_msgs=msgs,
    presenter=presenters,
    output_parser=parse_floats,
    message_validator=validate_precision
)
Full example:
kinetic_program_test.py
import random
import re
import test_core as core

float_pat = re.compile("(-?[0-9]+\.[0-9]+)")
precision_pat = re.compile("(-?[0-9]+\.[0-9]{2})[^0-9]")

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    v.append((random.randint(1, 100), random.randint(1, 100)))
    for i in range(9):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    
    random.shuffle(v)
    return v

def ref_func(v, m):
    return round(v, 2), round(m, 2), round(0.5 * m * v * v, 2)

def parse_floats(output):
    found = float_pat.findall(output)
    return [float(x) for x in found]

def validate_precision(output, args, inputs):
    found = precision_pat.findall(output)
    assert len(found) == 3, "fail_precision"

def decimal_presenter(value):
    return " ".join("{:.2f}".format(v) for v in value)

presenters = {
    "ref": decimal_presenter
}


if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang, inputs=[0, 0])
    if st_module:
        core.test_program(st_module, gen_vector, ref_func, lang,
            custom_msgs=msgs,
            presenter=presenters,
            output_parser=parse_floats,
            message_validator=validate_precision
        )

Static Tests

Static testing means evaluating the student code without executing it, i.e. looking at the source code. Generally speaking, static testing should not be used as the primary means of validation. It's mostly meant for two purposes: enforcing restrictions given in the assignment and pointing out questionable solutions. An example of the former could be a task where you want students to use loops but they could theoretically use N if/elif statements to do the job. In this case a static test can examine the student code, count how many if/elif statements are there and reject the program if there's too many.
In general static testing should be used with care, especially when used as an evaluation criteria instead of just providing hints. This is particularly true now since pylint testing has been included in PySenpai as it should catch poor coding habits much more efficiently.
Static tests can examine individual functions or the entire module. If given a dictionary of function names similarly to test_function, static_test will only examine that function. If given None as the same argument, it will instead look at the whole module.

Static Validators

Like other validators, static validators are also functions that make assert statements. The arguments given to static validators are different however. PySenpai uses the inpsect module's getsource, getdoc and getcomments functions which gives the source code, docstring and preceding comments separately. These three in this order are the arguments given to static validators. Each is given as a string - to examine the source code line by line you need to split it. Another difference is that if there are no assert-specific messages, the validator name is used to find the message. Validators to static tests are passed as lists.
Whether examining source code line by line or as one string, in most cases static validators should be implemented using the same considerations as
textfield exercise answers
. In other words, regular expressions or other means of increasing fuzziness in the validation are highly recommended. Also keep in mind that when looking at functions, all lines will be indented and therefore stripping whitespace might be called for. Ignoring inline comments is also something that may need to be done within the validator.

Example Static Tests

These are short, so here's a couple. The first one counts if/elif statements:
custom_msgs.set_msg("if_counter", "fi", dict(
    content="Koodissa on liikaa ehtolauseita.",
    hints=["Toteuta muunnos käyttämällä apuna tehtävänannossa annettuja listoja."]
))
custom_msgs.set_msg("if_counter", "en", dict(
    content="Your code contains too many if/elif statements.",
    hints=["You must implement the conversion using the given lists."]
))

def if_counter(source, doc, comments):
    assert source.count("if ") + source.count("elif ") <= 10
In this example we have set the message using the validator name. Running this test with static_test:
core.static_test(st_module, st_function, lang, validators=[if_counter], custom_msgs=custom_msgs)
Another example with two validators: one for finding string literals within a function that should only use strings from its parameters and another for finding a dumb way of doing float checking for a string.
stupid_pat = re.compile("\s+int\([A-Za-z0-9_]+\)\s*")

custom_msgs = core.TranslationDict()
custom_msgs.set_msg("string_literal_check", "fi", dict(
    content="Funktion koodissa ei tule olla yhtään merkkijonoliteraalia!",
    hints=["Poista [!term=Literaaliarvo!]merkkijonoliteraalit[!term!] koodista ja käytä niiden tilalla funktion parametreja."],
))
custom_msgs.set_msg("string_literal_check", "en", dict(
    content="The function should not contain any string literals",
    hints=["Remove string literals from your code and replace them with the function parameters."],
))    
custom_msgs.set_msg("useless_line_check", "fi", dict(
    content="Koodista löytyi rivi, jossa muutetaan merkkijono kokonaisluvuksi, mutta ei talleteta sitä muuttujaan tai palauteta.",
    hints=["Muuta try:n sisällä olevaa riviä siten, että tallennat kokonaisluvun muuttujaan."]
))
custom_msgs.set_msg("useless_line_check", "en", dict(
    content="There's a line in the code that converts a string to an integer but doesn't actually do anything with the integer value.",
    hints=["Change the code line inside the try statement so that the result is stored in a variable."]
))

def string_literal_check(source, doc, comments):
    assert source.count("\"") + source.count("'") == 0
    
def useless_line_check(source, doc, comments):
    assert not stupid_pat.search(source)
Again the validators themselves are pretty simple and most of the code is just messages. This test is run for information purposes only, it doesn't reject the student submission:
core.static_test(st_module, st_function, lang, [string_literal_check, useless_line_check],
    info_only=True,
    custom_msgs=custom_msgs
)

Pylint tests

As the name suggests, Pylint tests use pylint to evaluate the student submission against the Python style guide. Having a linter as part of the evaluation is useful because students are not very likely to run linters themselves even though they would probably benefit from it. Like static tests, Pylint tests can be used either as just helpful information, or they can used to enforce some level of code quality.

Configuration

Since majority of the testing is done by an external tool, configuration is mostly provided by the pylintrc configuration file. The Lovelace server has its own version of this file, which uses the default configuration with a few extra disables. There are two ways to alter the configuration. You can pass modifications to pylint_test as a keyword argument (extra_options). The argument takes a list of strings that are valid Pylint command line options. Another way to alter the configuration is to include your own pylintrc file when uploading your exercise files to Lovelace.
The Lovelace pylintrc file can be downloaded below
pylintrc

Validators

If you want to use Pylint to enforce some level of coding standard, you can implement a validator that looks at the linter stats. The stats object is a dictionary with a lot of keys. For validation the most important keys are
For instance you can have a validator that rejects code that has a score below a certain threshold, or has any notifications of given level or worse. The default validator simply looks if the score is above 5.0.

Example Pylint Test

There isn't a whole lot to show here, but let's just write a custom validator that rejects low scores, and programs with warnings or errors. Here's the validator and associated messages:
msgs = core.TranslationDict()
msgs.set_msg("fail_low_score", "en", "Your code received too low quality score ({global_note:.1f} / 10.0).\nTarget: 5+ / 10.0.")
msgs.set_msg("fail_low_score", "fi", "Koodisi sai liian pienet laatupisteet ({global_note:.1f} / 10.0).\nTavoite: 5+ / 10.0.")
msgs.set_msg("fail_errors", "en", "Your code evaluation included errors.")
msgs.set_msg("fail_errors", "fi", "Koodisi arviointi sisälsi virheitä.")
msgs.set_msg("fail_warnings", "en", "Your code evaluation included warnings.")
msgs.set_msg("fail_warnings", "fi", "Koodisi arviointi sisälsi varoituksia.")

def quality_validator(stats):
    assert stats["errors"] == 0, "fail_errors"
    assert stats["warnings"] == 0, "fail_warnings"
    assert stats["global_note"] >= 5, "fail_low_score"
Let's also assume there's one particular warning that we don't want to reject the student submission for. We can add this warning to the ignore list by passing it to pylint_test in the extra_options keyword argument. The call becomes:
core.pylint_test(st_module, lang,
    custom_msgs=msgs,
    validator=quality_validator, 
    extra_options=["--disable=redefined-outer-name"]
)

Advanced Tips

This section covers miscellaneuous tricks that we have used to implement fringe case checkers in the past. Some are small hacks while others required a large amount of work. Most of these examples may not be directly useful, but they should give you some ideas how to proceed when there is no clear path as to how to implement a checker.

Faking Libraries

In modern programming there's often cases where the student code creates graphics in the form of a graphical user interface, or animation, or game interface etc. These are not easy to check because the graphical libraries don't really work server-side where the checking happens. We had this issue when we wanted to have small introductory exercises using Python's turtle library. The visual feedback of graphics appearing on the screen is more powerful feedback for beginners, but turtle doesn't provide anything that could be evaluated.
In this case since turtle is a relatively small library, we just implemented a fake version of it with the exact same function definitions. Insted of drawing anything, it creates a log of actions that can be evaluated by the checker. The fake turtle writes the log into json when its done function is called. Abusing the fact that all imported modules refer to the same module in memory, we can call the module's done function after the student function is executed. This is done with a result object extractor. In this case it finds the corners of a square.
def corners_extractor(args, res, st_out):
    turtle.done()
    try:
        with open("picturelog.json") as logfile:
            corners = []
            log = json.load(logfile)
            lines = [op for op in log if op[0] == "line"]
            for line in lines:
                x0, y0, x1, y1 = line[1:5]
                corners.append((round(x0), round(y0)))
                corners.append((round(x1), round(y1)))
            return set(corners)
    except:
        return None
However, when it comes to more complex libraries, faking the entire thing becomes too unwieldy. In Elementary Programming we "solved" this issue by creating a wrapper that served two purposes: make the library easier for students to use (by exposing only the subset that students actually needed) and also make us able to write a fake version of the wrapper that mimics the programmatic behavior of the UI elements but doesn't actually create an UI. In this case the exercise assignment should point out the limits of the checker so that students don't try to go around the wrapper and import modules that haven't been installed on the server.

Testing Command Line Arguments

Normally PySenpai cannot test functions or programs that parse command line arguments because it's not running the student program from the command line, and sys.argv contains the arguments used in running the checker. However, once you have imported sys into the checker, you can overwrite sys.argv with fake command line arguments (just remember to put the program name as the first item) and the student program will not know the difference. If you do this in the new_test callback of a test, you can change the arguments at the start of each run.

Replacing Functions Within Modules

Since modules are loaded into memory the first time they are imported, it is also possible to replace existing functions with ones that are more suitable for the checker. This has been used for two different purposes so far. One scenario was an exercise where students needed to write a program with two functions. One of the functions read arguments from a text file and passed them to the other function (which drew things on the screen based on those arguments). In order to test the reader function, the checker replaced the drawing function with a mock version that only logged the arguments it received into a global list. This list was turned into the function result by a result object extractor.
call_list = []

def call_logger(*args):
    call_list.append(args)
    
def call_extractor(args, result, output):
    return call_list.pop()
After which the call arguments given by the reader function have effectively become its result. Messages need some adjusting to avoid confusion. Putting the mock function into the student module can be done with setattr after the student module has been loaded:
setattr(st_module, st_draw_function[lang], call_logger)
Another thing that has been done with module tampering involved the os module. We didn't want students to use os.chdir when reading files because it's a (really) bad practice. Instead of making a static test to see if it is used, we did this dynamically by replacing the chdir function in os module with this:
class ChdirCalled(Exception):
    pass
    
def fake_chdir(*args):
    raise ChdirCalled
PySenpai catches the exception when trying to execute the student function, and uses the exception name to look up a message, so we can notify the student about this by setting a custom message with "ChdirCalled" as the handle.
?
The checking daemon is a separate multi-threaded program that is invoked whenever Lovelace needs to execute code on the command line. The most common use case is to evaluate student programs by running checking programs. When a task is sent to the checker daemon, copies of all required files are put into a temporary directory where the test will then run. The daemon also does necessary security operations to prevent malicious code from doing any actual harm.
Content graphs are objects that connect content pages to a course instance's table of contents. Content graphs have several context attributes which define how the content is linked to this particular course instance. A content graph's ordinal number and parent node affect how it is displayed in the table of contents. You can also set a deadline which will be applied to all exercises contained within the linked content page. Content graphs also define which revision of the content to show - this is used when courses are archived.
In Lovelace, content page refers to learning objects that have text content written using a markup language. All types of content pages are treated similarly inside the system and they are interchangeable. Content pages include lecture pages, and all exercise types.
  1. Description
  2. Relations
In Lovelace, course refers to an abstract root course, not any specific instance of instruction. Courses are used for tying together actual instance of instruction (called course instances in Lovelace). In that sense they are like courses in the study guide, while course instances are like courses in WebOodi. The most important attrbutes of a course are its responsible teacher and its staff group - these define which users have access to edit content that is linked to the course.
  1. Description
  2. Relations
  3. Cloning and Archiving
In Lovelace, a course instance refers to an actual instace of instruction of a course. It's comparable to a course in WebOodi. Students can enroll to a course instance. Almost everything is managed by instance - student enrollments, learning objects, student answers, feedback etc. This way teachers can easily treat each instance of instruction separately. Course instances can also be archived through a process called freezing.
Course prefixes are recommended because content page and media names in Lovelace are unique across all courses. You should decide a prefix for each course and use that for all learning objects that are not included in the course table of contents. The prefix will also make it easier to manage learning objects of multiple courses - especially for your friendly superuser who sees everyhing in the admin interface...
  1. Description
  2. Examples
Embedded content refers to learning objects that have been embedded to other learning objects through links written in the content of the parent object. Embedded content can be other content pages or media. When saving a content page, all embedded objects that are linked must exist. A link to embedded content is a reference that ties together course instance, embedded content and the parent content.
Enrollment is the method which connects students to course instances. All students taking a course should enroll to it. Enrollment is used for course scoring and (once implemented) access to course content. Enrollments are either automatically accepted, or need to be accepted through the enrollment management interface.
Lovelace has a built-in feedback system. You can attach any number of feedback questions to any content page, allowing you to get either targeted feedback about single exercises, or more general feedback about entire lecture pages. Unlike almost everything else, feedback questions are currently not owned by any particular course. However, feedback answers are always tied to the page the feedback is for, and also to the course instance where the feedback was given.
  1. Description
  2. Archiving
  3. Embedding
In Lovelace file normally refers to a media file, managed under Files in the admin site. A file has a handle, actual file contents (in both languages) and a download name. The file handle is how the file is referened throughout the system. If a media file is modified by uploading a new version of the file, all references will by default fetch the latest version. The download name is the name that is displayed as the file header when it's embedded, and also as the default name in the download dialog. Files are linked to content through reference objects - one reference per course instance.
Media files are currently stored in the public media folder along with images - they can be addressed directly via URL.
  1. Description
  2. Legacy Checkers
File upload exercises are at the heart of Lovelace. They are exercises where students return one or more code files that are then evaluated by a checking program. File upload exercises can be evaluated with anything that can be run from the Linux command line, but usually a bit more sophisticated tools should be used (e.g. PySenpai). File upload exercises have a JSON format for evaluations returned by checking programs. This evaluation can include messages, hints and highlight triggers - these will ideally help the student figure out problems with their code.
Front page of a course instance is shown at the instance's index page, below the course table of contents. Front page is linked to a course instance just like any other page, but it uses the special ordinar number of 0 which excludes it from the table of contents. Any page can act as the course front page.
Hints are messages that are displayed to students in various cases of answering incorrectly. Hints can be given upon making incorrect choices in choice-type exercises, and they can also be given after a certain number of attempts. In textfield exercises you can define any number of catches for incorrect answers, and attach hints to each. Hints are shown in a hint box in the exercise layout - this box will become visible if there is at least one hint to show.
  1. Description
  2. Archiving
  3. Embedding
Images in Lovelace are managed as media objects similar to files. They have a handle that is used for referencing, and the file itself separately. Images should be always included by using reference. This way if the image is updated, all references to it always show the latest version.
Images stored on disc are accessible directly through URL.
Lecture pages are content pages that do not have any exercise capabilities attached to them. A course instance's table of contents usually consists entirely of lecture pages. Other types of content pages (i.e. exercises) are usually embedded within lecture pages.
Legacy checker is a name for checkers that were used in previous versions of Lovelace and its predecessor Raippa. They test the student submission against a reference, comparing their outputs. If the outputs match (exactly), the submission passes. Otherwise differences in output are highlighted. It is possible to use wrapper programs to alter the outputs, or output different things (e.g. testing return values of individual functions). Legacy checkers should generally be avoided because they are very limiting and often frustrating for students. Legacy checking is still occasionally useful for comparing compiler outputs etc.
Lovelace uses its own wiki style markup for writing content. Beyond basic formatting features, the markup is also used to embed content pages and media, mark highlightable sections in text and create hover-activated term definition popups.
In Lovelace, media refers to embeddable files etc. These come in there categories: images, files and video links. Like content pages, media objects are managed by reference using handles. Unlike other types of files, media files are publicly accessible to anyone who can guess the URL.
PySenpai is a library/framework for creating file upload exercise checking programs. It uses a callback-based architecture to create a consistent and highly customizable testing process. On the one hand it provides reasonable defaults for basic checking programs making them relatively straightforward to implement. On the other hand it also supports much more complex checking programs. Currently PySenpai supports Python, C, Y86 Assembly and Matlab.
Regular expression's are a necessary evil in creating textfield and repeated template exercises. Lovelace uses Python regular expressions in single line mode.
A generator acts as a backend for repeated template exercises, and provides the random values and their corresponding answers to the frontend. Generators can be written in any programming language that can be executed on the Lovelace server. Generators need to return a JSON document by printing it to stdout.
Responsible teacher is the primary teacher in charge of a course. Certain actions are available only to responsible teachers. These actions include managing enrollments and course instances.
Lovelace uses Django Reversion to keep track of version history for all learning objects. This can be sometimes useful if you need to restore a previous version after mucking something up. However the primary purpose is to have access to historical copies of learning objects for archiving purposes. When a course instance is archived, it uses the revision attribute of all its references to set which historical version should be fetched when the learning object is shown. Student answers also include the revision number of the exercise that was active at the time of saving the answer.
Slug is the lingo word for names used in urls. Slugs are automatically generated for courses, course instances and content pages. Slugs are all-lowercase with all non-alphanumeric characters replaced with dashes. Similar naming scheme is recommended for other types of learning objects as well although they do not use generated slugs.
Staff members are basically your TAs. Staff members can see pages hidden from normal users and they can edit and create content (within the confines of the courses they have been assigned to). They can also view answer statistics and evaluate student answers in manually evaluated exercises. Staff members are assigned to courses via staff group.
Lovelace has answer statistics for all exercises. Statistics are collected per instance, and allow you to review how many times an exercise has been answered, what's the success rate etc. All of this can be helpful in identifying where students either have difficulties, or the exercise itself is badly designed. For some types of exercises, there's also more detailed information about answers that have been given. Statistics can be accessed from the left hand toolbox for each exercise.
Teacher toolbox is located on the left hand side of each exercise. It has options to view statistcs, view feedback about the exercise and edit the exercise. For file upload exercises there is also an option to download all answers as a zip file. Do note that this takes some time.
  1. Description
  2. Examples
Terms are keywords that are linked to descriptions within your course. They will be collected into the course term bank, and the keyword can also be used to make term hint popups on any content page. Terms can include multiple tabs and links to pages that are relevant to the term. For instance, this term has a tab for examples, and a link to the page about terms.
Textfield exercises are exercises where the student gives their answer by writing into a text box. This answer is evaluated against predefined answers that can be either correct (accepting the exercise) or incorrect (giving a related hint). Almost always these answers are defined as regular expressions - exact matching is simply far too strict.
  1. Description
  2. Markup
  3. Triggering
Triggerable highlights can be used in content pages to mark passages that can be highlighted by triggers from file upload exercise evaluation responses. When a highlight is triggered the passage will be highlighted. This feature is useful for drawing student attention to things they may have missed. Exercises can trigger highlights in their own description, or in their parent page. It is usually a good idea to use exercise specific prefixes for highlight trigger names.