Implementing Python Checkers with PySenpai¶

As Lovelace is installed on a Linux server, it is recommended that you develop and test your checkers in a Linux environment. For Python checkers developing on Windows should be safe in almost all scenarios too though. You do not need a running instance of Lovelace for testing - checkers are stand-alone programs.

Checker Basics¶

In this section we'll go through the basics of implementing a Python checker, and some of the more common and simpler customizations. The file shown below is the skeleton of a minimal checker that tests a function from the student submission.

pychecker_skel.py

import random
import test_core as core

st_function = {
    "fi": "",
    "en": ""
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    return v

def ref_func():
    return None

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)

Starting from the top, the st_function dictionary contains the name of the function to be tested, in each of the available language. The keys are standard language codes. Even if only one language is supported, the function name still must be given as a dictionary.

The skeleton also creates another dictionary for messages. This is a subclass dictionary that has methods for accessing the same key with multiple languages. The methods are:

set_msg(key, lang, message_object)
get_msg(key, lang, default)

This is the type of dictionary that is expected as the argument for custom_msgs in the various testing functions. Very basic checkers may not need to set any messages, in which case this can be ignored.

Next up are two mandatory functions. The first generates the test vector which in turn determines the number of test cases. The second is the reference function. For simple checkers this is a copy of the function from your reference solution. Details and tips for implementing both are given in later sections. There are a lot of other functions that can be defined here, but these are the mandatory ones.

The section under if __name__ == "__main__": contains the code that actually executes the test preparations and tests themselves. The parsing of command line arguments is done by PySenpai's parse_command function which returns the names of files ($RETURNABLES in the command written in Lovelace file exercise admin interface) and the language argument. If only one file is returned, the first one is the one to check. If the exercise requires multiple files, then you will need some way of figuring out which is which (requiring specific names is the most common).

The load_module function turns the student submission from a file name to an imported Python module. It returns None if importing the module fails, in which case there is no point in trying to continue testing. The rest of the code is calling various test functions.

In the skeleton, two functions are called, both with minimal arguments. These are the most common functions for Python checkers to call. Note that test_function in particular has a lot of optional parameters for various purposes - we'll get to each of them eventually in this guide chapter. For most checkers using pylint_test is recommended because using it doesn't typically require any extra effort from you, and setting info_only=True only shows the student the linter messages (if set to False too low scores will result in rejection).

If we fill in the function name dictionary and the two functions, we have ourselves a basic checker. Below is an example that checks a function that calculates kinetic energy.

kinetic_test_basic.py

import random
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    
    return v

def ref_func(v, m):
    return 0.5 * m * v * v

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)

About Test Vectors¶

A test vector should contain one list of arguments/inputs for each test case. The number of test cases is directly derived from the length of the test vector. Note that the test vector must always contain lists (or tuples) even if the function being tested only takes a single argument - the function is always called by unpacking the argument list. This is the line that calls the student function:

res = st_func(*args)

So the entire test vector has to always be a list of sequences.

A good test vector has multiple cases, some of which are entirely randomly generated and some of which cover the edge cases that need to be tested specifically. Randomness makes it impossible for students to make code that tries to pass the test instead of actually implementing what was required, while covering edge cases makes sure that partially functioning solutions don’t get accepted accidentally if they get a favorable random test vector.

There are no strict guidelines to how many test cases should be in a checker. For simple checkers, we've been using 10 cases. If the exercise is complex enough that running some or all cases takes a noticeable amount of time, it's best to keep the checking time to a minimum. The default timeout for checker runs is 5 seconds and while this can be changed, it should only be done if necessary.

Changing the Validator¶

The default validator of PySenpai, called result_validator, is adequate for most checkers where no fuzziness is required in validation (it does strip whitespaces from both ends of strings). However, there are times when changing the validator is needed (or recommended). Our example here has floating point numbers as results. This runs the risk of rounding errors if the student implementation is different. For this reason, it would be safer to use rounding_float_result_validator as our validator. To do so, we'll change the test_function call to:

core.test_function(
    st_module, st_function, gen_vector, ref_func, lang,
    validator=core.rounding_float_result_validator
)

There are a couple of other validators available:

parsed_result_validator: validates values parsed from student function's output instead of its return value
parsed_list_validator: validates a list of values parsed from the output against a list returned by the reference function

If you need more elaborate validation, it's time to implement a custom validator. Details are provided in a later section.

Using Input¶

In this section we'll go through how to test functions that read inputs from the user. Extending the previous example, let's assume we want to add another function to the exercise. This function will prompt the user for a number until they give a proper positive float value. The prompt text is given as an argument to the function.

On the checker side this means we need to add another set of basics function test components. This time two vectors are needed: one for arguments to the function and another for inputs. The argument vector should have a few different prompt messages. Even though these are not used in the test in any way, they're still shown in the output so we should make a few sensible options. They should be put in a dictionary where keys are the available languages (shown in the example at the end). After we have the prompts, we can generate the argument vector:

def gen_prompt_vector():
    v = []
    for in range(10):
        v.append((random.choice(prompts[lang]), ))
    return v

The input vector contains the values that will be fed to the student's input function calls through a faked stdin. The stdin for each test run will be formed from the input vector by joining it with newlines. This line in the test_function code does it:

sys.stdin = io.StringIO("\n".join([str(x) for x in inps]))

Since str will be called for each value in the input vector, the vector can contain any types of values. This makes implementing the test somewhat easier. To test, we want to generate vectors where the last item is a proper input, and it is preceded by up to several garbage inputs that are either text or negative numbers. The following example does randomization rather thoroughly:

def gen_input_vector():
    v = []
    v.append((random.choice(string.ascii_letters), round(random.random() * 100, 3)))
    v.append((round(random.random() * -100, 3), round(random.random() * 100, 3)))
    for i in range(8):
        case = []
        for j in range(random.randint(0, 2)):
            if random.randint(0, 1):
                case.append(random.choice(string.ascii_letters))
            else:
                case.append(round(random.random() * -100, random.randint(1, 4)))
            
        case.append(round(random.random() * 100, random.randint(1, 4)))
        v.append(case)
    
    random.shuffle(v)
    return v

What about the reference? By default it would only be given the argument. However this is not very useful because the information we actually want is in the input vector. In order to get the input vector passed to the reference, we need to set the reference_needs_input keyword argument to True when calling test_function. This change needs to be reflected in the reference function - it now gets two arguments: the list of arguments and the list of inputs. The reference in this case is actually very simple:

def ref_prompt(args, inputs):
    return inputs[-1]

This is because we know from implementing the test vector that the last item is always the proper input while everything before it is not. Therefore, when the student function is working correctly it should return the same value. All that's left is to call test_function with two new keyword arguments - one for providing the inputs, and another for telling the function that our reference expects to see both vectors:

core.test_function(
    st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang,
    inputs=gen_input_vector,
    ref_needs_inputs=True
)

The updated example is below:

kinetic_test_input.py

import random
import string
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

st_prompt_function = {
    "fi": "pyyda_liukuluku",
    "en": "prompt_float"
}

prompts = {
    "fi": ["Anna nopeus: ", "Syötä massa: ", "Anna positiivinen luku: "],
    "en": ["Input speed: ", "Define mass: ", "Give a positive number: "]
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    return v

def gen_prompt_vector():
    v = []
    for i in range(10):
        v.append((random.choice(prompts[lang]), ))
    return v

def gen_input_vector():
    v = []
    v.append((random.choice(string.ascii_letters), round(random.random() * 100, 3)))
    v.append((round(random.random() * -100, 3), round(random.random() * 100, 3)))
    for i in range(8):
        case = []
        for j in range(random.randint(0, 2)):
            if random.randint(0, 1):
                case.append(random.choice(string.ascii_letters))
            else:
                case.append(round(random.random() * -100, random.randint(1, 4)))
            
        case.append(round(random.random() * 100, random.randint(1, 4)))
        v.append(case)
    
    random.shuffle(v)
    return v

def ref_prompt(args, inputs):
    return inputs[-1]

def ref_func(v, m):
    return 0.5 * m * v * v

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(
            st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang,
            inputs=gen_input_vector,
            ref_needs_inputs=True
        )
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)

Using Output¶

Another way to obtain results from a student function is to parse values from its output. Typically this involves implementing a parser function that obtains the relevant values, and changing the validator to a one that uses these values instead of the function's return values. To show how to do this, we're going to make a checker for a function that prints all even or odd numbers from a list.

The parser is a function that receives the raw output produced by the student function and returns values that will be validated against the reference. In this example, it should be a function that finds all integers from the output and returns them as a list. Most of the time this is done using the findall method of regular expression objects.

int_pat = re.compile("-?[0-9]+")

def parse_ints(output):
    return [int(v) for v in int_pat.findall(output)]

Note that it is always better to be more lenient in parsing than in validating. For example, if your exercise demands floating point numbers to be printed with exactly 2 decimal precision, your parser should still return all floats from the output. Incorrect precision should be caught by the validator instead. Also it's best to make sure your parser doesn't miss values due to unexpected formatting - at least within reasonable limits. It's confusing to the students if their code prints values but the checker claims it didn't find them. In cases where you absolutely want to complain about the output being unparseable, you can raise OutputParseError from the parser function. This will abort the current test case and mark it incorrect.

Implementing a reference for output validation is no different than implementing one for a function that returns the values instead. You can see the reference in the example at the end. A few adjustments are needed when validating output instead of return value. First, the validator needs to be changed to parsed_result_validator (or some other that validates outputs). Second, some messages need to be adjusted. By default the messages talk about return values. Here's a set of replacements for tests that validate outputs instead. In the future we may add an option to use these by setting a flag when calling test_function.

custom_msgs = core.TranslationDict()
custom_msgs.set_msg("CorrectResult", "fi", "Funktio tulosti oikeat arvot.")
custom_msgs.set_msg("CorrectResult", "en", "Your function printed the correct values.")
custom_msgs.set_msg("PrintStudentResult", "fi", "Funktion tulosteesta parsittiin arvot: {parsed}")
custom_msgs.set_msg("PrintStudentResult", "en", "Values parsed from function output: {parsed}")
custom_msgs.set_msg("PrintReference", "fi", "Olisi pitänyt saada: {ref}")
custom_msgs.set_msg("PrintReference", "en", "Should have been: {ref}")

With these in mind, we can call test_function. Because the return value of the function is always None, setting test_recurrence to False is called for. Doing so avoids complaining to the student that their function always returns the same value.

core.test_function(
    st_module, st_function, gen_vector, ref_func, lang, 
    custom_msgs=custom_msgs,
    output_parser=parse_ints,
    validator=parsed_result_validator,
    test_recurrence=False
)

Full example:

even_odd_test.py

import random
import re
import test_core as core

int_pat = re.compile("-?[0-9]+")

st_function = {
    "fi": "tulosta_parilliset_parittomat",
    "en": "print_even_odd"
}

custom_msgs = core.TranslationDict()
custom_msgs.set_msg("CorrectResult", "fi", "Funktio tulosti oikeat arvot.")
custom_msgs.set_msg("CorrectResult", "en", "Your function printed the correct values.")
custom_msgs.set_msg("PrintStudentResult", "fi", "Funktion tulosteesta parsittiin arvot: {parsed}")
custom_msgs.set_msg("PrintStudentResult", "en", "Values parsed from function output: {parsed}")
custom_msgs.set_msg("PrintReference", "fi", "Olisi pitänyt saada: {ref}")
custom_msgs.set_msg("PrintReference", "en", "Should have been: {ref}")

def gen_vector():
    v = []
    v.append(([2, 3, 5, 8, 10], True))
    v.append(([4, 3, 7, 9 ,11], False))
    
    for i in range(8):
        numbers = [random.randint(-100, 100) for j in range(random.randint(5, 10))]
        v.append((numbers, bool(random.randint(0, 1))))
        
    random.shuffle(v)
        
    return v

def ref_func(numbers, even):
    if even:
        return [n for n in numbers if n % 2 == 0]
    else:
        return [n for n in numbers if n % 2 != 0]

def parse_ints(output):
    return [int(v) for v in int_pat.findall(output)]

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(
            st_module, st_function, gen_vector, ref_func, lang,
            custom_msgs=custom_msgs,
            output_parser=parse_ints,
            validator=core.parsed_result_validator,
            test_recurrence=False
        )
        core.pylint_test(st_module, lang, info_only=True)

Validating Messages¶

The previous section covered how to validate values that are parsed from output. What about validating things like error messages? While you can technically do it with a normal validator, using a separate message validator is recommended. When the two validations are separated the student has a better idea of where the problems in their code are. Message validators operate with knowledge of arguments and inputs given to the student function, and its full output. Just like a normal validator, a message validator should also make assert statements about the output. Because message validators often deal with natural language, maximum leniency is recommended.

Getting back to our kinetic energy example, let's assume we've instructed the students to give two different error messages when prompting input. Either "This is not a number" if the input cannot be converted to float, or "Value must be positive" if it's negative. Usually these kinds of validators either use regular expressions or dissect the string in some other way. When using regular expressions, one way is to put them in a TranslationDict object.

msg_patterns = core.TranslationDict()
msgs_patterns.set_msg("NaN", "fi", re.compile("ei(?: ole)? numero"))
msgs_patterns.set_msg("NaN", "en", re.compile("not ?a? number"))
msgs_patterns.set_msg("negative", "fi", re.compile("positiivinen"))
msgs_patterns.set_msg("negative", "en", re.compile("positive"))

The validator itself should go through the inputs and see that there's a proper error message for each improper input. However before doing that it should also check that there's a sufficient number of prompts - otherwise this validator will cause an uncaught exception (IndexError for lines[i])

def error_msg_validator(output, args, inputs):
    lines = output.split("\n")
    
    assert len(lines) >= len(inputs), "fail_insufficient_prompts"
    for i, value in enumerate(inputs):
        if isinstance(value, str):
            assert msg_patterns.get_msg("NaN", lang).search(lines[i]), "fail_not_a_number"
        elif value < 0:
            assert msg_patterns.get_msg("negative", lang).search(lines[i]), "fail_negative"

This is also the first instance of using multiple validation failure messages within a single validator. Using "fail_" to prefix message names is a convention. These handles will correspond actual messages in the custom messages dictionary. Here you can also see how to add hints to a message - the value is now a dictionary instead of a string. More details about this can be found from a separate section.

msgs = core.TranslationDict()
msgs.set_msg("fail_insufficient_prompts", "fi", dict(
    content="Funktio ei kysynyt lukua tarpeeksi montaa kertaa.",
    hints=["Tarkista, että funktio hylkää virheelliset syötteet oikein."]
))
msgs.set_msg("fail_insufficient_prompts", "en", dict(
    content="The function didn't prompt input sufficient number of times.",
    hints=["Make sure the function rejects erroneous inputs properly."]
))
msgs.set_msg("fail_not_a_number", "fi", "Muuta kuin numeroita sisältävästä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_not_a_number", "en", "The error message for non-number input was wrong.")
msgs.set_msg("fail_negative", "fi", "Negatiivisestä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_negative", "en", "The error message for negative input was wrong.")

Again with all the pieces in places, we can call test_function:

core.test_function(
    st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang, 
    inputs=gen_input_vector,
    ref_needs_inputs=True,
    message_validator=error_msg_validator
)

Full example:

kinetic_test_messages.py

import random
import re
import string
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

st_prompt_function = {
    "fi": "pyyda_liukuluku",
    "en": "prompt_float"
}

prompts = {
    "fi": ["Anna nopeus: ", "Syötä massa: ", "Anna positiivinen luku: "],
    "en": ["Input speed: ", "Define mass: ", "Give a positive number: "]
}

msg_patterns = core.TranslationDict()
msg_patterns.set_msg("NaN", "fi", re.compile("ei(?: ole)? numero"))
msg_patterns.set_msg("NaN", "en", re.compile("not ?a? number"))
msg_patterns.set_msg("negative", "fi", re.compile("positiivinen"))
msg_patterns.set_msg("negative", "en", re.compile("positive"))

msgs = core.TranslationDict()
msgs.set_msg("fail_insufficient_prompts", "fi", dict(
    content="Funktio ei kysynyt lukua tarpeeksi montaa kertaa.",
    hints=["Tarkista, että funktio hylkää virheelliset syötteet oikein."]
))
msgs.set_msg("fail_insufficient_prompts", "en", dict(
    content="The function didn't prompt input sufficient number of times.",
    hints=["Make sure the function rejects erroneous inputs properly."]
))
msgs.set_msg("fail_not_a_number", "fi", "Muuta kuin numeroita sisältävästä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_not_a_number", "en", "The error message for non-number input was wrong.")
msgs.set_msg("fail_negative", "fi", "Negatiivisestä syötteestä kertova virheviesti oli väärä.")
msgs.set_msg("fail_negative", "en", "The error message for negative input was wrong.")

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    return v

def gen_prompt_vector():
    v = []
    for i in range(10):
        v.append((random.choice(prompts[lang]), ))
    return v

def gen_input_vector():
    v = []
    v.append((random.choice(string.ascii_letters), round(random.random() * 100, 3)))
    v.append((round(random.random() * -100, 3), round(random.random() * 100, 3)))
    for i in range(8):
        case = []
        for j in range(random.randint(0, 2)):
            if random.randint(0, 1):
                case.append(random.choice(string.ascii_letters))
            else:
                case.append(round(random.random() * -100, random.randint(1, 4)))
            
        case.append(round(random.random() * 100, random.randint(1, 4)))
        v.append(case)
    
    random.shuffle(v)
    return v

def ref_prompt(args, inputs):
    return inputs[-1]

def ref_func(v, m):
    return 0.5 * m * v * v

def error_msg_validator(output, args, inputs):
    lines = output.split("\n")
    
    assert len(lines) >= len(inputs), "fail_insufficient_prompts"
    for i, value in enumerate(inputs):
        if isinstance(value, str):
            assert msg_patterns.get_msg("NaN", lang).search(lines[i]), "fail_not_a_number"
        elif value < 0:
            assert msg_patterns.get_msg("negative", lang).search(lines[i]), "fail_negative"

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(
            st_module, st_prompt_function, gen_prompt_vector, ref_prompt, lang, 
            inputs=gen_input_vector,
            ref_needs_inputs=True,
            message_validator=error_msg_validator
        )
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)

Using Objects¶

Checking functions that modify existing objects instead of returning anything can be done using a result object extractor. These are functions that modify the student result. The most oftenly used mofication is to replace the return value with one of the arguments from the test vector - the one that contains the object that was modified. When doing tests with objects, it should be noted that - by default - the same object is passed to the reference and the student function, and other possible functions. Since this will lead to problems, we need another modification to the default behavior: an argument cloner. A cloner creates copies of mutable objects in the argument vector to prevent functions from affecting each other.

To show how to do these, we'll take a new example once more. This will be a simple filtering function that removes all values with an absolute value over a given threshold from a list of values. Students have been instructed to remove values from the original list - they should not make a copy of it. You can see most of the functions from the example as there isn't anything new about them. In order to change the student result from the return value to one of the arguments, this simple extractor function will do:

def values_extractor(args, res, parsed):
    return args[0]

As can be seen from the def line, arguments, student return value and values parsed by an output parser are available for extraction. This extraction is not done for the reference - the reference should directly return the value(s) we want to validate the student result against. In this case it would be:

def ref_func(values, threshold):
    return [value for value in values if value <= threshold]

Unlike the function expected from students this one actually does create a copy - but we're allowed to cheat in reference functions as long as we get the desired result. Although our reference doesn't modify the original list of values, an argument cloner is still needed because the arguments list goes into several different function calls (like when printing the function call into the evaluation log). For a one-dimensional list, the argument cloner is very simple:

def values_cloner(args):
    return args[0][:], args[1]

Most objects can also be cloned by the deepcopy function from the copy module if you're feeling lazy. Again with all the pieces in place, we'll just call test_function

core.test_function(
    st_module, st_function, gen_vector, ref_filter, lang,
    result_object_extractor=values_extractor,
    argument_cloner=values_cloner
)

If you are content with just using deepcopy, you can just use result_object_extractor=copy.deepcopy when calling the function and not bother with implementing your own function. However this might be slower, especially if there are parts in your argument vector that don't need cloning.

Full example:

filter_test.py

import random
import test_core as core

st_function = {
    "fi": "suodata_virhearvot", 
    "en": "filter_noise"
}

msgs = core.TranslationDict()

def gen_vector(): 
    v = [([10, 15, 15, 10, 10, 10], 12)]
    for i in range(9):    
        test = []
        for i in range(random.randint(1, 10)):
            test.append(random.randint(1, 99) * (random.randint(0, 1) * 1000 + 1))
        v.append((test, random.randint(1, 1000)))
    return v

def ref_func(values, threshold):
    return [value for value in values if value <= threshold]

def values_extractor(args, res, parsed):
    return args[0]
    
def values_cloner(args):
    return args[0][:], args[1]

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]   
    st_module = core.load_module(st_mname, lang, inputs=[[], 1])
    if st_module:
        core.test_function(
            st_module, st_function, gen_vector, ref_filter, lang
            result_object_extractor=values_extractor,
            argument_cloner=values_cloner
        )
        core.pylint_test(st_module, lang)

Custom Validators¶

Default validators are adequate for tests where exact matching between reference and student result is reasonable, and informative enough. However when the correct result has multiple potential representations or it is simply very complex, custom validators might be needed. Likewise if the task itself is complex, a custom validator with more than one assert can give better information about how exactly the student's submission is wrong.

Validators are functions that use one or more assert statements to compare the student's result or parsed output (or both) against the reference. The first failed assert is reported as the test result. If the validators runs through with failed asserts, the validation is considered successful. Each assert can be connected to a message that's been defined in the custom messages provided by the checker. To do so, the assert statement should use a handle string which corresponds to a message key in the dictionary.

For our example let's look at a checker that tests a function that converts a complex number to polar coordinates using degrees. While the default validator is capable of handling this, we want to give more information than just correct / incorrect. Also there might be some rounding errors which would cause the default validator to give false negatives.

A key thing to remember in implementing custom validators is that the only exception that is caught in this step is AssertionError - it's the validators responsibility to make sure it's not trying to do impossible things. Since validating ends at the first failed assert, ordering assert statements properly can be used to prevent validation from proceeding to steps that may cause errors. In our case we need to start by making sure the student function returned two values, then make sure they are both floats (because we want to round them) and finally we can compare them to the reference.

def polar_validator(ref, res, out):
    assert isinstance(res, (tuple, list)), "fail_return_value_length"
    assert len(res) == 2, "fail_return_value_length"
    assert isinstance(res[0], float), "fail_not_float" 
    assert isinstance(res[1], float), "fail_not_float"
    assert round(res[0], 2) == round(ref[0], 2), "fail_radius"
    assert round(res[1], 2) == round(ref[1], 2), "fail_angle"

Compared to a default validator, there's now 4 different messages. We don't expect the first two to trigger very often, but providing messages for them doesn't cost us a whole lot so might as well. You can see the messages from the full example below. With the validator function and messages in place, it's time to call test_function

core.test_function(
    st_module, st_function, gen_vector, ref_func, lang,
    custom_msgs=msgs,
    validator=polar_validator
)

Full example:

polar_test.py

import cmath
import random
import test_core as core

st_function = {
    "fi": "muunna_napakoordinaateiksi", 
    "en": "convert_to_polar"
}

msgs = core.TranslationDict()
msgs.set_msg("fail_return_value_length", "fi", "Funktio ei palauttanut kahta arvoa.")
msgs.set_msg("fail_return_value_length", "en", "The function didn't return two values.")
msgs.set_msg("fail_not_float", "fi", "Palautetut arvot eivät olleet liukulukuja.")
msgs.set_msg("fail_not_float", "en", "The returned values were not floats.")
msgs.set_msg("fail_radius", "fi", "Osoittimen pituus oli väärin.")
msgs.set_msg("fail_radius", "en", "The radius was wrong.")
msgs.set_msg("fail_angle", "fi", "Osoittimen vaihekulma oli väärin.")
msgs.set_msg("fail_angle", "en", "The polar angle was wrong.")

def gen_vector():
    v = [(-0.7j, )]
    
    for i in range(9):
        v.append((complex(round(random.random() * 10, 5), round(random.random() * 10 - 5, 5)), ))
    
    return v    
    
def ref_func(v):
    r, a = cmath.polar(v)
    return r, a * 180 / cmath.pi

def eref_radians(v):
    r, a = cmath.polar(v)
    return r, a    
    
def polar_validator(ref, res, out):
    assert isinstance(res, (tuple, list)), "fail_return_value_length"
    assert len(res) == 2, "fail_return_value_length"
    assert isinstance(res[0], float), "fail_not_float" 
    assert isinstance(res[1], float), "fail_not_float"
    assert round(res[0], 2) == round(ref[0], 2), "fail_radius"
    assert round(res[1], 2) == round(ref[1], 2), "fail_angle"
    
if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]   
    st_module = core.load_module(st_mname, lang, allow_output=False, custom_msgs=load_msgs)
    if st_module:
        core.test_function(
            st_module, st_function, gen_vector, ref_func, lang,
            custom_msgs=msgs,
            validator=polar_validator
        )

Other Customization¶

Most customization was covered by the previous sections. There's two more keyword arguments to test_function that were not covered: new_test and repeat. The first is for miscellaneous preparations at the start of each test case and the latter is a mechanism for calling the student function multiple times with the same arguments and inputs instead of once. Both of these are mostly for enabling fringe checkers.

If you provide a function as the new_test callback, this function will be called as the very first thing in each test case. In most checkers it's not needed for anything at all (the default does nothing). However, if there are persistent objects involved in the checking process that are not re-initialized by the student code, this callback is the correct place to reset them. The callback receives two arguments: test case arguments and inputs. Other objects that you need to access, have to be accessed globally within the checker code.

Repeat is needed even more rarely. It only has use in testing functions that would normally be called within a loop mutliple times to achieve the desired result.

Diagnosis Features¶

Due to the default messages of PySenpai, even by following the basic checker instructions above most checkers give relatively useful feedback. However, to truly reduce TA work, PySenpai offers a few ways to create catches for typical mistakes, and provide additional hints when students make them. Creating these is often an iterative process as year after year you have more data about what kinds of mistakes students make. However, making some initial guesses can be helpful too.

Implementing a diagnostic typically involves creating a function that discovers the mistake, and one or more messages that are shown in the evaluation output when the mistake is encountered. Attaching hints and

highlight triggers

is also common as they can provide much more accurate information when your checker is pretty certain of the nature of the mistake.

There's three ways to implement diagnosis functions: error/false references, custom tests and information functions. There is some overlap, but usually it's pretty straightforward to choose the right one. As stated in the PySenpai overview, there is also a return value recurrence check built into PySenpai. This is enabled by default and should be disabled when testing functions that don't return or modify anything (i.e. functions that only print stuff).

False Reference Functions¶

False or error reference functions are one of the most convenient ways to identify commonly made mistakes by students. A false reference is a modified copy of the actual reference function that emulates a previously identified erroneous behavior. In tests, these functions are treated like reference functions. They are called in the diagnosis step, and the student’s result is compared with the false reference result using the same validator as the test itself. If it matches (i.e. there is no AssertionError), it is highly likely that the student has made the error that’s emulated by the false reference.

False references are usually very easy to implement. Attaching messages to them is also very simple. When PySenpai gets a match with the student result against the false reference, it looks up a message with the false reference function's name. Do note that this message must be provided by the checker - there is no default message. Having a default would not make sense because PySenpai cannot know what your false reference function wants to say. False references are passed to test_function as a list of functions, so you can have as many as you want.

Let's assume our students have a hard time remembering to multiply the result of m * v ** 2 by 0.5 when calculating kinetic energy. In this case the false reference would simply be:

def eref_2x_energy(v, m):
    return m * v * v

After creating the function we also need the corresponding message in our custom messages dictionary.

msgs = core.TranslationDict()
msgs.set_msg("eref_2x_energy", "fi", dict(
    content="Funktion palauttama tulos oli 2x liian suuri.",
    hints=["Tarkista kineettisen energian laskukaava."]
))
msgs.set_msg("eref_2x_energy", "en", dict(
    content="The function's return value was 2 times too big.",
    hints=["Check the formula for kinetic energy."]
))

Finally modify test_function call to include our new diagnosis function.

core.test_function(
    st_module, st_function, gen_vector, ref_func, lang
    custom_msgs=msgs,
    error_refs=[eref_2x_energy]            
)

Full example, based on the minimal checker we created earlier.

kinetic_test_eref.py

import random
import test_core as core

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

msgs = core.TranslationDict()
msgs.set_msg("eref_2x_energy", "fi", dict(
    content="Funktion palauttama tulos oli 2x liian suuri.",
    hints=["Tarkista kineettisen energian laskukaava."]
)
msgs.set_msg("eref_2x_energy", "en", dict(
    content="The function's return value was 2 times too big.",
    hints=["Check the formula for kinetic energy."]
)

def gen_vector():
    v = []
    for i in range(10):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    
    return v

def ref_func(v, m):
    return 0.5 * m * v * v

def eref_2x_energy(v, m):
    return m * v * v

if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang)
    if st_module:
        core.test_function(st_module, st_function, gen_vector, ref_func, lang)
        core.pylint_test(st_module, lang, info_only=True)

Custom Tests¶

Custom tests are additional validator functions, but instead of validating the result, they check the result for known mistakes. For this end they are given more arguments by PySenpai than normal validators. A custom test can make use of arguments, inputs and raw output of the student function in addition to what's available to normal validators (i.e. result, values parsed from output and reference). A custom test can make multiple asserts the same way a validator does. Likewise, each assert can be connected to a different message. If a corresponding message is not found, PySenpai uses the function's name to fetch a message (if this fails it raises a KeyError).

The overlap of custom tests and validators is largely due to historical reasons. In the past validators did not use assert statements - they simply returned True of False. This meant that custom tests were needed whenever a more accurate statement about the problem was called for. With assert statements, validators can do most of the work that was previously done by custom tests. However there are still some valid reasons to use custom tests.

Since custom tests are only run if the initial validation fails, they can include tests that would occasionally trigger with a correctly behaving function. Another advantage is that they give information on top of the validation rejection message (remembering that only the first failed assert is reported). Also if you are otherwise content with the default validator or one of the built-ins, custom tests can provide the additional checking that is not necessary for validating but can be useful for the student to know.

The example shown here is pretty simple. It's from a test that uses a standard validator to check a prompt function that is expected to return an integer that is 2 or bigger. The custom test is added to draw more attention to situations where the student function returns a number that's smaller than 2 as this is something they may have missed in the exercise description.

custom_msgs.set_msg("fail_less_than_two", "fi", dict(
    content="Funktio palautti luvun joka on pienempi kuin kaksi.",
    hints=["Varmista, että kyselyfunktio tarkistaa myös onko luku suurempi kuin 1."]
))
custom_msgs.set_msg("fail_less_than_two", "en", dict(
    content="Your function returned a number that's smaller than two.",
    hints=["Make sure your input function also checks that the given number is greater than 1."]
))

def less_than_two(res, parsed, out, ref, args, inps):
    if isinstance(res, int):
        assert res > 1, "fail_less_than_two"

Information Functions¶

Information functions are in many ways similar to custom tests, but their results are reported differently. Where custom tests report the message for the failed assertion (if any), information functions report a message that contains value(s) returned by the function. The use case for information functions is when you want to show something specific to the student instead of giving a verbal statement about the issue. Information functions receive the same arguments as custom tests. However, unlike custom tests, information functions must return a value.

Information functions need to be accompanied by a message that uses the function's name as its dictionary key. This message is given the return value of the information function as a named key argument func_res. Information functions are not expected to find something every time. For the times they do not find anything worth reporting, they should raise NoAdditionalInfo. This will signal to test_function to not print the associated message at all.

An example use of this is from a checker that tests a function that finds out whether a given number is a prime. As the function return value is simply True or False, the report has no information which divisor the student function may have missed when it gives a false positive. To add this information, we can make an information function that returns the smallest divisor for a non-prime number. If the number is a prime, there is no divisor to show - therefore NoAdditionalInfo is raised.

custom_msgs.set_msg("show_divisor", "fi", "Luku on jaollinen (ainakin) luvulla {func_res}")
custom_msgs.set_msg("show_divisor", "en", "The number's (first) factor is {func_res}")

def show_divisor(res, parsed, output, ref, args, inputs):
    for i in range(2, int(args[0] ** 0.5) + 1):
        if args[0] % i == 0:
            return i
    raise core.NoAdditionalInfo

Output Customization¶

An important aspect of checkers is that their output should accurately represent what is going on in the test. Misleading messages are worse than no messages at all. PySenpai's default set of messages is adequate for most normal situations, but when checkers do something slightly different, altering the messages appropriately is called for. As discussed earlier, checkers can also add their own messages for validation and diagnosis features. These should be more helpful than "Your function returned incorrect value(s)".

Another aspect of output is the representation of code and data. Again PySenpai provides reasonable defaults that represent things like simple function calls, simple values and even simple structures quite well. However, when checkers involve more complex code or structures, these default represntations can become unwieldy. In these situations, checkers should provide presenter overrides - functions that format values more pleasantly for the evaluation report.

When Lovelace processes the evaluation log from PySenpai, it uses the Lovelace markup parser to render everything. This means messages can - and usually should - include markup to make the feedback easier to read. PySenpai's default messages do use Lovelace markup. Any message can also be accompanied with a list of hints and/or triggers by using a dictionary instead of string as the message value. So the message value can be either a string, or a dictionary with the following keys:

content (string)
hints (list of strings)
triggers (list of strings)

If you want to add hints or triggers to a default message, simply omit the content key from your dictionary. For new messages content must always be provided.

Overriding Default Messages¶

Overriding in PySenpai works by calling the update method of the default message dictionary with the custom_msgs parameter as its argument. The update method of TranslationDict is slightly modified from a normal dictionary update. On the operational level, all messages in PySenpai are stored as dictionaries. However, when overriding or adding messages, the message can be provided as a string or a dictionary. When overriding an existing message with a string, the string replaces the value of the "content" key in the dictionary.

To create an override, simply create a message in your custom message dictionary with the same key as the one you want to replace. You can find the specifics of each default message from the PySenpai Message Reference including what keyword arguments are available for format strings. If you use a dictionary without the "content" key as your override, the default message will be unchanged but any hints or triggers will be included in the evaluation log if the message is triggered.

Adding Messages¶

Messages need to be added when a checker wants to report something that PySenpai doesn't report by default. Custom validators that have more than one assert typically have separate message keys for each assert. Messages corresponding to these keys must be found in the message dictionary. Most of these messages must be plain statements - they are not offered any formatting arguments. The sole exceptions are messages tied to information functions, which get the function's return value as a formatting argument. Adding them is very straightforward since the checker developer is in charge of choosing the message keys.

In addition to completely custom messages, you can also add messages for exceptions that might occur when importing the student code or calling the student function. When importing the student module, PySenpai has default messages for the following exceptions:

ImportError
EOFError
SystemExit
SyntaxError
IndentationError
NameError

When calling the student function, the following exceptions have default messages:

TypeError
AttributeError
EOFError
SystemExit

To add a message for an exception, simply use the exception's name (with CamelCase) as the message key. When messages are printed into the evaluation log, they are given two format arguments: ename and emsg. Both are obtained from Python's stack trace. For legacy reasons the argument and input lists are also given (as args and inputs) when formatting but should not be used - both are also printed separately when an exception occurs using the PrintTestVector and PrintInputVector messages.

Overriding Presenters¶

Presenters are functions that prepare various test-related data to be shown in the evaluation log. Function tests support 6 presenters, although they only use 5 of those by default. The presenters are for:

Arguments
Function call
Inputs
Student result
Reference result
Values parsed from output

Out of these arguments is not used by default because the arguments are shown in the function call representation. The latter is more useful because it shows exactly how the student's function was called, and they can copy the line into their own code for trying it out themselves. The default call presenter splits long function calls into multiple lines, and always encapsulates the call into a syntax-highlighted code block markup. The default presenter for data values mainly uses repr and cleans braces from lists, tuples and dictionaries.

Presenter overrides are given to test_function as a dictionary. You only need to provide key/value pairs for presenters you want to override. Below is the dictionary that contains the defaults. Just copy it, put in your replacements and cut the other lines. When calling test_function, this dictionary should be given as the presenter keyword argument.

default_presenters = {
    "arg": default_value_presenter,
    "call": default_call_presenter,
    "input": default_input_presenter, 
    "ref": default_value_presenter,
    "res": default_value_presenter,
    "parsed": default_value_presenter
}

Presenters themselves are relatively straightforward functions: they receive a single argument, and should return a single representation. Usually this should be a fully formatted string. However if you also override the message, you can return a structure and do rest of the formatting in the message format string instead. Presenters are expected to handle their own exceptions. This is especially important for presenters that format student results. A good practice is to use a blanket exception and default to return repr(value).

Most common case for custom presenters are values where repr isn't sufficient to give a nice representation (e.g. two-dimensional lists) or a sensible representation at all (e.g. objects, files). Likewise the default call presenter has trouble when given long lists as arguments. The optimal representation of data structures is often dependent on the exact task at hand. For exercises where files are written, the following example can be used to show the file contents (in a separate box using a monospace font) when given a filename.

def file_presenter(value):
    try:
        with open(value) as source:
            return "{{{\n" + source.read() + "\n}}}"
    except:
        return core.default_value_presenter(value)

There are a lot of cases where a modified representation of data is appropriate. For example in Elementary Programming we have a few exercises where a 2D map of minesweeper is involved. If presented as a Python data structure, it's an unreadable mess. However, using presenters it can be turned into a nice ASCII visualization of the same structure. For example, this presenter shows the arguments to a floodfill function in a more readable way:

fill_msgs = core.TranslationDict()
fill_msgs.set_msg(
    "PrintTestVector", "fi", 
    "Tutkimuksen aloituspiste:\n{args[0]}\nPlaneetta:\n{args[1]}\n"
)
fill_msgs.set_msg(
    "PrintTestVector", "en",
    "Exploration starting point:\n{args[0]}\nPlanet:\n[args[1]}\n"
)

def planet_to_string_numbers(planet):
    view = ""
    cl = "  "
    for ci in range(len(planet[0])):
        cl += str(ci).rjust(2)
    view += cl + "\n"
    for ri, r in enumerate(planet):        
        view += str(ri).rjust(2) + " " + " ".join(r) + "\n"
    return view    
    
def arg_presenter(val):
    return "{1} {2}".format(*val), "{{{\n" + planet_to_string_numbers(val[0]) + "\n}}}"

In this example you can also see how to split the presenter output into two different placeholders inside the message format string. Just return two values from the presenter and use indexing in placeholders. This produces a nice output that is much more readable than what the default function call presenter would have rendered. This is how the data structure visualization looks like in the output:

  0 1 2 3 4
0     x
1     x
2 x x x x x
3     x
4     x

Program Tests¶

While function tests are by far the most important way of evaluating programs, sometimes testing full programs is desired. Program tests are mostly similar to function tests. This section only covers the differences between the two, and features that are unique to program tests.

Program tests have been implemented by using the reload function from importlib. This means that the student code is not being run as a standalone program, but imported. If the student program's main code is within the if __name__ == "__main__": conditional statement, the program test cannot run it. If your exercise needs to run program tests, students should be instructed to not use this conditional statement.

By far the biggest difference between function and program tests is the lack of arguments and return values. For program tests, the test vector is an input vector. Likewise the only way to get results is to parse them from the student code's output. Output parsers and message validators are therefore more commonly used in program tests. Their implementation doesn't differ from function tests.

Reference "Programs"¶

Because functions are generally easier to handle than modules, program test references are implemented as functions. The function should simply mimic the desired behavior of the student program and return values that should be found from the student program's output. When calling the reference, the input vector is unpacked just like the argument vector in function tests. This is useful for tests where the number of inputs is fixed as you can just pick up each input to a separate variable. However, there are just as many exercises where the number of inputs changes. In that case you should simply pack the arguments in the function definition:

def ref_func(*inputs):

Which simply puts all received arguments into the inputs variable. Beyond this difference program references are implemented in exactly the same way as function references.

Validating Programs¶

Validators for programs also work very similarly with function validators. The biggest difference is that while function validators have two different result objects - student result and parsed values - to examine, program validators only have one: parsed values. To make the signature of validators consistent, program tests call validators by passing None as the result argument. With this any function validator that only examines the values parsed from output works as a program validator without any changes. Indeed, the default validator for program tests is parsed_result_validator.

Message validators also have the same signature as their function test counterparts. This time the arguments argument is always None. Beyond this, just like validators, message validators work just like they do in function tests. The same goes for custom tests and information functions - they receive None for both arguments and the result.

Example Program Test¶

For this example we're turning the kinetic energy checker into a program test. The student main program is expected to prompt the values for mass and velocity, and then print the initial values and the result using 2 decimal precision within a result string (e.g. "The kinetic energy of an object with mass 1.00 kg moving at 1.00 m/s is 0.50 J").

In this case the test vector and the reference function are exactly the same. The only difference is that the test vector is now used for inputs. As we learned in tests that involve inputs, input vectors can have non-string values as they will be converted before writing to the student program's stdin. As the number of inputs is fixed, the reference works similarly. However since the task requires the student code to print the initial values as well as the result with a certain precision, all of them should be returned from the reference function:

def ref_func(v, m):
    return round(v, 2), round(m, 2), round(0.5 * m * v * v, 2)

Note that we're actually rounding here which doesn't produce 2 decimal precision. However, these are good values for validation because we only want to validate that the results are correct - whether the student program uses the correct precision should be determined by a message validator.

A parser is needed to actually obtain the results. As stated earlier, when parsing values we don't want to be strict with formatting. So although the task descriptions requires 2 decimal precision, the parser should just try to find floats with any precision. This can be done with a regular expression and its findall method:

float_pat = re.compile("(-?[0-9]+\.[0-9]+)")

def parse_floats(output):
    found = float_pat.findall(output)
    return [float(x) for x in found]

With this setup the default validator for parsed results will pass the student program as long as it calculates correctly. To enforce the 2 decimal precision requirement, a message validator is called for. We can do this by defining another regular expression, this time parsing floats with exactly two decimals, and using that in the validator.

precision_pat = re.compile("(-?[0-9]+\.[0-9]{2})[^0-9]")

def validate_precision(output, args, inputs):
    found = precision_pat.findall(output)
    assert len(found) == 3, "fail_precision"

Just to make sure we are testing with at least one set of inputs that results in trailing zeros, we'll actually modify the test vector slightly by generating one test case using integers instead of floats.

Finally to avoid confusion when printing the reference (remember, we used rounding which doesn't preserve trailing zeros), we use a presenter that shows the numbers with 2 decimal precision in the output:

def decimal_presenter(value):
    return " ".join("{:.2f}".format(v) for v in value)

presenters = {
    "ref": decimal_presenter
}

All that's left is to call test_program with all of this.

core.test_program(st_module, gen_vector, ref_func, lang,
    custom_msgs=msgs,
    presenter=presenters,
    output_parser=parse_floats,
    message_validator=validate_precision
)

Full example:

kinetic_program_test.py

import random
import re
import test_core as core

float_pat = re.compile("(-?[0-9]+\.[0-9]+)")
precision_pat = re.compile("(-?[0-9]+\.[0-9]{2})[^0-9]")

st_function = {
    "fi": "laske_kineettinen_energia",
    "en": "calculate_kinetic_energy"
}

msgs = core.TranslationDict()

def gen_vector():
    v = []
    v.append((random.randint(1, 100), random.randint(1, 100)))
    for i in range(9):
        v.append((round(random.random() * 100, 2), round(random.random() * 100, 2)))
    
    random.shuffle(v)
    return v

def ref_func(v, m):
    return round(v, 2), round(m, 2), round(0.5 * m * v * v, 2)

def parse_floats(output):
    found = float_pat.findall(output)
    return [float(x) for x in found]

def validate_precision(output, args, inputs):
    found = precision_pat.findall(output)
    assert len(found) == 3, "fail_precision"

def decimal_presenter(value):
    return " ".join("{:.2f}".format(v) for v in value)

presenters = {
    "ref": decimal_presenter
}


if __name__ == "__main__":
    files, lang = core.parse_command()
    st_mname = files[0]
    st_module = core.load_module(st_mname, lang, inputs=[0, 0])
    if st_module:
        core.test_program(
            st_module, gen_vector, ref_func, lang,
            custom_msgs=msgs,
            presenter=presenters,
            output_parser=parse_floats,
            message_validator=validate_precision
        )

Static Tests¶

Static testing means evaluating the student code without executing it, i.e. looking at the source code. Generally speaking, static testing should not be used as the primary means of validation. It's mostly meant for two purposes: enforcing restrictions given in the assignment and pointing out questionable solutions. An example of the former could be a task where you want students to use loops but they could theoretically use N if/elif statements to do the job. In this case a static test can examine the student code, count how many if/elif statements are there and reject the program if there's too many.

In general static testing should be used with care, especially when used as an evaluation criteria instead of just providing hints. This is particularly true now since pylint testing has been included in PySenpai as it should catch poor coding habits much more efficiently.

Static tests can examine individual functions or the entire module. If given a dictionary of function names similarly to test_function, static_test will only examine that function. If given None as the same argument, it will instead look at the whole module.

Static Validators¶

Like other validators, static validators are also functions that make assert statements. The arguments given to static validators are different however. PySenpai uses the inpsect module's getsource, getdoc and getcomments functions which gives the source code, docstring and preceding comments separately. These three in this order are the arguments given to static validators. Each is given as a string - to examine the source code line by line you need to split it. Another difference is that if there are no assert-specific messages, the validator name is used to find the message. Validators to static tests are passed as lists.

Whether examining source code line by line or as one string, in most cases static validators should be implemented using the same considerations as

textfield exercise answers

. In other words, regular expressions or other means of increasing fuzziness in the validation are highly recommended. Also keep in mind that when looking at functions, all lines will be indented and therefore stripping whitespace might be called for. Ignoring inline comments is also something that may need to be done within the validator.

Example Static Tests¶

These are short, so here's a couple. The first one counts if/elif statements:

custom_msgs.set_msg("if_counter", "fi", dict(
    content="Koodissa on liikaa ehtolauseita.",
    hints=["Toteuta muunnos käyttämällä apuna tehtävänannossa annettuja listoja."]
))
custom_msgs.set_msg("if_counter", "en", dict(
    content="Your code contains too many if/elif statements.",
    hints=["You must implement the conversion using the given lists."]
))

def if_counter(source, doc, comments):
    assert source.count("if ") + source.count("elif ") <= 10

In this example we have set the message using the validator name. Running this test with static_test:

core.static_test(st_module, st_function, lang, validators=[if_counter], custom_msgs=custom_msgs)

Another example with two validators: one for finding string literals within a function that should only use strings from its parameters and another for finding a dumb way of doing float checking for a string.

stupid_pat = re.compile("\s+int\([A-Za-z0-9_]+\)\s*")

custom_msgs = core.TranslationDict()
custom_msgs.set_msg("string_literal_check", "fi", dict(
    content="Funktion koodissa ei tule olla yhtään merkkijonoliteraalia!",
    hints=["Poista [!term=Literaaliarvo!]merkkijonoliteraalit[!term!] koodista ja käytä niiden tilalla funktion parametreja."],
))
custom_msgs.set_msg("string_literal_check", "en", dict(
    content="The function should not contain any string literals",
    hints=["Remove string literals from your code and replace them with the function parameters."],
))    
custom_msgs.set_msg("useless_line_check", "fi", dict(
    content="Koodista löytyi rivi, jossa muutetaan merkkijono kokonaisluvuksi, mutta ei talleteta sitä muuttujaan tai palauteta.",
    hints=["Muuta try:n sisällä olevaa riviä siten, että tallennat kokonaisluvun muuttujaan."]
))
custom_msgs.set_msg("useless_line_check", "en", dict(
    content="There's a line in the code that converts a string to an integer but doesn't actually do anything with the integer value.",
    hints=["Change the code line inside the try statement so that the result is stored in a variable."]
))

def string_literal_check(source, doc, comments):
    assert source.count("\"") + source.count("'") == 0
    
def useless_line_check(source, doc, comments):
    assert not stupid_pat.search(source)

Again the validators themselves are pretty simple and most of the code is just messages. This test is run for information purposes only, it doesn't reject the student submission:

core.static_test(
    st_module, st_function, lang, [string_literal_check, useless_line_check],
    info_only=True,
    custom_msgs=custom_msgs
)

Pylint tests¶

As the name suggests, Pylint tests use pylint to evaluate the student submission against the Python style guide. Having a linter as part of the evaluation is useful because students are not very likely to run linters themselves even though they would probably benefit from it. Like static tests, Pylint tests can be used either as just helpful information, or they can used to enforce some level of code quality.

Configuration¶

Since majority of the testing is done by an external tool, configuration is mostly provided by the pylintrc configuration file. The Lovelace server has its own version of this file, which uses the default configuration with a few extra disables. There are two ways to alter the configuration. You can pass modifications to pylint_test as a keyword argument (extra_options). The argument takes a list of strings that are valid Pylint command line options. Another way to alter the configuration is to include your own pylintrc file when uploading your exercise files to Lovelace.

The Lovelace pylintrc file can be downloaded below

pylintrc

Validators¶

If you want to use Pylint to enforce some level of coding standard, you can implement a validator that looks at the linter stats. The stats object is a dictionary with a lot of keys. For validation the most important keys are

"global_note": the score given by Pylint
"convention": count of convention violations
"refactor": count of refactor needs
"warning": count of warnings
"error": count of errors
"fatal": count of fatal errors

For instance you can have a validator that rejects code that has a score below a certain threshold, or has any notifications of given level or worse. The default validator simply looks if the score is above 5.0.

Example Pylint Test¶

There isn't a whole lot to show here, but let's just write a custom validator that rejects low scores, and programs with warnings or errors. Here's the validator and associated messages:

msgs = core.TranslationDict()
msgs.set_msg("fail_low_score", "en", "Your code received too low quality score ({global_note:.1f} / 10.0).\nTarget: 5+ / 10.0.")
msgs.set_msg("fail_low_score", "fi", "Koodisi sai liian pienet laatupisteet ({global_note:.1f} / 10.0).\nTavoite: 5+ / 10.0.")
msgs.set_msg("fail_errors", "en", "Your code evaluation included errors.")
msgs.set_msg("fail_errors", "fi", "Koodisi arviointi sisälsi virheitä.")
msgs.set_msg("fail_warnings", "en", "Your code evaluation included warnings.")
msgs.set_msg("fail_warnings", "fi", "Koodisi arviointi sisälsi varoituksia.")

def quality_validator(stats):
    assert stats["errors"] == 0, "fail_errors"
    assert stats["warnings"] == 0, "fail_warnings"
    assert stats["global_note"] >= 5, "fail_low_score"

Let's also assume there's one particular warning that we don't want to reject the student submission for. We can add this warning to the ignore list by passing it to pylint_test in the extra_options keyword argument. The call becomes:

core.pylint_test(
    st_module, lang,
    custom_msgs=msgs,
    validator=quality_validator, 
    extra_options=["--disable=redefined-outer-name"]
)

Advanced Tips¶

This section covers miscellaneuous tricks that we have used to implement fringe case checkers in the past. Some are small hacks while others required a large amount of work. Most of these examples may not be directly useful, but they should give you some ideas how to proceed when there is no clear path as to how to implement a checker.

Faking Libraries¶

In modern programming there's often cases where the student code creates graphics in the form of a graphical user interface, or animation, or game interface etc. These are not easy to check because the graphical libraries don't really work server-side where the checking happens. We had this issue when we wanted to have small introductory exercises using Python's turtle library. The visual feedback of graphics appearing on the screen is more powerful feedback for beginners, but turtle doesn't provide anything that could be evaluated.

In this case since turtle is a relatively small library, we just implemented a fake version of it with the exact same function definitions. Insted of drawing anything, it creates a log of actions that can be evaluated by the checker. The fake turtle writes the log into json when its done function is called. Abusing the fact that all imported modules refer to the same module in memory, we can call the module's done function after the student function is executed. This is done with a result object extractor. In this case it finds the corners of a square.

def corners_extractor(args, res, st_out):
    turtle.done()
    try:
        with open("picturelog.json") as logfile:
            corners = []
            log = json.load(logfile)
            lines = [op for op in log if op[0] == "line"]
            for line in lines:
                x0, y0, x1, y1 = line[1:5]
                corners.append((round(x0), round(y0)))
                corners.append((round(x1), round(y1)))
            return set(corners)
    except:
        return None

However, when it comes to more complex libraries, faking the entire thing becomes too unwieldy. In Elementary Programming we "solved" this issue by creating a wrapper that served two purposes: make the library easier for students to use (by exposing only the subset that students actually needed) and also make us able to write a fake version of the wrapper that mimics the programmatic behavior of the UI elements but doesn't actually create an UI. In this case the exercise assignment should point out the limits of the checker so that students don't try to go around the wrapper and import modules that haven't been installed on the server.

Testing Command Line Arguments¶

Normally PySenpai cannot test functions or programs that parse command line arguments because it's not running the student program from the command line, and sys.argv contains the arguments used in running the checker. However, once you have imported sys into the checker, you can overwrite sys.argv with fake command line arguments (just remember to put the program name as the first item) and the student program will not know the difference. If you do this in the new_test callback of a test, you can change the arguments at the start of each run.

Replacing Functions Within Modules¶

Since modules are loaded into memory the first time they are imported, it is also possible to replace existing functions with ones that are more suitable for the checker. This has been used for two different purposes so far. One scenario was an exercise where students needed to write a program with two functions. One of the functions read arguments from a text file and passed them to the other function (which drew things on the screen based on those arguments). In order to test the reader function, the checker replaced the drawing function with a mock version that only logged the arguments it received into a global list. This list was turned into the function result by a result object extractor.

call_list = []

def call_logger(*args):
    call_list.append(args)
    
def call_extractor(args, result, output):
    return call_list.pop()

After which the call arguments given by the reader function have effectively become its result. Messages need some adjusting to avoid confusion. Putting the mock function into the student module can be done with setattr after the student module has been loaded:

setattr(st_module, st_draw_function[lang], call_logger)

Another thing that has been done with module tampering involved the os module. We didn't want students to use os.chdir when reading files because it's a (really) bad practice. Instead of making a static test to see if it is used, we did this dynamically by replacing the chdir function in os module with this:

class ChdirCalled(Exception):
    pass
    
def fake_chdir(*args):
    raise ChdirCalled

PySenpai catches the exception when trying to execute the student function, and uses the exception name to look up a message, so we can notify the student about this by setting a custom message with "ChdirCalled" as the handle.

Anna palautetta

Kommentteja materiaalista?