Introduction to PySenpai¶
PySenpai is a program checking framework, developed with pedagogical goals in mind. It was initially developed to unify the behavior of checking programs in the Elementary Programming course and make them easier to update and fix. It was initially developed for Python exercises but has later received extensions to work with C, Y86 assembly and Matlab. The key principles behind PySenpai's design are:
- Provide a unified testing process for all checkers
- Make simple checkers easy to implement
- Make complex checkers possible to implement
- Allow customization of messages
- Provide reasonable default feedback even for the most minimal checkers
- Discourage cheating
To achieve these goals, PySenpai uses a callback-based architecture where the testing process itself runs within PySenpai test functions. Checker developers implement callback functions that are called during the testing process at certain stages and can influence the way the student program is tested. If you are unfamiliar with these kinds of architetures, the entire process can feel a bit black box like. But don't worry, that's what this guide chapter is here for.
In many ways PySenpai operates like unit test frameworks - it's just a bit more specialized. At the very basic level testing comprises of preparation and running the tests. Basic preparation starts with loading the student code with PySenpai's loading function (which handles errors gracefully). If the module is successfully loaded, the checker can proceed to call one or more of PySenpai's test functions. For most functions checkers need to provide test vectors or test vector generators and references implementations that match the expected behavior of the student submission. After there are no more test functions to call, PySenpai will automatically output the test report upon exit.
Failures when interacting with student code are always caught and handled by PySenpai and logged properly in the evaluation report (there are, however, some fringe exceptions that cannot be caught). Failures with the checker code are let through into stderr. This will always result in Lovelace only reporting checker failure to the student - no evaluation will be done.
Customizing Loading Behavior¶
Compared to the actual test functions, there isn't that much to customize in loading behavior. Most of the customization is related to how inputs and outputs are handled. You can provide a list of strings as inputs to the student program. These are written to stdin so that if the student program needs inputs when it's being loaded, it has them. You can also set flags whether code output is shown in the evaluation report and whether it is allowed. You can also customize the output (see below). Customization of non-Python loaders is treated in each extension's chapter.
For Python programs, PySenpai offers five kinds of tests (although one of them has been implemented for an exercise type that is not yet available in Lovelace). These tests are:
- Function test. Calls a single student function and compares its behavior to a reference. This is usually the most important test, and is extremely flexible.
- Program test. Tests the student main program and compares its behavior to a reference (function). This test can be used to test whole programs, and is also quite flexible. However, it can only test code that is executed when the module is imported (i.e. not under
if __name__ == "__main__":.
- Code snippet test. Tests a code snippet provided as a string. The snippet is inserted into a temporary module by a constructor function and then executed. The namespace of the executed module is compared with a reference object. Currently not in use, but there are plans to make textfield exercisesthat use this functionality instead of regular expressions.
- Static test. This test is for custom source code validation, and can inspect either the code of a single function or the entire program. Mostly used in current checkers for rejecting submissions that use solutions that have been specifically forbidden in the exercise. Static tests can also be used as information only, in which case they do not affect the evaluation result.
- Lint test. This test uses PyLint to generate a code quality analysis of the submission. This analysis can be used as an evaluation criterion, or it can be provided just as extra information for the student. Sadly PyLint itself doesn't support gettext, so the messages will always be in English
Customizing Function Tests¶
The function test function has a number of parameters that can be used to affect its behavior. There's a total of 17 different optional parameters, majority of which are callback functions. While this may sound a bit intimidating, for most checkers the defaults are perfectly adequate. The behavior is of course also defined by the mandatory parameters. This section just outlines the options you have for customization - implementation details are provided in a separate guide chapter. Some of the customization options are related to output formatting - these will be treated separately (see below). This section describes the options that affect the behavior of the test function. They will be listed in rough categories.
Test Vector and Reference¶
Both test vector and reference implementation are mandatory and for some checkers all that's needed. You can provide a list as the test vector, or a function that returns a list. The number of test cases is directly derived from the test vector length. For function tests, each case is a list of arguments to be used when calling the reference function and the student function. You can also provide an input vector (optional) - if provided, it must be the same length as the test vector, and each case is a list of strings to be written into stdin.
The reference function is a function that provides the desired result for each test case. In normal cases it will do exactly what the student submission is expected to do. However there are certain scenarios where it needs to behave differently. The most common example is the way PySenpai deals with inputs: reference functions are not supposed to consume inputs. The reference can be given the inputs as a list, but it has to simulate by reading directly from the list whereas the student submission actually gets to read stdin. Basically this is just removing an extra step but it does mean you cannot just copy paste an input reading function from your reference program to be the checker's reference function.
An important thing to bear in mind is that all reference results are generated in advance and stored, i.e. once PySenpai starts to interact with the student function, the reference is no longer interacted with. Evaluation is done against the stored results.
For function tests result is formed of two parts: return values and output (contents of stdout after running the student function). By default these are fed to validators as is. However both can be modified prior to evaluation by using filtering callbacks. The more common use case is the use of an output parser to convert the raw output into parsed values. When parsing is done separately, default validators can cover more ground. If parsing was to be done in the validator, a custom validator would be needed for every test that cares about output. It also makes the evaluation report better because we can show the parsed values along with the full output.
The other half, return value, can also be altered. This is especially meant for testing functions that do not return anything but rather modify an existing object. You can write a callback function that chooses the result object to use instead of the return value. It can be chosen/formed from test arguments, return value and the parsed output. For example if a function modifies a list it receives as an argument, you would simply write a function that returns the corresponding argument and it will be treated as the "return value" for the remaining test stages. These functions are called result object extractors, and the reasoning to use them is similar to that of output parsers.
Validator is responsible for deciding whether the test passes or not. By default PySenpai validates student functions by comparing their return values with the reference. It also provides a few built-in replacements (e.g. validating output values instead of return values). However, implementing custom validators is the best way to provide more accurate feedback about what went wrong, especially in more complex assignments. Validators are functions that can do any number of assert statements, allowing the comparison to be done in several steps. Each assert statement in the validator can be accompanied with a different rejection message which will be shown as the reason for failing the test in the evaluation log.
Custom validators are also sometimes necessary just because a checker needs to evaluate complex objects where simple equality testing is not reliable. On a similar note, checkers can have some leniency in their validation which can be very important in reducing student frustration. For instance, functions that perform multiple floating point operations can have rounding errors when the implementation is different from the reference but just as correct. In this scenario using a rounding validator is likely to result in a better experience.
PySenpai also has a separate stage for validating messages in the student code. This helps students differentiate between functional issues in their submission and problems with its output messages. If you want to test that student code gives certain messages with certain arguments / inputs, it should be done with a message validator.
Analysis callbacks are functions that are called after validation if the student submission didn't pass. These can be used to pinpoint problems in the evaluation log and provide additional hints. There is one built-in check that is enabled by default: it lets the student know their function returned the same result regardless of arguments/inputs. Further analysis needs to be provided as callback functions. There are three categories that can be used:
- error references
- custom tests
- information functions
Error references are functions that simulate typical student mistakes in the assignment. The student result is validated against each error reference function and if any of them match, a related message is added to the evaluation log. They are usually simple to implement because they're just modified copies of the real reference function. However, knowing what the typical mistakes are may take a few iterations of instructing the course.
Custom tests are additional validators that work with extra information. Just like validators, they can do a series of assert statements to find out what's wrong. However unlike validators, they have access to raw output, arguments and inputs in addition to what's available to normal validators. Information functions have access to the same data but instead of doing assert statements, they are expected to return something which will be formatted into a feedback message.
Messaging in PySenpai is based on Python dictionaries where each message is accessed via a key that consists of the message handle and language. PySenpai has default messages in Finnish and English. The language can be chosen when invoking a checker by using the
--langoption. When implementing checkers, you can add your own messages by creating a similar dictionary (there's a convenience class for doing this) and pass it to PySenpai functions. At the beginning of each function, the default messages dictionary will be updated by messages from the dictionary provided by the checker. This can be used to add new messages (for validators and analysis functions) and to override existing messages.
Messages in PySenpai consist of the message content, list of hints and list of triggers (the latter two being optional). The message content can also contain certain named placeholders which can be used to show values of relevant variables. The available placeholder names for each message can be found from the full message specification.
In addition to customizable messages, PySenpai also uses presenters for certain values in the testing process, namely: argument vector, input vector, reference result, student result, parsed student result and function call. These allow you to show information in a way that makes sense. For instance, if the result you are validating in tests is an object, printing it without a presenter would show something like
<__main__.Result object at 0x7f984f5b24a8>which is obviously not very useful in terms of feedback. In this case you'd implement a presenter that returns a nice representation of relevant attributes within that class instead.
When implementing custom validators and info functions, you need to add corresponding messages. For validators, each assertion should raise a different message handle, and this handle should be found from the messages dictionary of your checker.
Give feedback on this content
Comments about this material