Adding new programming languages
TESTed is designed to (and does) support multiple programming languages. To make this easier, all programming-language-specific aspects of the judging process have gathered into one class. Adding support for a new programming language then involves creating a subclass and implementing the various methods. We call this subclass and its supporting files a "language module".
This tutorial explains in detail how to implement such a class. We will use the C programming language as an example.
Some useful links that could be useful:
- Implementations for all programming languages currently supported by TESTed, including C, are available at https://github.com/dodona-edu/universal-judge/tree/master/tested/languages.
- The class definition, which you need to subclass: https://github.com/dodona-edu/universal-judge/blob/master/tested/languages/config.py
- Test exercises: https://github.com/dodona-edu/universal-judge/tree/master/tests/exercises
Installing and running TESTed
It is very useful to be able to run TESTed and its tests when extending TESTed. We therefore recommend following the installation instructions from the README. Note that you only need to install the Python dependencies. Dependencies for other programming languages (e.g. ghc
for Haskell) are optional.
To follow the parts of this tutorial that focus on the C programming language, you also need to have a local installation of the gcc
compiler (version 8.1 or up).
Windows users
We recommend using the Windows Subsystem for Linux for development on Windows machines. While TESTed is itself written in Python and thus platform independent, dependencies for programming languages are not always available on Windows for each language.
Running TESTed
After setting up TESTed, your directory structure should look something like this (the exact files can vary):
universal-judge
├── tested/ # Sources
├── tests/ # Tests
├── workdir/ # Working directory when manually running TESTed
├── config.json
└── ...
In this tutorial, we assume you run the commands in the root directory of the repository. Test if you can run TESTed using the following command:
> python -m tested --help
usage: __main__.py [-h] [-c CONFIG] [-o OUTPUT] [-v]
The programming language agnostic educational test framework.
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Where to read the config from
-o OUTPUT, --output OUTPUT
Where the judge output should be written to.
-v, --verbose Include verbose logs. It is recommended to also use -o in this case.
The above command is how you run TESTed in production. For development, however, the need to always specify an exercise configuration is somewhat cumbersome. For that purpose, you can run TESTed in development mode which uses a hard-coded configuration:
> python -m tested.manual
Running TESTed in development mode uses the hard-coded values defined in tested/manual.py
to evaluate an exercise and puts all files that are generated intermediately into the workdir
directory of TESTed. The latter can be very useful to, for example, inspect the test code that is generated by a language module. If the workdir
directory does not exist yet, you might need to create it.
Implementation patterns
It is also useful to have an idea how TESTed works internally. The evaluation of a submission for some programming language (say C) works like this:
- The programming-language-agnostic test suite is converted into actual C code. This code is called the test code and will run the submissions with the tests.
- The test code is executed.
- The results are collected.
- The results are compared against the expected values from the test suite.
- The final feedback is generated and outputted.
For example, assume we have an elementary exercise with the following problem statement:
Write a function
echo
that takes one string parameter and returns this argument.
A very small test suite with one test might look like this:
- tab: "Echo"
testcases:
- expression: "echo('input-1')"
stdout: "input-1"
A correct submission for this exercise might look like this:
#include <stdio.h>
void echo(char* what) {
printf("%s", what);
}
When TESTed evaluates this submission, it will convert the test suite to C code. In this case, the expression of the only test case looks like this:
char* result = echo("input-1");
It will also save the return value. Later, this code is combined with the code for other test cases into the test code (note that the actual generated code is more complex):
// Include the submission from the student
#include "submission.c"
// Include our output library, which converts values into TESTed's JSON format.
#include "values.h"
// The function that will execute the first test case.
void test_case_1() {
// Generated from the test suite.
char* result = echo("input-1");
// Use the "values.h" file to report the return value.
send_value(result);
}
int main(int argc, const char* argv[]) {
test_case_1();
// If there were more test cases, we would add them here:
// test_case_2();
// ...
}
The main responsibility of the language module is to generate the correct test code.
The C programming language
Not every programming language supports the same features or data structures. TESTed includes a way to indicate what is supported and what isn't.
From TESTed's basic data structures, the following are not supported in C:
sequence
: Arrays are special in C, and currently not supported.set
: C does not have built-in sets.map
: C does not have built-in maps. C has structs, but it is not possible to get access to their field names at runtime, so there is no way to serialise them.
Advanced types that are based on these unsupported basic types are also not supported. Additionally, the following advanced types are not supported:
bigint
: C does not have built-in arbitrary precision integers.fixed_precision
: C does not have built-in fixed precision real numbers.
Finally, from the "language constructs", the following are not supported:
- objects
- exceptions
- heterogeneous collections
- default parameters
- named parameters
The Language
class
The first step is to create a directory for the C language module. Create a new directory tested/languages/c
:
universal-judge/
├─ tested/
│ ├─ languages/
│ │ ├── c/ <- new folder
│ │ ├── haskell/
│ │ ├── java/
│ │ ├── python/
│ │ ├── config.py
│ │ ...
│ ...
...
To get started with implementing the class, create a Python module tested/languages/c/config.py
and define a class that inherits from Language
. See below for an annotated implementation. Various aspects are explained below it in more detail.
from tested.languages.c import generators
class C(Language):
# See the docs of the parent class for all available options.
# Required methods are marked abstract, so you must implement those.
# Additional files we need in the test code.
# These files should go in the `tested/languages/c/templates` directory.
def initial_dependencies(self) -> List[str]:
return ["values.h", "values.c", "evaluation_result.h", "evaluation_result.c"]
# Compiled languages need a selector to choose which test to execute.
def needs_selector(self):
return True
# The extension of the source code files.
def file_extension(self) -> str:
return "c"
# By default, TESTed uses snake case.
# Here you can override this for certain features.
def naming_conventions(self) -> Dict[Conventionable, NamingConventions]:
return {"global_identifier": "macro_case"}
# As mentioned above, we only support these constructs.
# This is the reverse: we list supported constructs.
def supported_constructs(self) -> Set[Construct]:
return {
Construct.FUNCTION_CALLS,
Construct.ASSIGNMENTS,
Construct.GLOBAL_VARIABLES,
}
# As mentioned above, we don't support various data types.
# This is the reverse: we list supported data types.
def datatype_support(self) -> Mapping[AllTypes, TypeSupport]:
return {
"integer": "supported",
"real": "supported",
"char": "supported",
"text": "supported",
"boolean": "supported",
"nothing": "supported",
"undefined": "reduced",
"int8": "reduced",
"uint8": "reduced",
"int16": "supported",
"uint16": "supported",
"int32": "supported",
"uint32": "supported",
"int64": "supported",
"uint64": "supported",
"single_precision": "supported",
"double_precision": "supported",
"double_extended": "supported",
}
# The compilation command used by TESTed.
# See below for more information, or check the method documentation
# for technical details.
def compilation(self, files: List[str]) -> CallbackResult:
main_file = files[-1]
exec_file = Path(main_file).stem
result = executable_name(exec_file)
return (
[
"gcc",
"-std=c11",
"-Wall",
"-O3" if self.config.options.compiler_optimizations else "-O0",
"evaluation_result.c",
"values.c",
main_file,
"-o",
result,
],
[result],
)
# Execution command used by TESTed.
# This will execute the result of the compilation command.
# See below for more information, or check the method documentation
# for technical details.
def execution(self, cwd: Path, file: str, arguments: List[str]) -> Command:
local_file = cwd / executable_name(Path(file).stem)
return [str(local_file.absolute()), *arguments]
# The actual implementation has more methods, but look at the code for those.
# The following four methods are responsible for generating C code based on
# the test suite. To keep this class managable, we have extracted the generation
# code to its own file, `tested/languages/c/generators.py`.
def generate_statement(self, statement: Statement) -> str:
return generators.convert_statement(statement, full=True)
def generate_execution_unit(self, execution_unit: PreparedExecutionUnit) -> str:
return generators.convert_execution_unit(execution_unit)
def generate_selector(self, contexts: List[str]) -> str:
return generators.convert_selector(contexts)
def generate_encoder(self, values: List[Value]) -> str:
return generators.convert_encoder(values)
Compilation
Interpreted languages
Whenever possible, we recommend to at least run a syntaxis check during the compilation step, even if the programming language is not compiled. For example, the compilation steps for Python and JavaScript check that submissions have no syntax errors.
The only parameter of the compilation
method takes a list of files that TESTed deems useful to have during compilation. This includes the files from the initial_dependencies
method, the submission itself and the generated code. By convention, the last file in the list contains the main
function and should thus be the file being compiled. All files will be in the same directory as the main file.
The compilation method does not need to use all files. In case of the C programming language, for example, gcc
takes care of loading all other files, so we only use the main file.
The return value of the compilation
method must be a tuple with two values:
- The compilation command. TESTed uses the Python module
subprocess
to execute the compilation command. - The resulting files or a file filter.
The resulting files are a list of files that will be available during the execution step. Alternatively, you can pass a "file filter", which allows dynamic filtering of the files.
For example, in C, only the resulting binary is relevant, and it has a predictable name. We can thus return a single file in the list. The last file in this list must also be the executable file.
However, it is not always possible to predict the list of executables. For example, in Java, compiling a .java
file will result in one or more .class
files, depending on the content of the .java
file. In that case, you can return a filter function that determines what executable files must be copied. After compilation, TESTed will call the filter function for each file in the compilation directory.
Here is an example of calling the compilation method with the arguments and return values for C (on Windows):
>>> compilation(['submission.c', 'evaluation_result.c', 'context_0_0.c', 'selector.c'])
(
['gcc', '-std=c11', '-Wall', 'evaluation_result.c', 'values.c', 'selector.c',
'-o', 'selector.exe'], ['selector.exe']
)
Return an empty list as the compilation command to skip compilation.
Execution
After compiling the submission and the test code, TESTed executes the test code to get the results. This is done by calling the execution
method, which has three parameters:
cwd
: Path name of the directory in which execution is taking place.file
: Name of the file that must be executed.arguments
: Arguments passed to the execution process. These are used, for example, to select what context TESTed must execute.
The execution
method must return the execution command. TESTed again uses the Python module subprocess
to execute the execution command. For the C programming language, TESTed simply executes the executable file with the given arguments.
Here is an example of calling the execution method with the arguments and return values for C (on Windows):
>>> execution('/test/path', 'executable.exe', ['arg1', 'arg2'])
['/test/path/executable.exe', 'arg1', 'arg2']
Code generation
The Language
class has five methods that deal with code generation:
generate_statement
, which generates a statement.generate_execution_unit
, which generates a complete executable file (with main function).generate_selector
, which generates code for the selector (only needed ofneeds_selector
returnsTrue
).generate_encoder
, which is used in the tests to encode a single value.generate_check_function
, which generates code for programming-language-agnostic custom evaluators.
The last method is not implemented in C, as C does not support this.
As the code for this is relatively long, we have extracted the implementation of these methods to a different file.
As a general strategy, you can look at the implementation of a similar programming language and use that as a starting point.
Generating statements
Generating code for statements consists of handling each possible case:
def convert_statement(statement: Statement, full=False) -> str:
if isinstance(statement, Identifier):
return statement
elif isinstance(statement, FunctionCall):
return convert_function_call(statement)
elif isinstance(statement, get_args(Value)):
return convert_value(statement)
elif isinstance(statement, get_args(Assignment)):
if full:
prefix = convert_declaration(statement.type) + " "
else:
prefix = ""
return (
f"{prefix}{statement.variable} = "
f"{convert_statement(statement.expression)};"
)
raise AssertionError(f"Unknown statement: {statement!r}")
One non-intuitive aspect is the parameter full
that indicates whether a variable declaration is needed:
int variabele = 5; // with declaration
variabele = 6; // without declaration
See the source code for the implementation of the other parts.
Generating execution units
The execution unit method contains the most complex case, as it is responsible for generating a full file.
def convert_execution_unit(pu: PreparedExecutionUnit) -> str:
# STEP 1
result = f"""
#include <stdio.h>
#include <math.h>
#include "values.h"
#include "{pu.submission_name}.c"
"""
for name in pu.evaluator_names:
result += f'#include "{name}.c"\n'
# STEP 2
result += f"""
static FILE* {pu.execution_name}_value_file = NULL;
static FILE* {pu.execution_name}_exception_file = NULL;
static void {pu.execution_name}_write_separator() {{
fprintf({pu.execution_name}_value_file, "--{pu.testcase_separator_secret}-- SEP");
fprintf({pu.execution_name}_exception_file, "--{pu.testcase_separator_secret}-- SEP");
fprintf(stdout, "--{pu.testcase_separator_secret}-- SEP");
fprintf(stderr, "--{pu.testcase_separator_secret}-- SEP");
}}
static void {pu.execution_name}_write_context_separator() {{
fprintf({pu.execution_name}_value_file, "--{pu.context_separator_secret}-- SEP");
fprintf({pu.execution_name}_exception_file, "--{pu.context_separator_secret}-- SEP");
fprintf(stdout, "--{pu.context_separator_secret}-- SEP");
fprintf(stderr, "--{pu.context_separator_secret}-- SEP");
}}
#undef send_value
#define send_value(value) write_value({pu.execution_name}_value_file, value)
#undef send_specific_value
#define send_specific_value(value) write_evaluated({pu.execution_name}_value_file, value)
"""
# STEP3: Generate code for each context.
ctx: PreparedContext
for i, ctx in enumerate(pu.contexts):
result += f"""
int {pu.execution_name}_context_{i}(void) {{
{_generate_internal_context(ctx, pu)}
}}
"""
result += f"""
int {pu.execution_name}() {{
{pu.execution_name}_value_file = fopen("{pu.value_file}", "w");
{pu.execution_name}_exception_file = fopen("{pu.exception_file}", "w");
int exit_code;
"""
for i, ctx in enumerate(pu.contexts):
result += " " * 4 + f"{pu.execution_name}_write_context_separator();\n"
result += " " * 4 + f"exit_code = {pu.execution_name}_context_{i}();\n"
result += f"""
fclose({pu.execution_name}_value_file);
fclose({pu.execution_name}_exception_file);
return exit_code;
}}
#ifndef INCLUDED
int main() {{
return {pu.execution_name}();
}}
#endif
"""
return result
This consists of a few parts:
- First, we import various dependencies we will need.
- We define functions to convert return values and other data into the correct format for TESTed.
- We generate code for each context and test case.
If a function or method call returns a value (step 2 from above), TESTed must serialize the value before writing it to an output file. This serialisation converts the representation of the value in a programming language into the language-independent format used by TESTed. It is useful to implement data serialisation as a separate module, which is called "values" by convention.
There are four functions to collect results:
send_value(value)
: Serialises and writes a value to the return output file.send_exception(exception)
: Serialises and writes an exception to the exception output file.send_specific_value(value)
: Serialises and writes the result of a check for a specific programming language to the return channel.send_specific_exception(exception)
: Serialises and writes the result of a check for a specific programming language to the exception channel.
As the C programming language does not support exceptions, we only implement the two exception functions (using macros).
Generating selectors
When using compiled programming languages, we don't recompile the code for each context we execute. TESTed supports a "selector", which chooses what will be executed:
def convert_selector(contexts: List[str]) -> str:
result = """
#include <string.h>
#include <stdio.h>
#define INCLUDED true
"""
for ctx in contexts:
result += f'#include "{ctx}.c"\n'
result += """
int main(int argc, const char* argv[]) {
if (argc < 1) {
fprintf(stderr, "No context selected.");
return -2;
}
const char* name = argv[1];
"""
for ctx in contexts:
result += f"""
if (strcmp("{ctx}", name) == 0) {{
return {ctx}();
}}
"""
result += """
fprintf(stderr, "Non-existing context '%s' selected.", name);
return -1;
}
"""
return result
By using a selector, we can compile all tests into one executable.
Registering the language
We also need to register the language module in TESTed. In the file tested/languages/__init__.py
, add the language to the dictionary LANGUAGES
:
LANGUAGES = {
'c': C, # This is what we added here, mapping a name to the configuration class.
'haskell': Haskell,
'java': Java,
'javascript': JavaScript,
'kotlin': Kotlin,
'python': Python,
'runhaskell': RunHaskell,
}
Testing the language module
Now that we have configured and registered the language for TESTed, we can test if the language module works as expected. TESTed contains some predefined tests that can be used for that purpose. Before this can be done, you should also extend the tests to support testing the new programming language:
- Add solutions in the programming language to one or more of the test exercises (in the
exercise
directory). Take a look at the existing solutions to infer what your solutions should do. - Modify
tests/test_functionality.py
and other test files to include the new programming language for testing.