Using Regex for Input/Output Tests

As you’re probably aware, you can check the functionality of student code with Unit-testing. For example, using C++ this is generally achieved by using external packages such as GoogleTest, CppUnit or Boost.Test. Coding Rooms offers unit test cases with GoogleTest.

There is an easier entry point to getting started with autograding student work: Input/Output Comparison Tests! This article provides some tips for getting started with Input/Output tests on Coding Rooms IDE. What do Input/Output tests do? They comparing expected console input with expected console output.

You can create test cases in Coding Rooms “Test Bench” on the instructor view of the IDE.

Once you click “Add Test Case”, you can name you test case, identify the number of points it is worth, and select the test type which include Input/Output Comparison, Unit Test, and Manual (no auto-grading). For this article, we will be using the “Input/Output Comparison”

When you click on the “Add” button, you will be able to provide details for your test. In part it will look something like this:

Whenever you are using the Input/Output comparison tests, you are invoking a regular expression (regex) to test whether the student’s answer is correct – or close enough.

The “Compare Method” drop-down menu contains a few basic built-in regexes:

Matches something exactly;
Matches something exactly except special characters and whitespaces;
Contains something.

Alternatively, you can write your own regex. As you do, here are a few tips:

Don’t enclose your regex in quotes or tic marks;
White spaces matter;
To match everything use this: (?s).*
Ask that students put a new line between lines that solicit input and ones that provide output e.g. ,
cout << “What is your favorite color?” << endl;
cin >> fav_color;
cout << endl; // add this line
cout << “So it’s << fav_color << “?” << endl;
… this will add some structure to the output and make debugging or your tests easier.

Testing is Atomic

Each assignment’s test outcome is regarded as “pass/fail”. Either all or none of the points will be awarded for a single test. This can create challenges when the goals include acknowledging partial success or getting a firm understanding on exactly what topics are resonating with the students.

Example - Basic IO with Regex:

The student is asked to write a script to produce a dialog that:

Asks a user for their favorite baseball team
Stores the team as a variable
Asks the user for an affirming message
Stores the message as a variable
Writes a blank line to the terminal
Writes the message then the team to the terminal

They are presented a sample dialog:

What's your favorite team?  Cubs
What's your inspirational message?  GO

GO Cubs

We’ve left much up to the student here – we want some creativity, after all. They can pick any team and any inspirational message. We did impose the constraint that a blank line is placed before the final output. Here’s how we might set up the test:

This lets the system know we’ll be writing our own regex.

This specifies what input we will provide to the student’s program. Each line of the input corresponds to the programs request for information.

Finally, we provide a regex for each line of text output to the console during our session.

The first two lines are extremely generic and will match anything. (A match means success from the auto-grader’s perspective). We really don’t care what the user provides for prompts for either the team or the message, right?

The final line actually tests what the student’s program did with our input. Line 3 will match our specific inspirational message (GO), followed by zero or more characters (this makes sure that any student-provided punctuation will pass), followed by the team (Cubs), followed by zero or more characters.

When you are ready to make sure your test works, you can run it against either the code in the Template or in your Model Solution by choosing one of the buttons at the top of the screen:

In either case, the system will run the code with your input then compare the console contents with the regular expressions you’ve provided. The pass/fail result then shows next to the test’s name. In this case the green “Passed” next to the name of the test (IO test) indicates that the test passed and awarded 1 point.

In the test result, you can see details. First shows the input and output. We can verify that the input is as expected. We can also see exactly what lines of terminal code were output by the student’s program and the regex against which it was evaluated.

Had the test failed a clickable “EXPLAIN DIFFERENCES” text link would appear just over the output. It provides a complete explanation of where and why the failure occurred.

By adding additional test cases, it’s easy to check different permutations of input, insert corner conditions, etc. The tests are completely independent, and each awards its own points. Thus it’s possible to run the easy-to-pass tests first and the “gotcha” tests later.

Please note that that there are some things we couldn’t test directly. For instance, did the student store the user-provided information in varibles? We didn’t have a direct test for this but we tested it indirectly. Otherwise,how else could the output reflect the team name and message?

There are some things we couldn’t test at all. Was the message affirming or snarky? Was the name of a baseball team provided? But that’s OK because these points are peripheral to the learning objective.

Example – Fixed-format Testing

Writing your own regular expressions offers significant flexibility around what students can provide as passing solutions. But it does take more work on the instructor’s part. Coding Rooms offers some built-in regex options when the output is well-defined or nearly so.

In this example, students were tested on some of the content in a module regarding C++ string manipulators used for formatting text output.

On November 27, 2021, The University of Michigan defeated Ohio State University in the most-watched football game of 2021. In an upset victory, the then-fifth rated Wolverines prevailed over then then-second rated Buckeyes at the Big House in Ann Arbor.

The final score was printed above using string manipulators like left, right, and setw from the <iomanip> library and cout from the library.

Can you upgrade this to include the scores by quarter? Please display the team names as left justified text in 12-column fields. Please show the other information as right justified text in 3-column fields.

Here, the project is carefully and precisely specified in terms of both contend and aesthetics. It’s a bit tedious and exacting from the perspective of the student – but it gives an exact target. Thus we can use the built-in “Equals exactly” match option …

… and the properly-formatted output provided by the Model Solution:

We can provide feedback that will likely be well-targeted in the event of failure:

Providing Feedback to students on failure

Regular Expressions

Regular expressions (regexes) are a means of searching for, replacing, and extracting characters from text. In the context of auto-grading assignments, regexes are used to match the console output produced by students’ code with the instructors’ definition of what the students’ code should produce.

Regexes can get extremely complicated. We’ll go over some of the fundamentals here demonstrate how to use an online tool as a coach when you need more.

Regex basics for CodingRooms:

Regexes handle only strings and characters; “123” is a string, not a number;
There are many dialects of regex - CodingRooms uses Golang’s version (RE2);
We don’t need to use quotes or tic marks to enclose the regexes;
White spaces matter;
Auto-grading uses only the match functionality …
… but matches can get as sophisticated as you need them to be.

Raw text regexes: The term “regex” refers to a bit of text that’s encoded to find content in a string. For instance, if our regex is Tex it would match the first three characters of the word “Texas”. If our regex is as it would match the last two characters in “Texas”.

Multiple choice: If we can accept different characters as a match we can put them in square braces. The regex [abc] encodes for either “a” or “b” or “c”. So the expression [TAB][eab][xab] would match the first three characters in “Texas”; it would also match the first three characters in “Aexas” or “Bbblahblah”. Ranges of contiguous characters also work. The regex [a-zA-Z] encodes for any letter.

Special codes: Whole classes of characters can be specified using special codes preceded by the escape character \. For instance the code \d means “any digit”. The code \s means “any whitespace character”, including tabs and new lines. The period is a wildcard for any character. This sequence matches anything at all (?s).*

Multiple characters: The * and + are special codes that can be used to match on multiple numbers of the sought-after character. The code X* means “zero or more X”. X+ means “one or more”. To find exactly three, one might use X{3}.

Position: The character ^ at the beginning of the regex means “match at the start of the string”. The dollar sign $ means “match at the end of the string”.

As you can see, regexes are pretty arcane and there is a lot to remember – and we’ve just scratched the surface here. Gratefully, you don’t need to remember much – this is all stuff you can look up.

Online Regex Resource

There’s a good tutorial and cheat sheet here (one of many in the world).

Stackoverflow is your friend. Regexes have been around just about as long as computing itself so you’ll find solutions to all your basic questions and many of your more complicated ones. Many of the solutions you’ll find will be from a different dialect than used in Coding Rooms. But don’t be daunted – the dialects are very close and have a lot of overlap so the answer you find could well work for your use case.

I find the best tool to be regex101.com. It supports Golang regexes as well as other dialects – so you can use as sort of a Rosetta Stone to test regexes you pick up with google searches, make sure that they work as advertised, then translate them as needed.

Here we’ll provide an overview of how to use regex101 for testing your regexes. When you open the tool, you’ll see something like this:

It’s important to note that it does not default to the right dialect of regex. You want to set it so you won’t have surprises later.

Select Golang as the “Flavor”.
Select Match as the “Function”.
The regex you’re testing goes here.
The string you checking goes here. It’s easy and fast to switch these out so you can check a variety of alternative student answers against your regex.
This is the best part IMHO – it provides play-by-play commentary on how your regex matched, and why.
The result about what, if anything, matched.
The code generator likely won’t be of use here and now, but it will produce syntactically-correct code in any supported language.
Here, you’ll find a terrific cheat sheet. All of the 100+ regex tokens are categorized and searchable.

When you have the regex working you can simply copy the contents of the “Regular Expression” box …

… into the I/O testing form.

You’ll note that there are embedded tic marks symbols at the beginning and end. These are used to contain literal strings (to make sure nothing is treated as an escape character). You likely won’t need these. The "gm" at the end holds the global modifier flags ‘g’ and ‘m’. You can read about these in the explanation section, but won’t need them.