Each line is a problem expressed in JSON format, consisting of the following fields:
id: Problem IDname: Problem name (cf. Appendix D)description: A short description of the problemcategory: Manually labeled problem categoryprompts: A list of template-enabled strings, specifying each step.inputs: A list consisting of 5 test case inputs. Each test case is a key-value table mapping the variables (used in the templated prompt) to actual values.outputs: A list consisting of 5 test case outputs. Each test case is an expected output value of the program.max_gen_length: Maximum number of tokens we set for each turn for the problem. The value is mostly 128 because each turn doesn't require substantial lines of code, but we adjusted a higher number when long generation is expected.