Addressing The Abstraction and Reasoning Corpus Via Procedural Example Generation

Uploaded by

yifu.chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views6 pages

Addressing The Abstraction and Reasoning Corpus Via Procedural Example Generation

Uploaded by

yifu.chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Addressing the Abstraction and

Reasoning Corpus via Procedural

arXiv:2404.07353v1 [[Link]] 10 Apr 2024

Example Generation
Michael Hodel

Institute of Neuroinformatics, University of Zurich and ETH Zurich

Abstract
This work presents code to procedurally generate examples for the
ARC training tasks [1] [2]. For each of the 400 tasks, an example generator
following the transformation logic of the original examples was created.
In effect, the assumed underlying distribution of examples for any given
task was reverse engineered by implementing a means to sample from it.
An attempt was made to cover an as large as reasonable space of possible
examples for each task. That is, whenever the original examples of a
given task may be limited in their diversity e.g. by having the dimensions
of the grids, the set of symbols or number of objects constant or within
tight bounds, even though the transformation does not require it, such
constraints were lifted. Having access to not just a few examples per task,
as the case for ARC, but instead very many, should enable a wide range
of experiments that may be important stepping stones towards making
leaps on the benchmark 1 .

1 Motivation
The Abstraction and Reasoning Corpus (ARC) is a dataset intended to
serve as a benchmark for general intelligence. It consists of 1000 unique
and diverse tasks asking for the test-taker to, given a number of demon-
stration pairs of input and output grids, predict the output grid for test
input grids. The difficulty of ARC for machine learning approaches is
largely a consequence of the great diversity of tasks as well its few-shot
nature. Even after more than four years since its publication, ARC re-
mains unsolved. While many attempts have been made to solve ARC,
what seems generally lacking are fundamental scientific experiments. The
data generation process in this work attempts to enable such experiments
for addressing the latter of those two aspects, namely sample-efficient
learning. A simple such experiment would be comparing various model
architectures and learning algorithms in a setting of having a separate
1 The code is available at [Link]

1
model trained for each task: For a given fixed number of examples, how
well can the model generalize to an unseen set of examples of the same
task, on average across all the training tasks? There are many oppor-
tunities in this realm of within-task generalization, e.g. addressing the
question of how the sample-efficiency of learning algorithms might be im-
proved by building a curriculum that increases example difficulty over
the course of training - as opposed to training on instances of the full
range of difficulties throughout the entire training - or by other means of
constructing a curriculum that is not fully random or uniform in terms
of some metric. While it seems implausible that strong statements about
suitability of approaches trying to solve ARC could be made directly based
on results of such experiments, they still may a means to gather important
insights into how progress on ARC could be made.

2 Task Generalization
Broad generalization of the tasks and therefore large sample spaces were
attempted. Generating new examples for an arbitrary ARC task via say a
set of fixed augmentation schemes would yield only minimal diversity and,
more importantly, can’t work in general in the first place, as the augmen-
tations may break examples. Hence, here, new examples are constructed
from the ground up instead. Each value for all relevant degrees of freedom
like grid dimension or number of objects, where applicable, is determined
randomly and individually for each example, e.g. sampling integers de-
termining the grid dimensions or number of symbols or objects, as well
as generating randomly shaped and colored objects, or randomly placing
objects on a grid, etc. It was generally attempted that each value within
the generation process that need not be hardcoded was sampled instead.
Further, it was attempted that each two components that do not logically
depend upon each other are sampled separately: E.g. for most tasks, the
grid height and width are independent from each other. Lastly, an effort
was made for each value of any degree of freedom to be sampled from an
as large a set as reasonably possible. For example, if a task features an
objectness notion and the number of objects may vary freely, then the
generator should allow for examples featuring only a single or also very
many objects.
There are some limitations to the notion of broadest-most possible
generalization however, and the realized sample space does not always
cover the actual space of possible examples. These limitations are a direct
result of favouring simpler or faster code over covering certain edge cases
or fully exhausting other pockets of the theoretical ideal sample space,
but also as a result of attempting to retain each ARC task’s spirit. Still,
each generator is capable of generating at least 10’000 unique examples,
and the vast majority of the generators even have a vastly greater sample
space.
Even if the generators could have been written to allow arbitrarily
large grids or number of symbols, to stay at least somewhat close to the
original ARC format, the number of symbols was kept at 10 and the
maximum grid height and width were kept at 30. Nevertheless, there are

2
three obvious ways in which the generated data fundamentally differ from
the ARC data: There are many more examples per task, each task is
typically much more general, i.e. the diversity of examples within a task
is greater, and examples may be either much simpler or also significantly
harder, e.g. since they may require dealing with many more elements.

3 Example Generation
Each generator is a standalone Python function merely making use of the
DSL and functions from the random module from the standard library.
The median generator consists of 40 lines of code and uses 22 DSL prim-
itive calls and 10 calls to the random module [3]. Even though there is a
great degree of overlap in the utilized techniques across the generators, no
separate utilities were created, apart from a wrapper around the random
function for controlling example difficulty. The interface of every genera-
tor is simple, in that, when called, it returns a single example. Some of
the strategies used are briefly presented here, to give some insight into
how the generation processes were implemented.
As the notion of objects is crucial to many ARC tasks, generating and
placing objects was frequently required. Typically, to generate a random
object, first a size of the object in terms of number of pixels was sampled,
and then - conditional on some notion of adjacency and maximum height
and width of the object - neighboring pixel were iteratively added to the
object, and then the object was colored. To place objects on grids, com-
monly one of two strategies was employed. The first is rejection sampling,
i.e. selecting a still unoccupied location and verifying that placing the
object there does not result in unintended occlusions. Alternatively, e.g.
in cases where rejection sampling was expected to give too high failure
rates, object locations were determined by sampling from a pre-filtered
set of candidate locations that are guaranteed to work.
Oftentimes, where possible, the requirement for a certain specific orien-
tation - e.g. with respect to rotation or mirroring - was abstracted away,
and to keep the generation process simple, variation in the orientation
was simply introduced by applying respective transformation functions to
both the input and output grid at the end of the generation process -
as opposed to accounting for the possible orientations already during the
process of drawing onto grids, which may unnecessarily complicate the
implementation without adding functionality.
Furthermore, for some tasks, the underlying transformation is an in-
vertible function, and for some of those, the inverse is more straight-
forward to implement. For such tasks, as opposed to sampling some input
grid and then applying some transformation to it - or, as was very fre-
quently done, constructing both the input grid in tandem - the output
grid was usually generated first and then the inverse transformation was
applied to arrive at the input grid.

3
4 Example Verification
It was deemed crucial to have some means of ensuring a high quality of
the generated examples. However, assessing the quality of the generated
examples manually is in general highly infeasible due to their potential
numerosity as well as the human tendency to make mistakes, especially
careless mistakes for more involved examples. Therefore, each generator
goes alongside a corresponding verifier. A verifier is simply a task-specific
function that transforms any input grid that is valid for that task to its
correct corresponding output grid. Hence, the verifiers can be used to keep
only generated examples on which the verifier works, i.e. which are valid.
Each verifier also works on all the original examples of the corresponding
task 2 . The verifiers oftentimes directly build on the solver programs
for the original ARC tasks as provided with the DSL. The verifiers are
all written entirely within the ARC-DSL. The median verifier program
consists of 18 DSL primitive calls.

5 Example Difficulty
While novelty is an important factor co-determining difficulty, it is not di-
rectly considered during generation, as it is context-dependent. Instead,
an emphasis is put on cardinalities, a more explicit proxy measure for dif-
ficulty: Arguably, examples involving more elements to deal with (rows,
pixels, symbols, objects, etc.) tend to be more difficult. Each gener-
ator allows to pass parameters to control the example difficulty in this
way. Example difficulty here is defined by an interval within the range [0,
1], which controls the uniform random number generation used for sam-
pling cardinalities. For example, if an input grid may be of any height
1 through 30, and bounds [1/3, 1] are passed as parameters, then any
height in range [10, 30] may be sampled; or if there may be between two
and five objects and bounds [0, 1/2] are used, then there will be either
two or three objects. The interval is used in any place where sampling
larger values generally corresponds to greater difficulty, and may include
anything from grid height, width, number of symbols, number of objects,
sizes of objects, number of symbols per objects, distances to travel, etc.
Hence, intervals closer to 1 and further from 0 tend to result in more dif-
ficult examples. This sampling is achieved via a simple wrapper function
around the random function of the random module. Robustly controlling
example difficulty was often nontrivial. For instance, setting the upper
bound for the number of objects too high may lead to extremely inefficient
object sampling or placement given a large upper bound for the difficulty.
Also, there were trade-offs, e.g. between the number of objects and their
sizes: both individually arguably increase example difficulty, but having
both at their respective maximum values is impossible.
The main motivation for allowing this somewhat explicit control over
2 There are a few exceptions however: For the tasks a8d7556c, 6cf79266, 469497ad, 9edfc990

and e5062a87, either the original examples contained bugs or the task logic has been slightly
adapted, and for the tasks 97a05b5b, 4290ef0e, 7e0986d6, 53b68214, 29ec7d0e and a64e4611
there is each one example in the original task for which the verifier fails.

4
example difficulty bounds was to enable within-task generalization capa-
bilities of models: Can a model also solve other (more difficult) examples
from the same task? This generalization capability is not to be confused
with the much more relevant notion of across-task generalization: Can a
model also solve different (but related) tasks? Controlling for example dif-
ficulty via the means of these parameters could be achieved by generating
levels, in the simplest case two sets of examples, the easier of which uses
interval [0, 0.5], and the harder of which uses interval [0.5, 1]. However,
creating too many levels of difficulty may not always work out nicely for
each task, since the ranges of possible values may be too small and would
render the pruned range empty. An issue with creating too many levels
is that in many cases, some degrees of freedom are not independent: E.g.
if the grid is smaller, then the upper bound for the number of objects
placeable is also smaller. For this reason, too small ranges per level would
hence not allow generation of certain instances in the first place. The aim
was to not over-engineer things while still retaining a uniform interface
and large sample space for all tasks.
While this means to control example difficulty could allow for tai-
lored datasets or curricula building during the generation process, it is,
for the reasons explained above, brittle to some degree. Hence, it is sug-
gested - depending on the experimental setup - to also or instead consider
the alternative of controlling for example difficulty in a post-hoc manner,
where examples are filtered or ordered after the generation, based on some
other metrics. Two metrics of difficulty, each in range [0, 1], are provided:
Firstly, the RNG-Difficulty defined as the average of the numbers sam-
pled from the provided bounds in the process of generating an example,
is heavily flawed for a number of reasons. For example, processes may
fail and the numbers sampled there still influence this metric. Also, the
random number influencing the number of objects has the same weight as
e.g. the random number influencing the number of pixels of an individual
object have the same weight. Secondly, the PSO-Difficulty is defined as
(P/1800 + S/10 + O/P )/3, where P is the number of pixels, S the num-
ber of symbols and O the number of objects, and an object is defined as
a set of pixels of the same symbol that are all horizontally or vertically
connected. Note that these metrics are mainly intended to serve as inspi-
ration, the metrics used for defining grouping or ordering of data should
be tailored to the questions that the experiments at hand try to address.
For about a third of the tasks, the number of symbols that occur in
each example is fixed. Over half the tasks however have at least 6 different
values for number of symbols an example may take on. Such tasks could
be used to conduct experiments for investigating the capabilities of models
to generalize to examples containing n symbols, after having been trained
on examples containing less than n symbols. Furthermore, almost all of
the tasks have at least 10 different values for the number of pixels an
example may take on, and the vast majority of tasks have at least 10
different values for the number of objects an example may take on (using
a simplistic uniform objectness notion of treating each each horizontally
or vertically connected set of pixels of a single color as an object), hence
experiments for assessing generalization capabilities along those degrees
of freedoms may also be feasible for respectively adequate subsets of tasks.

5
6 Limitations
Note that the true underlying transformation logic as arguably the only
or at least by far most crucial defining characteristic of a task is not well-
defined. The overall aim was to have the widest possible interpretation
of the transformation logic for any given task, such that most individuals
only familiar with the original examples could solve any generated exam-
ple within three guesses. However, this is not guaranteed. The generalized
task is simply defined as the set of pairs of input and output grids pro-
ducible by the generator for which the verifier function, when applied to
the input grid, returns the output grid. Importantly, this does not mean
that the verifier would work for every conceivable example a human might
associate with that task, nor every example producible by the generator.
Moreover, it does not imply that every example that works with the veri-
fier can necessarily be produced by the generator, nor does it ensure that
the generated examples are solvable by humans. Still, providing verifiers
should add a lot of credibility to the data generation process.
While for most generators, all or a vast majority of producible ex-
amples are valid, there are some with relatively low sample-efficiency in
terms of the percentage of generated examples that are valid. Generation
of new verified examples for a given task may also be slow for various
other reasons, such as a slow generation process itself, a small sample
space resulting in more frequent duplicate examples, or slow verification.
Still, approximately 1’000 verified unique examples per second can be
generated for the median task.

References
[1] François Chollet. On the Measure of Intelligence. 2019. url:
[Link]
[2] François Chollet. The Abstraction and Reasoning Corpus. 2019.
url: [Link]
[3] Michael Hodel. Domain Specific Language for the Abstraction
and Reasoning Corpus. 2023. url: [Link]

AI Task Analysis for ARC Challenges
No ratings yet
AI Task Analysis for ARC Challenges
4 pages
Object-Centric Models for ARC Tasks
No ratings yet
Object-Centric Models for ARC Tasks
20 pages
Arc DSL Writeup
No ratings yet
Arc DSL Writeup
24 pages
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
No ratings yet
Towards Efficient Neurally-Guided Program Induction For ARC-AGI
17 pages
Neural Algorithmic Reasoning Explained
No ratings yet
Neural Algorithmic Reasoning Explained
7 pages
Generative Code Models for Devs
No ratings yet
Generative Code Models for Devs
23 pages
Abductive Symbolic Solver On Abstraction and Reasoning Corpus
No ratings yet
Abductive Symbolic Solver On Abstraction and Reasoning Corpus
15 pages
ARC Prize 2025 Paper Submission
No ratings yet
ARC Prize 2025 Paper Submission
13 pages
From Skills To Symbols: Learning Symbolic Representations For Abstract High-Level Planning
No ratings yet
From Skills To Symbols: Learning Symbolic Representations For Abstract High-Level Planning
75 pages
VLSI Design Automation Suggestions
No ratings yet
VLSI Design Automation Suggestions
17 pages
ARC-NCA - Towards Developmental Solutions To The Abstraction and Reasoning Corpus
No ratings yet
ARC-NCA - Towards Developmental Solutions To The Abstraction and Reasoning Corpus
11 pages
N - G D S R - T P S E: Eural Uided Eductive Earch For EAL IME Rogram Ynthesis From Xamples
No ratings yet
N - G D S R - T P S E: Eural Uided Eductive Earch For EAL IME Rogram Ynthesis From Xamples
18 pages
Intelligent Systems Unit 1
No ratings yet
Intelligent Systems Unit 1
13 pages
AI Problem Representation Optimization
No ratings yet
AI Problem Representation Optimization
48 pages
GAN Project Report
No ratings yet
GAN Project Report
13 pages
Game-Based Learning in Engineering
No ratings yet
Game-Based Learning in Engineering
13 pages
Adversarial Reinforcement Learning For Procedural Content Generation
No ratings yet
Adversarial Reinforcement Learning For Procedural Content Generation
8 pages
4
No ratings yet
4
6 pages
Programming Puzzles: Tal Schuster Ashwin Kalyan Oleksandr Polozov Adam Tauman Kalai
No ratings yet
Programming Puzzles: Tal Schuster Ashwin Kalyan Oleksandr Polozov Adam Tauman Kalai
40 pages
8-Puzzle Problem and Search Strategies
No ratings yet
8-Puzzle Problem and Search Strategies
25 pages
Design Guidelines For Prompt Engineering
100% (1)
Design Guidelines For Prompt Engineering
23 pages
Learning Knowledge Graph-Based World Models of Textual Environments 2021
No ratings yet
Learning Knowledge Graph-Based World Models of Textual Environments 2021
12 pages
CS4700-Search1 v9
No ratings yet
CS4700-Search1 v9
68 pages
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
No ratings yet
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
9 pages
Codeit: Self-Improving Language Models With Prioritized Hindsight Replay
No ratings yet
Codeit: Self-Improving Language Models With Prioritized Hindsight Replay
22 pages
Large Language Models As General Pattern Machines
No ratings yet
Large Language Models As General Pattern Machines
21 pages
Introduction To Generative AI
No ratings yet
Introduction To Generative AI
29 pages
2023 CVPR 单样本自然场景生成 Patch-based 3D Natural Scene Generation From a Single Example
No ratings yet
2023 CVPR 单样本自然场景生成 Patch-based 3D Natural Scene Generation From a Single Example
23 pages
Transforming: Image Generation From Scene Graphs
No ratings yet
Transforming: Image Generation From Scene Graphs
7 pages
Photo OCR Pipeline Case Study
No ratings yet
Photo OCR Pipeline Case Study
6 pages
Code Generation Tools (Almost) For Free? A Study of Few-Shot, Pre-Trained Language Models On Code
No ratings yet
Code Generation Tools (Almost) For Free? A Study of Few-Shot, Pre-Trained Language Models On Code
12 pages
Automatic Representation Changes in Problem Solving - Eugene Fink
No ratings yet
Automatic Representation Changes in Problem Solving - Eugene Fink
496 pages
N - S P S: Euro Ymbolic Rogram Ynthesis
No ratings yet
N - S P S: Euro Ymbolic Rogram Ynthesis
14 pages
Arc Agi 2
No ratings yet
Arc Agi 2
13 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
DAA Assignment - 17-April-2020: Q1. With Help of Venn Diagram Explain Commonly Believed Relationship Between P and NP?
No ratings yet
DAA Assignment - 17-April-2020: Q1. With Help of Venn Diagram Explain Commonly Believed Relationship Between P and NP?
11 pages
AI Problem Solving and Search Techniques
No ratings yet
AI Problem Solving and Search Techniques
70 pages
Chollet Neurips
No ratings yet
Chollet Neurips
42 pages
2408.08435 - (Perguntar Se Serve para Design)
No ratings yet
2408.08435 - (Perguntar Se Serve para Design)
34 pages
BCA 6th Sem AI Practical Experiments
No ratings yet
BCA 6th Sem AI Practical Experiments
19 pages
Edexcel GCSE (9-1) Computer Science Student Book PDF
0% (1)
Edexcel GCSE (9-1) Computer Science Student Book PDF
6 pages
Understanding Artificial Intelligence Concepts
No ratings yet
Understanding Artificial Intelligence Concepts
103 pages
When Creating A Narrow AI - Hierarchy and Nonlocality of Neural Network Skills
No ratings yet
When Creating A Narrow AI - Hierarchy and Nonlocality of Neural Network Skills
19 pages
DSA Crack Watermark
No ratings yet
DSA Crack Watermark
14 pages
SAT SMT by Example
No ratings yet
SAT SMT by Example
585 pages
Arc Prize 2024 Technical Report
No ratings yet
Arc Prize 2024 Technical Report
12 pages
Nuro Symbolic AI 1706972510
No ratings yet
Nuro Symbolic AI 1706972510
38 pages
Unit Ii
No ratings yet
Unit Ii
24 pages
Course Overview: Representation & Reasoning
No ratings yet
Course Overview: Representation & Reasoning
33 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
G I C A: Enerating Mages From Aptions With Ttention
No ratings yet
G I C A: Enerating Mages From Aptions With Ttention
12 pages
1 s2.0 S1877750315300119 Main
No ratings yet
1 s2.0 S1877750315300119 Main
12 pages
Solucionario - Ciic2012 - Ingles
No ratings yet
Solucionario - Ciic2012 - Ingles
10 pages
Raven's Progressive Matrices
No ratings yet
Raven's Progressive Matrices
7 pages
Similarity-Based Heterogeneous Neural Networks: Llu Is A. Belanche Mu Noz Julio Jos e Vald Es Ramos
No ratings yet
Similarity-Based Heterogeneous Neural Networks: Llu Is A. Belanche Mu Noz Julio Jos e Vald Es Ramos
14 pages
2023 Toward The Third Generation Artificial Intelligenc
No ratings yet
2023 Toward The Third Generation Artificial Intelligenc
19 pages
Manual de Gen 220kva MQ
No ratings yet
Manual de Gen 220kva MQ
114 pages
ML Lab Experiments
No ratings yet
ML Lab Experiments
116 pages
Test Case Point Summary Report
100% (1)
Test Case Point Summary Report
42 pages
Chandeliers Price List
No ratings yet
Chandeliers Price List
56 pages
Solar Roadways
No ratings yet
Solar Roadways
16 pages
T2722009I3PV1 : Module Examination 2020 Core Engineering B
No ratings yet
T2722009I3PV1 : Module Examination 2020 Core Engineering B
10 pages
Inset Action Plan
No ratings yet
Inset Action Plan
3 pages
Icar-Indian Institute of Pulses Research Kalyanpur, Kanpur - 208 024 (An ISO 9001:2008 Certified Institute)
No ratings yet
Icar-Indian Institute of Pulses Research Kalyanpur, Kanpur - 208 024 (An ISO 9001:2008 Certified Institute)
4 pages
French Coursework Ma Ville
100% (2)
French Coursework Ma Ville
8 pages
Malicious Wedding
No ratings yet
Malicious Wedding
298 pages
Sri Jayalakshmi Transport - CH1
No ratings yet
Sri Jayalakshmi Transport - CH1
29 pages
Lab 02
No ratings yet
Lab 02
5 pages
Hoffman Pro JR
No ratings yet
Hoffman Pro JR
5 pages
Permutations and Combinations Worksheet
No ratings yet
Permutations and Combinations Worksheet
3 pages
Characteristics of a Brilliant Teacher
No ratings yet
Characteristics of a Brilliant Teacher
3 pages
Ooad Lab
No ratings yet
Ooad Lab
221 pages
(Education 3-13 2000-Jun Vol. 28 Iss. 2
No ratings yet
(Education 3-13 2000-Jun Vol. 28 Iss. 2
5 pages
9th Biology Test Series 360 MCQs Full Book 2018
No ratings yet
9th Biology Test Series 360 MCQs Full Book 2018
23 pages
Basic Probability Assignment Guide
No ratings yet
Basic Probability Assignment Guide
2 pages
PDF Course Sbi Clerk-Po Pre-Quant
No ratings yet
PDF Course Sbi Clerk-Po Pre-Quant
128 pages
10+fare Kyoto FULL
No ratings yet
10+fare Kyoto FULL
103 pages
Hilo g5-6 The Secret of Silk
No ratings yet
Hilo g5-6 The Secret of Silk
2 pages
Pva, Peg, CMC
No ratings yet
Pva, Peg, CMC
12 pages
Networking in English Language Teaching
No ratings yet
Networking in English Language Teaching
2 pages
Oracle v. Department of Labor Memo in Support of Motion To Dismiss
No ratings yet
Oracle v. Department of Labor Memo in Support of Motion To Dismiss
15 pages
7 Steps For SAP Fixed Assets Migration in SAP - SAP Expert
100% (1)
7 Steps For SAP Fixed Assets Migration in SAP - SAP Expert
13 pages
Manual - Shiftconnector API-v73
No ratings yet
Manual - Shiftconnector API-v73
95 pages
Tutorial 1 - Signals (1) (Exercises)
No ratings yet
Tutorial 1 - Signals (1) (Exercises)
2 pages
The Concept of Corporate Strategy
100% (2)
The Concept of Corporate Strategy
15 pages
The Maltese Education System
No ratings yet
The Maltese Education System
4 pages

Addressing The Abstraction and Reasoning Corpus Via Procedural Example Generation

Uploaded by

Addressing The Abstraction and Reasoning Corpus Via Procedural Example Generation

Uploaded by

Addressing the Abstraction and

Reasoning Corpus via Procedural

Institute of Neuroinformatics, University of Zurich and ETH Zurich

You might also like