Graph Text
Graph Text
A BSTRACT
arXiv:2310.01089v1 [[Link]] 2 Oct 2023
Large Language Models (LLMs) have gained the ability to assimilate human
knowledge and facilitate natural language interactions with both humans and other
LLMs. However, despite their impressive achievements, LLMs have not made
significant advancements in the realm of graph machine learning. This limitation
arises because graphs encapsulate distinct relational data, making it challenging
to transform them into natural language that LLMs understand. In this paper,
we bridge this gap with a novel framework, G RAPH T EXT, that translates graphs
to natural language. G RAPH T EXT derives a graph-syntax tree for each graph that
encapsulates both the node attributes and inter-node relationships. Traversal of the
tree yields a graph text sequence, which is then processed by an LLM to treat graph
tasks as text generation tasks. Notably, G RAPH T EXT offers multiple advantages.
It introduces training-free graph reasoning: even without training on graph data,
G RAPH T EXT with ChatGPT can achieve on par with, or even surpassing, the per-
formance of supervised-trained graph neural networks through in-context learning
(ICL). Furthermore, G RAPH T EXT paves the way for interactive graph reasoning,
allowing both humans and LLMs to communicate with the model seamlessly us-
ing natural language. These capabilities underscore the vast, yet-to-be-explored
potential of LLMs in the domain of graph machine learning.
1 I NTRODUCTION
Language stands as a cornerstone of human civilization, acting as the primary medium for knowl-
edge encoding, reasoning, and communication. Large language models (LLMs), pre-trained on
extensive text corpora, have showcased remarkable reasoning skills (Brown et al., 2020; Bubeck
et al., 2023). These LLMs can communicate via natural language both internally (Wei et al., 2022)
and externally with humans or other LLMs (Li et al., 2023), demonstrating exceptional skills such
as multi-step reasoning (Yao et al., 2023a), decision-making (Yao et al., 2023b; Liang et al., 2023),
tool use (Schick et al., 2023), and multi-agent collaboration (Park et al., 2023; Hong et al., 2023).
Motivation. Despite the remarkable success of LLMs in handling natural languages, their appli-
cation to other data modalities presents unique challenges, primarily because these data often lack
straightforward transformation into sequential text. These challenges are especially severe when
dealing with graph-structured data, as different graphs define structure and features in distinct ways.
Therefore, existing efforts within the graph machine learning field commonly require the training of
specific graph neural networks (GNNs) tailored to individual graphs (Kipf & Welling, 2017; Velick-
ovic et al., 2018; Xu et al., 2019). Often, models trained on one graph cannot generalize to the
unseen structure and feature representations of other graphs. Moreover, the gap between graphs and
human languages hinders the application of natural language reasoning to facilitate graph reasoning.
In light of these limitations, a question arises: can we derive a language for graph in natural lan-
guage? In this paper, we give an affirmative answer by proposing to use tree as an intermediary,
elegantly bridging structured data and one-dimensional sequential language. Essentially, a tree ex-
hibits a hierarchical structure, and traversing it yields a one-dimensional sequence. On top of that,
as shown in Figure 1 (c), we propose a novel framework G RAPH T EXT, which takes graph data to
build a graph-syntax tree. Traversing it results in a graph prompt expressed in natural language,
allowing an LLM to approach graph reasoning as a text-generation task.
1
Preprint
(a) Graph Learning in Graph-specific Space (c) GraphText # Task prompt and demos
<latexit sha1_base64="BFWAbey9nugmRF7lytJgjy32Ats=">AAACFnicbVC7TsMwFHXKq5RXgQ0WiwqpA1RJoS1jJRbGItEHSqLIcZ3WqvOQ7SBVUb6BgYVfYWEoQqyIjQ9hx007QOFIlo/Ouff6+rgRo0Lq+qeWW1peWV3Lrxc2Nre2d4q7ex0RxhyTNg5ZyHsuEoTRgLQllYz0Ik6Q7zLSdUeXU797R7igYXAjxxGxfTQIqEcxkkpyiqeJlQ0x+cC1E72in0C9cmY06tO73jivpdYQyeQ2dRIjTZ1iSZVkgH+JMSel5oFV/prcWy2n+GH1Qxz7JJCYISFMQ4+knSAuKWYkLVixIBHCIzQgpqIB8omwk2yjFB4rpQ+9kKsTSJipPzsS5Asx9l1V6SM5FIveVPzPM2PpXdgJDaJYkgDPHvJiBmUIpxnBPuUESzZWBGFO1a4QDxFHWKokCyoEY/HLf0mnWjHqldq1SqMKZsiDQ3AEysAADdAEV6AF2gCDB/AEJuBFe9SetVftbVaa0+Y9++AXtPdvtgmgPA==</latexit>
G1 Ŷ1 G-Prompt
[0, 0.9, 0, 0.1]
<latexit sha1_base64="tY4TIMFwOvuB5W62x+uJTiMvfks=">AAACFXicbVDLSgMxFM3UV62vqjvdBIvQRSkz1bYuC25cVrAP6Awlk6ZtaOZBckcow3yD4MZfcePCIm4Fd36Ie9PHQlsPhBzOuffm5rih4ApM88tIra1vbG6ltzM7u3v7B9nDo6YKIklZgwYikG2XKCa4zxrAQbB2KBnxXMFa7uh66rfumVQ88O9gHDLHIwOf9zkloKVuthDbsyEdOXCd2CyaBWwWL6xqZXpXqpflxIYhA9KNrSTpZnO6Yga8SqwFydVO7Pz35MGud7Ofdi+gkcd8oIIo1bHMEJyYSOBUsCRjR4qFhI7IgHU09YnHlBPPFkrwuVZ6uB9IfXzAM/V3R0w8pcaeqys9AkO17E3F/7xOBP0rJ+Z+GAHz6fyhfiQwBHgaEe5xySiIsSaESq53xXRIJKGgg8zoEKzlL6+SZqloVYrlW51GCc2RRqfoDOWRhaqohm5QHTUQRY/oGb2iifFkvBhvxvu8NGUseo7RHxgfP8yyn7o=</latexit>
<latexit sha1_base64="KX67P2OnIvD7UGmnpx3SRMTgc7w=">AAACEHicbVDLSgMxFM34rPU16k43wSJ2IWWm2tZlwYUuK9gHdIYhk6ZtaOZBkhHKMJ8ggr/ixoUiunTpzg9xb2bahbYeCDmcc+/NzXFDRoU0jC9tYXFpeWU1t5Zf39jc2tZ3dlsiiDgmTRywgHdcJAijPmlKKhnphJwgz2Wk7Y4uUr99S7iggX8jxyGxPTTwaZ9iJJXk6MexlQ3p8oFrx0bJOIFG6dSsVdO7WjurJJdOXE4SRy8oMwOcJ+aUFOr7VvH7/d5qOPqn1Qtw5BFfYoaE6JpGKO0YcUkxI0neigQJER6hAekq6iOPCDvOdkngkVJ6sB9wdXwJM/V3R4w8Icaeqyo9JIdi1kvF/7xuJPvndkz9MJLEx5OH+hGDMoBpOrBHOcGSjRVBmFO1K8RDxBGWKsO8CsGc/fI8aZVLZrVUuVZplMEEOXAADkERmKAG6uAKNEATYHAHHsEzeNEetCftVXublC5o05498Afaxw+KNp1e</latexit>
G2 <latexit sha1_base64="o5OS+s2fsxKDhI22CBVYW1Ji4ZI=">AAACFXicbVDLSgMxFM3UV62vqjvdBIvQRSkz1bYuC25cVrAP6Awlk6ZtaOZBckcow3yD4MZfcePCIm4Fd36Ie9PHQlsPhBzOuffm5rih4ApM88tIra1vbG6ltzM7u3v7B9nDo6YKIklZgwYikG2XKCa4zxrAQbB2KBnxXMFa7uh66rfumVQ88O9gHDLHIwOf9zkloKVuthDbsyEdOXCd2CyaBWwWL6xqZXpXqpflxIYhA9KNS0nSzeZ0xQx4lVgLkqud2PnvyYNd72Y/7V5AI4/5QAVRqmOZITgxkcCpYEnGjhQLCR2RAeto6hOPKSeeLZTgc630cD+Q+viAZ+rvjph4So09V1d6BIZq2ZuK/3mdCPpXTsz9MALm0/lD/UhgCPA0ItzjklEQY00IlVzviumQSEJBB5nRIVjLX14lzVLRqhTLtzqNEpojjU7RGcojC1VRDd2gOmogih7RM3pFE+PJeDHejPd5acpY9ByjPzA+fgDOOJ+7</latexit>
Ŷ2 feature:
GNN ✓2 [0.2, 0.8] …
1 hop 2 hop center 1 hop 2 hop center-node: [0]
1st-hop: [1, 2]
2nd-hop: [3, 2]
(b) Graph Learning in Shared Text Space A B 0 1 2 3 2
Question: What’s the
Text Reasoning category of the node
Pre-training Corpus Graph 0
<latexit sha1_base64="ACgtyBB+tdBy+AjXKcpQVzZIuvg=">AAACGHicbVDLTgIxFO0gKuILdamLRmKCicEZogg7EjcuMeGVDOOkUwo0dB5pOyZkMn/g1o2/4saFxrhl5xe48h/sAAsFT9Lm5Jx723uPEzAqpK5/aqmV9OraemYju7m1vbOb29tvCT/kmDSxz3zecZAgjHqkKalkpBNwglyHkbYzuk789j3hgvpeQ44DYrlo4NE+xUgqyc6dR93pIyYfOFakF8vVC+MM6kU9uQy9Wokbd1HBOI3tiHpxnLVzeWVOAZeJMSf52pH7XXj4Stft3KTb83HoEk9ihoQwDT2QVoS4pJiRONsNBQkQHqEBMRX1kEuEFU1niuGJUnqw73N1PAmn6u+OCLlCjF1HVbpIDsWil4j/eWYo+xVLbRSEknh49lE/ZFD6MEkJ9ignWLKxIghzqmaFeIg4wlJlmYRgLK68TFqlolEuXt6qNEpghgw4BMegAAxwBWrgBtRBE2DwCJ7BK3jTnrQX7V37mJWmtHnPAfgDbfIDyXOgFg==</latexit>
<latexit sha1_base64="8/BasKgVa2Tx0uXors24hM6BlT4=">AAACGXicbVDLSgMxFM3UV62vUXe6CRahgpSZorXdFdy4rNAXdMaSSdM2NPMgyQhlmG9w58ZfcaOgiEtd+SHuzUy70NYDCYdz7r25OU7AqJCG8aVllpZXVtey67mNza3tHX13ryX8kGPSxD7zecdBgjDqkaakkpFOwAlyHUbazvgy8du3hAvqew05CYjtoqFHBxQjqaSebkRWOqTLh44dGcVy9cw8hUbRSC7TqFbixk1UME/iXuSHMo5zPT2v3BRwkZgzkq8dWIXv5zur3tM/rL6PQ5d4EjMkRNc0AmlHiEuKGYlzVihIgPAYDUlXUQ+5RNhRulQMj5XShwOfq+NJmKq/OyLkCjFxHVXpIjkS814i/ud1Qzmo2BH1glASD08fGoQMSh8mMcE+5QRLNlEEYU7VrhCPEEdYqjCTEMz5Ly+SVqlolovn1yqNEpgiCw7BESgAE1yAGrgCddAEGNyDR/ACXrUH7Ul7096npRlt1rMP/kD7/AH5jqDM</latexit>
and Prediction 0 Tree Construct (choose from [A, B])?
(1) (1)
Tin <latexit sha1_base64="N8JVcUgJTvGWKu3nxbFuPTCSVfo=">AAACEHicbVDLSgMxFM3UV62vqjvdBIvYhQwzRWu7K7hxWcE+oDOUTJppQzOZIckIZegniOCvuHGhiC5duvND3Jtpu9DWAwmHc+69uTlexKhUlvVlZJaWV1bXsuu5jc2t7Z387l5ThrHApIFDFoq2hyRhlJOGooqRdiQICjxGWt7wMvVbt0RIGvIbNYqIG6A+pz7FSGmpmz9JnMmQjuh7bmKZ5eqZfQot00ov26pWxk40oONcN1/Q4gRwkdgzUqgdOMXv93un3s1/Or0QxwHhCjMkZce2IuUmSCiKGRnnnFiSCOEh6pOOphwFRLrJZJcxPNZKD/qh0IcrOFF/dyQokHIUeLoyQGog571U/M/rxMqvuAnlUawIx9OH/JhBFcI0HdijgmDFRpogLKjeFeIBEggrnWEagj3/5UXSLJl22Ty/1mmUwBRZcAiOQBHY4ALUwBWogwbA4A48gmfwYjwYT8ar8TYtzRiznn3wB8bHD0wlnTQ=</latexit>
Tout
2
1 2
<latexit sha1_base64="2hGVPVMG6IQqLPHH2nOl6pZaY3c=">AAACGHicbVDLTgIxFO0gKuILdamLRmKCicEZogg7EjcuMeGVDOOkUwo0dB5pOyZkMn/g1o2/4saFxrhl5xe48h/sAAsFT9Lm5Jx723uPEzAqpK5/aqmV9OraemYju7m1vbOb29tvCT/kmDSxz3zecZAgjHqkKalkpBNwglyHkbYzuk789j3hgvpeQ44DYrlo4NE+xUgqyc6dR93pIyYfOFakF8vVC+MM6kU9uQy9Wokbd1GhdBrbEfXiOGvn8sqcAi4TY07ytSP3u/Dwla7buUm35+PQJZ7EDAlhGnogrQhxSTEjcbYbChIgPEIDYirqIZcIK5rOFMMTpfRg3+fqeBJO1d8dEXKFGLuOqnSRHIpFLxH/88xQ9iuW2igIJfHw7KN+yKD0YZIS7FFOsGRjRRDmVM0K8RBxhKXKMgnBWFx5mbRKRaNcvLxVaZTADBlwCI5BARjgCtTADaiDJsDgETyDV/CmPWkv2rv2MStNafOeA/AH2uQHywCgFw==</latexit>
1 A
LLM
<latexit sha1_base64="RnYkwWQ3dbWTz8jOdsVs/MZE/sc=">AAACGXicbVDLSgMxFM3UV62vUXe6CRahgpSZorXdFdy4rNAXdMaSSdM2NPMgyQhlmG9w58ZfcaOgiEtd+SHuzUy70NYDCYdz7r25OU7AqJCG8aVllpZXVtey67mNza3tHX13ryX8kGPSxD7zecdBgjDqkaakkpFOwAlyHUbazvgy8du3hAvqew05CYjtoqFHBxQjqaSebkRWOqTLh44dGcVy9cw8hUbRSC7TqFbixk1UKJ3EvcgPZRznenpeuSngIjFnJF87sArfz3dWvad/WH0fhy7xJGZIiK5pBNKOEJcUMxLnrFCQAOExGpKuoh5yibCjdKkYHiulDwc+V8eTMFV/d0TIFWLiOqrSRXIk5r1E/M/rhnJQsSPqBaEkHp4+NAgZlD5MYoJ9ygmWbKIIwpyqXSEeIY6wVGEmIZjzX14krVLRLBfPr1UaJTBFFhyCI1AAJrgANXAF6qAJMLgHj+AFvGoP2pP2pr1PSzParGcf/IH2+QP7HKDN</latexit>
Figure 1: Comparison between (a) the GNN framework and (b) the proposed G RAPH T EXT frame-
work. For different graphs G1 and G2 , different GNNs θ1 θ2 are trained to make a graph-specific
output prediction in continuous form. In contrast, G RAPH T EXT encodes the graph information to
(1) (2) (1) (2)
text sequences Tin and Tin , and generates text reasoning and prediction Tout and Tout with a
graph-shared LLM ϕ. G RAPH T EXT leverages a pre-trained LLM to perform training-free graph
reasoning and enables human and AI interaction for graph reasoning in natural language. (c) An ex-
ample of the G RAPH T EXT framework that classifies node 0: Given a graph, G RAPH T EXT constructs
a graph-syntax tree that contains both node attributes (e.g. feature and label) and relationships (e.g.
center-node, 1st-hop, and 2nd-hop). Then, G RAPH T EXT traverses the graph-syntax tree to obtain a
sequential text, i.e. graph prompt, and let LLM perform graph reasoning in text space.
Main contributions. First, G RAPH T EXT serves as a flexible and general framework for graph rea-
soning. It can incorporate common inductive bias of GNNs, such as feature propagation and feature
similarity-based propagation, simply by constructing different graph-syntax trees. It also serves as a
general framework for graph reasoning for both in-context learning and instruction tuning, on both
general graphs and text-attributed graphs. Second, we show that G RAPH T EXT enables the possibil-
ity of training-free graph reasoning. The training-free property enables us to deploy G RAPH T EXT
not only with open-source LLMs, but also with powerful closed-source LLMs. Remarkably, even
without training on graph data, G RAPH T EXT with ChatGPT can deliver performance on par with,
or even surpassing, supervised graph neural networks through in-context learning. This highlights
the vast potential of foundation models in the realm of graph machine learning. Third, G RAPH T EXT
fosters interactive graph reasoning: With its capacity to generate and explain predictions in nat-
ural language, humans can directly engage with G RAPH T EXT. As shown in Figure 2 (b), through
interactions with humans and other LLMs, G RAPH T EXT refines its graph reasoning capabilities.
2 M ETHODOLOGY
In this section, we introduce G RAPH T EXT to perform graph reasoning in text space. Out of the three
fundamental problems of graph ML (graph classification, node classification, and link prediction),
we take node classification as an example to introduce our idea. We however note that our discussion
applies to other graph tasks.
Let us be given an attributed graph G = (V, E, X) with nodes V and edges E, whose structure is
represented as the |V | × |V | adjacency matrix A and node attributes as the |V | ×d feature matrix X.
Given a subset L ⊂ V of labeled nodes with labels YL , the goal of node classification is to predict
the labels YU of the unlabeled nodes U = V \ L. Graph Neural Networks (GNNs) are the standard
architecture for such problems. As shown in Figure 1 (a), a GNN directly learns a parametric map
ŷi = fGNN (G; θG )i (1)
between the input graph G ∈ G and the output labels Ŷ ∈ Y, assigning to each node i its predicted
label ŷi . The training of GNN attempts to find parameters θG such that ŷi ≈ yi on the training set.
2
Preprint
pseudo labels: pseudo labels: Classify the papers’ topics. Updated Graph Prior
center-node: center-node: Here are a few examples: The first PPR labels
[F] [B] appears to be the best
prediction.
ppr: ppr: Classify the paper below:
[A, A, A, A] [B, B, B, B] Counting PPR labels is
topic: topic: more robust than using
A B Previous Graph Prior center-node labels
Center-nodes are vital. Confusion
pseudo labels: pseudo labels: Hence, the answer is G. The answer is either A
center-node: center-node: PPR labels are ranked by or G. I can’t predict a
[G] [G] importance, with the first reasonable answer.
ppr: ppr: one being the most crucial. Refusion
[G, G, G, G] [A, G, A, E] Re-evaluate your prediction While PPR labels are
topic: topic: Ah, I see. Based on the first crucial, I still believe in
G ? (a) PPR label , the answer is A. (b) center-node labels. (c)
Figure 2: (a) Few-shot demonstrations (blue) and the target node #2188 (green) to predict on Cora.
(b) An illustration of how human interaction changes the graph reasoning of an LLM, where the
LLM previously has the prior that the center-node is vital. (c) Behaviors of LLMs after given
demonstrations/human interaction: update graph prior to bias more on PPR (personalized pager-
ank); leads to confusion or refusion. Details are discussed in Section 4.2.
Note that standard GNNs are graph-specific functions, i.e. fGNN (·; θG ) : G 7→ Ŷ , which do not
generalize to other graphs, since other graphs G′ ∈ G define distinct distributions of Y ′ , A′ , and
X ′ , or even different types of features such as continuous, categorical, or text features.
To solve the generalization problem mentioned above, this paper proposes to perform graph rea-
soning as a text-to-text problem (Raffel et al., 2020), as shown in Figure 1 (b). Inspired by prompt
tuning (Brown et al., 2020; Liu et al., 2023), we construct two graph-specific maps to form the input
and output space of a text-to-text problem: a map g : G 7→ Tin that maps the graph input to text
space, and a map h : Tout 7→ Ỹ that maps the output of LLM to label predictions Ỹ . In this way,
we can use a generative large language model fLLM to perform graph reasoning as
ỹi = h(fLLM (g(G)i ; ϕ)) (2)
where g(G)i = Tin [i] denotes the text sequence representing node i. Different from GNNs,
fLLM (·; ϕ) : T → T is a graph-shared function, where both input and output are in text space, i.e.
Tin , Tout ∈ T , which not only activates of parametric knowledge encoded in the model fLLM (·; ϕ),
but also enables interactions between human and AI agents to facilitate graph reasoning.
Specifically, as node classification, link prediction and graph classification are essentially classi-
fication tasks, we can naturally formulate these graph reasoning tasks as multi-choice QA prob-
lem (Robinson & Wingate, 2023) and design h as the map from predicted choice Tout ∈ T to the
corresponding prediction Ỹ . However, the design of g that maps the structural graph information
into the text space of natural language is still a non-trivial problem.
The primary challenge in converting graph data to language lies in handling its relational structure,
which fundamentally deviates from the one-dimensional sequential nature of text data. Inspired
by linguistic syntax trees (Chiswell & Hodges, 2007), we introduce graph-syntax trees as a bridge
between relational and sequential data. The traversal of such a tree produces a sentence in natural
language, which is fed to LLM for graph reasoning. Specifically, as shown in Figure 1 (c), we
compose a graph-syntax tree consisting of node text attributes and inter-node relationships. Next,
we describe how to compose the node text attributes and inter-node relationships in Section 2.2, and
how to build a graph-syntax tree in Section 2.3.
A graph syntax tree is composed of both textual and relational information derived from the graph.
For textual information, G RAPH T EXT constructs a text attribute set F ∈ T for an arbitrary graph
G ∈ G (with or without text-attributes) composed of multiple types of attributes for each node,
e.g. feature and label, in natural language. Specifically, for each node vi and feature type m, we
3
Preprint
We now describe how to build a graph prompt using a graph-syntax tree of the graph text attributes
and relationships F and R. By analogy to the syntax tree in linguistics, we define a graph-syntax tree
as an ordered tree: a directed acyclic graph (DAG) with nodes T̃ ∈ T and edges Ẽ. In a graph-syntax
tree, e.g. the one in Figure 1 (c), each node stores a text sequence in natural language, where the root
node is an empty node; the leaf nodes T̃L are text sequences in the graph text attributes, i.e. ∀ Ti ∈
T̃L , Ti ∈ F ; the internal nodes T̃I are text sequences in natural language, i.e. ∀ Ti ∈ T̃I , Ti ∈ T . A
graph syntax tree is constructed in three steps: (1) construct an ego-subgraph (Hamilton et al., 2017)
Gi for target node vi based on relationship R (2) select leaf nodes T̃L based on the relationship R.
(3) build up internal nodes T̃I and edges Ẽ based on the leaf nodes’ types and their relationship with
the graph1 . Notably, the leaf nodes are sorted according to their relationships with the center-node,
preserving the relative relationship in a one-dimensional order.
We illustrate this with a node classification example shown in Figure 1 (c). Before build-
ing the graph-syntax tree, G RAPH T EXT determines the text attributes composed of raw fea-
tures and observed labels, i.e. F = {FX [i], FY [i] | ∀vi ∈ V }, and a relationship set
composed of determined by shortest path distance (SPD): center-node, 1st-hop, and 2nd-hop,
i.e. R = {RSPD=0 , RSPD=1 , RSPD=2 }. Then, for target node vi (0 in the example), an ego-
subgraph (Hamilton et al., 2017) (with nodes [0,1,2,3,4]) is sampled based on the relative rela-
tionship between vi and other nodes. Finally, a graph-syntax tree is constructed with leaf nodes
T̃L = {FX [0], FX [1], FX [2]FX [3], FX [4], FY [1], FY [3]}, the internal nodes T̃I ={“center-node”, “1st-
hop”, “2nd-hop”, “label”, “feature”}, and the corresponding edges. The traversal of the resulting
graph-syntax tree leads to a text sequence in natural language.
Compared with the direct flattening of (sub)graphs (Wang et al., 2023a; Chen et al., 2023), using
a graph-syntax tree-based prompt has many advantages: Above all, unlike a graph, which has no
topology order, a syntax tree is a DAG that can be topologically sorted, which gracefully converts
a relational structure to a sequence of nodes. Moreover, G RAPH T EXT easily incorporates the in-
ductive biases of GNNs through the construction of node text attributes F and relationships R. For
example, we can easily encode the feature-propagation mechanism of GNNs by including a text
attribute derived from the propagated feature Ak X (Zhang et al., 2022), into the node attributes F .
We can also incorporate the feature similarity-based aggregation (Velickovic et al., 2018) by adding
XX ⊤ to R. These graph-based inductive biases can significantly boost LLMs’ graph reasoning
performance (further discussed in Section 4.1). Last but not least, a tree naturally defines a hierar-
chical structure, which LLMs are proficient in reasoning on (Liang et al., 2023), by training on code
data (Chen et al., 2021) and web page data (Touvron et al., 2023).
1
The hierarchy of the tree can be defined flexibly, but we have empirically discovered that a simple config-
uration, with attribute type at the top hierarchy and relation type at the bottom hierarchy for internal nodes, as
illustrated in Figure 1 (c), yields strong performance. Further details are available in Section 4.3.
4
Preprint
3 R ELATED W ORK
Unlock Graph Space for Language Models. Large Language Models (LLMs) (Brown et al.,
2020; OpenAI, 2023; Anil et al., 2023; Bubeck et al., 2023) possess impressive reasoning capabili-
ties (Wei et al., 2022; Yao et al., 2023a; Fu et al., 2023). At the heart of LLMs’ reasoning prowess is
their ability to process and generate natural language inputs and outputs, enabling flexible interac-
tions (Dohan et al., 2022) with both humans and AI agents. This unique capability empowers them
with remarkable abilities such as complex reasoning (Fu et al., 2023) and decision-making (Yao
et al., 2023b; Liang et al., 2023). Despite their success, applying LLMs to relational graph data
remains challenging, primarily due to the absence of a natural language representation for graphs.
G RAPH T EXT bridges this gap by providing a novel framework that enables LLMs to seamlessly
integrate and reason over relational graph data using the same natural language capabilities, thereby
unlocking their potential for a wide range of graph-based applications.
Training-free Graph Reasoning Graph neural networks (GNNs) (Kipf & Welling, 2017; Xu
et al., 2019) excel in handling relational graph data, thanks to the message-passing mechanism for
aggregation and transformation of neighborhood representations. Their standout performance can
be attributed to their intrinsic capability to assimilate graph inductive biases. This incorporation
of biases is achieved by designing representations with the graph structure in perspective, such as
position embeddings (Dwivedi et al., 2022; Ying et al., 2021; Kreuzer et al., 2021) and propagated
features (Wu et al., 2019; Zhang et al., 2022). Furthermore, they can introduce diverse aggregation
methods, like feature similarity-based message passing (Velickovic et al., 2018; Zhao et al., 2021)
or high-order aggregation (Klicpera et al., 2019; Bojchevski et al., 2020; Chien et al., 2021). How-
ever, as highlighted in Section 2.1, due to the variance in both structure and feature, the majority of
GNNs are graph-specific. They are tailored for a particular graph type with consistent features and
structures, thus posing challenges for generalization to different graphs.
In a parallel vein, G RAPH T EXT also taps into the potent ability to infuse graph inductive biases for
graph reasoning, achieved through designing both the textual and relational aspects of the graph-
syntax tree. Setting itself apart from GNNs, G RAPH T EXT approaches graph reasoning in a graph-
shared domain, facilitating the broader applicability of a single LLM to diverse graphs and offering
training-free and interactive graph reasoning.
Connecting Both Worlds. Recent endeavors (Chien et al., 2022; Zhao et al., 2023; He et al., 2023)
have aimed to merge the language and graph domains. Most methods involve transitioning the prob-
lem into a graph-specific realm, utilizing a combination of a text-encoder (either pre-trained (Chien
et al., 2022) or learned (Li et al., 2021)) and a GNN predictor. This methodology still falls into a
graph-specific paradigm. Very recently, there are concurrent works (Guo et al., 2023; Ye et al., 2023;
Wang et al., 2023a; Chen et al., 2023) exploring to leverage LLMs for graph-related tasks. These
methods either directly flatten the nodes and edges (Guo et al., 2023; Wang et al., 2023a) or employ
rule-based prompts on text-attributed graphs (Chen et al., 2023; Ye et al., 2023).
Nevertheless, G RAPH T EXT is fundamentally different from these works. Foremost, G RAPH T EXT
proposes a language defined by a graph-syntax tree, offering a flexible and structured approach for
seamlessly integrating graph inductive biases. Moreover, it also serves as a general framework for
graph reasoning, which can be applied to scenarios encompassing in-context learning and instruc-
tion tuning. It accommodates various types of graphs, including general graphs and text-attributed
graphs, and is adaptable to both closed-source Large Language Models (LLMs) (OpenAI, 2023;
Bubeck et al., 2023) and open-source LLMs (Touvron et al., 2023).
4 E XPERIMENTS
We conduct extensive experiments to demonstrate the effectiveness of G RAPH T EXT. Firstly, in Sec-
tion 4.1, we delve into the remarkable capacity of G RAPH T EXT for training-free graph reasoning.
Subsequently, Section 4.2 highlights the interactive graph reasoning capabilities of G RAPH T EXT.
We further analyze various ablations of graph-syntax trees in Section 4.3. Concluding our explo-
ration, Section 4.4 illustrates how G RAPH T EXT can seamlessly function as a versatile framework,
catering to both in-context learning and instruction tuning across on both general graph and text-
attributed graphs.
5
Preprint
Citeseer Texas
70
70
60 60
accuracy
accuracy
GCN GCN
50 GAT 50 GAT
GCNII GCNII
40 GATv2 40 GATv2
GraphText GraphText
1 3 5 10 15 20 1 3 5 10 15 20
shots shots
Figure 3: Few-shot in-context learning node classification accuracy. We perform 1, 3, 5, 10, 15, and
20-shot node classification on Citeseer and Texas datasets.
One unique ability of G RAPH T EXT is the training-free graph reasoning by in-context learning (ICL).
In this section, we demonstrate this capability on the node classification tasks. Specifically, we use
two citation datasets (Cora (McCallum et al., 2000) and Citeseer (Giles et al., 1998)), and three
webpage datasets (Texas, Wisconsin, and Cornell (Pei et al., 2020)). The detailed discussion of
experimental settings and the dataset statistics can be found in Appendices A.1 and A.2.
We selected standard GNNs, including GCN (Kipf & Welling, 2017) and GAT (Velickovic et al.,
2018), along with their more recent variants GCNII (Chen et al., 2020) and GATv2 (Brody et al.,
2022), as our baselines. These GNN baselines are supervised and specific to individual graphs,
trained solely for inference on one dataset. In contrast, G RAPH T EXT utilizes a single pre-trained
LLM (ChatGPT) for all datasets without any graph-specific training.
Given an input graph, G RAPH T EXT constructs a graph-syntax tree by incorporating two types of
information: text attributes and relations. We utilize three types of text attributes for the graph:
observed labels (referred to as ‘label’), features generated by K-means clustering of X (referred to
as ‘feat’), and synthetic text (referred to as ‘synth.’) derived from feature- and label- propagation.
Additionally, we employ two types of relations: the original graph structure (A with self-loops)
and synthetic relations based on feature similarity, shortest-path distance, and personalized pager-
ank (Klicpera et al., 2019). The hyperparameter details are provided in Appendix A.3.
Experimental results are depicted in Table 1 and Figure 3. We observe that directly flattening raw text
(labels) using relations, as proposed in (Chen et al., 2023), results in poor performance, occasionally
worse than random. Incorporating discretized features into text attributes improves the performance
slightly.
Integrating the graph inductive bias into both text attributes and relations enhances performance.
The addition of synthetic relationships significantly boosts performance across all datasets, indi-
cating that the raw graph lacks sufficient information and requires augmentation. This observa-
tion aligns with findings in graph structure learning literature (Franceschi et al., 2019; Zhao et al.,
2021). Furthermore, the inclusion of synthetic text attributes is beneficial in most cases. Ultimately,
6
Preprint
Table 2: Interactive graph reasoning results (accuracy %) on Cora (node # 2188). The table show-
cases the performance of GPT-4 and ChatGPT before and after human interactions with 15 times of
evaluation. The reasoning metrics include PPR, Center-node, and instances where the model was
Confused to respond or Refused (Conf./Ref.) to make their reasoning/prediction. See Figure 2 (c)
for details.
Reasoning
Model Interaction Accuracy
PPR Center-node Conf./Ref.
Before 73.3 73.3 26.7 0
GPT-4
After 100 (+26.7) 100 0 0
Before 26.7 26.7 53.3 20.0
ChatGPT
After 63.6 (+36.9) 72.7 18.2 9.1
the combination of synthetic text attributes and synthetic relations yields the highest accuracy for
G RAPH T EXT in four out of five datasets.
Remarkably, even though it is not trained on graph data, G RAPH T EXT surpasses several GNN base-
lines, particularly when the label rate is low (see Figure 3) and in heterophilic datasets. This is
due to the merits that, in contrast with standard GNNs, G RAPH T EXT decouples depth and scope in
graph reasoning (Zeng et al., 2021). The strong performance of G RAPH T EXT in training-free graph
reasoning highlights the substantial potential of leveraging LLMs in graph machine learning.
In this section, we illustrate that G RAPH T EXT facilitates effective interactive graph reasoning:
through its ability to generate and clarify predictions in natural language, both humans and LLMs
can directly interact with G RAPH T EXT.
To illustrate this concept, we will use Cora node #2188. Figure 2 (a) shows two types of text at-
tributes we use: the center-node pseudo labels and the PPR (Personalized PageRank) pseudo label
sequence, where the first PPR neighbor denotes the most important label prediction. Upon examin-
ing the demonstrations (marked in blue), it becomes apparent that the PPR pseudo-labels provide a
more robust mechanism for paper topic prediction. Utilizing either a count of PPR labels followed
by a majority vote, or merely referencing the foremost PPR label, consistently results in the correct
categorization in the given examples. Hence, based on these graph inductive biases derived from
samples, we can reasonably figure out correct topic of the target paper should be A, which not only
is the first entry, but also the predominant label in the PPR pseudo-label sequence.
We leverage G RAPH T EXT with ChatGPT and GPT-4 to perform graph reasoning on the provided
example. Their respective reasoning processes and outcomes are illustrated and summarized in
Figure 2 and Table 4.1 respectively, from which we draw several key insights:
1. LLMs inherently possess the knowledge and inductive bias toward graph reasoning. Specif-
ically, both ChatGPT and GPT-4 acknowledge the importance of center-nodes and sometimes make
predictions based on center-node labels. ChatGPT exhibits reasoning with the center-node bias
53.3% of the time, while GPT-4 does so at a rate of 26.7%.
2. LLMs can adjust their prior inductive bias based on demonstrations. Through in-context
learning, G RAPH T EXT can recalibrate their bias and make more accurate predictions. Our obser-
vations indicate that GPT-4 significantly outperforms ChatGPT, achieving an accuracy of 73.3%,
markedly superior to ChatGPT’s 26.7%.
3. LLMs can adapt their prior inductive bias based on human feedback. Figure 2 (b) provides
an illustrative example, with a detailed reasoning of LLM can be found in Appendix C. Specifically,
after human interaction, GPT-4 shows remarkable adaptability, achieving an impeccable accuracy of
100% and adhering to the PPR logic. Meanwhile, ChatGPT also enhances its performance notably
(gaining 36.9% in accuracy), but occasionally maintains its antecedent biases.
7
Preprint
(The full example can be found in Figure 1). (c-f) Text prompts of different tree designs.
A BSTRACT
Large language models (LLMs) have gained the ability to assimilate human
knowledge and facilitate natural language interactions with both humans and other
LLMs. However, despite their impressive achievements, LLMs have not made
Table 3: Ablations of G RAPH T EXT on Cora, Citeseer and Texas.
significant advancements in the realm of graph machine learning. This limitation
arises because graphs encapsulate distinct relational data, making it challenging
Cora Citeseer Texas
to transform them into natural language that LLMs understand. In this paper, we
Model bridge this gap with a novel framework, G RAPH T EXT, that translates graphs to
natural language. G RAPH T EXT derives a graph-syntax tree for each graph that
Acc. % ∆ Acc. % ∆ Acc. %
encapsulates both the node attributes and inter-node relationships. Traversal of
the tree yields a graph text sequence, which is then processed by an LLM by treat- ∆
ing graph tasks as text generation tasks. Notably, G RAPH T EXT offers multiple
In this section, we demonstrate that G RAPH T EXT is also applicable to text-attributed graphs. As
depicted in Table 4, we conducted training-free node classification on the Cora and Citeseer datasets
8
Preprint
Table 4: Node classification results (accuracy %) on real-world text attributed graphs. Experiments
are conducted using in-context learning with ChatGPT, as well as instruction tuning with Llama-2-
7B. Note that “text” refers to raw text attributes, while “feat” represents the continuous features on
the graph. The top results for each category are highlighted in bold.
Framework Model Cora Citeseer
GCN 89.13 74.92
GNNs
GAT 89.68 75.39
ChatGPT-text 67.77 68.98
ChatGPT-feat 10.68 16.14
ChatGPT-feat+text 65.19 66.46
G RAPH T EXT
Llama-2-7B-text 60.59 49.37
Llama-2-7B-feat 87.11 74.77
Llama-2-7B-feat+text 77.53 73.83
with both raw text attributes (Chen et al., 2023) and continuous features (Reimers & Gurevych,
2019). We observed that using closed-source LLMs, such as ChatGPT, the performance lags behind
the GNN baseline methods. Thus, we further explored the potential of instruction tuning on currently
available open-source LLMs, such as Llama-2 (Touvron et al., 2023). For natural language prompts
construction, we adopted an approach almost identical to the in-context learning setting. Further-
more, we expand the original vocabulary of Llama-2 by introducing selected options as new tokens
and then fine-tune the large language model by the widely-used and efficient Low-Rank Adaptation
(LoRA) (Hu et al., 2022).
From the results in Table 4, it is evident that even with a relatively smaller open-source model,
Llama-2-7B, our best results from instruction tuning across various settings surpass those of Chat-
GPT and approach the GNN baselines. This validates that our method can be beneficial in an
instruction-tuning scenario. It also implies that using G RAPH T EXT, we can feasibly fine-tune
smaller open-source LLMs with reasonable computational costs, achieving performances that can
rival or even surpass those of much larger closed-source models, such as ChatGPT or GPT-4.
Another intriguing observation is the notably poor performance of ChatGPT in settings incorpo-
rating continuous feature – nearing a random guess. This is attributable to the inherent limitation
of these closed-source LLMs: they are designed to process raw discrete text inputs and fail to di-
rectly handle the continuous inputs. In contrast, open-source LLMs possess the ability to map these
continuous embeddings into their embedding space, facilitating improved performance.
Upon contrasting these two groups of models, we noticed a decline in the performance of open-
source models when processing raw text inputs. This decline can be ascribed to the constraints
imposed by the size of the LLM parameters and the volume of pre-training corpora used. It suggests
that harnessing larger-scale open-source models, such as Llama-2 variants including 13B, 30B, and
70B, would significantly bolster their modeling capacity for raw text. Concurrently, by leveraging
the ability to process continuous embeddings, these models would inevitably exhibit enhanced graph
reasoning capabilities, paving the way for more sophisticated graph-based applications.
5 C ONCLUSION
In this paper, we propose G RAPH T EXT, a framework that enables graph reasoning in text space. It
easily incorporates the inductive bias of GNNs by constructing a graph-syntax tree. The traversal
of a graph-syntax tree leads to a graph prompt in natural language and is fed to LLM to perform
graph reasoning as text generation. G RAPH T EXT enables training-free graph reasoning where a
G RAPH T EXT-LLM can deliver performance on par with, or even surpassing, supervised graph neu-
ral networks through in-context learning. What’s more, G RAPH T EXT fosters explainable and inter-
active graph reasoning: G RAPH T EXT performs graph reasoning in natural language which enables
humans and LLMs to engage with graph learning using natural language. These abilities highlight
the immense and largely untapped potential of LLMs in the realm of graph machine learning.
9
Preprint
E THICS S TATEMENT
Graphs are prevalent in the real world. On the bright side, G RAPH T EXT alleviates the computa-
tional load and carbon footprint associated with training numerous non-transferable, graph-specific
models. However, while the training-free graph reasoning capability of G RAPH T EXT introduces
minimal costs, there’s potential for misuse in malicious recommendation systems and malware.
R EFERENCES
Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos,
Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark,
Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark
Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang,
Gustavo Hernández Ábrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan A. Botha, James
Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry,
Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa
Dehghani, Sunipa Dev, Jacob Devlin, Mark Dı́az, Nan Du, Ethan Dyer, Vladimir Feinberg, Fangx-
iaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez,
and et al. Palm 2 technical report. CoRR, 2023.
Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek
Rózemberczki, Michal Lukasik, and Stephan Günnemann. Scaling graph neural networks with
approximate pagerank. In KDD, 2020.
Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? In ICLR,
2022.
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are
few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Ka-
mar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi,
Marco Túlio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments
with GPT-4. CoRR, 2023.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared
Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri,
Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan,
Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian,
Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fo-
tios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex
Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders,
Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa,
Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob
McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating
large language models trained on code. CoRR, 2021.
Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph
convolutional networks. In ICML, 2020.
Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei
Yin, Wenqi Fan, Hui Liu, and Jiliang Tang. Exploring the potential of large language models
(llms) in learning on graphs. CoRR, 2023.
Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank
graph neural network. In ICLR, 2021.
Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, and
Inderjit S. Dhillon. Node feature extraction by self-supervised multi-scale neighborhood predic-
tion. In ICLR, 2022.
10
Preprint
Ian Chiswell and Wilfrid Hodges. Mathematical logic, volume 3 of Oxford texts in logic. Clarendon
Press, 2007. ISBN 978-0-19-921562-1.
David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes,
Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-Dickstein, Kevin Murphy, and
Charles Sutton. Language model cascades. CoRR, 2022.
Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson.
Graph neural networks with learnable structural and positional representations. In ICLR, 2022.
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He. Learning discrete structures
for graph neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings
of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long
Beach, California, USA, Proceedings of Machine Learning Research, 2019.
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. Complexity-based prompting
for multi-step reasoning. In ICLR, 2023.
C Lee Giles, Kurt D Bollacker, and Steve Lawrence. Citeseer: An automatic citation indexing
system. In Proceedings of the third ACM conference on Digital libraries, pp. 89–98, 1998.
Jiayan Guo, Lun Du, Hengyu Liu, Mengyu Zhou, Xinyi He, and Shi Han. Gpt4graph: Can large
language models understand graph structured data ? an empirical evaluation and benchmarking.
CoRR, abs/2305.15066, 2023. doi: 10.48550/arXiv.2305.15066. URL [Link]
10.48550/arXiv.2305.15066.
William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In NeurIPS, 2017.
Xiaoxin He, Xavier Bresson, Thomas Laurent, and Bryan Hooi. Explanations as features: Llm-
based features for text-attributed graphs. CoRR, abs/2305.19523, 2023. doi: 10.48550/arXiv.
2305.19523. URL [Link]
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang,
Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, and Chenglin Wu.
Metagpt: Meta programming for multi-agent collaborative framework, 2023.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang,
and Weizhu Chen. Lora: Low-rank adaptation of large language models. In ICLR, 2022.
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional net-
works. In ICLR, 2017.
Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate:
Graph neural networks meet personalized pagerank. In ICLR, 2019.
Devin Kreuzer, Dominique Beaini, William L. Hamilton, Vincent Létourneau, and Prudencio
Tossou. Rethinking graph transformers with spectral attention. In NeurIPS, 2021.
Chaozhuo Li, Bochen Pang, Yuming Liu, Hao Sun, Zheng Liu, Xing Xie, Tianqi Yang, Yanling
Cui, Liangjie Zhang, and Qi Zhang. Adsgnn: Behavior-graph augmented relevance modeling in
sponsored search. In Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and
Tetsuya Sakai (eds.), SIGIR, 2021.
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem.
CAMEL: communicative agents for ”mind” exploration of large scale language model society.
CoRR, 2023.
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and
Andy Zeng. Code as policies: Language model programs for embodied control. In ICRA, 2023.
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-
train, prompt, and predict: A systematic survey of prompting methods in natural language pro-
cessing. ACM Comput. Surv., 2023.
11
Preprint
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In 7th International
Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
[Link], 2019. URL [Link]
Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Automating the
construction of internet portals with machine learning. Information Retrieval, 3:127–163, 2000.
OpenAI. Introducing chatgpt. 2023. URL [Link]
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and
Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. CoRR, 2023.
Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. Geom-gcn: Geometric
graph convolutional networks. In ICLR, 2020.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi
Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text
transformer. J. Mach. Learn. Res., 2020.
Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. Deepspeed: System op-
timizations enable training deep learning models with over 100 billion parameters. In Ra-
jesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (eds.), KDD ’20: The 26th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, Au-
gust 23-27, 2020, pp. 3505–3506. ACM, 2020. doi: 10.1145/3394486.3406703. URL https:
//[Link]/10.1145/3394486.3406703.
Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-
networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language
Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-
IJCNLP), pp. 3982–3992, 2019.
Joshua Robinson and David Wingate. Leveraging large language models for multiple choice ques-
tion answering. In The Eleventh International Conference on Learning Representations, ICLR
2023, Kigali, Rwanda, May 1-5, 2023. [Link], 2023. URL [Link]
net/pdf?id=yKbprarjc5B.
Timo Schick, Jane Dwivedi-Yu, Roberto Dessı̀, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer,
Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to
use tools. CoRR, 2023.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Niko-
lay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher,
Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy
Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn,
Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel
Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee,
Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra,
Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi,
Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh
Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen
Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic,
Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models.
2023.
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua
Bengio. Graph attention networks. In ICLR, 2018.
Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, and Yulia Tsvetkov.
Can language models solve graph problems in natural language? CoRR, 2023a.
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong
Lu, Jie Zhou, Yu Qiao, and Jifeng Dai. Visionllm: Large language model is also an open-ended de-
coder for vision-centric tasks. CoRR, abs/2305.11175, 2023b. doi: 10.48550/arXiv.2305.11175.
URL [Link]
12
Preprint
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi,
Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language
models. In NeurIPS, 2022.
Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger.
Simplifying graph convolutional networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov
(eds.), ICML, 2019.
Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie
Jegelka. Representation learning on graphs with jumping knowledge networks. In ICML, 2018.
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural
networks? In ICLR, 2019.
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik
Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. CoRR,
2023a.
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao.
React: Synergizing reasoning and acting in language models. In ICLR, 2023b.
Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, and Yongfeng Zhang. Natural language
is all a graph needs. CoRR, abs/2308.07134, 2023. doi: 10.48550/arXiv.2308.07134. URL
[Link]
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and
Tie-Yan Liu. Do transformers really perform bad for graph representation? In NeurIPS, 2021.
Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kan-
nan, Viktor K. Prasanna, Long Jin, and Ren Chen. Decoupling the depth and scope of graph
neural networks. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang,
and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34:
Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December
6-14, 2021, virtual, pp. 19665–19679, 2021. URL [Link]
paper/2021/hash/[Link].
Wentao Zhang, Ziqi Yin, Zeang Sheng, Yang Li, Wen Ouyang, Xiaosen Li, Yangyu Tao, Zhi Yang,
and Bin Cui. Graph attention multi-layer perceptron. In KDD, 2022.
Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. Learning
on large-scale text-attributed graphs via variational inference. 2023.
Tong Zhao, Yozen Liu, Leonardo Neves, Oliver J. Woodford, Meng Jiang, and Neil Shah. Data
augmentation for graph neural networks. In AAAI, 2021.
13
Preprint
R EPRODUCIBILITY S TATEMENT
The code to reproduce our results will be available soon. Our experimental settings and implementa-
tion details are stated in Section A.1, and the important hyper-parameters are discussed in Appendix
A.3 section.
A E XPERIMENTAL S ETTINGS
A.2 DATASETS
In this section, we provide more relevant details about the datasets we used in experiments. The
dataset statistics are provided in Table 5. The datasets can be categorized into citation network
datasets (i.e. Cora, Citeseer, and ogbn-arxiv) and web-page networks (i.e. Cornell, Texas, and Wis-
consin), additionally, we use two text-attributed-graph (TAG) version of Cora and Citeseer, denoted
as Cora-TAG and Citeseer-TAG.
Citation graphs: Most GNN-related studies, as referenced in works like Kipf & Welling (2017);
Velickovic et al. (2018), often employ citation networks as benchmarking tools. Within these net-
works, nodes represent papers from the computer science domain. The features of these nodes are
derived from bag-of-word vectors of the respective paper titles. Edges depict the citation links be-
tween these papers, while labels indicate the paper’s specific categories. The text attributes are the
title and abstract of the paper.
WebKB graphs Pei et al. (2020): Sourced by Carnegie Mellon University, aggregates web pages
from computer science departments across several universities. We employ three specific subsets
from this collection: Cornell, Texas, and Wisconsin. In these subsets, each node symbolizes a
2
[Link]
14
Preprint
web page, while edges denote hyperlinks connecting them. Nodes are characterized by a bag-of-
words representation derived from their respective web pages. These pages have been meticulously
categorized into five distinct classes: student, project, course, staff, and faculty.
The datasets mentioned above can be found in the following URLs: Cora 3 , Citeseer 4 , Cora-TAG 5 ,
Citeseer-TAG 6 , Texas 7 , Cornell 8 , Wisconsin 9 .
A.3 H YPERPARAMETERS
In G RAPH T EXT, the selection of text attributes F and relations R are the most important parameters.
Here we discuss their choices and report the selected parameters in Table 6. For text attributes F ,
there are several choices: propagated features Ak X, propagated labels Ak YL , raw features X and
labels YL ; For relationship R, there are several choices: k-hop shortest-path distance, denoted as Sk ,
propagated feature similarity, denoted as sim(Ak X), and pagerank matrix Klicpera et al. (2019),
with restart probability α = 0.25, denoted as Π.
B P ROMPT E XAMPLES
B.1 F EW- SHOT I N - CONTEXT L EARNING
Example of Citeseer:
[Human]: You are a helpful assistant that classifies the topic of an academic paper
based on the labels of the cited papers. You are going to choose the correct answer from
several choices of paper categories: [A: Agents, B: Artificial Intelligence, C: Database, D:
Information Retrieval, E: Machine Learning, F: Human Computer Interaction]
Remaining examples . . .
3
[Link]
4
[Link]
5
[Link]
6
[Link]
7
[Link]
8
[Link]
9
[Link]
15
Preprint
What’s the topic of the paper given the information above? Valid choices are [A: Agents,
B: Artificial Intelligence, C: Database, D: Information Retrieval, E: Machine Learning, F:
Human computer interaction]. Remember, your answer should be in the form of the class
choice wrapped by <answer></answer>.
[Assistant]: <answer>C</answer>
Example of Cora:
[Human]: Your goal is to perform node classification. You are given the information of
each node in a xml format. Using the given information of a node, you need to classify
the node to several choices: [<c0>: Rule Learning, <c1>: Neural Networks, <c2>:
Case Based, <c3>: Genetic Algorithms, <c4>: Theory, <c5>: Reinforcement Learning,
<c6>: Probabilistic Methods]. Remember, your answer should be in the form of the class
label.
<information>
<feature>
<center node><x><x emb></x></center node>
<1st feature similarity graph><x><x emb></x></1st feature similarity graph>
</feature>
</information>
Note that the “<x emb>” is the text token embedding for feature feature “x” generated by the MLP
projector discussed in Appendix A.1.
Node # 30
Graph information:
pseudo labels:
center-node:[’D’]
second-hop neighbor:[’D’, ’E’, ’D’, ’D’, ’D’]
Target class: D
Node # 158
Graph information:
pseudo labels:
16
Preprint
center-node:[’A’]
second-hop neighbor:[’A’]
Target class: A
We can observe that for the best setting in the Texas datasets, with hyperparameters discussed in
Table 6, the center-node pseudo labels mostly assemble the second-hop neighbors. Consequently,
removing the text information, i.e. removing the internal nodes in the graph-syntax tree in Section
4.3 does not hurt the performance.
This also shows the advantage of decoupling depth and scope Zeng et al. (2021) in the graph-syntax
tree of G RAPH T EXT, which explains the performance gain of G RAPH T EXT over standard GNNs,
e.g. GCN and GAT. A similar observation is also drawn in Figure 8 (i) of (Chien et al., 2021), where
A2 serves as the most important high-order aggregation scheme for Texas dataset.
Below is the example of graph reasoning on Cora node #2188 in the setting of standard zero-shot
chain of thought reasoning Wei et al. (2022)10 The input prompt for “Cora node #2188” is as below:
Graph information:
pseudo labels:
center-node:[’B’]
ppr:[’B’, ’B’, ’B’, ’B’]
Topic of paper: B
Graph information:
pseudo labels:
center-node:[’C’]
ppr:[’C’, ’C’, ’C’, ’C’]
Topic of paper: C
Graph information:
10
ChatGPT and GPT-4 perform reasoning with explanation by default. Therefore, there is no necessity to
add the additional “Let’s think step-by-step” prompt for the text input.
17
Preprint
pseudo labels:
center-node:[’D’]
ppr:[’D’, ’D’, ’E’, ’D’]
Topic of paper: D
Graph information:
pseudo labels:
center-node:[’A’]
ppr:[’E’, ’E’, ’A’, ’A’]
Topic of paper: E
Graph information:
pseudo labels:
center-node:[’F’]
ppr:[’F’, ’F’, ’F’, ’F’]
Topic of paper: F
Graph information:
pseudo labels:
center-node:[’G’]
ppr:[’G’, ’G’, ’G’, ’G’]
Topic of paper: G
As discussed in Section 4.2, according to the demonstrations, the PPR pseudo-labels provide a more
robust mechanism for paper topic prediction. Utilizing either a count of PPR labels followed by
a majority vote, or merely referencing the foremost PPR label, consistently results in the correct
categorization in the given examples. Hence, based on these graph inductive biases derived from
samples, we can reasonably figure out correct topic of the target paper should be A, which not only
is the first entry, but also the predominant label in the PPR pseudo-label sequence. Now, let’s use
ChatGPT and GPT-4 to answer the problem. We performed 15 times of evaluation and here we show
one sample of correct and incorrect responses below:
18
Preprint
It seems that while the center node is ’G’, the cited papers are mostly of category ’A’ and
’E’.
From the pattern observed in the examples, the category that has the majority or at least
equal emphasis with other categories in the ’ppr’ nodes tends to be the category of the paper.
Therefore, in this case, ’A’ or ’Theory’ has the highest count in the ’ppr’ nodes.
Thus, the topic of the paper given the information above is: A: Theory.
[GPT-4]: Given the graph information: pseudo labels: center-node:[’G’] ppr:[’A’, ’G’, ’A’,
’E’]
The center node label is ’G’, which corresponds to ”Rule Learning”. The papers referenced
(ppr) are mixed between ”Theory”, ”Rule Learning”, and ”Probabilistic Method”. However,
since the center node holds the primary topic label, the academic paper’s topic is: G: Rule
Learning.
As highlighted in the preceding section, LLMs occasionally struggle with recalling prior knowledge
and remain anchored to their pre-existing beliefs. To address this, we engage with G RAPH T EXT by:
(1) providing a clear definition of PPR, and (2) explicitly instructing them to assess the reliability of
both center-node pseudo and PPR pseudo labels. This leads to the following message:
Human Interaction Note that PPR pseudo labels represent a sequence of labels ranked by
importance, with the first one being the most crucial. Let’s evaluate the reliability of center
node pseudo labels against PPR pseudo labels and then make another prediction.
With the interaction above, in most cases LLMs refined their answer, below are some examples:
19
Preprint
’G,’ which means ’A’ is considered more crucial in the context of this paper. Now, let’s
make a prediction based on this evaluation: Since ’A’ is the most crucial label in the PPR
list, we will prioritize it for classifying the topic of the paper. Therefore, the paper’s primary
topic is ”A: Theory” (Category A). The presence of ’G’ in the PPR list indicates some
relevance to ”Rule Learning” (Category G), but because it comes after ’A,’ it is considered
less important in this context. So, the revised prediction is that the primary topic of the
paper is ”A: Theory.”
[GPT-4]: Given the new information that PPR pseudo labels represent a sequence of labels
ranked by importance, let’s re-evaluate the provided data:
The center node suggests the paper is about ’G’ (Rule Learning). However, considering the
prominence of ’A’ (Theory) in the ranked sequence, and since it’s the most crucial label, the
paper might be primarily focused on theoretical aspects related to Rule Learning. Consider-
ing both the center node and the importance sequence, the topic of the paper should be: A:
Theory
Refusion after Human Interaction
[ChatGPT]: Certainly, let’s evaluate the reliability of center node pseudo labels against PPR
pseudo labels.
In the provided graph information: The center-node label is ’G,’ which suggests that the
most important label for this paper is ’G.’ However, when we consider the PPR labels, we
see that ’A’ appears as the first and third label, which might indicate some importance as
well.
Let’s consider both the center-node and the PPR labels for our prediction. Given that ’G’ is
the center-node label and ’A’ appears in the PPR labels, we can prioritize the center-node
label but also consider the presence of ’A’ in the PPR labels.
So, the prediction for the topic of the paper can be: Topic of paper: G (with some influence
from A).
Confusion after Human Interaction
[ChatGPT]: Based on this information, we can classify the topic of the paper as a combi-
nation of the most frequent labels among the neighboring papers, which are ’A’ and ’G’.
Therefore, the topic of the paper is likely a combination of topics ’A’ and ’G’.
So, the answer is a combination of two topics, and the closest match among the given op-
tions is:
Option: F: Case Based # Note that F does not present in the pseudo labels of the question
The provided examples, along with the consolidated findings in Table 4.1, compellingly show that
both ChatGPT and GPT-4 can adjust their pre-existing biases about graphs when given human feed-
back. Notably, after this interaction, GPT-4 delivers a flawless accuracy rate of 100%, consistently
following the PPR logic. Meanwhile, ChatGPT also sees a significant performance boost, with an
accuracy improvement of 36.9%. However, as evidenced in the examples, ChatGPT occasionally
refuses updating its predictions or becomes confused.
While the G RAPH T EXT framework offers notable advantages, there is ample room for enhancement
and exploration of new applications.
One primary concern not fully addressed in this paper is the question of how to discretize continuous
features. As evident from Table 6, most optimal settings are label-based. This makes G RAPH T EXT
resemble a label propagation model, with Citeseer being the only exception. We posit that the
observed trend might be attributed to two main factors: 1) The ineffectiveness of the discretization
process, and 2) The discord between feature and label spaces, making reasoning challenging for
LLMs.
Additionally, the design space of the graph-proxy tree is extensive and often requires either expert
knowledge or hyperparameter optimization. Although G RAPH T EXT boasts flexibility and general-
ity, crafting the text-attribute set F and relation set R, and determining their combinations result
20
Preprint
in a vast search space. However, given that G RAPH T EXT operates on a training-free paradigm,
hyperparameter optimization can be swift.
Notwithstanding its constraints, G RAPH T EXT introduces avenues for novel research. Chiefly, it
sets the stage for graph reasoning in natural language. Emerging advancements in the LLM realm
can potentially be integrated into the graph ML domain. This includes areas such as multi-step
reasoning (Yao et al., 2023a), decision-making (Yao et al., 2023b; Liang et al., 2023), tool utiliza-
tion (Schick et al., 2023), and multi-agent collaboration (Park et al., 2023; Hong et al., 2023). Fur-
thermore, the prospect of training-free graph learning streamlines the validation process for graph
model designs. If one assumes the optimal relation set R and feature set H to be transferable, signifi-
cant training time can be saved. Researchers can quickly identify suitable settings with G RAPH T EXT
and then apply these configurations to the hyperparameters of other GNNs/LLMs.
21