0% found this document useful (0 votes)
30 views21 pages

Graph Text

The paper introduces G RAPH T EXT, a novel framework that translates graph data into natural language, enabling large language models (LLMs) to perform graph reasoning tasks. This framework utilizes a graph-syntax tree to represent node attributes and relationships, allowing LLMs to treat graph tasks as text generation tasks. G RAPH T EXT demonstrates training-free graph reasoning capabilities and facilitates interactive communication between humans and LLMs using natural language.

Uploaded by

ericsun2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views21 pages

Graph Text

The paper introduces G RAPH T EXT, a novel framework that translates graph data into natural language, enabling large language models (LLMs) to perform graph reasoning tasks. This framework utilizes a graph-syntax tree to represent node attributes and relationships, allowing LLMs to treat graph tasks as text generation tasks. G RAPH T EXT demonstrates training-free graph reasoning capabilities and facilitates interactive communication between humans and LLMs using natural language.

Uploaded by

ericsun2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Preprint

G RAPH T EXT: G RAPH R EASONING IN T EXT S PACE


Jianan Zhao1,2 , Le Zhuo3 , Yikang Shen4 , Meng Qu1,2 , Kai Liu5
Michael Bronstein6 , Zhaocheng Zhu1,2 , Jian Tang1,7,8
1
Mila - Québec AI Institute, 2 Université de Montréal, 3 Beihang University,
4
MIT-IBM Watson AI Lab, 5 Division of gRED Computational Science, Genentech Inc.,
6
University of Oxford, 7 HEC Montréal, 8 Canadian Institute for Advanced Research (CIFAR)

A BSTRACT
arXiv:2310.01089v1 [[Link]] 2 Oct 2023

Large Language Models (LLMs) have gained the ability to assimilate human
knowledge and facilitate natural language interactions with both humans and other
LLMs. However, despite their impressive achievements, LLMs have not made
significant advancements in the realm of graph machine learning. This limitation
arises because graphs encapsulate distinct relational data, making it challenging
to transform them into natural language that LLMs understand. In this paper,
we bridge this gap with a novel framework, G RAPH T EXT, that translates graphs
to natural language. G RAPH T EXT derives a graph-syntax tree for each graph that
encapsulates both the node attributes and inter-node relationships. Traversal of the
tree yields a graph text sequence, which is then processed by an LLM to treat graph
tasks as text generation tasks. Notably, G RAPH T EXT offers multiple advantages.
It introduces training-free graph reasoning: even without training on graph data,
G RAPH T EXT with ChatGPT can achieve on par with, or even surpassing, the per-
formance of supervised-trained graph neural networks through in-context learning
(ICL). Furthermore, G RAPH T EXT paves the way for interactive graph reasoning,
allowing both humans and LLMs to communicate with the model seamlessly us-
ing natural language. These capabilities underscore the vast, yet-to-be-explored
potential of LLMs in the domain of graph machine learning.

1 I NTRODUCTION
Language stands as a cornerstone of human civilization, acting as the primary medium for knowl-
edge encoding, reasoning, and communication. Large language models (LLMs), pre-trained on
extensive text corpora, have showcased remarkable reasoning skills (Brown et al., 2020; Bubeck
et al., 2023). These LLMs can communicate via natural language both internally (Wei et al., 2022)
and externally with humans or other LLMs (Li et al., 2023), demonstrating exceptional skills such
as multi-step reasoning (Yao et al., 2023a), decision-making (Yao et al., 2023b; Liang et al., 2023),
tool use (Schick et al., 2023), and multi-agent collaboration (Park et al., 2023; Hong et al., 2023).

Motivation. Despite the remarkable success of LLMs in handling natural languages, their appli-
cation to other data modalities presents unique challenges, primarily because these data often lack
straightforward transformation into sequential text. These challenges are especially severe when
dealing with graph-structured data, as different graphs define structure and features in distinct ways.
Therefore, existing efforts within the graph machine learning field commonly require the training of
specific graph neural networks (GNNs) tailored to individual graphs (Kipf & Welling, 2017; Velick-
ovic et al., 2018; Xu et al., 2019). Often, models trained on one graph cannot generalize to the
unseen structure and feature representations of other graphs. Moreover, the gap between graphs and
human languages hinders the application of natural language reasoning to facilitate graph reasoning.
In light of these limitations, a question arises: can we derive a language for graph in natural lan-
guage? In this paper, we give an affirmative answer by proposing to use tree as an intermediary,
elegantly bridging structured data and one-dimensional sequential language. Essentially, a tree ex-
hibits a hierarchical structure, and traversing it yields a one-dimensional sequence. On top of that,
as shown in Figure 1 (c), we propose a novel framework G RAPH T EXT, which takes graph data to
build a graph-syntax tree. Traversing it results in a graph prompt expressed in natural language,
allowing an LLM to approach graph reasoning as a text-generation task.

1
Preprint

(a) Graph Learning in Graph-specific Space (c) GraphText # Task prompt and demos
<latexit sha1_base64="BFWAbey9nugmRF7lytJgjy32Ats=">AAACFnicbVC7TsMwFHXKq5RXgQ0WiwqpA1RJoS1jJRbGItEHSqLIcZ3WqvOQ7SBVUb6BgYVfYWEoQqyIjQ9hx007QOFIlo/Ouff6+rgRo0Lq+qeWW1peWV3Lrxc2Nre2d4q7ex0RxhyTNg5ZyHsuEoTRgLQllYz0Ik6Q7zLSdUeXU797R7igYXAjxxGxfTQIqEcxkkpyiqeJlQ0x+cC1E72in0C9cmY06tO73jivpdYQyeQ2dRIjTZ1iSZVkgH+JMSel5oFV/prcWy2n+GH1Qxz7JJCYISFMQ4+knSAuKWYkLVixIBHCIzQgpqIB8omwk2yjFB4rpQ+9kKsTSJipPzsS5Asx9l1V6SM5FIveVPzPM2PpXdgJDaJYkgDPHvJiBmUIpxnBPuUESzZWBGFO1a4QDxFHWKokCyoEY/HLf0mnWjHqldq1SqMKZsiDQ3AEysAADdAEV6AF2gCDB/AEJuBFe9SetVftbVaa0+Y9++AXtPdvtgmgPA==</latexit>

Continuous pred. G-Syntax Tree Graph information:


/ label:
<latexit sha1_base64="nplGGrvkO5Xbeiwh4v+FO2kZ6nw=">AAACEHicbVDLSgMxFM34rPVVdaebYBG7kDJTbeuy4EKXFewDZoYhk6ZtaCYzJBmhDPMJIvgrblwookuX7vwQ96aPhbYeCDmcc+/NzfEjRqUyzS9jYXFpeWU1s5Zd39jc2s7t7DZlGAtMGjhkoWj7SBJGOWkoqhhpR4KgwGek5Q8uRn7rlghJQ36jhhFxA9TjtEsxUlrycseJMx5ii57vJmbRPIFm8dSqVkZ3pXpWTi+9xEpTL5fX5hhwnlhTkq/tO4Xv93un7uU+nU6I44BwhRmS0rbMSLkJEopiRtKsE0sSITxAPWJrylFApJuMd0nhkVY6sBsKfbiCY/V3R4ICKYeBrysDpPpy1huJ/3l2rLrnbkJ5FCvC8eShbsygCuEoHdihgmDFhpogLKjeFeI+EggrnWFWh2DNfnmeNEtFq1IsX+s0SmCCDDgAh6AALFAFNXAF6qABMLgDj+AZvBgPxpPxarxNSheMac8e+APj4weIsJ1d</latexit>

G1 Ŷ1 G-Prompt
[0, 0.9, 0, 0.1]
<latexit sha1_base64="tY4TIMFwOvuB5W62x+uJTiMvfks=">AAACFXicbVDLSgMxFM3UV62vqjvdBIvQRSkz1bYuC25cVrAP6Awlk6ZtaOZBckcow3yD4MZfcePCIm4Fd36Ie9PHQlsPhBzOuffm5rih4ApM88tIra1vbG6ltzM7u3v7B9nDo6YKIklZgwYikG2XKCa4zxrAQbB2KBnxXMFa7uh66rfumVQ88O9gHDLHIwOf9zkloKVuthDbsyEdOXCd2CyaBWwWL6xqZXpXqpflxIYhA9KNrSTpZnO6Yga8SqwFydVO7Pz35MGud7Ofdi+gkcd8oIIo1bHMEJyYSOBUsCRjR4qFhI7IgHU09YnHlBPPFkrwuVZ6uB9IfXzAM/V3R0w8pcaeqys9AkO17E3F/7xOBP0rJ+Z+GAHz6fyhfiQwBHgaEe5xySiIsSaESq53xXRIJKGgg8zoEKzlL6+SZqloVYrlW51GCc2RRqfoDOWRhaqohm5QHTUQRY/oGb2iifFkvBhvxvu8NGUseo7RHxgfP8yyn7o=</latexit>

GNN ✓1 1st-hop: [A]


… label feature 2nd-hop: [B]
Traverse
<latexit sha1_base64="HOKj9Mv9p0N4lKTgizvlAYVmW+k=">AAACFnicbVC7TsMwFHXKq5RXgA0WiwqpA1RJoS1jJRbGItEHaqLKcd3WquNEtoNURfkGBhZ+hYWhCLEiNj6EHaftAIUjWT46597r6+OFjEplWZ9GZml5ZXUtu57b2Nza3jF395oyiAQmDRywQLQ9JAmjnDQUVYy0Q0GQ7zHS8kaXqd+6I0LSgN+ocUhcHw047VOMlJa65mnsTId0xMBzY6tonUCreGZXK+ldqZ6XE2eIVHybdONSknTNvC6ZAv4l9pzkawdO4Wty79S75ofTC3DkE64wQ1J2bCtUboyEopiRJOdEkoQIj9CAdDTlyCfSjacbJfBYKz3YD4Q+XMGp+rMjRr6UY9/TlT5SQ7nopeJ/XidS/Qs3pjyMFOF49lA/YlAFMM0I9qggWLGxJggLqneFeIgEwkonmdMh2Itf/kuapaJdKZavdRolMEMWHIIjUAA2qIIauAJ10AAYPIAnMAEvxqPxbLwab7PSjDHv2Qe/YLx/A7ePoD0=</latexit>

<latexit sha1_base64="KX67P2OnIvD7UGmnpx3SRMTgc7w=">AAACEHicbVDLSgMxFM34rPU16k43wSJ2IWWm2tZlwYUuK9gHdIYhk6ZtaOZBkhHKMJ8ggr/ixoUiunTpzg9xb2bahbYeCDmcc+/NzXFDRoU0jC9tYXFpeWU1t5Zf39jc2tZ3dlsiiDgmTRywgHdcJAijPmlKKhnphJwgz2Wk7Y4uUr99S7iggX8jxyGxPTTwaZ9iJJXk6MexlQ3p8oFrx0bJOIFG6dSsVdO7WjurJJdOXE4SRy8oMwOcJ+aUFOr7VvH7/d5qOPqn1Qtw5BFfYoaE6JpGKO0YcUkxI0neigQJER6hAekq6iOPCDvOdkngkVJ6sB9wdXwJM/V3R4w8Icaeqyo9JIdi1kvF/7xuJPvndkz9MJLEx5OH+hGDMoBpOrBHOcGSjRVBmFO1K8RDxBGWKsO8CsGc/fI8aZVLZrVUuVZplMEEOXAADkERmKAG6uAKNEATYHAHHsEzeNEetCftVXublC5o05498Afaxw+KNp1e</latexit>

G2 <latexit sha1_base64="o5OS+s2fsxKDhI22CBVYW1Ji4ZI=">AAACFXicbVDLSgMxFM3UV62vqjvdBIvQRSkz1bYuC25cVrAP6Awlk6ZtaOZBckcow3yD4MZfcePCIm4Fd36Ie9PHQlsPhBzOuffm5rih4ApM88tIra1vbG6ltzM7u3v7B9nDo6YKIklZgwYikG2XKCa4zxrAQbB2KBnxXMFa7uh66rfumVQ88O9gHDLHIwOf9zkloKVuthDbsyEdOXCd2CyaBWwWL6xqZXpXqpflxIYhA9KNS0nSzeZ0xQx4lVgLkqud2PnvyYNd72Y/7V5AI4/5QAVRqmOZITgxkcCpYEnGjhQLCR2RAeto6hOPKSeeLZTgc630cD+Q+viAZ+rvjph4So09V1d6BIZq2ZuK/3mdCPpXTsz9MALm0/lD/UhgCPA0ItzjklEQY00IlVzviumQSEJBB5nRIVjLX14lzVLRqhTLtzqNEpojjU7RGcojC1VRDd2gOmogih7RM3pFE+PJeDHejPd5acpY9ByjPzA+fgDOOJ+7</latexit>

Ŷ2 feature:
GNN ✓2 [0.2, 0.8] …
1 hop 2 hop center 1 hop 2 hop center-node: [0]
1st-hop: [1, 2]
2nd-hop: [3, 2]
(b) Graph Learning in Shared Text Space A B 0 1 2 3 2
Question: What’s the
Text Reasoning category of the node
Pre-training Corpus Graph 0
<latexit sha1_base64="ACgtyBB+tdBy+AjXKcpQVzZIuvg=">AAACGHicbVDLTgIxFO0gKuILdamLRmKCicEZogg7EjcuMeGVDOOkUwo0dB5pOyZkMn/g1o2/4saFxrhl5xe48h/sAAsFT9Lm5Jx723uPEzAqpK5/aqmV9OraemYju7m1vbOb29tvCT/kmDSxz3zecZAgjHqkKalkpBNwglyHkbYzuk789j3hgvpeQ44DYrlo4NE+xUgqyc6dR93pIyYfOFakF8vVC+MM6kU9uQy9Wokbd1HBOI3tiHpxnLVzeWVOAZeJMSf52pH7XXj4Stft3KTb83HoEk9ihoQwDT2QVoS4pJiRONsNBQkQHqEBMRX1kEuEFU1niuGJUnqw73N1PAmn6u+OCLlCjF1HVbpIDsWil4j/eWYo+xVLbRSEknh49lE/ZFD6MEkJ9ignWLKxIghzqmaFeIg4wlJlmYRgLK68TFqlolEuXt6qNEpghgw4BMegAAxwBWrgBtRBE2DwCJ7BK3jTnrQX7V37mJWmtHnPAfgDbfIDyXOgFg==</latexit>
<latexit sha1_base64="8/BasKgVa2Tx0uXors24hM6BlT4=">AAACGXicbVDLSgMxFM3UV62vUXe6CRahgpSZorXdFdy4rNAXdMaSSdM2NPMgyQhlmG9w58ZfcaOgiEtd+SHuzUy70NYDCYdz7r25OU7AqJCG8aVllpZXVtey67mNza3tHX13ryX8kGPSxD7zecdBgjDqkaakkpFOwAlyHUbazvgy8du3hAvqew05CYjtoqFHBxQjqaSebkRWOqTLh44dGcVy9cw8hUbRSC7TqFbixk1UME/iXuSHMo5zPT2v3BRwkZgzkq8dWIXv5zur3tM/rL6PQ5d4EjMkRNc0AmlHiEuKGYlzVihIgPAYDUlXUQ+5RNhRulQMj5XShwOfq+NJmKq/OyLkCjFxHVXpIjkS814i/ud1Qzmo2BH1glASD08fGoQMSh8mMcE+5QRLNlEEYU7VrhCPEEdYqjCTEMz5Ly+SVqlolovn1yqNEpgiCw7BESgAE1yAGrgCddAEGNyDR/ACXrUH7Ul7096npRlt1rMP/kD7/AH5jqDM</latexit>
and Prediction 0 Tree Construct (choose from [A, B])?
(1) (1)
Tin <latexit sha1_base64="N8JVcUgJTvGWKu3nxbFuPTCSVfo=">AAACEHicbVDLSgMxFM3UV62vqjvdBIvYhQwzRWu7K7hxWcE+oDOUTJppQzOZIckIZegniOCvuHGhiC5duvND3Jtpu9DWAwmHc+69uTlexKhUlvVlZJaWV1bXsuu5jc2t7Z387l5ThrHApIFDFoq2hyRhlJOGooqRdiQICjxGWt7wMvVbt0RIGvIbNYqIG6A+pz7FSGmpmz9JnMmQjuh7bmKZ5eqZfQot00ov26pWxk40oONcN1/Q4gRwkdgzUqgdOMXv93un3s1/Or0QxwHhCjMkZce2IuUmSCiKGRnnnFiSCOEh6pOOphwFRLrJZJcxPNZKD/qh0IcrOFF/dyQokHIUeLoyQGog571U/M/rxMqvuAnlUawIx9OH/JhBFcI0HdijgmDFRpogLKjeFeIBEggrnWEagj3/5UXSLJl22Ty/1mmUwBRZcAiOQBHY4ALUwBWogwbA4A48gmfwYjwYT8ar8TYtzRiznn3wB8bHD0wlnTQ=</latexit>
Tout
2
1 2
<latexit sha1_base64="2hGVPVMG6IQqLPHH2nOl6pZaY3c=">AAACGHicbVDLTgIxFO0gKuILdamLRmKCicEZogg7EjcuMeGVDOOkUwo0dB5pOyZkMn/g1o2/4saFxrhl5xe48h/sAAsFT9Lm5Jx723uPEzAqpK5/aqmV9OraemYju7m1vbOb29tvCT/kmDSxz3zecZAgjHqkKalkpBNwglyHkbYzuk789j3hgvpeQ44DYrlo4NE+xUgqyc6dR93pIyYfOFakF8vVC+MM6kU9uQy9Wokbd1GhdBrbEfXiOGvn8sqcAi4TY07ytSP3u/Dwla7buUm35+PQJZ7EDAlhGnogrQhxSTEjcbYbChIgPEIDYirqIZcIK5rOFMMTpfRg3+fqeBJO1d8dEXKFGLuOqnSRHIpFLxH/88xQ9iuW2igIJfHw7KN+yKD0YZIS7FFOsGRjRRDmVM0K8RBxhKXKMgnBWFx5mbRKRaNcvLxVaZTADBlwCI5BARjgCtTADaiDJsDgETyDV/CmPWkv2rv2MStNafOeA/AH2uQHywCgFw==</latexit>

1 A
LLM
<latexit sha1_base64="RnYkwWQ3dbWTz8jOdsVs/MZE/sc=">AAACGXicbVDLSgMxFM3UV62vUXe6CRahgpSZorXdFdy4rNAXdMaSSdM2NPMgyQhlmG9w58ZfcaOgiEtd+SHuzUy70NYDCYdz7r25OU7AqJCG8aVllpZXVtey67mNza3tHX13ryX8kGPSxD7zecdBgjDqkaakkpFOwAlyHUbazvgy8du3hAvqew05CYjtoqFHBxQjqaSebkRWOqTLh44dGcVy9cw8hUbRSC7TqFbixk1UKJ3EvcgPZRznenpeuSngIjFnJF87sArfz3dWvad/WH0fhy7xJGZIiK5pBNKOEJcUMxLnrFCQAOExGpKuoh5yibCjdKkYHiulDwc+V8eTMFV/d0TIFWLiOqrSRXIk5r1E/M/rhnJQsSPqBaEkHp4+NAgZlD5MYoJ9ygmWbKIIwpyqXSEeIY6wVGEmIZjzX14krVLRLBfPr1UaJTBFFhyCI1AAJrgANXAF6qAJMLgHj+AFvGoP2pP2pr1PSzParGcf/IH2+QP7HKDN</latexit>

(2) (2) According to the demos,


Tin Tout Text Attributes
1st-hop labels are robust
feature x
predictions. Therefore,
Feedback 3 B 3 4 2 label y the answer is A.

Figure 1: Comparison between (a) the GNN framework and (b) the proposed G RAPH T EXT frame-
work. For different graphs G1 and G2 , different GNNs θ1 θ2 are trained to make a graph-specific
output prediction in continuous form. In contrast, G RAPH T EXT encodes the graph information to
(1) (2) (1) (2)
text sequences Tin and Tin , and generates text reasoning and prediction Tout and Tout with a
graph-shared LLM ϕ. G RAPH T EXT leverages a pre-trained LLM to perform training-free graph
reasoning and enables human and AI interaction for graph reasoning in natural language. (c) An ex-
ample of the G RAPH T EXT framework that classifies node 0: Given a graph, G RAPH T EXT constructs
a graph-syntax tree that contains both node attributes (e.g. feature and label) and relationships (e.g.
center-node, 1st-hop, and 2nd-hop). Then, G RAPH T EXT traverses the graph-syntax tree to obtain a
sequential text, i.e. graph prompt, and let LLM perform graph reasoning in text space.

Main contributions. First, G RAPH T EXT serves as a flexible and general framework for graph rea-
soning. It can incorporate common inductive bias of GNNs, such as feature propagation and feature
similarity-based propagation, simply by constructing different graph-syntax trees. It also serves as a
general framework for graph reasoning for both in-context learning and instruction tuning, on both
general graphs and text-attributed graphs. Second, we show that G RAPH T EXT enables the possibil-
ity of training-free graph reasoning. The training-free property enables us to deploy G RAPH T EXT
not only with open-source LLMs, but also with powerful closed-source LLMs. Remarkably, even
without training on graph data, G RAPH T EXT with ChatGPT can deliver performance on par with,
or even surpassing, supervised graph neural networks through in-context learning. This highlights
the vast potential of foundation models in the realm of graph machine learning. Third, G RAPH T EXT
fosters interactive graph reasoning: With its capacity to generate and explain predictions in nat-
ural language, humans can directly engage with G RAPH T EXT. As shown in Figure 2 (b), through
interactions with humans and other LLMs, G RAPH T EXT refines its graph reasoning capabilities.

2 M ETHODOLOGY
In this section, we introduce G RAPH T EXT to perform graph reasoning in text space. Out of the three
fundamental problems of graph ML (graph classification, node classification, and link prediction),
we take node classification as an example to introduce our idea. We however note that our discussion
applies to other graph tasks.

2.1 T HE G RAPH T EXT F RAMEWORK

Let us be given an attributed graph G = (V, E, X) with nodes V and edges E, whose structure is
represented as the |V | × |V | adjacency matrix A and node attributes as the |V | ×d feature matrix X.
Given a subset L ⊂ V of labeled nodes with labels YL , the goal of node classification is to predict
the labels YU of the unlabeled nodes U = V \ L. Graph Neural Networks (GNNs) are the standard
architecture for such problems. As shown in Figure 1 (a), a GNN directly learns a parametric map
ŷi = fGNN (G; θG )i (1)

between the input graph G ∈ G and the output labels Ŷ ∈ Y, assigning to each node i its predicted
label ŷi . The training of GNN attempts to find parameters θG such that ŷi ≈ yi on the training set.

2
Preprint

pseudo labels: pseudo labels: Classify the papers’ topics. Updated Graph Prior
center-node: center-node: Here are a few examples: The first PPR labels
[F] [B] appears to be the best
prediction.
ppr: ppr: Classify the paper below:
[A, A, A, A] [B, B, B, B] Counting PPR labels is
topic: topic: more robust than using
A B Previous Graph Prior center-node labels
Center-nodes are vital. Confusion
pseudo labels: pseudo labels: Hence, the answer is G. The answer is either A
center-node: center-node: PPR labels are ranked by or G. I can’t predict a
[G] [G] importance, with the first reasonable answer.
ppr: ppr: one being the most crucial. Refusion
[G, G, G, G] [A, G, A, E] Re-evaluate your prediction While PPR labels are
topic: topic: Ah, I see. Based on the first crucial, I still believe in
G ? (a) PPR label , the answer is A. (b) center-node labels. (c)

Figure 2: (a) Few-shot demonstrations (blue) and the target node #2188 (green) to predict on Cora.
(b) An illustration of how human interaction changes the graph reasoning of an LLM, where the
LLM previously has the prior that the center-node is vital. (c) Behaviors of LLMs after given
demonstrations/human interaction: update graph prior to bias more on PPR (personalized pager-
ank); leads to confusion or refusion. Details are discussed in Section 4.2.

Note that standard GNNs are graph-specific functions, i.e. fGNN (·; θG ) : G 7→ Ŷ , which do not
generalize to other graphs, since other graphs G′ ∈ G define distinct distributions of Y ′ , A′ , and
X ′ , or even different types of features such as continuous, categorical, or text features.
To solve the generalization problem mentioned above, this paper proposes to perform graph rea-
soning as a text-to-text problem (Raffel et al., 2020), as shown in Figure 1 (b). Inspired by prompt
tuning (Brown et al., 2020; Liu et al., 2023), we construct two graph-specific maps to form the input
and output space of a text-to-text problem: a map g : G 7→ Tin that maps the graph input to text
space, and a map h : Tout 7→ Ỹ that maps the output of LLM to label predictions Ỹ . In this way,
we can use a generative large language model fLLM to perform graph reasoning as
ỹi = h(fLLM (g(G)i ; ϕ)) (2)
where g(G)i = Tin [i] denotes the text sequence representing node i. Different from GNNs,
fLLM (·; ϕ) : T → T is a graph-shared function, where both input and output are in text space, i.e.
Tin , Tout ∈ T , which not only activates of parametric knowledge encoded in the model fLLM (·; ϕ),
but also enables interactions between human and AI agents to facilitate graph reasoning.
Specifically, as node classification, link prediction and graph classification are essentially classi-
fication tasks, we can naturally formulate these graph reasoning tasks as multi-choice QA prob-
lem (Robinson & Wingate, 2023) and design h as the map from predicted choice Tout ∈ T to the
corresponding prediction Ỹ . However, the design of g that maps the structural graph information
into the text space of natural language is still a non-trivial problem.
The primary challenge in converting graph data to language lies in handling its relational structure,
which fundamentally deviates from the one-dimensional sequential nature of text data. Inspired
by linguistic syntax trees (Chiswell & Hodges, 2007), we introduce graph-syntax trees as a bridge
between relational and sequential data. The traversal of such a tree produces a sentence in natural
language, which is fed to LLM for graph reasoning. Specifically, as shown in Figure 1 (c), we
compose a graph-syntax tree consisting of node text attributes and inter-node relationships. Next,
we describe how to compose the node text attributes and inter-node relationships in Section 2.2, and
how to build a graph-syntax tree in Section 2.3.

2.2 T EXTUAL AND R ELATIONAL I NFORMATION FOR S YNTAX T REES

A graph syntax tree is composed of both textual and relational information derived from the graph.
For textual information, G RAPH T EXT constructs a text attribute set F ∈ T for an arbitrary graph
G ∈ G (with or without text-attributes) composed of multiple types of attributes for each node,
e.g. feature and label, in natural language. Specifically, for each node vi and feature type m, we

3
Preprint

construct a text sequence Fm [i] in natural language:


Fm [i] = {t1 , t2 , · · · tlm }, Fm [i] ∈ T , (3)
where the sequence is of length lm . Each text attribute Fm can be derived from either sequential text
features or continuous features. For text features, they can be directly added to the text attributes
F . For example, we can directly add the text sequences of “title” and “abstract” into F for citation
graphs. For continuous features, e.g. the raw feature X or other graph embeddings, we propose to
use discretization methods, e.g. clustering, to transform the continuous feature into a discrete space
and then derive sequential data from it. For simplicity, we use the cluster index of K-means to gen-
erate a sequence of length 1 for all continuous features as K-means is effective in our experiments.
For relational information, G RAPH T EXT derives a set of matrices R where each Rn ∈ R is a
|V | × |V | matrix, depicting one type of relationship between nodes. Choices of Rn may be the
original graph (Kipf & Welling, 2017), high-order connectedness (Xu et al., 2018), page-rank matri-
ces (Klicpera et al., 2019), or any matrices that encode node-pair information. These relationships
play an important role in determining the nodes and structure of the graph-syntax tree, which further
improves the graph text prompt.

2.3 G RAPH - SYNTAX T REE C OMPOSITION

We now describe how to build a graph prompt using a graph-syntax tree of the graph text attributes
and relationships F and R. By analogy to the syntax tree in linguistics, we define a graph-syntax tree
as an ordered tree: a directed acyclic graph (DAG) with nodes T̃ ∈ T and edges Ẽ. In a graph-syntax
tree, e.g. the one in Figure 1 (c), each node stores a text sequence in natural language, where the root
node is an empty node; the leaf nodes T̃L are text sequences in the graph text attributes, i.e. ∀ Ti ∈
T̃L , Ti ∈ F ; the internal nodes T̃I are text sequences in natural language, i.e. ∀ Ti ∈ T̃I , Ti ∈ T . A
graph syntax tree is constructed in three steps: (1) construct an ego-subgraph (Hamilton et al., 2017)
Gi for target node vi based on relationship R (2) select leaf nodes T̃L based on the relationship R.
(3) build up internal nodes T̃I and edges Ẽ based on the leaf nodes’ types and their relationship with
the graph1 . Notably, the leaf nodes are sorted according to their relationships with the center-node,
preserving the relative relationship in a one-dimensional order.
We illustrate this with a node classification example shown in Figure 1 (c). Before build-
ing the graph-syntax tree, G RAPH T EXT determines the text attributes composed of raw fea-
tures and observed labels, i.e. F = {FX [i], FY [i] | ∀vi ∈ V }, and a relationship set
composed of determined by shortest path distance (SPD): center-node, 1st-hop, and 2nd-hop,
i.e. R = {RSPD=0 , RSPD=1 , RSPD=2 }. Then, for target node vi (0 in the example), an ego-
subgraph (Hamilton et al., 2017) (with nodes [0,1,2,3,4]) is sampled based on the relative rela-
tionship between vi and other nodes. Finally, a graph-syntax tree is constructed with leaf nodes
T̃L = {FX [0], FX [1], FX [2]FX [3], FX [4], FY [1], FY [3]}, the internal nodes T̃I ={“center-node”, “1st-
hop”, “2nd-hop”, “label”, “feature”}, and the corresponding edges. The traversal of the resulting
graph-syntax tree leads to a text sequence in natural language.
Compared with the direct flattening of (sub)graphs (Wang et al., 2023a; Chen et al., 2023), using
a graph-syntax tree-based prompt has many advantages: Above all, unlike a graph, which has no
topology order, a syntax tree is a DAG that can be topologically sorted, which gracefully converts
a relational structure to a sequence of nodes. Moreover, G RAPH T EXT easily incorporates the in-
ductive biases of GNNs through the construction of node text attributes F and relationships R. For
example, we can easily encode the feature-propagation mechanism of GNNs by including a text
attribute derived from the propagated feature Ak X (Zhang et al., 2022), into the node attributes F .
We can also incorporate the feature similarity-based aggregation (Velickovic et al., 2018) by adding
XX ⊤ to R. These graph-based inductive biases can significantly boost LLMs’ graph reasoning
performance (further discussed in Section 4.1). Last but not least, a tree naturally defines a hierar-
chical structure, which LLMs are proficient in reasoning on (Liang et al., 2023), by training on code
data (Chen et al., 2021) and web page data (Touvron et al., 2023).
1
The hierarchy of the tree can be defined flexibly, but we have empirically discovered that a simple config-
uration, with attribute type at the top hierarchy and relation type at the bottom hierarchy for internal nodes, as
illustrated in Figure 1 (c), yields strong performance. Further details are available in Section 4.3.

4
Preprint

3 R ELATED W ORK
Unlock Graph Space for Language Models. Large Language Models (LLMs) (Brown et al.,
2020; OpenAI, 2023; Anil et al., 2023; Bubeck et al., 2023) possess impressive reasoning capabili-
ties (Wei et al., 2022; Yao et al., 2023a; Fu et al., 2023). At the heart of LLMs’ reasoning prowess is
their ability to process and generate natural language inputs and outputs, enabling flexible interac-
tions (Dohan et al., 2022) with both humans and AI agents. This unique capability empowers them
with remarkable abilities such as complex reasoning (Fu et al., 2023) and decision-making (Yao
et al., 2023b; Liang et al., 2023). Despite their success, applying LLMs to relational graph data
remains challenging, primarily due to the absence of a natural language representation for graphs.
G RAPH T EXT bridges this gap by providing a novel framework that enables LLMs to seamlessly
integrate and reason over relational graph data using the same natural language capabilities, thereby
unlocking their potential for a wide range of graph-based applications.

Training-free Graph Reasoning Graph neural networks (GNNs) (Kipf & Welling, 2017; Xu
et al., 2019) excel in handling relational graph data, thanks to the message-passing mechanism for
aggregation and transformation of neighborhood representations. Their standout performance can
be attributed to their intrinsic capability to assimilate graph inductive biases. This incorporation
of biases is achieved by designing representations with the graph structure in perspective, such as
position embeddings (Dwivedi et al., 2022; Ying et al., 2021; Kreuzer et al., 2021) and propagated
features (Wu et al., 2019; Zhang et al., 2022). Furthermore, they can introduce diverse aggregation
methods, like feature similarity-based message passing (Velickovic et al., 2018; Zhao et al., 2021)
or high-order aggregation (Klicpera et al., 2019; Bojchevski et al., 2020; Chien et al., 2021). How-
ever, as highlighted in Section 2.1, due to the variance in both structure and feature, the majority of
GNNs are graph-specific. They are tailored for a particular graph type with consistent features and
structures, thus posing challenges for generalization to different graphs.
In a parallel vein, G RAPH T EXT also taps into the potent ability to infuse graph inductive biases for
graph reasoning, achieved through designing both the textual and relational aspects of the graph-
syntax tree. Setting itself apart from GNNs, G RAPH T EXT approaches graph reasoning in a graph-
shared domain, facilitating the broader applicability of a single LLM to diverse graphs and offering
training-free and interactive graph reasoning.

Connecting Both Worlds. Recent endeavors (Chien et al., 2022; Zhao et al., 2023; He et al., 2023)
have aimed to merge the language and graph domains. Most methods involve transitioning the prob-
lem into a graph-specific realm, utilizing a combination of a text-encoder (either pre-trained (Chien
et al., 2022) or learned (Li et al., 2021)) and a GNN predictor. This methodology still falls into a
graph-specific paradigm. Very recently, there are concurrent works (Guo et al., 2023; Ye et al., 2023;
Wang et al., 2023a; Chen et al., 2023) exploring to leverage LLMs for graph-related tasks. These
methods either directly flatten the nodes and edges (Guo et al., 2023; Wang et al., 2023a) or employ
rule-based prompts on text-attributed graphs (Chen et al., 2023; Ye et al., 2023).
Nevertheless, G RAPH T EXT is fundamentally different from these works. Foremost, G RAPH T EXT
proposes a language defined by a graph-syntax tree, offering a flexible and structured approach for
seamlessly integrating graph inductive biases. Moreover, it also serves as a general framework for
graph reasoning, which can be applied to scenarios encompassing in-context learning and instruc-
tion tuning. It accommodates various types of graphs, including general graphs and text-attributed
graphs, and is adaptable to both closed-source Large Language Models (LLMs) (OpenAI, 2023;
Bubeck et al., 2023) and open-source LLMs (Touvron et al., 2023).

4 E XPERIMENTS
We conduct extensive experiments to demonstrate the effectiveness of G RAPH T EXT. Firstly, in Sec-
tion 4.1, we delve into the remarkable capacity of G RAPH T EXT for training-free graph reasoning.
Subsequently, Section 4.2 highlights the interactive graph reasoning capabilities of G RAPH T EXT.
We further analyze various ablations of graph-syntax trees in Section 4.3. Concluding our explo-
ration, Section 4.4 illustrates how G RAPH T EXT can seamlessly function as a versatile framework,
catering to both in-context learning and instruction tuning across on both general graph and text-
attributed graphs.

5
Preprint

Table 1: Node classification results (accuracy %).


Model Graph Text Attributes Relations Cora Citeseer Texas Wisconsin Cornell
GCN NA original 81.4 69.8 59.5 49.0 37.8
GAT NA original 80.8 69.4 54.1 49.0 45.9
GCNII NA original 81.2 69.8 56.8 51.0 40.5
GATv2 NA original 82.3 69.9 62.2 52.9 43.2
original 26.3 13.7 5.4 9.8 21.6
label
ori.+synth. 53.0 37.2 70.3 64.7 51.4
original 33.4 36.9 5.4 29.4 24.3
G RAPH T EXT-ICL label+feat
ori.+synth. 52.1 50.4 73.0 60.8 46.0
original 64.5 51.0 73.0 35.3 48.7
label+feat+synth.
ori.+synth. 68.3 58.6 75.7 54.9 51.4

Citeseer Texas
70
70
60 60
accuracy

accuracy
GCN GCN
50 GAT 50 GAT
GCNII GCNII
40 GATv2 40 GATv2
GraphText GraphText
1 3 5 10 15 20 1 3 5 10 15 20
shots shots

Figure 3: Few-shot in-context learning node classification accuracy. We perform 1, 3, 5, 10, 15, and
20-shot node classification on Citeseer and Texas datasets.

4.1 T RAINING - FREE G RAPH R EASONING

One unique ability of G RAPH T EXT is the training-free graph reasoning by in-context learning (ICL).
In this section, we demonstrate this capability on the node classification tasks. Specifically, we use
two citation datasets (Cora (McCallum et al., 2000) and Citeseer (Giles et al., 1998)), and three
webpage datasets (Texas, Wisconsin, and Cornell (Pei et al., 2020)). The detailed discussion of
experimental settings and the dataset statistics can be found in Appendices A.1 and A.2.
We selected standard GNNs, including GCN (Kipf & Welling, 2017) and GAT (Velickovic et al.,
2018), along with their more recent variants GCNII (Chen et al., 2020) and GATv2 (Brody et al.,
2022), as our baselines. These GNN baselines are supervised and specific to individual graphs,
trained solely for inference on one dataset. In contrast, G RAPH T EXT utilizes a single pre-trained
LLM (ChatGPT) for all datasets without any graph-specific training.
Given an input graph, G RAPH T EXT constructs a graph-syntax tree by incorporating two types of
information: text attributes and relations. We utilize three types of text attributes for the graph:
observed labels (referred to as ‘label’), features generated by K-means clustering of X (referred to
as ‘feat’), and synthetic text (referred to as ‘synth.’) derived from feature- and label- propagation.
Additionally, we employ two types of relations: the original graph structure (A with self-loops)
and synthetic relations based on feature similarity, shortest-path distance, and personalized pager-
ank (Klicpera et al., 2019). The hyperparameter details are provided in Appendix A.3.
Experimental results are depicted in Table 1 and Figure 3. We observe that directly flattening raw text
(labels) using relations, as proposed in (Chen et al., 2023), results in poor performance, occasionally
worse than random. Incorporating discretized features into text attributes improves the performance
slightly.
Integrating the graph inductive bias into both text attributes and relations enhances performance.
The addition of synthetic relationships significantly boosts performance across all datasets, indi-
cating that the raw graph lacks sufficient information and requires augmentation. This observa-
tion aligns with findings in graph structure learning literature (Franceschi et al., 2019; Zhao et al.,
2021). Furthermore, the inclusion of synthetic text attributes is beneficial in most cases. Ultimately,

6
Preprint

Table 2: Interactive graph reasoning results (accuracy %) on Cora (node # 2188). The table show-
cases the performance of GPT-4 and ChatGPT before and after human interactions with 15 times of
evaluation. The reasoning metrics include PPR, Center-node, and instances where the model was
Confused to respond or Refused (Conf./Ref.) to make their reasoning/prediction. See Figure 2 (c)
for details.
Reasoning
Model Interaction Accuracy
PPR Center-node Conf./Ref.
Before 73.3 73.3 26.7 0
GPT-4
After 100 (+26.7) 100 0 0
Before 26.7 26.7 53.3 20.0
ChatGPT
After 63.6 (+36.9) 72.7 18.2 9.1

the combination of synthetic text attributes and synthetic relations yields the highest accuracy for
G RAPH T EXT in four out of five datasets.
Remarkably, even though it is not trained on graph data, G RAPH T EXT surpasses several GNN base-
lines, particularly when the label rate is low (see Figure 3) and in heterophilic datasets. This is
due to the merits that, in contrast with standard GNNs, G RAPH T EXT decouples depth and scope in
graph reasoning (Zeng et al., 2021). The strong performance of G RAPH T EXT in training-free graph
reasoning highlights the substantial potential of leveraging LLMs in graph machine learning.

4.2 I NTERPRETABLE AND I NTERACTIVE G RAPH R EASONING

In this section, we illustrate that G RAPH T EXT facilitates effective interactive graph reasoning:
through its ability to generate and clarify predictions in natural language, both humans and LLMs
can directly interact with G RAPH T EXT.
To illustrate this concept, we will use Cora node #2188. Figure 2 (a) shows two types of text at-
tributes we use: the center-node pseudo labels and the PPR (Personalized PageRank) pseudo label
sequence, where the first PPR neighbor denotes the most important label prediction. Upon examin-
ing the demonstrations (marked in blue), it becomes apparent that the PPR pseudo-labels provide a
more robust mechanism for paper topic prediction. Utilizing either a count of PPR labels followed
by a majority vote, or merely referencing the foremost PPR label, consistently results in the correct
categorization in the given examples. Hence, based on these graph inductive biases derived from
samples, we can reasonably figure out correct topic of the target paper should be A, which not only
is the first entry, but also the predominant label in the PPR pseudo-label sequence.
We leverage G RAPH T EXT with ChatGPT and GPT-4 to perform graph reasoning on the provided
example. Their respective reasoning processes and outcomes are illustrated and summarized in
Figure 2 and Table 4.1 respectively, from which we draw several key insights:
1. LLMs inherently possess the knowledge and inductive bias toward graph reasoning. Specif-
ically, both ChatGPT and GPT-4 acknowledge the importance of center-nodes and sometimes make
predictions based on center-node labels. ChatGPT exhibits reasoning with the center-node bias
53.3% of the time, while GPT-4 does so at a rate of 26.7%.
2. LLMs can adjust their prior inductive bias based on demonstrations. Through in-context
learning, G RAPH T EXT can recalibrate their bias and make more accurate predictions. Our obser-
vations indicate that GPT-4 significantly outperforms ChatGPT, achieving an accuracy of 73.3%,
markedly superior to ChatGPT’s 26.7%.
3. LLMs can adapt their prior inductive bias based on human feedback. Figure 2 (b) provides
an illustrative example, with a detailed reasoning of LLM can be found in Appendix C. Specifically,
after human interaction, GPT-4 shows remarkable adaptability, achieving an impeccable accuracy of
100% and adhering to the PPR logic. Meanwhile, ChatGPT also enhances its performance notably
(gaining 36.9% in accuracy), but occasionally maintains its antecedent biases.

7
Preprint

0 label: center-node: attributes: attributes:


1st-hop: [A] feature: [0] [A] [A, B, 0, 1,
2nd-hop: [B] 1st-hop: [B] 2, 3, 2]
2
1 A
1 2 Text Attributes feature: label: [A]
(e) sequence
center-node: [0] feature: [1, 2] [0]
feature x
1st-hop: [1,
Under review as a2] 2nd-hop:
conference paper at ICLR 2024 [1,2] attributes:
3 B 3 4 2 label y 2nd-hop: [3, 2] label: [B] [3,2] [2, 0, 3, B
feature: [3, 2] 1, 2, A]
(a) An example graph (b) G RAPH T EXT: G RAPH R EASONING
(c) Rev. T EXT
hierarchy IN(d) w/o S
int. nodes
PACE (f) set
Anonymous authors
Figure 4: Ablations of graph-syntax trees. (a) An example graph. (b) G RAPH T EXT text prompt
Paper under double-blind review

(The full example can be found in Figure 1). (c-f) Text prompts of different tree designs.
A BSTRACT
Large language models (LLMs) have gained the ability to assimilate human
knowledge and facilitate natural language interactions with both humans and other
LLMs. However, despite their impressive achievements, LLMs have not made
Table 3: Ablations of G RAPH T EXT on Cora, Citeseer and Texas.
significant advancements in the realm of graph machine learning. This limitation
arises because graphs encapsulate distinct relational data, making it challenging
Cora Citeseer Texas
to transform them into natural language that LLMs understand. In this paper, we
Model bridge this gap with a novel framework, G RAPH T EXT, that translates graphs to
natural language. G RAPH T EXT derives a graph-syntax tree for each graph that
Acc. % ∆ Acc. % ∆ Acc. %
encapsulates both the node attributes and inter-node relationships. Traversal of
the tree yields a graph text sequence, which is then processed by an LLM by treat- ∆
ing graph tasks as text generation tasks. Notably, G RAPH T EXT offers multiple

G RAPH T EXT 76.5 - 58.6 - 75.7


advantages. It introduces training-free graph reasoning: without explicit training
on graph data, G RAPH T EXT with ChatGPT can achieve on par with, or even sur- -
passing, the performance of supervisedly trained graph neural networks through
rev. hierarchy 68.3 -10.7 % 57.6 -1.7 % 73.0
in-context learning (ICL). Furthermore, G RAPH T EXT paves the way for interac- -3.6 %
tive graph reasoning, allowing both humans and LLMs to communicate with the
w/o int. nodes 67.8 -11.4 % 56.3 -3.9 % 75.7
model seamlessly using natural language. These capabilities underscore the vast,
yet-to-be-explored potential of LLMs in the domain of graph machine learning.
-0 %
sequence 67.0 -12.4 % 53.0 -9.6 % 70.3 -7.1 %
set 65.9 1
-13.9 %
I NTRODUCTION
56.4 -3.8 % 67.6 -10.6 %
Language stands as a cornerstone of human civilization, acting as the primary medium for knowl-
edge encoding, reasoning, and communication. Large language models (LLMs), through gener-
ative pre-training on vast sequential text corpora, have emerged with impressive reasoning capa-
bilities (Brown et al., 2020; Bubeck et al., 2023; Anil et al., 2023; OpenAI, 2023). Moreover,
LLMs exhibit the capability to engage in natural language-based communication not only with them-
In summary, through graph reasoning in natural language, G RAPH T EXT can effectively leverage
selves (Wei et al., 2022; Zelikman et al., 2022) but also with human and other LLMs (Li et al., 2023),
enabling them to demonstrate exceptional skills such as multi-step reasoning (Yao et al., 2023a), de-
its pre-trained knowledge to engage in graph reasoning and, crucially, adapt its existing knowledge
cision making (Yao et al., 2023b; Liang et al., 2023), tool using (Schick et al., 2023) and multi-agent
collaboration (Park et al., 2023; Hong et al., 2023).
through demonstrations or external feedback.
Motivation. Despite the remarkable success of LLMs in handling natural languages, their appli-
cation to other data modalities presents unique challenges, primarily because these data often lack
straightforward transformation into sequential text. These challenges are especially severe when
dealing with graph-structured data, as different graphs define structure and features in distinct ways.
4.3 A BLATION S TUDIES ON G RAPH - SYNTAX T REES Therefore, existing efforts within the graph machine learning field commonly require the training of
specific graph neural networks (GNNs) tailored to individual graphs (Kipf & Welling, 2017; Velick-
ovic et al., 2018; Xu et al., 2019). Often, models trained on one graph cannot generalize to the
unseen structure and feature representations of other graphs. Moreover, the gap between graphs and
The graph-syntax tree serves as the core design of G RAPH T EXT, transforming a graph into a 1-
human languages hinders the application of natural language reasoning to facilitate graph reasoning.
In light of these limitations, a question arises: can we derive a language for graph in natural lan-
dimensional natural language sequence. guage?Within
In this paper, G RAPH
we give T EXT
an affirmative answer ,bythe text
proposing G and
T , arelational
novel framework data of a graph
RAPH EXT
designed for graph reasoning in text space. As shown in Figure 1 (c), the core of our approach is to
are initially formulated, followed by the construction
leverage of
a tree-based prompt module a
that graph-syntax
converts graph information intotree. This
natural language section
as input
for an LLM and models graph reasoning as text generation. Specifically, in analogy to the syntax
delves into
the ablations of various methods for building graph-syntax trees. T traverses the tree to form a graph
tree of linguistics (Chiswell & Hodges, 2007), we derive a graph-syntax tree that encapsulates the
node attributes and inter-node relationships. Then, G RAPH EXT
prompt in natural language, enabling graph reasoning for the LLM.
As shown in Figure 4, besides the proposed G RAPH T EXT method for constructing a graph-syntax
1
tree, we present four ablation types: (1) Reverse hierarchy (denoted as rev. hierarchy in Figure 4
(c): The tree hierarchy is inverted, positioning the relationship type at the top and the text attribute
type at the bottom. (2) Without internal nodes (denoted as w/o int. nodes in Figure 4 (d)): The
internal nodes of the graph-syntax tree are eliminated, but the G RAPH T EXT hierarchy remains intact
(note the indents are kept, maintaining the hierarchical structure of the tree). (3) Sequential prompt
(denoted as sequence in Figure 4 (e)): The tree hierarchy is removed, yielding a sequence of text
attributes. (4) Set prompt (denoted as set in Figure 4 (f)): Sequence order is removed, yielding a set.
From Table 3, several observations can be made: (1) The graph-syntax tree of G RAPH T EXT con-
sistently outperforms the others, underscoring the efficacy of our approach. (2) The hierarchical
structure of the tree plays a crucial role in the design of the graph prompt. Specifically, we observe
a sheer performance drop when using a sequence or a set to represent the graph information. Upon
inspecting the LLM’s reasoning, we found it treats graph learning purely as label counting with-
out recognizing structure. (3) Variations in the tree hierarchy design can impact performance; for
instance, rev. hierarchy underperforms compared to G RAPH T EXT. (4) The comparison between
w/o int. nodes and G RAPH T EXT reveals the importance of making LLMs aware of text attribute
types. The only exception is the Texas dataset since all types of attributes are almost identical in this
dataset (detailedly discussed in Appendix B.3). This suggests that LLMs utilize text descriptions to
distinguish and understand different attributes during graph reasoning.

4.4 E XPERIMENTS ON T EXT- ATTRIBUTED G RAPH

In this section, we demonstrate that G RAPH T EXT is also applicable to text-attributed graphs. As
depicted in Table 4, we conducted training-free node classification on the Cora and Citeseer datasets

8
Preprint

Table 4: Node classification results (accuracy %) on real-world text attributed graphs. Experiments
are conducted using in-context learning with ChatGPT, as well as instruction tuning with Llama-2-
7B. Note that “text” refers to raw text attributes, while “feat” represents the continuous features on
the graph. The top results for each category are highlighted in bold.
Framework Model Cora Citeseer
GCN 89.13 74.92
GNNs
GAT 89.68 75.39
ChatGPT-text 67.77 68.98
ChatGPT-feat 10.68 16.14
ChatGPT-feat+text 65.19 66.46
G RAPH T EXT
Llama-2-7B-text 60.59 49.37
Llama-2-7B-feat 87.11 74.77
Llama-2-7B-feat+text 77.53 73.83

with both raw text attributes (Chen et al., 2023) and continuous features (Reimers & Gurevych,
2019). We observed that using closed-source LLMs, such as ChatGPT, the performance lags behind
the GNN baseline methods. Thus, we further explored the potential of instruction tuning on currently
available open-source LLMs, such as Llama-2 (Touvron et al., 2023). For natural language prompts
construction, we adopted an approach almost identical to the in-context learning setting. Further-
more, we expand the original vocabulary of Llama-2 by introducing selected options as new tokens
and then fine-tune the large language model by the widely-used and efficient Low-Rank Adaptation
(LoRA) (Hu et al., 2022).
From the results in Table 4, it is evident that even with a relatively smaller open-source model,
Llama-2-7B, our best results from instruction tuning across various settings surpass those of Chat-
GPT and approach the GNN baselines. This validates that our method can be beneficial in an
instruction-tuning scenario. It also implies that using G RAPH T EXT, we can feasibly fine-tune
smaller open-source LLMs with reasonable computational costs, achieving performances that can
rival or even surpass those of much larger closed-source models, such as ChatGPT or GPT-4.
Another intriguing observation is the notably poor performance of ChatGPT in settings incorpo-
rating continuous feature – nearing a random guess. This is attributable to the inherent limitation
of these closed-source LLMs: they are designed to process raw discrete text inputs and fail to di-
rectly handle the continuous inputs. In contrast, open-source LLMs possess the ability to map these
continuous embeddings into their embedding space, facilitating improved performance.
Upon contrasting these two groups of models, we noticed a decline in the performance of open-
source models when processing raw text inputs. This decline can be ascribed to the constraints
imposed by the size of the LLM parameters and the volume of pre-training corpora used. It suggests
that harnessing larger-scale open-source models, such as Llama-2 variants including 13B, 30B, and
70B, would significantly bolster their modeling capacity for raw text. Concurrently, by leveraging
the ability to process continuous embeddings, these models would inevitably exhibit enhanced graph
reasoning capabilities, paving the way for more sophisticated graph-based applications.

5 C ONCLUSION

In this paper, we propose G RAPH T EXT, a framework that enables graph reasoning in text space. It
easily incorporates the inductive bias of GNNs by constructing a graph-syntax tree. The traversal
of a graph-syntax tree leads to a graph prompt in natural language and is fed to LLM to perform
graph reasoning as text generation. G RAPH T EXT enables training-free graph reasoning where a
G RAPH T EXT-LLM can deliver performance on par with, or even surpassing, supervised graph neu-
ral networks through in-context learning. What’s more, G RAPH T EXT fosters explainable and inter-
active graph reasoning: G RAPH T EXT performs graph reasoning in natural language which enables
humans and LLMs to engage with graph learning using natural language. These abilities highlight
the immense and largely untapped potential of LLMs in the realm of graph machine learning.

9
Preprint

E THICS S TATEMENT
Graphs are prevalent in the real world. On the bright side, G RAPH T EXT alleviates the computa-
tional load and carbon footprint associated with training numerous non-transferable, graph-specific
models. However, while the training-free graph reasoning capability of G RAPH T EXT introduces
minimal costs, there’s potential for misuse in malicious recommendation systems and malware.

R EFERENCES
Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos,
Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark,
Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark
Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang,
Gustavo Hernández Ábrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan A. Botha, James
Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry,
Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa
Dehghani, Sunipa Dev, Jacob Devlin, Mark Dı́az, Nan Du, Ethan Dyer, Vladimir Feinberg, Fangx-
iaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez,
and et al. Palm 2 technical report. CoRR, 2023.
Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek
Rózemberczki, Michal Lukasik, and Stephan Günnemann. Scaling graph neural networks with
approximate pagerank. In KDD, 2020.
Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? In ICLR,
2022.
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are
few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Ka-
mar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott M. Lundberg, Harsha Nori, Hamid Palangi,
Marco Túlio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments
with GPT-4. CoRR, 2023.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared
Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri,
Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan,
Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian,
Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fo-
tios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex
Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders,
Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa,
Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob
McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating
large language models trained on code. CoRR, 2021.
Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph
convolutional networks. In ICML, 2020.
Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei
Yin, Wenqi Fan, Hui Liu, and Jiliang Tang. Exploring the potential of large language models
(llms) in learning on graphs. CoRR, 2023.
Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank
graph neural network. In ICLR, 2021.
Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, and
Inderjit S. Dhillon. Node feature extraction by self-supervised multi-scale neighborhood predic-
tion. In ICLR, 2022.

10
Preprint

Ian Chiswell and Wilfrid Hodges. Mathematical logic, volume 3 of Oxford texts in logic. Clarendon
Press, 2007. ISBN 978-0-19-921562-1.
David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes,
Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-Dickstein, Kevin Murphy, and
Charles Sutton. Language model cascades. CoRR, 2022.
Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson.
Graph neural networks with learnable structural and positional representations. In ICLR, 2022.
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He. Learning discrete structures
for graph neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings
of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long
Beach, California, USA, Proceedings of Machine Learning Research, 2019.
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. Complexity-based prompting
for multi-step reasoning. In ICLR, 2023.
C Lee Giles, Kurt D Bollacker, and Steve Lawrence. Citeseer: An automatic citation indexing
system. In Proceedings of the third ACM conference on Digital libraries, pp. 89–98, 1998.
Jiayan Guo, Lun Du, Hengyu Liu, Mengyu Zhou, Xinyi He, and Shi Han. Gpt4graph: Can large
language models understand graph structured data ? an empirical evaluation and benchmarking.
CoRR, abs/2305.15066, 2023. doi: 10.48550/arXiv.2305.15066. URL [Link]
10.48550/arXiv.2305.15066.
William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In NeurIPS, 2017.
Xiaoxin He, Xavier Bresson, Thomas Laurent, and Bryan Hooi. Explanations as features: Llm-
based features for text-attributed graphs. CoRR, abs/2305.19523, 2023. doi: 10.48550/arXiv.
2305.19523. URL [Link]
Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang,
Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, and Chenglin Wu.
Metagpt: Meta programming for multi-agent collaborative framework, 2023.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang,
and Weizhu Chen. Lora: Low-rank adaptation of large language models. In ICLR, 2022.
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional net-
works. In ICLR, 2017.
Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate:
Graph neural networks meet personalized pagerank. In ICLR, 2019.
Devin Kreuzer, Dominique Beaini, William L. Hamilton, Vincent Létourneau, and Prudencio
Tossou. Rethinking graph transformers with spectral attention. In NeurIPS, 2021.
Chaozhuo Li, Bochen Pang, Yuming Liu, Hao Sun, Zheng Liu, Xing Xie, Tianqi Yang, Yanling
Cui, Liangjie Zhang, and Qi Zhang. Adsgnn: Behavior-graph augmented relevance modeling in
sponsored search. In Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and
Tetsuya Sakai (eds.), SIGIR, 2021.
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem.
CAMEL: communicative agents for ”mind” exploration of large scale language model society.
CoRR, 2023.
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and
Andy Zeng. Code as policies: Language model programs for embodied control. In ICRA, 2023.
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-
train, prompt, and predict: A systematic survey of prompting methods in natural language pro-
cessing. ACM Comput. Surv., 2023.

11
Preprint

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In 7th International
Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
[Link], 2019. URL [Link]
Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Automating the
construction of internet portals with machine learning. Information Retrieval, 3:127–163, 2000.
OpenAI. Introducing chatgpt. 2023. URL [Link]
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and
Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. CoRR, 2023.
Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. Geom-gcn: Geometric
graph convolutional networks. In ICLR, 2020.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi
Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text
transformer. J. Mach. Learn. Res., 2020.
Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. Deepspeed: System op-
timizations enable training deep learning models with over 100 billion parameters. In Ra-
jesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (eds.), KDD ’20: The 26th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, Au-
gust 23-27, 2020, pp. 3505–3506. ACM, 2020. doi: 10.1145/3394486.3406703. URL https:
//[Link]/10.1145/3394486.3406703.
Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-
networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language
Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-
IJCNLP), pp. 3982–3992, 2019.
Joshua Robinson and David Wingate. Leveraging large language models for multiple choice ques-
tion answering. In The Eleventh International Conference on Learning Representations, ICLR
2023, Kigali, Rwanda, May 1-5, 2023. [Link], 2023. URL [Link]
net/pdf?id=yKbprarjc5B.
Timo Schick, Jane Dwivedi-Yu, Roberto Dessı̀, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer,
Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to
use tools. CoRR, 2023.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Niko-
lay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher,
Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy
Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn,
Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel
Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee,
Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra,
Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi,
Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh
Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen
Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic,
Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models.
2023.
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua
Bengio. Graph attention networks. In ICLR, 2018.
Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, and Yulia Tsvetkov.
Can language models solve graph problems in natural language? CoRR, 2023a.
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong
Lu, Jie Zhou, Yu Qiao, and Jifeng Dai. Visionllm: Large language model is also an open-ended de-
coder for vision-centric tasks. CoRR, abs/2305.11175, 2023b. doi: 10.48550/arXiv.2305.11175.
URL [Link]

12
Preprint

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi,
Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language
models. In NeurIPS, 2022.
Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger.
Simplifying graph convolutional networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov
(eds.), ICML, 2019.
Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie
Jegelka. Representation learning on graphs with jumping knowledge networks. In ICML, 2018.
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural
networks? In ICLR, 2019.
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik
Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. CoRR,
2023a.
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao.
React: Synergizing reasoning and acting in language models. In ICLR, 2023b.
Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, and Yongfeng Zhang. Natural language
is all a graph needs. CoRR, abs/2308.07134, 2023. doi: 10.48550/arXiv.2308.07134. URL
[Link]
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and
Tie-Yan Liu. Do transformers really perform bad for graph representation? In NeurIPS, 2021.
Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kan-
nan, Viktor K. Prasanna, Long Jin, and Ren Chen. Decoupling the depth and scope of graph
neural networks. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang,
and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34:
Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December
6-14, 2021, virtual, pp. 19665–19679, 2021. URL [Link]
paper/2021/hash/[Link].
Wentao Zhang, Ziqi Yin, Zeang Sheng, Yang Li, Wen Ouyang, Xiaosen Li, Yangyu Tao, Zhi Yang,
and Bin Cui. Graph attention multi-layer perceptron. In KDD, 2022.
Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. Learning
on large-scale text-attributed graphs via variational inference. 2023.
Tong Zhao, Yozen Liu, Leonardo Neves, Oliver J. Woodford, Meng Jiang, and Neil Shah. Data
augmentation for graph neural networks. In AAAI, 2021.

13
Preprint

R EPRODUCIBILITY S TATEMENT

The code to reproduce our results will be available soon. Our experimental settings and implementa-
tion details are stated in Section A.1, and the important hyper-parameters are discussed in Appendix
A.3 section.

A E XPERIMENTAL S ETTINGS

A.1 I MPLEMENTATION D ETAILS

In our experiments node classification is approached as a multi-choice QA challenge, and we em-


ploy the prompt detailed in Appendix B. The raw text attributes can be directly leveraged to form
the textual information F . Additionally, we utilize the dataset metadata to extract raw text labels,
creating a textual feature for each node. Nodes without data are assigned the value “NA”. Con-
sequently, every general graph can be perceived as a text-attributed graph with a minimum of one
type of text attribute, namely the label. For continuous attributes, we use K-means to discretize
continuous features (with the K being the number of classes).
During the in-context-learning experiments, to prevent meaningless demonstrations lacking neigh-
boring labels, we choose a sample with the highest degree from each label set. For the instruction
tuning experiments using open-source LLMs, we can leverage continuous attributes in a more flexi-
ble way. Specifically, inspired by multi-modality LLMs Wang et al. (2023b), we use a Multi-Layer
Perceptron (MLP) projector to map continuous features into the input text space, the token embed-
ding space of LLaMA-2 (Touvron et al., 2023). We utilize AdamW (Loshchilov & Hutter, 2019)
in conjunction with DeepSpeed (Rasley et al., 2020) to train the huggingface LLaMA2-7b model 2 ,
with FP16 activated.

A.2 DATASETS

Table 5: The statistics of the datasets.


Benchmarks #Nodes #Edges #Classes #Features #Train #Validation #Test
Cora 2708 5278 7 1433 140 500 1000
Cora-TAG 2708 5278 7 1433 1624 541 543
Citeseer 3327 4552 6 3703 120 500 1000
Citeseer-TAG 3327 4552 6 3703 1911 637 638
Cornell 183 298 5 1703 87 59 37
Texas 183 325 5 1703 87 59 37
Wisconsin 251 515 5 1703 120 80 51

In this section, we provide more relevant details about the datasets we used in experiments. The
dataset statistics are provided in Table 5. The datasets can be categorized into citation network
datasets (i.e. Cora, Citeseer, and ogbn-arxiv) and web-page networks (i.e. Cornell, Texas, and Wis-
consin), additionally, we use two text-attributed-graph (TAG) version of Cora and Citeseer, denoted
as Cora-TAG and Citeseer-TAG.
Citation graphs: Most GNN-related studies, as referenced in works like Kipf & Welling (2017);
Velickovic et al. (2018), often employ citation networks as benchmarking tools. Within these net-
works, nodes represent papers from the computer science domain. The features of these nodes are
derived from bag-of-word vectors of the respective paper titles. Edges depict the citation links be-
tween these papers, while labels indicate the paper’s specific categories. The text attributes are the
title and abstract of the paper.
WebKB graphs Pei et al. (2020): Sourced by Carnegie Mellon University, aggregates web pages
from computer science departments across several universities. We employ three specific subsets
from this collection: Cornell, Texas, and Wisconsin. In these subsets, each node symbolizes a
2
[Link]

14
Preprint

web page, while edges denote hyperlinks connecting them. Nodes are characterized by a bag-of-
words representation derived from their respective web pages. These pages have been meticulously
categorized into five distinct classes: student, project, course, staff, and faculty.
The datasets mentioned above can be found in the following URLs: Cora 3 , Citeseer 4 , Cora-TAG 5 ,
Citeseer-TAG 6 , Texas 7 , Cornell 8 , Wisconsin 9 .

A.3 H YPERPARAMETERS

In G RAPH T EXT, the selection of text attributes F and relations R are the most important parameters.
Here we discuss their choices and report the selected parameters in Table 6. For text attributes F ,
there are several choices: propagated features Ak X, propagated labels Ak YL , raw features X and
labels YL ; For relationship R, there are several choices: k-hop shortest-path distance, denoted as Sk ,
propagated feature similarity, denoted as sim(Ak X), and pagerank matrix Klicpera et al. (2019),
with restart probability α = 0.25, denoted as Π.

Table 6: G RAPH T EXT in-context learning hyperparameters.


Text Attributes Relations
Cora A2 YL ,A3 YL S0 ,Π, sim(A2 X),sim(A3 X)
Citeseer X, A3 YL S0 ,S2 ,Π, sim(A2 X)
Taxes A2 YL S0 ,S2
Wisconsin YL S0 ,sim(X), sim(AX)
Cornell A2 YL S0 , S0 ,sim(X), sim(AX)

B P ROMPT E XAMPLES
B.1 F EW- SHOT I N - CONTEXT L EARNING

Example of Citeseer:
[Human]: You are a helpful assistant that classifies the topic of an academic paper
based on the labels of the cited papers. You are going to choose the correct answer from
several choices of paper categories: [A: Agents, B: Artificial Intelligence, C: Database, D:
Information Retrieval, E: Machine Learning, F: Human Computer Interaction]

Here are a few examples:


<information>
<third-order pseudo labels>
<center node>[’A’]</center node>
<1st feature similarity graph>[’A’, ’A’, ’A’]</1st feature similarity graph>
<ppr>[’A’, ’B’, ’A’]</ppr>
</third-order pseudo labels>
</information>
<question>What’s the topic of academic paper given the information above?</question>
<answer>A</answer>

Remaining examples . . .

3
[Link]
4
[Link]
5
[Link]
6
[Link]
7
[Link]
8
[Link]
9
[Link]

15
Preprint

Now let’s answer the question below:


<information>
<third-order pseudo labels>
<1st feature similarity graph>[’C’, ’B’, ’B’]</1st feature similarity graph>
<ppr>[’C’]</ppr>
</third-order pseudo labels>
</information>

What’s the topic of the paper given the information above? Valid choices are [A: Agents,
B: Artificial Intelligence, C: Database, D: Information Retrieval, E: Machine Learning, F:
Human computer interaction]. Remember, your answer should be in the form of the class
choice wrapped by <answer></answer>.

[Assistant]: <answer>C</answer>

B.2 I NSTRUCTION T UNING

Example of Cora:
[Human]: Your goal is to perform node classification. You are given the information of
each node in a xml format. Using the given information of a node, you need to classify
the node to several choices: [<c0>: Rule Learning, <c1>: Neural Networks, <c2>:
Case Based, <c3>: Genetic Algorithms, <c4>: Theory, <c5>: Reinforcement Learning,
<c6>: Probabilistic Methods]. Remember, your answer should be in the form of the class
label.
<information>
<feature>
<center node><x><x emb></x></center node>
<1st feature similarity graph><x><x emb></x></1st feature similarity graph>
</feature>
</information>

[Assistant]: The answer is: <c6>

Note that the “<x emb>” is the text token embedding for feature feature “x” generated by the MLP
projector discussed in Appendix A.1.

B.3 E XAMPLES OF T EXAS

Node # 132, 136, 143, . . .


Graph information:
pseudo labels:
center-node:[’D’]
second-hop neighbor:[’D’, ’D’, ’D’, ’D’, ’D’]
Target class: D

Node # 30
Graph information:
pseudo labels:
center-node:[’D’]
second-hop neighbor:[’D’, ’E’, ’D’, ’D’, ’D’]
Target class: D

Node # 158
Graph information:
pseudo labels:

16
Preprint

center-node:[’A’]
second-hop neighbor:[’A’]
Target class: A

We can observe that for the best setting in the Texas datasets, with hyperparameters discussed in
Table 6, the center-node pseudo labels mostly assemble the second-hop neighbors. Consequently,
removing the text information, i.e. removing the internal nodes in the graph-syntax tree in Section
4.3 does not hurt the performance.
This also shows the advantage of decoupling depth and scope Zeng et al. (2021) in the graph-syntax
tree of G RAPH T EXT, which explains the performance gain of G RAPH T EXT over standard GNNs,
e.g. GCN and GAT. A similar observation is also drawn in Figure 8 (i) of (Chien et al., 2021), where
A2 serves as the most important high-order aggregation scheme for Texas dataset.

C I NTERACTIVE G RAPH R EASONING


Since G RAPH T EXT facilitates graph learning within a textual domain, it allows for direct interaction
between both humans and AI agents. In this section, we spotlight the interactive graph reasoning
capabilities of G RAPH T EXT using a practical example. First, we demonstrate how G RAPH T EXT
can engage in self-interaction via zero-shot chain of thought reasoning. Following that, we illustrate
how human interactions can guide G RAPH T EXT to refine its graph reasoning approach.

C.1 Z ERO - SHOT C HAIN OF T HOUGHT R EASONING

Below is the example of graph reasoning on Cora node #2188 in the setting of standard zero-shot
chain of thought reasoning Wei et al. (2022)10 The input prompt for “Cora node #2188” is as below:

Input Prompt Cora node #2188:


[Human]: You are a helpful assistant that generates a classifies the topic of an academic
paper based on the labels of the cited papers. You are going to choose the correct answer
from several choices of paper categories:[A: Theory, B: Reinforcement Learning, C:
Genetic Algorithm, D: Neural Network, E: Probabilistic Method, F: Case Based, G: Rule
Learning]
Here are a few examples:
Graph information:
pseudo labels:
center-node:[’F’]
ppr:[’A’, ’A’, ’A’, ’A’]
Topic of paper: A

Graph information:
pseudo labels:
center-node:[’B’]
ppr:[’B’, ’B’, ’B’, ’B’]
Topic of paper: B

Graph information:
pseudo labels:
center-node:[’C’]
ppr:[’C’, ’C’, ’C’, ’C’]
Topic of paper: C

Graph information:

10
ChatGPT and GPT-4 perform reasoning with explanation by default. Therefore, there is no necessity to
add the additional “Let’s think step-by-step” prompt for the text input.

17
Preprint

pseudo labels:
center-node:[’D’]
ppr:[’D’, ’D’, ’E’, ’D’]
Topic of paper: D

Graph information:
pseudo labels:
center-node:[’A’]
ppr:[’E’, ’E’, ’A’, ’A’]
Topic of paper: E

Graph information:
pseudo labels:
center-node:[’F’]
ppr:[’F’, ’F’, ’F’, ’F’]
Topic of paper: F

Graph information:
pseudo labels:
center-node:[’G’]
ppr:[’G’, ’G’, ’G’, ’G’]
Topic of paper: G

Now let’s answer the question below:


Graph information:
pseudo labels:
center-node:[’G’]
ppr:[’A’, ’G’, ’A’, ’E’]
What’s the topic of the paper given the information above? Valid choices are [A: Theory,
B: Reinforcement Learning, C: Genetic Algorithm, D: Neural Network, E: Probabilistic
Method, F: Case Based, G: Rule Learning]

As discussed in Section 4.2, according to the demonstrations, the PPR pseudo-labels provide a more
robust mechanism for paper topic prediction. Utilizing either a count of PPR labels followed by
a majority vote, or merely referencing the foremost PPR label, consistently results in the correct
categorization in the given examples. Hence, based on these graph inductive biases derived from
samples, we can reasonably figure out correct topic of the target paper should be A, which not only
is the first entry, but also the predominant label in the PPR pseudo-label sequence. Now, let’s use
ChatGPT and GPT-4 to answer the problem. We performed 15 times of evaluation and here we show
one sample of correct and incorrect responses below:

Correct Response without Human Interaction


[ChatGPT]: Among the PPR labels, ’A’ appears twice, ’G’ appears once, and ’E’ appears
once. ’A’ is the most frequent label, so we classify the topic of the paper as ’A.’
Therefore, the topic of the paper is A: Theory.

[GPT-4]: Given the examples provided:


1. Whenever the center-node label is consistent with the majority of the PPR pseudo labels,
the topic of the paper matches the center-node label.
2. However, when there’s a discrepancy between the center-node label and the majority of
the PPR labels, the topic of the paper is typically determined by the first (most crucial) label
in the PPR list.
Given the graph information provided:
- The center node is labeled ’G’ which corresponds to ’Rule Learning’.
- The ’ppr’ nodes are labeled ’A’, ’G’, ’A’, and ’E’, which corresponds to ’Theory’, ’Rule
Learning’, ’Theory’, and ’Probabilistic Method’ respectively.

18
Preprint

It seems that while the center node is ’G’, the cited papers are mostly of category ’A’ and
’E’.
From the pattern observed in the examples, the category that has the majority or at least
equal emphasis with other categories in the ’ppr’ nodes tends to be the category of the paper.
Therefore, in this case, ’A’ or ’Theory’ has the highest count in the ’ppr’ nodes.
Thus, the topic of the paper given the information above is: A: Theory.

Wrong Response without Human Interaction


[ChatGPT]: Based on the labels in the ”ppr” list, we can see that ’A’ (Theory), ’G’ (Rule
Learning), and ’E’ (Probabilistic Method) are mentioned. However, the ’center-node’ label
is ’G’ (Rule Learning), which indicates a stronger association with ”Rule Learning.” So, the
topic of the paper is most likely ”Rule Learning (G).”

[GPT-4]: Given the graph information: pseudo labels: center-node:[’G’] ppr:[’A’, ’G’, ’A’,
’E’]
The center node label is ’G’, which corresponds to ”Rule Learning”. The papers referenced
(ppr) are mixed between ”Theory”, ”Rule Learning”, and ”Probabilistic Method”. However,
since the center node holds the primary topic label, the academic paper’s topic is: G: Rule
Learning.

From the analysis above, it’s clear that:


By recasting the graph learning challenge into problems that both humans and LLMs can interpret
and reason with, G RAPH T EXT facilitates LLMs in graph reasoning. For instance, ChatGPT can de-
duce the right answer through label counting, as highlighted in green. Impressively, GPT-4 discerns
the correct solution by employing the Personalized PageRank (PPR) prior, as illustrated in blue, and
identifies the first PPR neighbor as the most important factor for prediction.
However, the lucid explanations rendered in natural language allowed us to pinpoint certain inade-
quacies in their graph reasoning approach:
Failure to Retrieve Prior Knowledge: Table 4.1 shows that LLMs, particularly ChatGPT, don’t
always effectively recall their previous understanding of graphs; only 26.7% of ChatGPT’s reasoning
relies on PPR.
Persistence in Prior Belief: Despite the inherent reasoning prowess of LLMs, they can sometimes
be hindered by preconceived notions about graphs. A detailed examination of the results in Table 4.1,
accompanied by the explanations of reasoning above, indicates that a deep-seated bias - where the
central node is viewed as the most vital (denoted in red) - is a primary cause of mistakes.
In the following section, we engage with G RAPH T EXT to enhance its graph reasoning capabilities.

C.2 G RAPH R EASONING WITH H UMAN I NTERACTION

As highlighted in the preceding section, LLMs occasionally struggle with recalling prior knowledge
and remain anchored to their pre-existing beliefs. To address this, we engage with G RAPH T EXT by:
(1) providing a clear definition of PPR, and (2) explicitly instructing them to assess the reliability of
both center-node pseudo and PPR pseudo labels. This leads to the following message:

Human Interaction Note that PPR pseudo labels represent a sequence of labels ranked by
importance, with the first one being the most crucial. Let’s evaluate the reliability of center
node pseudo labels against PPR pseudo labels and then make another prediction.

With the interaction above, in most cases LLMs refined their answer, below are some examples:

Correct Reasoning after Human Interaction


[ChatGPT]: In this PPR list, the labels ’A’ and ’G’ are present, with ’A’ appearing before

19
Preprint

’G,’ which means ’A’ is considered more crucial in the context of this paper. Now, let’s
make a prediction based on this evaluation: Since ’A’ is the most crucial label in the PPR
list, we will prioritize it for classifying the topic of the paper. Therefore, the paper’s primary
topic is ”A: Theory” (Category A). The presence of ’G’ in the PPR list indicates some
relevance to ”Rule Learning” (Category G), but because it comes after ’A,’ it is considered
less important in this context. So, the revised prediction is that the primary topic of the
paper is ”A: Theory.”

[GPT-4]: Given the new information that PPR pseudo labels represent a sequence of labels
ranked by importance, let’s re-evaluate the provided data:
The center node suggests the paper is about ’G’ (Rule Learning). However, considering the
prominence of ’A’ (Theory) in the ranked sequence, and since it’s the most crucial label, the
paper might be primarily focused on theoretical aspects related to Rule Learning. Consider-
ing both the center node and the importance sequence, the topic of the paper should be: A:
Theory
Refusion after Human Interaction
[ChatGPT]: Certainly, let’s evaluate the reliability of center node pseudo labels against PPR
pseudo labels.
In the provided graph information: The center-node label is ’G,’ which suggests that the
most important label for this paper is ’G.’ However, when we consider the PPR labels, we
see that ’A’ appears as the first and third label, which might indicate some importance as
well.
Let’s consider both the center-node and the PPR labels for our prediction. Given that ’G’ is
the center-node label and ’A’ appears in the PPR labels, we can prioritize the center-node
label but also consider the presence of ’A’ in the PPR labels.
So, the prediction for the topic of the paper can be: Topic of paper: G (with some influence
from A).
Confusion after Human Interaction
[ChatGPT]: Based on this information, we can classify the topic of the paper as a combi-
nation of the most frequent labels among the neighboring papers, which are ’A’ and ’G’.
Therefore, the topic of the paper is likely a combination of topics ’A’ and ’G’.
So, the answer is a combination of two topics, and the closest match among the given op-
tions is:
Option: F: Case Based # Note that F does not present in the pseudo labels of the question

The provided examples, along with the consolidated findings in Table 4.1, compellingly show that
both ChatGPT and GPT-4 can adjust their pre-existing biases about graphs when given human feed-
back. Notably, after this interaction, GPT-4 delivers a flawless accuracy rate of 100%, consistently
following the PPR logic. Meanwhile, ChatGPT also sees a significant performance boost, with an
accuracy improvement of 36.9%. However, as evidenced in the examples, ChatGPT occasionally
refuses updating its predictions or becomes confused.

D L IMITATIONS AND F UTURE W ORK

While the G RAPH T EXT framework offers notable advantages, there is ample room for enhancement
and exploration of new applications.
One primary concern not fully addressed in this paper is the question of how to discretize continuous
features. As evident from Table 6, most optimal settings are label-based. This makes G RAPH T EXT
resemble a label propagation model, with Citeseer being the only exception. We posit that the
observed trend might be attributed to two main factors: 1) The ineffectiveness of the discretization
process, and 2) The discord between feature and label spaces, making reasoning challenging for
LLMs.
Additionally, the design space of the graph-proxy tree is extensive and often requires either expert
knowledge or hyperparameter optimization. Although G RAPH T EXT boasts flexibility and general-
ity, crafting the text-attribute set F and relation set R, and determining their combinations result

20
Preprint

in a vast search space. However, given that G RAPH T EXT operates on a training-free paradigm,
hyperparameter optimization can be swift.
Notwithstanding its constraints, G RAPH T EXT introduces avenues for novel research. Chiefly, it
sets the stage for graph reasoning in natural language. Emerging advancements in the LLM realm
can potentially be integrated into the graph ML domain. This includes areas such as multi-step
reasoning (Yao et al., 2023a), decision-making (Yao et al., 2023b; Liang et al., 2023), tool utiliza-
tion (Schick et al., 2023), and multi-agent collaboration (Park et al., 2023; Hong et al., 2023). Fur-
thermore, the prospect of training-free graph learning streamlines the validation process for graph
model designs. If one assumes the optimal relation set R and feature set H to be transferable, signifi-
cant training time can be saved. Researchers can quickly identify suitable settings with G RAPH T EXT
and then apply these configurations to the hyperparameters of other GNNs/LLMs.

21

You might also like