0% found this document useful (0 votes)
850 views12 pages

GEO Generative Engine Optimization

Método de tratamento para alucinações das IA generativas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
850 views12 pages

GEO Generative Engine Optimization

Método de tratamento para alucinações das IA generativas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

GEO: Generative Engine Optimization

Pranjal Aggarwal∗ Vishvak Murahari∗ Tanmay Rajpurohit


Indian Institute of Technology Delhi Princeton University Independent
New Delhi, India Princeton, USA Seattle, USA
[email protected] [email protected] [email protected]

Ashwin Kalyan Karthik Narasimhan Ameet Deshpande


Independent Princeton University Princeton University
Seattle, USA Princeton, USA Princeton, USA
[email protected] [email protected] [email protected]
arXiv:2311.09735v3 [cs.LG] 28 Jun 2024

ABSTRACT CCS CONCEPTS


The advent of large language models (LLMs) has ushered in a new • Computing methodologies → Natural language processing;
paradigm of search engines that use generative models to gather Machine learning; • Information systems → Web searching
and summarize information to answer user queries. This emerging and information discovery.
technology, which we formalize under the unified framework of
generative engines (GEs), can generate accurate and personalized KEYWORDS
responses, rapidly replacing traditional search engines like Google generative models, search engines, datasets and benchmarks
and Bing. Generative Engines typically satisfy queries by synthe-
sizing information from multiple sources and summarizing them ACM Reference Format:
using LLMs. While this shift significantly improves user utility Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan,
and generative search engine traffic, it poses a huge challenge for Karthik Narasimhan, and Ameet Deshpande. 2024. GEO: Generative Engine
Optimization. In Proceedings of the 30th ACM SIGKDD Conference on Knowl-
the third stakeholder – website and content creators. Given the
edge Discovery and Data Mining (KDD ’24), August 25–29, 2024, Barcelona,
black-box and fast-moving nature of generative engines, content
Spain. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3637528.
creators have little to no control over when and how their content 3671900
is displayed. With generative engines here to stay, we must ensure
the creator economy is not disadvantaged. To address this, we in-
troduce Generative Engine Optimization (GEO), the first novel 1 INTRODUCTION
paradigm to aid content creators in improving their content visi- The invention of traditional search engines three decades ago revo-
bility in generative engine responses through a flexible black-box lutionized information access and dissemination globally [4]. While
optimization framework for optimizing and defining visibility met- they were powerful and ushered in a host of applications like aca-
rics. We facilitate systematic evaluation by introducing GEO-bench, demic research and e-commerce, they were limited to providing
a large-scale benchmark of diverse user queries across multiple do- a list of relevant websites for user queries. However, the recent
mains, along with relevant web sources to answer these queries. success of large language models [5, 21] has paved the way for
Through rigorous evaluation, we demonstrate that GEO can boost better systems like BingChat, Google’s SGE, and perplexity.ai that
visibility by up to 40% in generative engine responses. Moreover, combine conventional search engines with generative models. We
we show the efficacy of these strategies varies across domains, un- dub these systems generative engines (GE) because they search for
derscoring the need for domain-specific optimization methods. Our information and generate multi-modal responses by using multiple
work opens a new frontier in information discovery systems, with sources. Technically, generative engines (Figure 2) retrieve relevant
profound implications for both developers of generative engines documents from a database (like the internet) and use large neural
and content creators.1 models to generate a response grounded on the sources, ensuring
attribution and a way for the user to verify the information.
The usefulness of generative engines for developers and users
∗ Equal Contribution
1 Code
is evident – users access information faster and more accurately,
and Data available at https://generative-engines.com/GEO/
while developers craft precise and personalized responses, improv-
ing user satisfaction and revenue. However, generative engines
disadvantage the third stakeholder – website and content creators.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed Generative Engines, in contrast to traditional search engines, re-
for profit or commercial advantage and that copies bear this notice and the full citation move the need to navigate to websites by directly providing a
on the first page. Copyrights for components of this work owned by others than the precise and comprehensive response, potentially reducing organic
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission traffic to websites and impacting their visibility [16]. With millions
and/or a fee. Request permissions from [email protected]. of small businesses and individuals relying on online traffic and
KDD ’24, August 25–29, 2024, Barcelona, Spain visibility for their livelihood, generative engines will significantly
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0490-1/24/08 disrupt the creator economy. Further, the black-box and propri-
https://doi.org/10.1145/3637528.3671900 etary nature of generative engines makes it difficult for content
KDD ’24, August 25–29, 2024, Barcelona, Spain Pranjal Aggarwal et al.

Figure 1: Our proposed Generative Engine Optimization (GEO) method optimizes websites to boost their visibility in
Generative Engine responses. GEO’s black-box optimization framework then enables the website owner of the pizza website,
which lacked visibility originally, to optimize their website to increase visibility under Generative Engines. Further, GEO’s
general framework allows content creators to define and optimize their custom visibility metrics, giving them greater control
in this new emerging paradigm.

creators to control and understand how their content is ingested Through systematic evaluation, we demonstrate that our proposed
and portrayed. Generative Engine Optimization methods can boost visibility by
In this work, we propose the first general creator-centric frame- up to 40% on diverse queries, providing beneficial strategies for con-
work to optimize content for generative engines, which we dub tent creators. Among other things, we find that including citations,
Generative Engine Optimization (GEO), to empower content quotations from relevant sources, and statistics can significantly
creators to navigate this new search paradigm. GEO is a flexible boost source visibility, with an increase of over 40% across various
black-box optimization framework for optimizing web content vis- queries. We also demonstrate the efficacy of Generative Engine
ibility for proprietary and closed-source generative engines (Fig- Optimization on Perplexity.ai, a real-world generative engine and
ure 1). GEO ingests a source website and outputs an optimized demonstrate visibility improvements up to 37%.
version by tailoring and calibrating the presentation, text style, and In summary, our contributions are three-fold:
content to increase visibility in generative engines. (1) We propose Generative Engine Optimization, the first gen-
Further, GEO introduces a flexible framework for defining visi- eral optimization framework for website owners to optimize their
bility metrics tailor-made for generative engines as the notion of websites for generative engines. Generative Engine Optimiza-
visibility in generative engines is more nuanced and multi-faceted tion can improve the visibility of websites by up to 40% on a wide
than traditional search engines (Figure 3). While average ranking range of queries, domains, and real-world black-box generative
on the response page is a good measure of visibility in traditional engines.
search engines, which present a linear list of websites, this does (2) Our framework proposes a comprehensive set of visibility met-
not apply to generative engines. Generative Engines provide rich, rics specifically designed for generative engines and enables content
structured responses and embed websites as inline citations in the creators to flexibly optimize their content through customized visi-
response, often embedding them with different lengths, at varying bility metrics.
positions, and with diverse styles. This necessitates the need for vis- (3) To foster faithful evaluation of GEO methods in generative en-
ibility metrics tailor-made for generative engines, which measure gines, we propose the first large-scale benchmark consisting of
the visibility of attributed sources over multiple dimensions, such diverse search queries from wide-ranging domains and datasets
as relevance and influence of citation to query, measured through specially tailored for Generative Engines.
both an objective and a subjective lens.
To facilitate faithful and extensive evaluation of GEO methods,
we propose GEO-bench, a benchmark consisting of 10000 queries
from diverse domains and sources, adapted for generative engines.
GEO: Generative Engine Optimization KDD ’24, August 25–29, 2024, Barcelona, Spain

associated with (high citation precision) [14]. We refer readers to


Figure 3 for a representative generative engine response.

2.2 Generative Engine Optimization


The advent of search engines led to search engine optimization
(SEO), a process to help website creators optimize their content to
improve search engine rankings. Higher rankings correlate with
increased visibility and website traffic. However, traditional SEO
methods are not directly applicable to Generative Engines. This is
because, unlike traditional search engines, the generative model
Figure 2: Overview of Generative Engines. Generative En- in generative engines is not limited to keyword matching, and
gines primrarily consists of a set of generative models and a the use of language models in ingesting source documents and
search engine to retrieve relevant documents. Generative En- response generation results in a more nuanced understanding of
gines take user query as input and through a series of steps text documents and user query. With generative engines rapidly
generate a final response that is grounded in the retrieved emerging as the primary information delivery paradigm and SEO
sources with inline attributions. is not directly applicable; new techniques are needed. To this end,
we propose Generative Engine Optimization, a new paradigm
2 FORMULATION & METHODOLOGY where content creators aim to increase their visibility (or impres-
sion) in generative engine responses. We define the visibility of a
2.1 Formulation of Generative Engines website (also referred to as a citation) 𝑐𝑖 in a cited response 𝑟 by the
Despite the deployment of numerous generative engines to millions function 𝐼𝑚𝑝 (𝑐𝑖 , 𝑟 ), which the website creator wants to maximize.
of users, there is currently no standard framework. We provide a for- From the generative engine’s perspective, the goal is to maximize
mulation that accommodates various modular components in their the visibility of citations most relevant to the user query, i.e., maxi-
design. We describe a generative engine, which includes several
Í
mize 𝑖 𝑓 (𝐼𝑚𝑝 (𝑐𝑖 , 𝑟 ), 𝑅𝑒𝑙 (𝑐𝑖 , 𝑞, 𝑟 )), where 𝑅𝑒𝑙 (𝑐𝑖 , 𝑞, 𝑟 ) measures the
backend generative models and a search engine for source retrieval. relevance of citation 𝑐𝑖 to the query 𝑞 in the context of response 𝑟
A Generative Engine (GE) takes a user query 𝑞𝑢 and returns a nat- and 𝑓 is determined by the exact algorithmic design of generative
ural language response 𝑟 , where 𝑃𝑈 represents personalized user engine and is a black-box function to end-users. Further, both the
information. The GE can be represented as a function: functions 𝐼𝑚𝑝 and 𝑅𝑒𝑙 are subjective and not well-defined yet for
𝑓𝐺𝐸 := (𝑞𝑢 , 𝑃𝑈 ) → 𝑟 (1) generative engines, and we define them next.

Generative Engines comprise two crucial components: a.) A set 2.2.1 Impressions for Generative Engines. In SEO, a website’s im-
of generative models 𝐺 = {𝐺 1, 𝐺 2 ...𝐺𝑛 }, each serving a specific pur- pression (or visibility) is determined by its average ranking over
pose like query reformulation or summarization, and b.) A search a range of queries. However, generative engines’ output nature
engine 𝑆𝐸 that returns a set of sources 𝑆 = {𝑠 1, 𝑠 2 ...𝑠𝑚 } given a necessitates different impression metrics. Unlike search engines,
query 𝑞. We present a representative workflow in Figure 2, which, Generative Engines combine information from multiple sources
at the time of writing, closely resembles the design of BingChat. This in a single response. Factors such as length, uniqueness, and pre-
workflow breaks down the input query into a set of simpler queries sentation of the cited website determine the true visibility of a
that are easier to consume for the search engine. Given a query, a citation. Thus, as illustrated in Figure 3, while a simple ranking
query re-formulating generative model, 𝐺 1 = 𝐺𝑞𝑟 , generates a set on the response page serves as an effective metric for impression
of queries 𝑄 1 = {𝑞 1, 𝑞 2 ...𝑞𝑛 }, which are then passed to the search and visibility in conventional search engines, such metrics are not
engine 𝑆𝐸 to retrieve a set of ranked sources 𝑆 = {𝑠 1, 𝑠 2, ..., 𝑠𝑚 }. The applicable to generative engine responses.
sets of sources 𝑆 are passed to a summarizing model 𝐺 2 = 𝐺𝑠𝑢𝑚 , In response to this challenge, we propose a suite of impression
which generates a summary 𝑆𝑢𝑚 𝑗 for each source in 𝑆, resulting in metrics designed with three key principles in mind: 1.) The metrics
the summary set (𝑆𝑢𝑚 = {𝑆𝑢𝑚 1, 𝑆𝑢𝑚 2, ..., 𝑆𝑢𝑚𝑚 }). The summary should hold relevance for creators, 2.) They should be explainable,
set is passed to a response-generating model 𝐺 3 = 𝐺𝑟𝑒𝑠𝑝 , which and 3.) They should be easily comprehensible by a broad spectrum
generates a cumulative response 𝑟 backed by sources 𝑆. In this work, of content creators. The first of these metrics, the “Word Count”
we focus on single-turn Generative Engines, but the formulation metric, is the normalized word count of sentences related to a
can be extended to multi-turn Conversational Generative Engines citation. Mathematically, this is defined as:
(Appendix A). Í
𝑠 ∈𝑆𝑐 |𝑠 |
The response 𝑟 is typically a structured text with embedded 𝐼𝑚𝑝 𝑤𝑐 (𝑐𝑖 , 𝑟 ) = Í 𝑖 (2)
citations. Citations are important given the tendency of LLMs to 𝑠 ∈𝑆𝑟 |𝑠 |
hallucinate information [10]. Specifically, consider a response 𝑟 Here 𝑆𝑐𝑖 is the set of sentences citing 𝑐𝑖 , 𝑆𝑟 is the set of sentences
composed of sentences {𝑙 1, 𝑙 2 ...𝑙𝑜 }. Each sentence may be backed in the response, and |𝑠 | is the number of words in sentence 𝑠. In
by a set of citations that are part of the retrieved set of documents cases where a sentence is cited by multiple sources, we share the
𝐶𝑖 ⊂ 𝑆. An ideal generative engine should ensure all statements word count equally with all the citations. Intuitively, a higher word
in the response are supported by relevant citations (high citation count correlates with the source playing a more important part in
recall), and all citations accurately support the statements they’re the answer, and thus, the user gets higher exposure to that source.
KDD ’24, August 25–29, 2024, Barcelona, Spain Pranjal Aggarwal et al.

Figure 3: Ranking and Visibility Metrics are straightforward in traditional search engines, which list website sources in ranked
order with verbatim content. However, Generative Engines generate rich, structured responses, often embedding citations
in a single block interleaved with each other. This makes ranking and visibility nuanced and multi-faceted. Further, unlike
search engines, where significant research has been conducted on improving visibility, optimizing visibility in generative
engine responses remains unclear. To address these challenges, our black-box optimization framework proposes a series of
well-designed impression metrics that creators can use to gauge and optimize their website’s performance and also allows the
creator to define their impression metrics.

However, since “Word Count” is not impacted by the ranking of the modified content after applying the GEO method. The modifi-
the citations (whether it appears first, for example), we propose a cations can range from simple stylistic alterations to incorporating
position-adjusted count that reduces the weight by an exponentially new content in a structured format. A well-designed GEO is equiv-
decaying function of the citation position: alent to a black-box optimization method that, without knowing
Í −
𝑝𝑜𝑠 (𝑠 ) the exact algorithmic design of generative engines, can increase
𝑠 ∈𝑆𝑐𝑖 |𝑠 | · 𝑒 |𝑆 |
the website’s visibility and implement textual modifications to 𝑊
𝐼𝑚𝑝𝑝𝑤𝑐 (𝑐𝑖 , 𝑟 ) = Í (3) independent of the exact queries.
𝑠 ∈𝑆𝑟 |𝑠 |
For our experiments, we apply Generative Engine Optimiza-
Intuitively, sentences that appear first in the response are more
tion methods on website content using a large language model,
likely to be read, and the exponent term in definition 𝐼𝑚𝑝𝑝𝑤𝑐 gives
prompted to perform specific stylistic and content changes to the
higher weightage to such citations. Thus, a website cited at the
website. In particular, based on the GEO method defining a spe-
top may have a higher impression despite having a lower word
cific set of desired characteristics, the source content is modified
count than a website cited in the middle or end of the response.
accordingly. We propose and evaluate several such methods:
Further, the choice of exponentially decaying function is motivated
1: Authoritative: Modifies text style of the source content to be
by several studies showing click-through rates follow a power-law
more persuasive and authoritative, 2. Statistics Addition: Modifies
as a function of ranking in search engines [7, 8]. While the above
content to include quantitative statistics instead of qualitative dis-
impression metrics are objective and well-grounded, they ignore
cussion, wherever possible, 3. Keyword Stuffing: Modifies content
the subjective aspects of the impact of citations on the user’s at-
to include more keywords from the query, as expected in classi-
tention. To address this, we propose the "Subjective Impression"
cal SEO optimization. 4. Cite Sources & 5. Quotation Addition:
metric, which incorporates facets such as the relevance of the cited
Adds relevant citations and quotations from credible sources re-
material to the user query, influence of the citation, uniqueness of
spectively, 6.) 6. Easy-to-Understand: Simplifies the language of
the material presented by a citation, subjective position, subjective
website, while 7. Fluency Optimization improves the fluency of
count, probability of clicking the citation, and diversity in the ma-
website text. 8. Unique Words & 9. Technical Terms: involves
terial presented. We use G-Eval [15], the current state-of-the-art
adding unique and technical terms respectively wherever possible,
for evaluation with LLMs, to measure each of these sub-metrics.
These methods cover diverse general strategies that website
2.2.2 Generative Engine Optimization methods for website. To owners can implement quickly and use regardless of the website
improve impression metrics, content creators must make changes content. Further, except for methods 3, 4, and 5, the remaining
to their website content. We present several generative engine- methods enhance the presentation of existing content to increase
agnostic strategies, referred to as Generative Engine Optimiza- its persuasiveness or appeal to the generative engine, without re-
tion methods (GEO). Mathematically, every GEO method is a func- quiring extra content. On the other hand, methods 3,4 and 5 may
tion 𝑓 : 𝑊 → 𝑊𝑖′ , where 𝑊 is the initial web content, and 𝑊 ′ is
GEO: Generative Engine Optimization KDD ’24, August 25–29, 2024, Barcelona, Spain

require some form of additional content. To analyze the perfor- an updated list of trending queries on the platform. 8. ELI-53 : This
mance gain of our methods, for each input user query, we randomly dataset contains questions from the ELI5 subreddit, where users ask
select one source website to be optimized and apply each of the complex questions and expect answers in simple, layman’s terms.
GEO methods separately on the same source. We refer readers to 9. GPT-4 Generated Queries: To supplement diversity in query
Appendix B.4 for more details on GEO methods. distribution, we prompt GPT-4 [21] to generate queries ranging
from various domains (e.g., science, history) and based on query
3 EXPERIMENTAL SETUP intent (e.g., navigational, transactional) and based on difficulty and
scope of generated response (e.g., open-ended, fact-based).
3.1 Evaluated Generative Engine
In accordance with previous works [14], we use a 2-step setup for . Our benchmark comprises 10K queries divided into 8K, 1K, and
Generative Engine design. The first step involves fetching relevant 1K for train, validation, and test splits, respectively. We preserve
sources for input query, followed by a second step where an LLM the real-world query distribution, with our benchmark containing
generates a response based on the fetched sources. Similar to pre- 80% informational queries and 10% each for transactional and navi-
vous works, we do not use summarization and provide the whole gational queries. Each query is augmented with the cleaned text
response for each source. Due to context length limitations and qua- content of the top 5 search results from the Google search engine.
dratic scaling cost based on the context size of transformer models, Tags. Optimizing website content often requires targeted changes
only the top 5 sources are fetched from the Google search engine based on the task’s domain. Additionally, a user of Generative
for every query. The setup closely mimics the workflow used in Engine Optimization may need to identify an appropriate method
previous works and the general design adopted by commercial GEs for only a subset of queries, considering multiple factors such as
such as you.com and perplexity.ai. The answer is then generated domain, user intent, and query nature. To facilitate this, we tag each
by the gpt3.5-turbo model [20] using the same prompt as prior query with one of seven different categories. For tagging, we em-
work [14]. We sample 5 different responses at temperature=0.7, to ploy the GPT-4 model and manually verify high recall and precision
reduce statistical deviations. on the test split.
Further in Section C.1, we evaluate the same Generative Engine Overall, GEO-bench consists of queries from 25 diverse domains
Optimization methods on Perplexity.ai, which is a commercially such as Arts, Health, and Games; it features a range of query diffi-
deployed generative engine, highlighting the generalizability of our culties from simple to multi-faceted; includes 9 different types of
proposed Generative Engine Optimization methods. queries such as informational and transactional; and encompasses
7 different categorizations. Owing to its specially designed high
3.2 Benchmark : GEO-bench diversity, the size of the benchmark, and its real-world nature, GEO-
Since there is currently no publicly available dataset containing bench is a comprehensive benchmark for evaluating Generative
Generative Engine related queries, we curate GEO-bench, a bench- Engines and serves as a standard testbed for assessing them for
mark consisting of 10K queries from multiple sources, repurposed various purposes in this and future works. We provide more details
for generative engines, along with synthetically generated queries. about GEO-bench in Appendix B.2.
The benchmark includes queries from nine different sources, each
further categorized based on their target domain, difficulty, query 3.3 GEO Methods
intent, and other dimensions. We evaluate 9 different proposed GEO methods as described in
Section 2.2.2. We compare them with a baseline, which measures
Datasets: 1. MS Macro, 2. ORCAS-1, and 3. Natural Ques-
the impression metric of unmodified website sources. We evaluate
tions: [1, 6, 13] These datasets contain real anonymized user queries
methods on the complete GEO-bench test split. Further, to reduce
from Bing and Google Search Engines. These three collectively
variance in results, we run our experiments on five different random
represent the common set of datasets that are used in search en-
seeds and report the average.
gine related research. However, Generative Engines will be posed
with far more difficult and specific queries with the intent of syn-
3.4 Evaluation Metrics
thesizing answers from multiple sources instead of searching for
them. To this end, we repurpose several other publicly available We utilize the impression metrics as defined in Section 2.2.1. Specif-
datasets: 4. AllSouls: This dataset contains essay questions from ically, we employ two impression metrics: 1. Position-Adjusted
"All Souls College, Oxford University." The queries in this dataset Word Count, which combines word count and position count.
require Generative Engines to perform appropriate reasoning to To analyze the effect of individual components, we also report
aggregate information from multiple sources. 5. LIMA: [25] con- scores on the two sub-metrics separately. 2. Subjective Impres-
tains challenging questions requiring Generative Engines to not sion, which is a subjective metric encompassing seven different
only aggregate information but also perform suitable reasoning aspects: 1) relevance of the cited sentence to the user query, 2) in-
to answer the question (e.g., writing a short poem, python code.). fluence of the citation, assessing the extent to which the generated
6. Davinci-Debtate [14] contains debate questions generated for response relies on the citation, 3) uniqueness of the material pre-
testing Generative Engines. 7. Perplexity.ai Discover2 : These sented by a citation, 4) subjective position, gauging the prominence
queries are sourced from Perplexity.ai’s Discover section, which is of the positioning of source from the user’s viewpoint, 5) subjec-
tive count, measuring the amount of content presented from the
2 https://www.perplexity.ai/discover 3 https://huggingface.co/datasets/eli5_category
KDD ’24, August 25–29, 2024, Barcelona, Spain Pranjal Aggarwal et al.

Position-Adjusted Word Count Subjective Impression


Method Word Position Overall Rel. Infl. Unique Div. FollowUp Pos. Count Average
Performance without Generative Engine Optimization
No Optimization 19.5 19.3 19.3 19.3 19.3 19.3 19.3 19.3 19.3 19.3 19.3
Non-Performing Generative Engine Optimization methods
Keyword Stuffing 17.8 17.7 17.7 19.8 19.1 20.5 20.4 20.3 20.5 20.4 20.2
Unique Words 20.7 20.5 20.5 20.5 20.1 19.9 20.4 20.2 20.7 20.2 20.4
High-Performing Generative Engine Optimization methods
Easy-to-Understand 22.2 22.4 22.0 20.2 21.0 20.0 20.1 20.1 20.9 19.9 20.5
Authoritative 21.8 21.3 21.3 22.3 22.1 22.4 23.1 22.2 23.1 22.7 22.9
Technical Terms 23.1 22.7 22.7 20.9 21.7 20.5 21.2 20.8 21.9 20.8 21.4
Fluency Optimization 25.1 24.6 24.7 21.1 22.9 20.4 21.6 21.0 22.4 21.1 21.9
Cite Sources 24.9 24.5 24.6 21.4 22.5 21.0 21.6 21.2 22.2 20.7 21.9
Quotation Addition 27.8 27.3 27.2 23.8 25.4 23.9 24.4 22.9 24.9 23.2 24.7
Statistics Addition 25.9 25.4 25.2 22.5 24.5 23.0 23.3 21.6 24.2 23.0 23.7
Table 1: Absolute impression metrics of GEO methods on GEO-bench. Performance Measured on Two metrics and their
sub-metrics. Compared to baselines, simple methods like Keyword Stuffing traditionally used in SEO don’t perform well.
However, our proposed methods such as Statistics Addition and Quotation Addition show strong performance improvements
across all metrics. The best methods improve upon baseline by 41% and 28% on Position-Adjusted Word Count and Subjective
Impression respectively. For readability, Subjective Impression scores are normalized with respect to Position-Adjusted Word
Count resulting in similar baseline scores.

citation as perceived by the user, 6) likelihood of the user clicking 4 RESULTS


the citation, and 7) diversity of the material presented. These sub- We evaluate various Generative Engine Optimization methods
metrics assess diverse aspects that content creators can target to designed to optimize website content for better visibility in Gener-
improve one or more areas effectively. Each sub-metric is evaluated ative Engine responses, compared against a baseline with no opti-
using GPT-3.5, following a methodology akin to that described in mization. Our evaluation used GEO-bench, a diverse benchmark
G-Eval [15]. In G-Eval, a form-based evaluation template is pro- of user queries from multiple domains and settings. Performance
vided to the language model, along with a GE generated response was measured using two metrics: Position-Adjusted Word Count and
with citations. The model outputs a score (computed by sampling Subjective Impression. The former considers word count and citation
multiple times) for each citation. However, since G-Eval scores position in the GE’s response, while the latter computes multiple
are poorly calibrated, we normalize them to have the same mean subjective factors, giving an overall impression score.
and variance as Position-Adjusted Word Count to enable a fair and Table 1 details the absolute impression metrics of different meth-
meaningful comparison. We provide the exact templates used in ods on multiple metrics. The results reveal that our GEO methods
Appendix B.3. consistently outperform the baseline across all metrics on GEO-
Furthermore, all impression metrics are normalized by multiply- bench. This shows the robustness of these methods to varying
ing them with a constant factor so that the sum of the impressions queries, yielding significant improvements despite query diver-
of all citations in a response equals 1. In our analysis, we compare sity. Specifically, our top-performing methods, Cite Sources, Quota-
methods by calculating the relative improvement in impression. tion Addition, and Statistics Addition, achieved a relative improve-
For an initial generated response 𝑟 from sources 𝑆𝑖 ∈ {𝑠 1, . . . , 𝑠𝑚 }, ment of 30-40% on the Position-Adjusted Word Count metric and
and a modified response 𝑟 ′ , the relative improvement in impression 15-30% on the Subjective Impression metric. These methods, involv-
for each source 𝑠𝑖 is measured as: ing adding relevant statistics (Statistics Addition), incorporating
credible quotes (Quotation Addition), and including citations from
reliable sources (Cite Sources) in the website content, require mini-
𝐼𝑚𝑝𝑠𝑖 (𝑟 ′ ) − 𝐼𝑚𝑝𝑠𝑖 (𝑟 )
𝐼𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡𝑠𝑖 = × 100 (4) mal changes but significantly improve visibility in GE responses,
𝐼𝑚𝑝𝑠𝑖 (𝑟 ) enhancing both the credibility and richness of the content.
Interestingly, stylistic changes such as improving fluency and
readability of the source text (Fluency Optimization and Easy-to-
The modified response 𝑟 ′ is produced by applying the GEO method
Understand) also resulted in a significant visibility boost of 15-30%.
being evaluated to one of the sources 𝑠𝑖 . The source 𝑠𝑖 selected
This suggests that Generative Engines value not only content but
for optimization is chosen randomly but remains constant for a
also information presentation.
particular query across all GEO methods.
GEO: Generative Engine Optimization KDD ’24, August 25–29, 2024, Barcelona, Spain

Relative Improvement (%) in Visibility Top Performing Tags


Method Rank-1 Rank-2 Rank-3 Rank-4 Rank-5 Method Rank-1 Rank-2 Rank-3
Authoritative -6.0 4.1 -0.6 12.6 6.1 Authoritative Debate History Science
Fluency Opt. -2.0 5.2 3.6 -4.4 2.2 Fluency Opt. Business Science Health
Cite Sources -30.3 2.5 20.4 15.5 115.1 Cite Sources Statement Facts Law & Gov.
Quotation Addition -22.9 -7.0 3.5 25.1 99.7 Quotation Addition People & Society Explanation History
Statistics Addition -20.6 -3.9 8.1 10.0 97.9 Statistics Addition Law & Gov. Debate Opinion
Table 2: Visibility changes through GEO methods for sources Table 3: Top Performing categories for each of the GEO meth-
with different Rankings in Search Engine. GEO is especially ods. Website-owners can choose relevant GEO strategy based
helpful for lower ranked websites. on their target domain.

Further, given generative models are often designed to follow Average


Improvement
instructions, one would expect a more persuasive and authoritative
tone in website content to boost visibility. However, we find no fluency 22.4% 35.8% 34.4% 33.0% 34 31.4%

significant improvement, demonstrating that Generative Engines


32
are already somewhat robust to such changes. This highlights the
need for website owners to focus on improving content presentation 30
statistics 35.8% 27.0% 30.3% 35.4% 32.1%
and credibility.
Finally, we evaluate keyword stuffing, i.e., adding more relevant 28

keywords to website content. While widely used for Search Engine


26
Optimization, we find such methods offer little to no improvement citation 34.4% 30.3% 19.1% 20.1% 26.0%
on generative engine’s responses. This underscores the need for
24
website owners to rethink optimization strategies for generative
engines, as techniques effective in search engines may not translate 22
to success in this new paradigm. quotes 33.0% 35.4% 20.1% 30.3% 29.7%
20

5 ANALYSIS
cy

s
tic

ote
o
en

ati
tis

qu
flu

cit
sta

5.1 Domain-Specific Generative Engine


Optimizations Figure 4: Relative Improvement on using combination of
In Section 4, we presented the improvements achieved by GEO GEO strategies. Using Fluency Optimization and Statistics
across the entirety of the GEO-bench benchmark. However, in Addition in conjunction results in maximum performance.
real-world SEO scenarios, domain-specific optimizations are often The rightmost column shows using Fluency Optimization
applied. With this in mind, and considering that we provide cat- with other strategies is most beneficial.
egories for every query in GEO-bench, we delve deeper into the
performance of various GEO methods across these categories.
Table 3 provides a detailed breakdown of the categories where involve personal narratives or historical events, where direct quotes
our GEO methods have proven to be most effective. A careful anal- can add authenticity and depth to the content. Overall, our anal-
ysis of these results reveals several intriguing observations. For in- ysis suggests that website owners should strive towards making
stance, Authoritative significantly improves performance in debate- domain-specific targeted adjustments to their websites for higher
style questions and queries related to the “historical” domain. This visibility.
aligns with our intuition, as a more persuasive form of writing is
likely to hold more value in debates. 5.2 Optimization of Multiple Websites
Similarly, the addition of citations through Cite Sources is par- In the evolving landscape of Generative Engines, GEO methods are
ticularly beneficial for factual questions, likely because citations expected to become widely adopted, leading to a scenario where
provide a source of verification for the facts presented, thereby en- all source contents are optimized using GEO. To understand the
hancing the credibility of the response. The effectiveness of different implications, we conducted an evaluation of GEO methods by opti-
GEO methods varies across domains. For example, as shown in row mizing all source contents simultaneously, with results presented
5 of Table 3, domains such as ‘Law & Government’ and question in Table 2. A key observation is the differential impact of GEO
types like ‘Opinion’ benefit significantly from the addition of rele- on websites based on their Search Engine Results Pages (SERP)
vant statistics in the website content, as implemented by Statistics ranking. Notably, lower-ranked websites, which typically struggle
Addition. This suggests that data-driven evidence can enhance the for visibility, benefit significantly more from GEO. This is because
visibility of a website in particular contexts. The method Quotation traditional search engines rely on multiple factors, such as the num-
Addition is most effective in the ‘People & Society,’ ‘Explanation,’ ber of backlinks and domain presence, which are challenging for
and ‘History’ domains. This could be because these domains often small creators to achieve. However, since Generative Engines utilize
KDD ’24, August 25–29, 2024, Barcelona, Spain Pranjal Aggarwal et al.

Method GEO Optimization Relative Improvement


Query: What is the secret of Swiss chocolate
With per capita annual consumption averaging between 11 and 12 kilos, Swiss people rank among the
Cite Sources 132.4%
top chocolate lovers in the world (According to a survey conducted by The International Chocolate
Consumption Research Group [1])

Query: Should robots replace humans in the workforce?


Source: Not here, and not now — until recently. The big difference is that the robots have come not to
Statistics Addition 65.5%
destroy our lives, but to disrupt our work,
with a staggering 70% increase in robotic involvement in the last decade .

Query: Did the jacksonville jaguars ever make it to the superbowl?


Source: It is important to note that The Jaguars have never appeared made an appearance in the
Authoritative 89.1%
Super Bowl. However, They have achieved an impressive feat by securing 4 divisional titles
to their name. , a testament to their prowess and determination.

Table 4: Representative examples of GEO methods optimizing source website. Additions are marked in green and Deletions in
red. Without adding any substantial new information, GEO methods significantly increase the visibility of the source content.

generative models conditioned on website content, factors such in conjunction with other methods (Average: 31.4%), despite it being
as backlink building should not disadvantage small creators. This relatively less effective when used alone (8% lower than Quotation
is evident from the relative improvements in visibility shown in Addition). The findings underscore the importance of studying GEO
Table 2. For example, the Cite Sources method led to a substantial methods in combination, as they are likely to be used by content
115.1% increase in visibility for websites ranked fifth in SERP, while creators in the real world.
on average, the visibility of the top-ranked website decreased by
30.3%. 5.4 Qualitative Analysis
This finding highlights GEO’s potential as a tool to democra- We present a qualitative analysis of GEO methods in Table 4, con-
tize the digital space. Many lower-ranked websites are created by taining representative examples where GEO methods boost source
small content creators or independent businesses, who traditionally visibility with minimal changes. Each method optimizes a source
struggle to compete with larger corporations in top search engine through suitable text additions and deletions. In the first example,
results. The advent of Generative Engines might initially seem dis- we see that simply adding the source of a statement can significantly
advantageous to these smaller entities. However, the application boost visibility in the final answer, requiring minimal effort from
of GEO methods presents an opportunity for these content cre- the content creator. The second example demonstrates that adding
ators to significantly improve their visibility in Generative Engine relevant statistics wherever possible ensures increased source vis-
responses. By enhancing their content with GEO, they can reach ibility in the final Generative Engine response. Finally, the third
a wider audience, leveling the playing field and allowing them to row suggests that merely emphasizing parts of the text and using a
compete more effectively with larger corporations. persuasive text style can also lead to improvements in visibility.

5.3 Combination of GEO Strategies 6 GEO IN THE WILD : EXPERIMENTS WITH


While individual GEO strategies show significant improvements DEPLOYED GENERATIVE ENGINE
across various domains, in practice, website owners are expected
to employ multiple strategies in conjunction. To study the perfor- Position-Adjusted Word Count Subjective Impression
Method
mance improvements achieved by combining GEO strategies, we No Optimization 24.1 24.7
consider all pairs of combinations of the top 4 performing GEO Keyword Stuffing 21.9 28.1
methods, namely Cite Sources, Fluency Optimization, Statistics
Quotation Addition 29.1 32.1
Addition, and Quotation Addition. Figure 4 displays the heatmap Statistics Addition 26.2 33.9
of relative improvement in the Position-Adjusted Word Count visi- Table 5: Absolute impression metrics of GEO methods on
bility metric achieved by combining different GEO strategies. The GEO-bench with Perplexity.ai as GE. While SEO methods
analysis demonstrates that the combination of Generative En- such as Keyword Stuffing perform poorly, our proposed GEO
gine Optimization methods can enhance performance, with the methods generalize well to multiple generative engines sig-
best combination (Fluency Optimization and Statistics Addition) nificanlty improve content visibility.
outperforming any single GEO strategy by more than 5.5%4 . Fur-
thermore, Cite Sources significantly boosts performance when used
To reinforce the efficacy of our proposed Generative Engine
4 Due to cost constraints, the analysis was conducted on a subset of 200 examples from Optimization methods, we evaluate them on Perplexity.ai, a real
the test split, and therefore the numbers presented here differ from those in Table 1 deployed Generative Engine with a large user base. Results are
GEO: Generative Engine Optimization KDD ’24, August 25–29, 2024, Barcelona, Spain

in Table 5. Similar to our generative engine, Quotation Addition to optimize their content under generative engines. We define im-
performs best in Position-Adjusted Word Count with a 22% im- pression metrics for generative engines and propose and release
provement over the baseline. Methods that performed well in our GEO-bench: a benchmark encompassing diverse user queries from
generative engine such as Cite Sources, Statistics Addition show multiple domains and settings, along with relevant sources needed
improvements of up to 9% and 37% on the two metrics. Our obser- to answer those queries. We propose several ways to optimize con-
vations, such as the ineffectiveness of traditional SEO methods like tent for generative engines and demonstrate that these methods can
Keyword Stuffing, are further highlighted, as it performs 10% worse boost source visibility by up to 40% in generative engine responses.
than the baseline. The results are significant for three reasons: 1) Among other findings, we show that including citations, quotations
they underscore the importance of developing different Genera- from relevant sources, and statistics can significantly boost source
tive Engine Optimization methods to benefit content creators, 2) visibility. Further, we discover a dependence of GEO methods’ ef-
they highlight the generalizability of our proposed GEO methods fectiveness on the query domain and the potential of combining
on different generative engines, 3) they demonstrate that content multiple GEO strategies in conjunction. We show promising results
creators can use our easy-to-implement proposed GEO methods on a commercially deployed generative engine with millions of
directly, thus having a high real-world impact. We refer readers to active users, showcasing the real-world impact of our work. In sum-
Appendix C.1 for more details. mary, our work is the first to formalize the important and timely
GEO paradigm, releasing algorithms and infrastructure (bench-
7 RELATED WORK marks, datasets, and metrics) to facilitate rapid progress in genera-
tive engines by the community. This serves as a first step towards
Evidence-based Answer Generation: Previous works have used understanding the impact of generative engines on the digital space
several techniques for answer generation backed by sources. Nakano and the role of GEO in this new paradigm of search engines.
et al. [19] trained GPT-3 to navigate web environments to generate
source-backed answers. Similarly, other methods [17, 23, 24] fetch
sources via search engines for answer generation. Our work unifies 9 LIMITATIONS
these approaches and provides a common benchmark for improving While we rigorously test our proposed methods on two generative
these systems in the future. In a recent working draft, Kumar and engines, including a publicly available one, methods may need to
Lakkaraju [11] showed that strategic text sequences can manipulate adapt over time as GEs evolve, mirroring the evolution of SEO.
LLM recommendations to enhance product visibility in generative Additionally, despite our efforts to ensure the queries in our GEO-
engines. While their approach focuses on increasing product visibil- bench closely resemble real-world queries, the nature of queries
ity through adversarial text, our method introduces non-adversarial can change over time, necessitating continuous updates. Further,
strategies to optimize any website content for improved visibility owing to the black-box nature of search engine algorithms, we
in generative engine search results. didn’t evaluate how GEO methods affect search rankings. However,
we note that changes made by GEO methods are targeted changes
Retrieval-Augmented Language Models: Several recent works in textual content, bearing some resemblance with SEO methods,
have tackled the issues of limited memory of language models while not affecting other metadata such as domain name, backlinks,
by fetching relevant sources from a knowledge base to complete a etc, and thus, they are less likely to affect search engine rankings.
task [3, 9, 18]. However, Generative Engine needs to generate an Further, as larger context lengths in language models become eco-
answer and provide attributions throughout the answer. Further, nomical, it is expected that future generative models will be able to
Generative Engine is not limited to a single text modality regarding ingest more sources, thus reducing the impact of search rankings.
both input and output. Additionally, the framework of Generative Lastly, while every query in our proposed GEO-bench is tagged and
Engine is not limited to fetching relevant sources but instead com- manually inspected, there may be discrepancies due to subjective
prises multiple tasks such as query reformulation, source selection, interpretations or errors in labeling.
and making decisions on how and when to perform them.
10 ACKNOWLEDGEMENTS
Search Engine Optimization: In nearly the past 25 years, extensive
This material is based upon work supported by the National Science
research has optimized web content for search engines [2, 12, 22].
Foundation under Grant No. 2107048. Any opinions, findings, and
These methods fall into On-Page SEO, improving content and user
conclusions or recommendations expressed in this material are
experience, and Off-Page SEO, boosting website authority through
those of the author(s) and do not necessarily reflect the views of
link building. In contrast, GEO deals with a more complex envi-
the National Science Foundation.
ronment involving multi-modality, conversational settings. Since
GEO is optimized against a generative model not limited to simple
keyword matching, traditional SEO strategies will not apply to REFERENCES
Generative Engine settings, highlighting the need for GEO. [1] Daria Alexander, Wojciech Kusa, and Arjen P. de Vries. 2022. ORCAS-I: Queries
Annotated with Intent using Weak Supervision. Proceedings of the 45th Inter-
national ACM SIGIR Conference on Research and Development in Information
Retrieval (2022). https://api.semanticscholar.org/CorpusID:248495926
8 CONCLUSION [2] Prashant Ankalkoti. 2017. Survey on Search Engine Optimization Tools &
In this work, we formulate search engines augmented with genera- Techniques. Imperial journal of interdisciplinary research 3 (2017). https:
//api.semanticscholar.org/CorpusID:116487363
tive models that we dub generative engines. We propose Genera- [3] Akari Asai, Xinyan Velocity Yu, Jungo Kasai, and Hannaneh Hajishirzi. 2021.
tive Engine Optimization (GEO) to empower content creators One Question Answering Model for Many Languages with Cross-lingual
KDD ’24, August 25–29, 2024, Barcelona, Spain Pranjal Aggarwal et al.

Dense Passage Retrieval. In Neural Information Processing Systems. https: Miles Brundage, Kevin Button, Trevor Cai, Rosie Campbell, Andrew Cann, Brit-
//api.semanticscholar.org/CorpusID:236428949 tany Carey, Chelsea Carlson, Rory Carmichael, Brooke Chan, Che Chang, Fotis
[4] Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hy- Chantzis, Derek Chen, Sully Chen, Ruby Chen, Jason Chen, Mark Chen, Ben
pertextual Web Search Engine. Comput. Networks 30 (1998), 107–117. https: Chess, Chester Cho, Casey Chu, Hyung Won Chung, Dave Cummings, Jeremiah
//api.semanticscholar.org/CorpusID:7587743 Currier, Yunxing Dai, Cory Decareaux, Thomas Degry, Noah Deutsch, Damien
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Deville, Arka Dhar, David Dohan, Steve Dowling, Sheila Dunning, Adrien Ecoffet,
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Atty Eleti, Tyna Eloundou, David Farhi, Liam Fedus, Niko Felix, Simón Posada
Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Fishman, Juston Forte, Isabella Fulford, Leo Gao, Elie Georges, Christian Gibson,
Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Vik Goel, Tarun Gogineni, Gabriel Goh, Rapha Gontijo-Lopes, Jonathan Gor-
Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin don, Morgan Grafstein, Scott Gray, Ryan Greene, Joshua Gross, Shixiang Shane
Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Gu, Yufei Guo, Chris Hallacy, Jesse Han, Jeff Harris, Yuchen He, Mike Heaton,
Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. Johannes Heidecke, Chris Hesse, Alan Hickey, Wade Hickey, Peter Hoeschele,
In Advances in Neural Information Processing Systems, H. Larochelle, M. Ran- Brandon Houghton, Kenny Hsu, Shengli Hu, Xin Hu, Joost Huizinga, Shantanu
zato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Jain, Shawn Jain, Joanne Jang, Angela Jiang, Roger Jiang, Haozhun Jin, Denny
Inc., 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/ Jin, Shino Jomoto, Billie Jonn, Heewoo Jun, Tomer Kaftan, Łukasz Kaiser, Ali
1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf Kamali, Ingmar Kanitscheider, Nitish Shirish Keskar, Tabarak Khan, Logan Kil-
[6] Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Fernando Campos, and patrick, Jong Wook Kim, Christina Kim, Yongjik Kim, Jan Hendrik Kirchner,
Jimmy J. Lin. 2021. MS MARCO: Benchmarking Ranking Models in the Large-Data Jamie Kiros, Matt Knight, Daniel Kokotajlo, Łukasz Kondraciuk, Andrew Kon-
Regime. Proceedings of the 44th International ACM SIGIR Conference on Research drich, Aris Konstantinidis, Kyle Kosic, Gretchen Krueger, Vishal Kuo, Michael
and Development in Information Retrieval (2021). https://api.semanticscholar.org/ Lampe, Ikai Lan, Teddy Lee, Jan Leike, Jade Leung, Daniel Levy, Chak Ming
CorpusID:234336491 Li, Rachel Lim, Molly Lin, Stephanie Lin, Mateusz Litwin, Theresa Lopez, Ryan
[7] Brian Dean. 2023. We Analyzed 4 Million Google Search Results. Here’s What Lowe, Patricia Lue, Anna Makanju, Kim Malfacini, Sam Manning, Todor Markov,
We Learned About Organic Click Through Rate. https://backlinko.com/google- Yaniv Markovski, Bianca Martin, Katie Mayer, Andrew Mayne, Bob McGrew,
ctr-stats Accessed: 2024-06-08. Scott Mayer McKinney, Christine McLeavey, Paul McMillan, Jake McNeil, David
[8] Danny Goodwin. 2011. Top Google Result Gets 36.4% of Clicks Medina, Aalok Mehta, Jacob Menick, Luke Metz, Andrey Mishchenko, Pamela
[Study]. https://www.searchenginewatch.com/2011/04/21/top-google-result- Mishkin, Vinnie Monaco, Evan Morikawa, Daniel Mossing, Tong Mu, Mira Murati,
gets-36-4-of-clicks-study/ Oleg Murk, David Mély, Ashvin Nair, Reiichiro Nakano, Rajeev Nayak, Arvind
[9] Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Neelakantan, Richard Ngo, Hyeonwoo Noh, Long Ouyang, Cullen O’Keefe, Jakub
2020. REALM: Retrieval-Augmented Language Model Pre-Training. ArXiv Pachocki, Alex Paino, Joe Palermo, Ashley Pantuliano, Giambattista Parascan-
abs/2002.08909 (2020). https://api.semanticscholar.org/CorpusID:211204736 dolo, Joel Parish, Emy Parparita, Alex Passos, Mikhail Pavlov, Andrew Peng,
[10] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Adam Perelman, Filipe de Avila Belbute Peres, Michael Petrov, Henrique Ponde
Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in de Oliveira Pinto, Michael, Pokorny, Michelle Pokrass, Vitchyr H. Pong, Tolly
natural language generation. Comput. Surveys 55, 12 (2023), 1–38. Powell, Alethea Power, Boris Power, Elizabeth Proehl, Raul Puri, Alec Radford,
[11] Aounon Kumar and Himabindu Lakkaraju. 2024. Manipulating Large Language Jack Rae, Aditya Ramesh, Cameron Raymond, Francis Real, Kendra Rimbach,
Models to Increase Product Visibility. arXiv:2404.07981 [cs.IR] Carl Ross, Bob Rotsted, Henri Roussez, Nick Ryder, Mario Saltarelli, Ted Sanders,
[12] R.Anil Kumar, Zaiduddin Shaik, and Mohammed Furqan. 2019. A Survey on Shibani Santurkar, Girish Sastry, Heather Schmidt, David Schnurr, John Schul-
Search Engine Optimization Techniques. International Journal of P2P Network man, Daniel Selsam, Kyla Sheppard, Toki Sherbakov, Jessica Shieh, Sarah Shoker,
Trends and Technology (2019). https://doi.org/10.14445/22492615/IJPTT-V9I1P402 Pranav Shyam, Szymon Sidor, Eric Sigler, Maddie Simens, Jordan Sitkin, Katarina
[13] Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Slama, Ian Sohl, Benjamin Sokolowsky, Yang Song, Natalie Staudacher, Felipe Pet-
Ankur P. Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob De- roski Such, Natalie Summers, Ilya Sutskever, Jie Tang, Nikolas Tezak, Madeleine B.
vlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Thompson, Phil Tillet, Amin Tootoonchian, Elizabeth Tseng, Preston Tuggle,
Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc V. Le, and Slav Petrov. 2019. Nick Turley, Jerry Tworek, Juan Felipe Cerón Uribe, Andrea Vallone, Arun Vi-
Natural Questions: A Benchmark for Question Answering Research. Transac- jayvergiya, Chelsea Voss, Carroll Wainwright, Justin Jay Wang, Alvin Wang,
tions of the Association for Computational Linguistics 7 (2019), 453–466. https: Ben Wang, Jonathan Ward, Jason Wei, CJ Weinmann, Akila Welihinda, Peter
//api.semanticscholar.org/CorpusID:86611921 Welinder, Jiayi Weng, Lilian Weng, Matt Wiethoff, Dave Willner, Clemens Winter,
[14] Nelson F. Liu, Tianyi Zhang, and Percy Liang. 2023. Evaluating Verifiabil- Samuel Wolrich, Hannah Wong, Lauren Workman, Sherwin Wu, Jeff Wu, Michael
ity in Generative Search Engines. ArXiv abs/2304.09848 (2023). https://api. Wu, Kai Xiao, Tao Xu, Sarah Yoo, Kevin Yu, Qiming Yuan, Wojciech Zaremba,
semanticscholar.org/CorpusID:258212854 Rowan Zellers, Chong Zhang, Marvin Zhang, Shengjia Zhao, Tianhao Zheng,
[15] Yang Liu, Dan Iter, Yichong Xu, Shuo Wang, Ruochen Xu, and Chenguang Zhu. Juntang Zhuang, William Zhuk, and Barret Zoph. 2024. GPT-4 Technical Report.
2023. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. ArXiv arXiv:2303.08774 [cs.CL]
abs/2303.16634 (2023). https://api.semanticscholar.org/CorpusID:257804696 [22] A. Shahzad, Deden Witarsyah Jacob, Nazri M. Nawi, Hairulnizam Bin Mahdin,
[16] G. D. Maayan. 2023. How Google SGE will impact your traffic and Marheni Eka Saputri. 2020. The new trend for search engine optimization,
– and 3 SGE recovery case studies. Search Engine Land (5 Sep tools and techniques. Indonesian Journal of Electrical Engineering and Computer
2023). https://searchengineland.com/how-google-sge-will-impact-your-traffic- Science 18 (2020), 1568. https://api.semanticscholar.org/CorpusID:213123106
and-3-sge-recovery-case-studies-431430 [23] Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen
[17] Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz,
Martin Chadwick, Mia Glaese, Susannah Young, Lucy Campbell-Gillingham, W.K.F. Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie
Geoffrey Irving, and Nathan McAleese. 2022. Teaching language models to Kambadur, and Jason Weston. 2022. BlenderBot 3: a deployed conversational
support answers with verified quotes. ArXiv abs/2203.11147 (2022). https: agent that continually learns to responsibly engage. ArXiv abs/2208.03188 (2022).
//api.semanticscholar.org/CorpusID:247594830 https://api.semanticscholar.org/CorpusID:251371589
[18] Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ra- [24] Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kul-
makanth Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane shreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang
Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, and Thomas Scialom. Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali,
2023. Augmented Language Models: a Survey. ArXiv abs/2302.07842 (2023). Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen,
https://api.semanticscholar.org/CorpusID:256868474 Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao,
[19] Reiichiro Nakano, Jacob Hilton, S. Arun Balaji, Jeff Wu, Ouyang Long, Christina Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett,
Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel
Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben
Knight, Benjamin Chess, and John Schulman. 2021. WebGPT: Browser-assisted Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Ol-
question-answering with human feedback. ArXiv abs/2112.09332 (2021). https: son, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar,
//api.semanticscholar.org/CorpusID:245329531 Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen,
[20] OpenAI. 2022. Introducing ChatGPT. https://openai.com/index/chatgpt/ Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak,
[21] OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Ed Chi, and Quoc Le. 2022. LaMDA: Language Models for Dialog Applications.
Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam arXiv:2201.08239 [cs.CL]
Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Bal- [25] Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe
com, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Ma, Avia Efrat, Ping Yu, L. Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke
Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Zettlemoyer, and Omer Levy. 2023. LIMA: Less Is More for Alignment. ArXiv
Oleg Boiko, Madelaine Boyd, Anna-Luisa Brakman, Greg Brockman, Tim Brooks, abs/2305.11206 (2023). https://api.semanticscholar.org/CorpusID:258822910
GEO: Generative Engine Optimization KDD ’24, August 25–29, 2024, Barcelona, Spain

Listing 1: Prompt used for Generative Engine. The GE takes Listing 2: Representative Queries from each of the 9 datasets
the query and 5 sources as input and outputs the response to in GEO-bench
query with response grounded in the sources. 1 ### ORCAS
1 Write an accurate and concise answer for the given user question, 2 - what does globalization mean
using _only_ the provided summarized web search results. 3 - wine pairing list
The answer should be correct, high-quality, and written by 4

an expert using an unbiased and journalistic tone. The user 5 ### AllSouls
's language of choice such as English, Francais, Espamol, 6 - Are open-access journals the future of academic publishing?
Deutsch, or should be used. The answer should be 7 - Should the study of non-Western philosophy be a requirement
informative, interesting, and engaging. The answer's logic for a philosophy degree in the UK?
and reasoning should be rigorous and defensible. Every 8

sentence in the answer should be _immediately followed_ by 9 ### Davinci-Debate


an in-line citation to the search result(s). The cited 10 - Should all citizens receive a basic income?
search result(s) should fully support _all_ the information 11 - Should governments promote atheism?
in the sentence. Search results need to be cited using [ 12

index]. When citing several search results, use [1][2][3] 13 ### ELI5
format rather than [1, 2, 3]. You can use multiple search 14 - Why does my cat kick its toys when playing with them?
results to respond comprehensively while avoiding 15 - what does caffeine actually do your muscles, especially
irrelevant search results. regarding exercising?
16
2
3 Question: {query} 17 ### GPT-4
4
18 - What are the benefits of a keto diet?
5 Search Results: 19 - What are the most profound impacts of the Renaissance period
6 {source_text} on modern society?
20
21 ### LIMA
22 - What are the primary factors that influence consumer behavior?
23 - What would be a great twist for a murder mystery? I'm looking
A CONVERSATIONAL GENERATIVE ENGINE for something creative, not to rehash old tropes.
24
In Section 2.1, we discussed a single-turn Generative Enginethat 25 ### MS-Macro
outputs a single response given the user query. However, one of the 26 - what does monogamous
27 - what is the normal fbs range for children
strengths of upcoming Generative Engines will be their ability to 28
engage in an active back-and-forth conversation with the user. The 29 ### Natural Questions
conversation allows users to provide clarifications to their queries 30 - where does the phrase bee line come from
31 - what is the prince of persia in the bible
or Generative Engine response and ask follow-ups. Specifically, 32
in equation 1, instead of the input being a single query 𝑞𝑢 , it is 33 ### Perplexity.ai
34 - how to gain more followers on LinkedIn
modeled as a conversation history 𝐻 = (𝑞𝑢𝑡 , 𝑟 𝑡 ) pairs. The response 35 - why is blood sugar higher after a meal
𝑟 𝑡 +1 is then defined as:
𝐺𝐸 := 𝑓𝐿𝐸 (𝐻, 𝑃𝑈 ) → 𝑟 𝑡 +1 (5)
where 𝑡 is the turn number. • Difficulty Level: The complexity of the query, ranging from
Further, to engage the user in a conversation, a separate LLM, simple to complex.
𝐿 𝑓 𝑜𝑙𝑙𝑜𝑤 or 𝐿𝑟𝑒𝑠𝑝 , may generate suggested follow-up queries based • Nature of Query: The type of information sought by the query,
such as factual, opinion, or comparison.
on 𝐻 , 𝑃𝑈 , and 𝑟 𝑡 +1 . The suggested follow-up queries are typically
• Genre: The category or domain of the query, such as arts and
designed to maximize the likelihood of user engagement. This
entertainment, finance, or science.
not only benefits Generative Engine providers by increasing user
• Specific Topics: The specific subject matter of the query, such
interaction but also benefits website owners by enhancing their
as physics, economics, or computer science.
visibility. Furthermore, these follow-up queries can help users by
• Sensitivity: Whether the query involves sensitive topics or not.
getting more detailed information.
• User Intent: The purpose behind the user’s query, such as re-
search, purchase, or entertainment.
B EXPERIMENTAL SETUP • Answer Type: The format of the answer that the query is seek-
B.1 Evaluated Generative Engine ing, such as fact, opinion, or list.
The exact prompt used is shown in Listing 1.
B.3 Evaluation Metrics
B.2 Benchmark We use 7 different subjective impression metrics, whose prompts
GEO-bench contains queries from nine datasets. Representative are presented in our our public repository: https://github.com/GEO-
queries from each of the datasets are shown in Figure 2. Further, we optim/GEO.
tag each of the queries based on a pool of 7 different categories. For
tagging, we use the GPT-4 model and manually confirm high recall B.4 GEO Methods
and precision in tagging. However, owing to such an automated We propose 9 different Generative Engine Optimization methods
system, the tags can be noisy and should not be considered carefully. to optimize website content for generative engines. We evaluate
Details about each of these queries are presented here: these methods on the complete GEO-bench test split. Further, to
KDD ’24, August 25–29, 2024, Barcelona, Spain Pranjal Aggarwal et al.

Position-Adjusted Word Count Subjective Impression


Method Word Position Overall Rel. Infl. Unique Div. FollowUp Pos. Count Average
Performance without Generative Engine Optimization
No Optimization 19.7 (±0.7) 19.6 (±0.5) 19.8 (±0.6) 19.8 (±0.9) 19.8 (±1.6) 19.8 (±0.6) 19.8 (±1.1) 19.8 (±1.0) 19.8 (±1.0) 19.8 (±0.9) 19.8 (±0.9)
Non-Performing Generative Engine Optimization methods
Keyword Stuffing 19.6 (±0.5) 19.5 (±0.6) 19.8 (±0.5) 20.8 (±0.8) 19.8 (±1.0) 20.4 (±0.5) 20.6 (±0.9) 19.9 (±0.9) 21.1 (±1.0) 21.0 (±0.9) 20.6 (±0.7)
Unique Words 20.6 (±0.6) 20.5 (±0.7) 20.7 (±0.5) 20.8 (±0.7) 20.3 (±1.3) 20.5 (±0.3) 20.9 (±0.3) 20.4 (±0.7) 21.5 (±0.6) 21.2 (±0.4) 20.9 (±0.4)
High-Performing Generative Engine Optimization methods
Easy-to-Understand 21.5 (±0.7) 22.0 (±0.8) 21.5 (±0.6) 21.0 (±1.1) 21.1 (±1.8) 21.2 (±0.9) 20.9 (±1.1) 20.6 (±1.0) 21.9 (±1.1) 21.4 (±0.9) 21.3 (±1.0)
Authoritative 21.3 (±0.7) 21.2 (±0.9) 21.1 (±0.8) 22.3 (±0.8) 22.9 (±0.8) 22.1 (±0.9) 23.2 (±0.7) 21.9 (±0.4) 23.9 (±1.2) 23.0 (±1.1) 23.1 (±0.7)
Technical Terms 22.5 (±0.6) 22.4 (±0.6) 22.5 (±0.6) 21.2 (±0.7) 21.8 (±0.8) 20.5 (±0.5) 21.1 (±0.6) 20.5 (±0.6) 22.1 (±0.6) 21.2 (±0.2) 21.4 (±0.4)
Fluency Optimization 24.4 (±0.8) 24.4 (±0.6) 24.4 (±0.8) 21.3 (±0.9) 23.2 (±1.5) 21.2 (±1.0) 21.4 (±1.4) 20.8 (±1.3) 23.2 (±1.8) 21.5 (±1.3) 22.1 (±1.2)
Cite Sources 25.5 (±0.7) 25.3 (±0.6) 25.3 (±0.6) 22.8 (±0.9) 24.2 (±0.7) 21.7 (±0.3) 22.3 (±0.8) 21.3 (±0.9) 23.5 (±0.4) 21.7 (±0.6) 22.9 (±0.5)
Quotation Addition 27.5 (±0.8) 27.6 (±0.8) 27.1 (±0.6) 24.4 (±1.0) 26.7 (±1.1) 24.6 (±0.7) 24.9 (±0.9) 23.2 (±0.9) 26.4 (±1.0) 24.1 (±1.2) 25.5 (±0.9)
Statistics Addition 25.8 (±1.2) 26.0 (±0.8) 25.5 (±1.2) 23.1 (±1.4) 26.1 (±0.9) 23.6 (±0.9) 24.5 (±1.2) 22.4 (±1.2) 26.1 (±1.2) 23.8 (±1.2) 24.8 (±1.1)
Table 6: Absolute impression metrics of GEO methods on GEO-bench. Compared to baselines, simple methods like Keyword
Stuffing traditionally used in SEO don’t perform well. However, our proposed methods such as Statistics Addition and Quotation
Addition show strong performance improvements across all metrics. The best methods improve upon baseline by 41% and 28%
on Position-Adjusted Word Count and Subjective Impression respectively.

Position-Adjusted Word Count Subjective Impression


Method Word Position Overall Rel. Infl. Unique Div. FollowUp Pos. Count Average
Performance without Generative Engine Optimization
No Optimization 24.0 24.4 24.1 24.7 24.7 24.7 24.7 24.7 24.7 24.7 24.7
Non-Performing Generative Engine Optimization methods
Keyword Stuffing 21.9 21.4 21.9 26.3 27.2 27.2 30.2 27.9 28.2 26.9 28.1
Unique Words 24.0 23.7 23.6 24.9 25.1 24.7 24.4 23.0 23.6 23.9 24.1
High-Performing Generative Engine Optimization methods
Authoritative 25.6 25.7 25.9 28.9 30.9 31.2 31.7 31.5 26.9 29.5 30.6
Fluency Optimization 25.8 26.2 26.0 28.9 29.4 29.8 30.6 30.1 29.6 29.6 30.0
Cite Sources 26.6 26.9 26.8 19.8 20.7 19.5 18.9 20.0 18.5 18.9 19.0
Quotation Addition 28.8 28.7 29.1 31.4 31.9 31.9 32.3 31.4 31.7 30.9 32.1
Statistics Addition 25.8 26.6 26.2 31.6 33.4 34.0 33.7 34.0 33.3 33.1 33.9
Table 7: Performance improvement of GEO methods on GEO-bench with Perplexity.ai as generative engine. Compared to the
baselines simple methods such as Keyword Stuffing traditionally used in SEO often perform worse. However, our proposed
methods such as Statistics Addition and Quotation Addition show strong performance improvements across the board. The
best performing methods improve upon baseline by 22% on Position-Adjusted Word Count and 37% on Subjective Impression.

reduce variance in results, we run our experiments on five different C.1 GEO in the Wild : Experiments with
random seeds and report the average. Deployed Generative Engine
We also evaluate our proposed Generative Engine Optimization
B.5 Prompts for GEO methods methods on real-world deployed Generative Engine: Perplexity.ai.
We present all prompts in our our public repository: https://github. Since perplexity.ai does not allow the user to specify source URLs,
com/GEO-optim/GEO. GPT-3.5 turbo was used for all experiments. we instead provide source text as file uploads to perplexity.ai while
ensuring all answers are generated only using the file sources pro-
C RESULTS vided. We evaluate all our methods on a subset of 200 samples of
our test set. Results using Perplexity.ai are shown in Table 7.
We perform experiments on 5 random seeds and present results
with statistical deviations in Table 6

You might also like