Journal of Computational Design and Engineering, 2024, 11, 125–142
DOI: 10.1093/jcde/qwae077
Advance access publication date: 4 September 2024
Research Article
Generative AI-powered architectural exterior conceptual
design based on the design intent
1
Mengnan Shi , JoonOh Seo1 , *, Seung Hyun Cha 2
, Bo Xiao3 and Hung-Lin Chi1
1
Department of Building and Real Estate, The Hong Kong Polytechnic University, Hung Hom, Kowloon, 999077, Hong Kong
2
Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
3
Department of Civil, Environmental, and Geospatial Engineering, Michigan Technological University, Houghton, MI 49931, USA
∗
Correspondence:
[email protected] Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Abstract
In the architectural exterior design domain, design intent is usually expressed by textual design intent [e.g., client needs, architectural
language (AL)] and non-verbal design intent (e.g., sketch). However, existing generative AI-based methods for automated architectural
exterior conceptual design can only use the general image description as the prompt. Thus, despite its potential, existing generative
image AI cannot produce appropriate design alternatives that meet various design requirements. Enabling automated architectural
exterior conceptual design requires solving two problems: teaching the AI model to understand textual design intent and allowing
generative AI to combine textual design intent with non-verbal design intent. The study aims to propose an automated architectural
exterior conceptual design approach by incorporating domain-specific prompting strategies and sketch-to-image synthesis into fine-
tuned generative image AI models. In the proposed approach, textual design intent annotations (including client needs and AL) are
added to architectural images and general image description annotations. Web crawler and ChatGPT automatically extract design
intent-related annotations from online sources for famous architectural works that are used as training images. The constructed
dataset is then used to fine-tune a generative AI model [i.e., Stable Diffusion (SD)] via the Lora algorithm, teaching the AI model to
understand textual design intent. Also, ControlNet is used to control the generation process of the SD model to enable the generative
AI to reflect the design intent expressed by the sketches. The proposed approach is validated by comparing generated images from our
approach with those from two existing models. The results show that the proposed method can successfully generate architectural
exterior conceptual design images that fulfil the requirements based on the architectural design intent. The proposed approach is
expected to streamline and facilitate time-consuming and demanding iterative processes during a conceptual design phase.
Keywords: architectural exterior conceptual design, design intent, generative AI, Stable Diffusion
Nomenclature R(θ ) : Regularization term
x: Input image λreg : Regularization coefficient
ε: Noise vector c: Control vector
α: Coefficient determining the noise level f (attributes ) : Encoding function that maps the sample’s
Ladv (θ ) : Adversarial loss attributes or features to the control vector
Lstab (θ ) : Stability loss G(z, c ) : Generative model with control vector c and
λ: Weighting coefficient that balances the two input noise z
losses L: Loss function for ControlNet training
CNN pre−trained : Pre-trained deep convolutional neural net- xcontrolled : Controlled sample generated by the model
work θprevious : Previous parameters of the model
θlarge−scale : Parameters of the pre-trained model on a θ: Model parameters
large-scale dataset J(θ; x, y ) : Loss function
θfrozen : Parameters of the frozen layers L(G(z, c; θ ), target) : Loss function that measures the difference
f reeze(θ1...k ) : Function to freeze the parameters of the between generated samples and target
first k layers θ1...k : Parameters of the first k layers
Wnew : Weights of the new fully connected layer Out putnew : Output of the new fully connected layer
bnew : Biases of the new fully connected layer
θupdated : Updated parameters of the model
η: Learning rate
1. Introduction
∇θ J(θ ; x, y ) : Gradient of the loss function J concerning
the parameters θ Conceptual design is a creative process, usually in the early
Jreg (θ ; x, y ) : Regularized loss function stages of architectural design. At this stage, the architect and
client explore and develop essential ideas and concepts about the
Received: May 23, 2024. Revised: August 14, 2024. Accepted: August 14, 2024
© The Author(s) 2024. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This is an Open Access article
distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits
non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
[email protected]
126 | AI-Driven architectural exterior concept design
architectural project, including themes, design styles, and basic In response to the above challenges, this study aims to pro-
structures (Xia et al., 2008; Castro Pena et al., 2021). The conceptual pose a framework for a generative AI-powered automated archi-
design provides the basis for the project’s overall direction and tectural exterior conceptual design approach by incorporating
helps ensure that subsequent design phases move in the right di- domain-specific prompting strategies and sketch-to-image syn-
rection (Pourzolfaghar et al., 2014). In particular, one of the crucial thesis into fine-tuned generative image AI models. The proposed
goals during the early conceptual design stage is to determine the framework is demonstrated using famous architectural works by
visual appearance of a building, such as the building’s shapes and well-known architects. First, this study creates domain-specific
external design, that meet both functional and aesthetic require- building image datasets from famous architectural works by tex-
ments (Castro Pena et al., 2021). Traditional architectural concep- tual design intent annotation (i.e., client needs and architectural
tual design is a communication process between clients and ar- design languages) to the traditional image descriptions annota-
chitects, where clients provide for their needs. Then, the archi- tion. To streamline the time-consuming procedure of creating
tects propose various design solutions to meet clients’ needs and datasets, this study uses web crawlers to collect images of famous
design intent based on their professional knowledge through a se- architectural works and a ChatGPT interface to automatically ex-
ries of meetings and discussions with the client. During this itera- tract corresponding textual annotations representing each archi-
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
tive process, various forms of visual representation are commonly tectural work’s design intents from web-based articles. The con-
used for the client and the architect to agree on the outcomes, structed dataset is then used to fine-tune a generative AI model
ranging from sketches to modelling. However, conceptual design (i.e., SD) via the Low-Rank Adaptation (Lora) algorithm, teaching
generation is time-consuming and mentally demanding, and the the generative AI to understand textual design intents from an
design quality is subject to human intervention (Qiu et al., 2002). architectural exterior conceptual design perspective. Also, to re-
To support this complex design process during the architec- flect the design intents represented in non-verbal forms such as
tural exterior conceptual design, several automated architectural sketches when creating new building design images, ControlNet
methods based on artificial intelligence (AI) have been proposed, is used to control the generation process of the SD model. The
including those based on fractals (Joye, 2011), swarm intelligence proposed framework is qualitatively validated by comparing gen-
(Sengupta & Mishra, 2014), and meta-cellular automata (Coates et erated design images from our approach with those from two ex-
al., 1996). For example, Anzalone & Clarke (2003) developed an ar- isting models [i.e., Mnml.ai (Mnml.ai, 2024, July 24) and Architec-
chitectural exterior conceptual design tool called CAAD using the tural Schoolteacher (Technology, 2024, July 24)]. It is expected that
principles of meta-cellular automata. The tool automatically gen- the proposed approach could accelerate an iterative architectural
erates architectural solutions with meta-cellular automata mor- exterior conceptual design process by quickly creating and visu-
phology that are evaluated and selected according to the user’s alizing various conceptual design alternatives based on both the
needs and preferences. However, these approaches focus on de- client’s needs and an architect’s ideas and creative concepts.
sign optimization given architectural design needs, and they have
difficulty generating realistic architectural exterior conceptual
design drawings. Generative image AI empowered by large-scale 2. Literature Review of AI-Powered
foundation models is recently gaining attention in architectural Architectural Exterior Conceptual Design
exterior conceptual design. Generative AI is at the forefront of Architectural exterior conceptual design methodology has
current research in AI. Contrasting with traditional AI, which is evolved through stages of manual, automated, and intelligent
typically focused on tasks like data analysis and prediction, the development. The traditional manual design process is time-
generative image AI [e.g., DALL-E (Li et al., 2023), Stable Diffusion consuming and labour-intensive, and human factors affect the
(SD; Dehouche & Dehouche, 2023)] has demonstrated its ability quality of the design (Pérez, 2017). Automation of architectural
to create detail-rich new content by understanding and replicat- exterior conceptual design can help improve design efficiency
ing patterns learned from data in the areas of natural language and quality and is of great research significance (Dutta & Sarthak,
processing and image generation, and has developed applications 2011).
in areas such as drug discovery and music composition. Recently, The continuous evolution and enhancement of computer mod-
studies have explored the feasibility of using generative AI for ar- elling software have increasingly enabled studies to leverage these
chitectural design (Chen et al., 2023). For example, in the commer- tools for the semi-automated conceptual design of architectural
cial sector, Mnml.ai (Mnml.ai, 2024, July 24) can generate realis- exteriors. Utilizing such software facilitates designers in swiftly
tic renderings of buildings based on user-input image description generating and revising models of architectural appearances, thus
prompts. expediting the conceptual design phase and yielding superior out-
Despite the success of these approaches, they still present chal- comes with ease. For instance, designers can employ software like
lenges in understanding design intent and handling complex de- 3ds Max (Baltus & Žebrauskas, 2019) and Revit (Mora et al., 2008)
sign requirements. In the domain of architectural design, design for modelling and image rendering, creating lifelike visualizations
intent is expressed through specific client needs and architec- of architectural exteriors. Nonetheless, it is essential to note that
tural design language that reflects an architect’s design concept these methods still require considerable manual work from archi-
(Krūgelis, 2018; Song et al., 2020; Chen & Kitagawa, 2023). However, tectural designers.
it is still questionable whether existing generative AI tools can Conceptual design methods for architectural exteriors,
understand these architectural contexts to create new designs, grounded in AI, represent a burgeoning approach that incorpo-
as they only use general image description texts to train existing rates AI techniques like fractal geometry, swarm intelligence,
building images. Thus, they could not understand the design in- and deep learning for devising exterior design solutions. These
tent implicitly represented in building design. Also, another chal- advanced methods aid designers more effectively comprehend
lenge is that most generative image AI tools (i.e., text-to-image user needs and design limitations, enabling the creation of highly
models) rely on prompts as a means of interaction between de- customized design solutions tailored to specific requirements.
signers and AI models. Thus, reflecting the architect’s design in- For example, Wen et al. (2010) developed an architectural ex-
tent is impossible and is often difficult to formulate in words. terior conceptual design tool called Fractal Architect using the
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 127
principles of fractal geometry. The tool automatically generates tural styles; Architectural schoolteacher (Technology, 2024, July
architecture solutions with fractal forms, which are evaluated 24) can generate high-quality architectural exterior conceptual
and selected according to the user’s needs and preferences. designs based on prompts by rendering input images in specified
Sharafi et al. (2015) developed an ant colony algorithm-based styles.
architectural exterior conceptual design method for optimizing Generative AI shows potential in architectural exterior concep-
the energy consumption of an architecture. The method auto- tual design. However, existing methods still present challenges in
matically generates various architectural scenarios evaluated understanding both textualized architectural design intent [client
and selected based on multiple design objectives, such as energy needs and architectural language (AL)] and non-textualized de-
consumption and comfort. Rapone & Saro (2012) developed an sign intent (sketches) simultaneously.
architectural exterior conceptual design method using a particle
swarm algorithm to optimize the design of the curtain wall
façade of an office. Zhao (2021) combined parametric modelling, 3. Methodology
a building performance simulation engine and an optimization
This study proposes a generative AI-powered automated architec-
algorithm to propose an optimal design optimization method
tural exterior conceptual design approach that can understand
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
based on building design objectives. The method can generate
and reflect architectural domain-specific inputs, including tex-
optimal window and wall proportions in less than 2 s. This
tual and non-textual design intents for developing architectural
compares to about two weeks architects spend using traditional
exterior conceptual design. The overall framework of the pro-
simulation engine-based methods. Yi & Kim (2022) proposed a
posed method is shown in Fig. 1. The framework consists of (i)
multi-objective optimization method for architectural design
a collection of datasets for training the generative AI model, (ii)
based on swarm intelligence algorithms to meet multiple design
fine-tuning the generative AI model, and (iii) generating concep-
requirements simultaneously. Coates et al. (1996) developed a
tual designs with a controlled fine-tuned generative AI model.
conceptual design tool for architecture called Cellular Automata
Specifically, first, this study creates a domain-specific architec-
Designer using the principles of multi-state meta-cellular au-
tural image dataset from famous architectural works through
tomata. The tool automatically generates architectural solutions
textual design intent annotations (i.e., client needs and archi-
with multi-state cellular automata morphology. It evaluates
tectural design language) and traditional image description an-
and selects them based on multiple design goals, such as the
notations. In particular, this study uses a web crawler to col-
architecture’s aesthetics, structure and function. Kakooee &
lect images of famous architectural works to simplify the time-
Dillenburger (2024) exploit the potential of deep reinforcement
consuming dataset creation process. It uses ChatGPT to auto-
learning algorithms to optimize the spatial design of buildings.
matically extract corresponding textual annotations represent-
Experiments show that the method can automatically explore a
ing the design intent of each architectural work from the web
broader range of design options, thus facilitating the discovery
articles. Then, using the constructed dataset, the generative AI
of innovative solutions. Gan et al. (2024) proposed a building
model (i.e., the SD model) is fine-tuned by the Lora algorithm
design method integrating Generative Adversarial Networks
to teach the generative AI to understand textual design intent
(GANs) and Multi-Objective Optimization algorithms, which can
from a perspective of architectural exterior conceptual design.
generate spatial design solutions for multiple buildings in less
In addition, ControlNet is used to control the generation pro-
than 5 min. Chang et al. (2020) proposed a deep learning and
cess of the stabilized diffusion model to reflect the design inten-
Electroencephalography (EEG) signal-based approach for building
tions for shape and form required in architectural mass mod-
design image preference recognition. The method can support
elling, expressed in non-verbal forms such as sketches. The pro-
selecting building appearance design solutions by considering
posed method uses textual descriptions and sketches as inputs.
user needs.
It is not constrained by parameters (e.g., specific heights and
Recently, the generative AI is gaining significant attention. For
numbers of floors, which are more critical in the mid-and late-
example, GANs and variants of transformer models have demon-
stage of architectural design) to ensure that a diverse range of
strated significant capabilities in creating realistic and coherent
creative exterior design solutions can be provided to the archi-
output (Goodfellow et al., 2014). DALL-E (Li et al., 2023) shows the
tect in the early architectural design stage (conceptual design
ability to create novel, high-quality images based on textual de-
stage).
scriptions. In addition, generative models have been used in drug
discovery, music composition, and video game design, demon-
strating various applications in innovation and automated cre- 3.1 Defining architectural design intent for
ation processes (Elgammal et al., 2017). Generative AI models have model inputs
been extended to create conceptual architectural designs. For ex- Architectural design intent refers to the goals and concepts pur-
ample, in academia, Chen et al. (2023) proposed a method for gen- sued in the architectural design process, which usually involves
erating architectural designs using AI that can batch-generate several aspects of the architecture’s function, form, space, mate-
high-quality architectural designs based on prompts. This method rials, structure, environment, and culture (Krūgelis, 2018). The ex-
can improve the efficiency and quality of architectural design and pression of architectural design intent can take many forms, in-
optimize the workflow of architectural design. Jo et al. (2024) ap- cluding text and non-verbal (e.g., images; Chen & Kitagawa, 2023).
plied generative AI to the design of building facades, capable of The process usually begins with the client’s design requirements
generating various design solutions that resonate with the char- during the conceptual design phase of an architecture, as shown
acter of local buildings. In addition, several architectural design in Fig. 2. The architect further uses this to clarify the design in-
assistance programs claim to be commercially available based on tent and communicates and confirms this with the client through
generative AI. This software can assist in the generation of con- sketches from the architecture client’s presentation. The final
ceptual design plans for buildings to a certain extent. For exam- conceptual design proposal will synthesize and respond to the
ple, in the commercial sector, Veras (EvolveLAB, 2024, July 24) can results of all the above information and communication. There-
render images based on user-input sketches by selecting architec- fore, in the conceptual design phase, the client’s requirements, the
128 | AI-Driven architectural exterior concept design
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 1: Research framework.
architectural design language and the architectural design to cultural, social, and environmental needs. It is also important
sketches are the primary means of expressing the architectural to note that different architects and designers may have unique
design intent. AL that reflects their design philosophy and creative style. Table 2
shows the main items of AL and corresponding descriptions for
3.1.1 Architectural client needs ‘Dancing House’ as examples.
Architectural client needs are the specific requirements and ex-
pectations of the client for the architectural design, functionality 3.1.3 Architectural design sketches
and construction of a building project. These needs typically in- Adopting appropriate and easy-to-understand visualization tech-
clude building functionality, budget, schedule, style and aesthet- niques can communicate the design intent more clearly. Architec-
ics, sustainability and environmental protection, and codes and tural design sketches, as preliminary hand drawings or drawings
standards (Wikberg et al., 2014). These requirements are central that express the designer’s creativity and ideas, play an essen-
considerations in the architectural design and planning process, tial role in architectural exterior conceptual design (Chen et al.,
and they can influence the success of a project. During the con- 2008). In terms of the form of the composition, it usually includes
ceptual design phase of an architectural project, the client needs floor plans, elevations and sections. Regarding the level of compo-
to work with the architect and other stakeholders to clearly define sitional detail of sketches, they include rough and fine sketches. In
these essential requirements and objectives (Thyssen et al., 2010). addition, sketches are usually simple, quick, and free from strict
Table 1 shows the main items of architectural client require- proportions and specifications.
ments and corresponding descriptions for ‘Dancing House’ as These sketches aid designers, clients, and other stakeholders
examples. in visualizing and comprehending the fundamental concepts and
layouts of architectural appearance design. They are also helpful
3.1.2 AL in designing the shape and form required for architectural mass
AL is the set of terms and principles to describe and understand modelling.
architectural design and expression. This includes elements and
concepts from various aspects, such as form, function, materials,
technology, and culture (Eilouti, 2018). AL can help people better 3.2 Automated training data collection by web
understand and evaluate architectural works and is an essential crawlers, ChatGPT, and Dreambooth
tool for architects and designers to communicate design ideas and The proposed data acquisition process is shown in Fig. 3. First,
concepts. AL can communicate design intent, explain the creation images of famous architectural works and related articles that
of architectural forms and spaces, and how architecture responds describe these works are obtained from web pages through
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 129
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 2: Examples of architectural design intent for ‘Dancing House’ [Dancing House (Hartoonian, 2010), also known as Fred and Ginger, is a modern
building located in Prague, Czech Republic].
Table 1: specific items and examples of construction client needs.
Items Descriptive Examples of ‘Dancing House’
Functional needs How the building should meet its users’ basic As a mixed-use building containing office space, a
functional and activity needs. restaurant and a gallery, its functional needs include the
provision of suitable office environments, dining space, and
art display space.
Aesthetic and style needs The appearance, style, and artistic expression The building, consisting of static and dynamic parts,
that the building should have. became a cultural center, symbolizing the transition of
Czechoslovakia from a communist regime to a
parliamentary democracy.
Technological and innovation Consider the innovation and application of Many innovative building techniques and materials were
needs building technology. used to realize its unique form and structure.
Sustainability and Building design should consider environmental While the ‘Dancing House’ does not explicitly emphasize
environmental needs impacts, including energy efficiency, material sustainable design, new building designs often consider
selection, and eco-friendliness. these factors.
130 | AI-Driven architectural exterior concept design
Table 2: Specific items of AL and examples.
Items Descriptive Examples of ‘Dancing House’
Form and space The basic shape, structure, and spatial layout Shaped like two dancing men, deconstructionist
of a building. architecture creates a dynamic and flowing space through
curved and irregular forms. A vast twisted metal structure
tops the building.
Materials and textures Types, characteristics, and finishes of materials The façade uses mainly glass and concrete, creating a
used for construction. modern and industrial sense of materials.
Colour and tone The effect of light and shadow created by a The window and façade design allow natural light to enter
natural or artificial light source. the interior space fully, creating a rich light and shadow
effect.
Proportion and scale Size, proportion, and relationship of There are two main sections. The first is a glass tower,
architectural elements and spaces. reduced in height by half and supported by curved
columns; the second runs parallel to the river and is
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
characterized by undulating forms and unaligned windows.
Figure 3: Data collection flow.
keyword searches. Second, the prompt (“Extract the AL and client the images. To improve the efficiency of data collection, the
needs keywords from the article”) and the web text are en- above process uses a web crawler to automatically retrieve
tered into ChatGPT to obtain the above-defined client needs web pages and a ChatGPT interface to automate Q&A, thus re-
and AL keywords. After that, the images are input into Dream- alizing the automation of dataset construction. Retrieved im-
booth (a method for automatically extracting text descriptions ages and corresponding annotations (general image descriptions,
from images; Ruiz et al., 2023) to obtain the regular image client needs and AL) were manually verified for model training.
descriptions of the images. Finally, the regular image descrip- Given the uncertainty of ChatGPT’s responses each time, in or-
tion is combined with extracted keywords representing client der to ensure the quality of the dataset, the retrieved images
needs and architectural design language as annotations for and the corresponding annotations (general image descriptions,
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 131
client needs and AL) were manually checked for model train- high-level features.
ing.
In general, existing methods for training an SD model must θfrozen = freeze (θ1...k ) (4)
be paired with image-text data. Instead of using single paired
data, the proposed approach utilizes each architectural work as a where θ1...k represents the parameters of the first k layers.
fundamental training unit. For example, multiple images of each
3) Replacing the top layers: Next, we replace the output layers
building captured from various angles and against diverse back-
(output layer of the decoder block connection in Fig. 4b) of
grounds are compiled for each architectural work. As a result,
the model to fit our specific task. We can replace the out-
these images from each architectural work collectively share the
put layer with a fully connected layer with an appropriate
same text annotations, enabling them to reflect image variations
number of neurons for classification tasks.
according to diverse views and backgrounds.
3.3 Fine-tuning generative AI to learn textual Outputnew = Wnew · x + bnew (5)
design intent via Lora
New Wnew and new bnew are the weights and biases of the new
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
In order to enable a generative AI to generate architectural im- fully connected layer.
ages according to the textual design intent, SD (Ni et al., 2023), a
deep learning model related to GANs (Hitaj et al., 2017) is adopted 4) Fine-tuning Training: Now, we fine-tune the model using
and further fine-tuned using the training data collected from the task-specific training data. The fine-tuning process con-
previous step. The SD model is among the most advanced gener- nects the task-dependent loss function to the model’s out-
ative AI models available today, is open-source, receives regular put and then updates the model’s parameters through opti-
updates, and is user-friendly for researchers. The key idea of the mization algorithms such as backpropagation and gradient
SD model is to introduce stability enhancement mechanisms to descent.
improve the stability of GANs during training and the quality of
generated samples. Traditional GANs can face problems such as θupdated = θprevious − η · ∇θ J (θ; x, y ) (6)
mode collapse during training, resulting in a lack of diversity in
the samples generated. SD helps to address this by introducing where is the learning rate, ∇θ J(θ; x, y ) is the gradient of the loss
noise into the generation process and through a series of stability function J with respect to the parametersθ, and (x, y) is the input
enhancement techniques. and target output pairs for the training data.
Specifically, SD uses progressive noise injection to gradually in-
crease the strength of the noise to help the generator better ex- 5) Regularization and tuning the learning rate: To avoid over-
plore the sample space. The noise injection can be represented fitting, regularization techniques such as weight decay or
as dropout are usually applied. In addition, the learning rate
must be adjusted to ensure that the model converges to the
appropriate weights. Regularization can be introduced as an
x = x + α · ε (1)
additional term in the loss function:
where x is the input image, ε is a noise vector, and α is a coefficient
determining the noise level. Jreg (θ; x, y ) = J (θ; x, y ) + λreg · R (θ ) (7)
In addition, it employs a stability-enhancing loss function that
where R(θ ) represents the regularization term, and λreg is the reg-
helps improve the model’s training stability. This loss function can
ularization coefficient.
be formulated as
3.4 Controlling generative AI models with
L (θ ) = Ladv (θ ) + λ · Lstab (θ ) (2)
sketches by ControlNet
where Ladv (θ ) is the adversarial loss, Lstab (θ ) is the stability loss, To enable the proposed method to match design intents expressed
and λ is a weighting coefficient that balances the two. in sketches during the process of generating conceptual design
These improvements allow SD to generate more diverse, high- images, ControlNet (Zhang et al., 2023) is added to the pipeline of
quality images that are more stable and controllable compared the fine-tuned SD model. ControlNet is a technique for control-
with traditional GANs. To further fine-tune the SD model, Lora (Hu ling GANs, which can be used in conjunction with SD or other
et al., 2021) is applied to enhance domain knowledge. The principle GAN models to control certain aspects of the generated samples,
of fine-tuning SD models involves the following steps. such as the attributes of the samples, the content or the style. The
following is the basic principle of ControlNet for controlling SD, as
1) Loading a pre-trained model: An SD model trained on a shown in Fig. 4.
large-scale dataset is loaded. This model includes a deep
convolutional neural network and a generative network 1) Introducing a Control Vector: The network structure of Con-
(Encoder and decoder structure in Fig. 4b) in SD. trolNet is shown in Fig. 4a as an encoder-decoder structure.
One of the critical concepts of ControlNet is the introduc-
tion of a Control Vector, which contains information about
CNN pre−trained = f θlarge−scale (3)
the features or attributes of the samples we want to control.
2) Freezing Layers: As the first few layers of the model include We can design the Control Vector as an encoding that rep-
low-level feature extractors, these layers (Encoder block in resents different expressions. The formula for the Control
Fig. 4b) are frozen. This is because these layers have al- Vector can be expressed as
ready learned generalized features, while we are mainly
concerned with fine-tuning the model to fit task-specific c = f (attributes ) (8)
132 | AI-Driven architectural exterior concept design
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 4: Schematic diagram of controlling generative AI models by ControlNet.
where c represents the Control Vector, and f is an encoding func- where xcontrolled represents the controlled sample (Images of build-
tion that maps the sample’s attributes or features to the Control ings in Fig. 4) generated by the model.
Vector. The sample described is the sketch and prompt in Fig. 4.
2) Combining with a generative model: ControlNet uses con-
trol vectors with a generative model, such as SD. This usu-
ally involves feeding the control vectors into some part of 3.5 Validation method
the generative model to influence the generation process. In the domain of computer vision, existing validation methods for
As shown in Fig. 4, the output information of the decoder ‘txt-to-image’ generative AI have primarily focused on appraising
block of ControlNet is input to the decoder block of SD. The the quality of the generated images (Chen et al., 2023; Wang et al.,
formula can describe the process of combining the control 2023). However, relevant studies that applied ‘txt-to-image’ gen-
vector with the generative model: erative AI remain relatively limited in architectural design. Given
that the assessment of architectural design is inherently subjec-
G (z, c ) (9) tive and encompasses aesthetic considerations, Chen et al. (2023)
employed a questionnaire survey to subjectively evaluate images
where G represents the generative model, z is the input noise to produced by generative AI by using several assessment criteria
the generative model, and c is the control vector. related to design quality (e.g., overall impression, design details,
architectural integrity, consistency in architectural style etc.) that
3) Training the ControlNet: In the training phase, we typically
professional architects scored. However, these criteria would not
need to train the ControlNet to generate appropriate con-
be enough to assess the capability of the proposed method to re-
trol vectors and integrate the control vectors with the gen-
flect the design intent.
erative model. The optimization of a loss function can rep-
In this regard, this study validates the proposed approach by
resent the training process:
assessing the performance of the proposed model in terms of
whether specific design intent is well reflected in generated ar-
min L (G (z, c; θ ) , target ) (10)
θ chitectural exterior conceptual design images. In particular, the
where θ represents the model parameters, and L is a loss function outputs (i.e., generated design images) from the proposed method
that measures the difference between the generated samples and are compared with those from two existing generative image
the target. AI models [Mnml.ai (Mnml.ai, 2024, July 24) and Architectural
schoolteacher (Technology, 2024, July 24)], when given same in-
4) Generating controlled samples: Once ControlNet is trained, puts (e.g., textural design intent as prompts and non-verbal de-
it can create controlled samples. We can control the genera- sign intent as sketches). For example, the assessment focuses on
tive model by inputting different control vectors to generate the extent to which each model effectively reflects (i) sketches,
samples with different features or attributes. This process (ii) general image descriptions, (iii) client needs, and (iv) AL, in ad-
can be represented by dition to (v) overall design quality. The assessment is based on
a questionnaire survey conducted by invited professionals using
xcontrolled = G (z, c ) (11) these five criteria.
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 133
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 5: Dataset of famous architects.
4. Experimental Demonstration Using annotations, encompassing two types of AL, client needs, and var-
Famous Architectural Works ious general image descriptions.
The experimental demonstration shows the overall procedures
of the proposed method and qualitatively describes its effective-
4.2 Generative AI model fine-tuning
ness in realizing architectural exterior conceptual design based The SD model deployed in this research is built on a Pytorch
on design intent. Furthermore, to ascertain the benefits of the framework. The SD model has been extensively pre-trained on a
proposed method, sample images from the two existing methods large dataset [the LAION-5B dataset (Schuhmann et al., 2022) used
mentioned above are presented along with those from the pro- for SD training, containing 5.8 billion ‘image-text’ pairs collected
posed method. from the internet], which provides it with the fundamental ability
to generate images from text. To enhance its capabilities in ar-
chitectural design, it needs to be fine-tuned. The fine-tuning was
conducted on a Windows 10 platform with 64 GB of RAM and a 12
4.1 Datasets GB video memory GPU. We set the epoch limit to 30, with a batch
For model demonstration, a dataset was created comprising 2021 size of one, optimizing our fine-tuning with the Lora algorithm. To
images of 198 buildings designed by six famous architects (Frank prevent overfitting, dynamic learning rates were utilized. The spe-
Gehry, Louis Kahn, Ludwig Mies van der Rohe, Renzo Piano, cific learning rates for the text encoder and the U-Net within the
Richard Meier, and Zaha Hadid), given that the works of renowned SD model are detailed in Fig. 7a and b, respectively. Figure 7c illus-
architects can be accessed online. Figure 5 shows examples of im- trates the loss curves throughout the fine-tuning phase, indicating
ages from these buildings. The images were resized uniformly to a decreasing trend in the loss function, signalling convergence.
meet the prerequisites of the training process (512 × 512-pixel
resolution). Annotations for these images include (i) general im- 4.3 Inputs for experimental demonstration
age descriptions used in generative image AI model training, (ii)
To generate new architectural exterior conceptual design images
keywords that define client needs of architectural works, and (iii)
from the fine-tuned model, four input prompts (as textual design
keywords related to AL that describes architects’ design concepts.
intent) and two sketches (as non-verbal design intent) are chosen,
While general image descriptions were obtained using Dream-
as shown in Table 3 and Fig. 8, producing eight combinations as
booth, specific keywords related to design intent (client needs and
model inputs. The four prompts are all combinations of image de-
AL) were extracted from online articles using ChatGPT. Subse-
scriptions, user requirements, and AL. Sketch 1 in Fig. 8a is a high-
quently, the author meticulously reviewed the extracted textual
rise building, while Sketch 2 in Fig. 8b is a building consisting of
annotations and had them confirmed by two experienced archi-
three parts.
tectural experts (with more than 3 yr of architectural design expe-
rience), considering that conventional image descriptions are in-
tuitive. The architects primarily inspected the user requirements, 4.4 Examples of generated images
and AL implied behind the images to eliminate any weird anno- Using the inputs defined above, sample architectural exterior con-
tations. Figure 6 presents an analysis of the frequency of labelled ceptual design images are generated from the proposed and two
134 | AI-Driven architectural exterior concept design
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 6: Distribution of image annotations in the dataset.
(a) (b) (c)
Figure 7: Fine-tuning learning rate and loss: (a) plot of variation in text encoder learning rate, (b) graph of change in U-Net learning rate, and (c) loss
curves.
Table 3: Prompts.
ID Prompt
1 “Design a building located beside a road, reflecting a modern style. The building should embrace deconstructivism.”
2 “Design a building with the function of an art showcase, featuring a flowing sense of space. There should be no humans in the
generated images.”
3 “Design a modern style building. The building should embody a flowing sense of space. There should be some trees beside the
building in the generated images.”
4 “Design a building with the function of an art showcase, embodying a modern style. The architecture should be designed
deconstructively, featuring a flowing sense of space. Generate images of the building on grass under blue sky conditions.”
‘Modern style’ means minimalist design, functional spaces, use of new materials like steel and glass, and clean lines; ‘art showcase’ refers to a building designed as
a work of art itself or a space optimized for displaying art, emphasizing creative aesthetics and functional design; ‘deconstructivism’ refer to a style characterized
by fragmentation, non-linear shapes, and the appearance of controlled chaos, challenging traditional design conventions; ‘flowing sense of space’ describes design
with interconnected areas, creating a continuous, dynamic spatial experience.
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 135
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
(a) (b)
Figure 8: Sketches: (a) Sketch 1, and (b) Sketch 2.
existing methods 9Method 1: Architectural schoolteacher (Tech- Quality-1’, ‘Low Quality-2’, ‘Moderate Quality-3’, ‘High Quality-
nology, 2024, July 24), Method 2: Mnml.ai (Mnml.ai, 2024, July 24)] 4’, and ‘Exceptional Quality-5’. To ensure comprehensive and ex-
as shown in Fig. 9. The proposed method successfully generated pert evaluation, we enlisted professionals in the architectural field
images that align with the requirements set by sketches and tex- (certified architects currently employed at architectural design
tual descriptions. For instance, Sketch 1’s design intent of a two- firms) to complete these questionnaires through the crowdsourc-
part structure (with a shorter top and longer bottom) is depicted ing website Mturk (Amazon, 2024, July 24). We received a total of 50
in the generated images, regardless of prompt variations. Simi- questionnaires. After excluding responses from individuals with
larly, Sketch 2’s three-part structure, including a pavilion and two less than one year of architectural experience, we ended up with
buildings, is effectively realized. In comparison, Method 1 and 39 valid questionnaires; among them, 84.62% have been employed
Method 2 generate images consistent with the sketches. Regarding for more than three years, and 61.54% have been employed for
general description adherence, all methods satisfactorily incorpo- more than 5 yr.
rate the ‘building’ element. The proposed method notably suc- Figure 11 displays the distribution of questionnaire scores for
ceeds in incorporating additional elements like roads (Prompt1), the three methods across five categories. The illustration clearly
no humans (Prompt2), trees (Prompt3), and grass with a blue sky shows that the proposed method garners higher top-tier scores in
(Prompt4), outperforming Method 1 and Method 2. Furthermore, general image description, client needs, and AL. The scores regard-
in meeting client needs, all three methods produce images with ing design quality are relatively consistent across all methods. No-
a ‘modern style’ and ‘art showcase’ functionality. However, the tably, scores of 3 and above indicate successful alignment between
proposed method’s outputs are appealing and fitting for an art images and their corresponding sketches and descriptions. In this
showcase landmark. Regarding architectural design language, the context, Table 5 tabulates the matching rates of the three meth-
proposed method effectively generates images in a ‘deconstruc- ods in these various criteria. The data reveal that the proposed
tivism’ style (Prompt1 and Prompt4) and with a ‘flowing sense of method excels in general image description, client needs, and ar-
space’ (Prompt2 and Prompt3), a feat not fully achieved by Method chitectural design language, registering a matching rate exceed-
1 and Method 2. ing 80%. Although its performance in sketch matching is some-
what lower, the overall average matching rate for the proposed
method is 80.69%. In comparison, the two existing methods fall
short of this benchmark, with method 1 achieving 75.64% and
5. Validation and Results method 2 reaching 73.00%.
This study designed a questionnaire survey based on the evalu- Moreover, we evaluated the proposed method alongside two
ation criteria outlined in Section 3.5 to quantitatively assess the other methods regarding specific scores. First, the results of de-
strengths and weaknesses of our proposed method, as illustrated scriptive statistics, including mean and standard deviation, are
in Fig. 10. Specifically, we devised eight questionnaires to align shown in Table 6. Second, analysis of variance (ANOVA; St &
with the eight combinations of prompts and sketches. Each ques- Wold, 1989) is a statistical analysis method mainly used to com-
tionnaire initially presented images generated by the three meth- pare whether the difference in means between two or more sam-
ods. It is important to note that to ensure accurate assessment, we ples or groups is statistically significant. A two-by-two compari-
randomized the order of the three methods in each questionnaire son can visualize the difference between the two methods more
to form three groups. Respondents were instructed to assess the intuitively. Therefore, we also used IBM SPSS Statistics 25 soft-
degree to which the sketches, general image descriptions, client ware to conduct an ANOVA analysis of the scores of the three
needs, and AL corresponded to the generated images through vi- methods in the questionnaire data in the five aspects of two-by-
sual observation. The response options included ‘Not Matched- two comparisons between different methods, applying Bonferroni
1’, ‘Poorly Matched-2’, ‘Matched-3’, ‘Well Matched-4’, and ‘Very correction (Napierala, 2012) to mitigate the risk of type I errors,
Well Matched-5’. We manually extracted keywords from the in- and the results are shown in Table 7. Specifically, Method 2 is
put prompts after categorization to enhance accurate judgment, significantly better than Method 1 regarding matching sketches,
as detailed in Table 4. Furthermore, participants were requested with a mean difference of 0.375 (P < 0.001). Method 2 is signifi-
to evaluate the overall design quality using five options: ‘Poor cantly better than the proposed method, with a mean difference
136 | AI-Driven architectural exterior concept design
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 9: Results of three methods.
of 0.808 (P < 0.001). Regarding matching general image descrip- 2 was insignificant (P = 0.496). The proposed method is signif-
tion, the proposed method is significantly better than method 1, icantly better than Method 1 in matching client needs with a
with a mean difference of 0.455 (P < 0.001). The proposed method mean difference of 0.385 (P < 0.001). The proposed method is
is significantly better than method 2, with a mean difference of significantly better than method 2, with a mean difference of
0.391 (P < 0.001). The difference between Method 1 and Method 0.484 (P < 0.001). The difference between Method 1 and Method
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 137
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Figure 10: Questionnaire.
Table 4: Keywords of the prompts.
Prompt General image description Client needs AL
1 ‘Building, road’ ‘Modern style’ ‘Deconstructivism’
2 ‘Building, no humans’ ‘Art showcase’ ‘Flowing sense of space’
3 ‘Building, tree’ ‘Modern style’ ‘Flowing sense of space’
4 ‘Building, grass, blue sky’ ‘Art showcase, modern style’ ‘Deconstructivism, flowing sense of space’
2 was insignificant (P = 0.232). Regarding matching the AL, the erate architectural images by design intent; the overall average
proposed method is significantly better than Method 1, with a matching rate for the proposed method is 80.69%. Furthermore, it
mean difference of 0.494 (P < 0.001). The proposed method is is significantly better (P < 0.0167) than existing methods in match-
significantly better than method 2, with a mean difference of ing general image descriptions, client needs, and AL.
0.753 (P < 0.001). Method 1 was significantly better than Method The success of the proposed methodology stems from several
2, with a mean difference of 0.260 (P = 0.005). Regarding design aspects in terms of understanding textualized architectural de-
quality, the differences between the three methods were insignifi- sign requirements. The proposed method constructs an architec-
cant, with p-values greater than 0.05 for two-by-two comparisons. tural design dataset. It adds client needs and AL to the dataset on
Furthermore, we applied the paired t-test (Shi et al., 2024) using top of regular image descriptions, thus creating a new dataset con-
IBM SPSS Statistics 25 software to enhance the robustness of the taining architectural domain knowledge. Based on this, we employ
comparison, with the results presented in Table 8. On the met- the Lora technique to migrate the domain knowledge into gener-
rics of Sketch, Image description, Client needs, and AL, the pro- ative AI. As a result, when we use prompts containing textualized
posed method significantly outperformed the other two methods; design intent as input, the generative AI can understand and gen-
in terms of Design quality, the differences between the three were erate matching images. The ways of designing domain prompts
insignificant. and boosting generative AI with domain knowledge are all affil-
In summary, the proposed method is significantly better iated with prompt engineering (Oppenlaender, 2022). Therefore,
(P < 0.0167) in terms of matching general image description, client this research highlights the importance of Prompt Engineering in
needs and AL without significant differences (P > 0.0167) in design augmenting the capabilities of generative AI in architectural de-
quality with the two existing methods. sign.
In terms of understanding sketches, the success of the pro-
posed method stems from controlling the image generation pro-
cess of the generative AI with ControlNet. Given that the proposed
6. Discussion
method is mainly used in the mass modelling of the conceptual
This study proposed an architectural exterior conceptual design design process, which is more concerned with the shape and form
method that generates images that align with the design intent. of the building. Therefore, as shown in Fig. 12a, the adopted rough
We designed a survey to assess the proposed method’s effec- sketch showing the pavilion and the two buildings, as well as their
tiveness and advantages and compared it with two existing ap- shape and spatial relationship, when inputted into prompt4, suc-
proaches. The results indicate that the proposed method can gen- cessfully generates an image of the building that satisfies both
138 | AI-Driven architectural exterior concept design
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
(a) (b)
(c) (d)
(e)
Figure 11: Histogram of multiple indicators of scoring ratio: (a) sketch-matching score, (b) general image description matching score, (c) client needs
matching score, (d) AL matching score, and (d) AL matching score.
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 139
Table 5: Match rates of the three methods in different aspects.
Methods Sketch General image description Client needs AL Overall mean
Method 1 80.77% 65.71% 84.29% 71.79% 75.64%
Method 2 91.03% 67.63% 75.00% 58.33% 73.00%
Proposed 67.95% 81.41% 91.67% 81.73% 80.69%
Table 6: Results of descriptive statistics for three methods (mean ± standard deviation).
Methods Sketch General image description Client needs AL Design quality
Method 1 3.51 ± 1.102 3.12 ± 1.152 3.48 ± 1.011 3.16 ± 1.116 3.60 ± 0.934
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
Method 2 3.88 ± 1.020 3.18 ± 1.212 3.38 ± 1.134 2.90 ± 1.209 3.61 ± 0.908
Proposed 3.08 ± 1.151 3.57 ± 1.160 3.87 ± 0.958 3.65 ± 1.104 3.55 ± 1.035
Table 7: Results of pairwise comparisons in five aspects using ANOVA for three methods.
95% confidence interval
Different aspects (I) Method (J) Method Mean difference (I–J) Standard error P value Lower limit Upper limit
Sketch Method 1 Method 2 -0.375∗ 0.087 < 0.001 -0.55 -0.20
Method 1 Proposed 0.433∗ 0.087 < 0.001 0.26 0.60
Method 2 Proposed 0.808∗ 0.087 < 0.001 0.64 0.98
Image description Method 1 Method 2 -0.064 0.094 0.496 -0.25 0.12
Method 1 Proposed -0.455∗ 0.094 < 0.001 -0.64 -0.27
Method 2 Proposed -0.391∗ 0.094 < 0.001 -0.58 -0.21
Client needs Method 1 Method 2 0.099 0.083 0.232 -0.06 0.26
Method 1 Proposed -0.385∗ 0.083 < 0.001 -0.55 -0.22
Method 2 Proposed -0.484∗ 0.083 < 0.001 -0.65 -0.32
AL Method 1 Method 2 0.260∗ 0.092 0.005 0.08 0.44
Method 1 Proposed -0.494∗ 0.092 < 0.001 -0.67 -0.31
Method 2 Proposed -0.753∗ 0.092 < 0.001 -0.93 -0.57
Design quality Method 1 Method 2 -0.010 0.077 0.901 -0.16 0.14
Method 1 Proposed 0.045 0.077 0.560 -0.11 0.20
Method 2 Proposed 0.054 0.077 0.479 -0.10 0.21
∗Applying Bonferroni correction to reduce the risk of Type I errors. Since each metric was compared three times, the significance level is 0.05/3 = 0.0167. The name
of the significantly better method in a two-by-two comparison is bolded.
Table 8: Results of pairwise comparisons in five aspects using paired t-tests for three methods.
Different aspects (I) Method (J) Method Paired differences (I–J) t-value Sig. (two-tailed)
Sketch Method 1 Method 2 0.027∗ -13.660 < 0.001
Method 1 Proposed 0.028∗ 15.401 < 0.001
Method 2 Proposed 0.025∗ 31.834 < 0.001
Image description Method 1 Method 2 0.017∗ -3.863 < 0.001
Method 1 Proposed 0.028∗ -16.118 < 0.001
Method 2 Proposed 0.027∗ -14.131 < 0.001
Client needs Method 1 Method 2 0.022∗ 4.471 < 0.001
Method 1 Proposed 0.028∗ -13.942 < 0.001
Method 2 Proposed 0.028∗ -17.079 < 0.001
AL Method 1 Method 2 0.025∗ 10.443 < 0.001
Method 1 Proposed 0.028∗ -17.411 < 0.001
Method 2 Proposed 0.026∗ -28.453 < 0.001
Design quality Method 1 Method 2 0.060 -0.159 0.873
Method 1 Proposed 0.065 0.686 0.493
Method 2 Proposed 0.067 0.804 0.422
∗Applying Bonferroni correction to reduce the risk of Type I errors. Since each metric was compared three times, the significance level is 0.05/3 = 0.0167. The name
of the significantly better method in a two-by-two comparison is bolded.
140 | AI-Driven architectural exterior concept design
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
(a) (b)
(c) (d)
Figure 12: Comparison results of rough and more detailed sketches: (a) rough sketch, (b) generated image by rough sketch, (c) more detailed sketch,
and (d) generated image by more detailed sketch.
the textualized architectural design language and the sketches of scriptions can help to increase the generality of the proposed
the buildings, as shown in Fig. 12b. The architectural exterior con- method.
ceptual design process is an iterative process where the design The use of generative AI in architectural exterior conceptual
of a building is continuously refined. Therefore, inspired by the design has far-reaching implications. Generative AI can generate
generated image, the architect absorbed the design of the flow- photorealistic images comparable to professional architects. This
sensitive eaves and further added the need for a door design, re- technology not only improves the efficiency and speed of design
sulting in a refined sketch, as shown in Fig. 12c. When the same plan generation and shortens the initial drafting time for archi-
was entered in prompt 4, the architectural design matching the tects but also provides innovative design options and enhances
refined sketch was successfully generated. This indicates that, on creativity. In the future, developing a practical web interface or
the one hand, the proposed method can meet the design require- app based on the proposed method will benefit architects and ac-
ments of sketches with different levels of refinement; on the other celerate the design process.
hand, the proposed method can be used in a continuous iterative
process in the conceptual design phase. Architects can cooperate
with generative AI to generate conceptual designs that meet the 7. Conclusions
design intent and are creative. This study proposes an approach to automate the architectural
Despite the proposed method’s success in this study’s ex- exterior conceptual design based on architectural design intent.
periments, it still faces limitations. (i) The dataset’s limited The contributions of this study are as follows: teaching generative
sources, comprising works of only six famous architects, re- AI to learn to understand textual design intent and allowing gen-
strict architectural style diversity. Capturing works from more erative AI to combine textual and non-textual design intent. For
architects is acknowledged as necessary to enrich the dataset this purpose, we constructed an architectural image dataset and
and support diverse architectural designs. (ii) The types of text added general image descriptions, client needs and AL. The SD
annotation in the dataset are constrained, relying on a re- model is fine-tuned using Lora to enable the generative AI to un-
stricted number of images for specific keywords. Future improve- derstand the textualized design intent. In addition, we used Con-
ments involve acquiring more architectural images and adding trolNet to control the SD generation process to generate architec-
richer textual annotations to address broader architectural de- tural conceptual images that conform to the sketches simultane-
sign needs. (iii) Given that textual descriptions corresponding ously.
to architectural images are often incomplete or even missing. We have designed comparative experiments and verified the
Considering how to construct a dataset under conditions of effectiveness of the proposed method using a questionnaire.
insufficient availability and comprehensiveness of textual de- The results indicate that the proposed method can generate
Journal of Computational Design and Engineering, 2024, 11(5), 125–142 | 141
architectural images by design intent; the overall average match- sign thinking. Sustainability, 15, 10573. https://doi.org/10.3390/su
ing rate for the proposed method is 80.69%. It is significantly better 151310573.
(P < 0.0167) than the existing methods in terms of understanding Coates, P., Healy, N., Lamb, C., & Voon, W. (1996). The use of Cellular
general image descriptions, client needs and architectural design Automata to explore bottom up architectonic rules. In Proceedings
language (Mnml.ai and Architectural schoolteacher). This study of the Eurographics UK Chapter 14th Annual Conference. Eurographics
demonstrates the potential of prompt engineering in enhancing Association UK.
the performance of generative AI in architectural design. Dehouche, N., & Dehouche, K. (2023). What’s in a text-to-image
Further prompt engineering efforts are essential to support a prompt? The potential of stable diffusion in visual arts education.
broader range of complex and diverse architectural design intents. Heliyon, 9, e16757. https://doi.org/10.1016/j.heliyon.2023.e16757.
These include expanding the dataset to encompass a greater va- Dutta, K., & Sarthak, S. (2011). Architectural space planning using
riety of data, which will aid the method in generating more di- evolutionary computing approaches: a review. Artificial Intelligence
verse architectural concept sketches. Enriching the dataset with Review, 36, 311–321. https://doi.org/10.1007/s10462- 011- 9217- y.
a broader array of keyword tags, including client needs and AL, Eilouti, B. (2018). Concept evolution in architectural design: an oc-
will further enhance the generative AI’s understanding of com- tonary framework. Frontiers of Architectural Research, 7, 180–196.
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
plex and nuanced architectural design intentions. https://doi.org/10.1016/j.foar.2018.01.003.
Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). Can:
Creative adversarial networks, generating “art” by learning
Conflict of interest statement about styles and deviating from style norms. arXiv preprint
arXiv:1706.07068. https://doi.org/10.48550/arXiv.1706.07068.
None declared.
EvolveLAB. (2024). Veras. https://www.evolvelab.io/veras Accessed 24
July 2024.
Gan, W., Zhao, Z., Wang, Y., Zou, Y., Zhou, S., & Wu, Z. (2024).
Author Contributions UDGAN: a new urban design inspiration approach driven by using
M.S.: Methodology, software, and writing (original draft). H.-L.: Generative Adversarial Networks. Journal of Computational Design
Writing (review and editing). J.S.: Conceptualization and supervi- and Engineering, 11, 305–324. https://doi.org/10.1093/jcde/qwae0
sion. S.H.C.: Writing (review and editing) and validation. B.X.: Writ- 14.
ing (review and editing). Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D.,
Ozair, S., Courville, A., & Bengio, Y., (2014). Generative adversarial
nets. Advances in Neural Information Processing Systems, 27, 1–9. ht
Data availability tps://doi.org/10.48550/arXiv.1406.2661.
Hartoonian, G. (2010). Frank Gehry: roofing, wrapping, and wrapping
Data will be available upon request.
the roof. The Journal of Architecture, 7, 1–31. https://doi.org/10.108
0/13602360110114759.
Hitaj, B., Ateniese, G., & Perez-Cruz, F. (2017). Deep models under the
References GAN: Information leakage from collaborative deep learning. In
Proceedings of the ACM SIGSAC Conference on Computer and Commu-
Amazon. (2024). Amazon Mechanical Turk (Mturk). https://www.mturk. nications Security.
com Accessed 24 July 2024. Hu, E. J., Shen, Y., Wallis, P, Allen-Zhu, Z, Li, Y., Wang, S., Wang, L.,
Anzalone, P., & Clarke, C. (2003). Architectural applications of com- & Chen, W. (2021). Lora: low-rank adaptation of large language
plex adaptive systems. In Proceedings of the 2003 Annual Conference models. arXiv preprint arXiv:2106.09685. 10, 1–26. https://doi.org/
of the Association for Computer Aided Design in Architecture. Ball State 10.48550/arXiv.2106.09685.
University. Jo, H., Lee, J.-K., Lee, Y.-C., & Choo, S. (2024). Generative artificial in-
Baltus, V., & Žebrauskas, T. (2019). Parametric design concept in ar- telligence and building design: early photorealistic render visu-
chitectural studies. Architecture and Urban Planning, 15, 96–100. alization of façades using local identity-trained models. Journal of
https://doi.org/10.2478/aup- 2019- 0013. Computational Design and Engineering, 11, 85–105. https://doi.org/
Castro Pena, M. L., Carballal, A., Rodríguez-Fernández, N., Santos, I., & 10.1093/jcde/qwae017.
Romero, J. (2021). Artificial intelligence applied to conceptual de- Joye, Y. (2011). A review of the presence and use of fractal geometry
sign. A review of its use in architecture. Automation in Construction, in architectural design. Environment and Planning B: Planning and
124, 103550. https://doi.org/10.1016/j.autcon.2021.103550. Design, 38, 814–828. https://doi.org/10.1068/b36032.
Chang, S., Dong, W., & Jun, H. (2020). Use of electroencephalogram Kakooee, R., & Dillenburger, B. (2024). Reimagining space layout de-
and long short-term memory networks to recognize design pref- sign through deep reinforcement learning. Journal of Computational
erences of users toward architectural design alternatives. Journal Design and Engineering, 11, 43–55. https://doi.org/10.1093/jcde/q
of Computational Design and Engineering, 7, 551–562. https://doi.or wae025.
g/10.1093/jcde/qwaa045. Krūgelis, L. (2018). 3D printing technology as a method for discover-
Chen, J., Wang, D., Shao, Z., Zhang, X., Ruan, M., Li, H., & Li, J. (2023). ing new creative opportunities for architecture and design. Land-
Using artificial intelligence to generate master-quality architec- scape Archit. Art, 13, 87–94. https://doi.org/10.22616/j.landarchart
tural designs from text descriptions. Buildings, 13, 2285. https: .2018.13.10.
//doi.org/10.3390/buildings13092285. Li, H., Gu, J., Koner, R., Sharifzadeh, S., & Tresp, V. (2023). Do DALL-E
Chen, X., Kang, S. B., Xu, Y.-Q., Dorsey, J., & Shum, H.-Y. (2008). Sketch- and Flamingo understand each other. Proceedings of the IEEE/CVF
ing reality: realistic interpretation of architectural designs. ACM International Conference on Computer Vision.
Transactions on Graphics (TOG), 27, 1–15. https://doi.org/10.1145/13 Mnml.ai. (2024). Architecture AI Design Assistant(mnml.ai). https://mn
56682.1356684. ml.ai/explore Accessed 24 July 2024.
Chen, Y., & Kitagawa, K. (2023). Locally based architectural construc- Mora, R., Bédard, C., & Rivard, H. (2008). A geometric modelling frame-
tion strategies in rural China: textual analysis of architects’ De- work for conceptual structural design from early digital architec-
142 | AI-Driven architectural exterior concept design
tural models. Advanced Engineering Informatics, 22, 254–270. https: Shi, Z., Jin, N., Chen, D., & Ai, D. (2024). A comparison study of se-
//doi.org/10.1016/j.aei.2007.03.003. mantic segmentation networks for crack detection in construc-
Napierala, M. A. (2024). What is the Bonferroni correction? In Aaos tion materials. Construction and Building Materials, 414, 134950.
Now (pp. 40–41). https://docs.ufpr.br/∼giolo/LivroADC/Material/S https://doi.org/10.1016/j.conbuildmat.2024.134950.
3_Bonferroni%20Correction.pdf Accessed 24 July 2014. Song, J., Lee, J.-K., Choi, J., & Kim, I. (2020). Deep learning-based ex-
Ni, Z., Wei, L., Li, J., Tang, S., Zhuang, Y., & Tian, Q. (2023). traction of predicate-argument structure (PAS) in building design
Degeneration-tuning: Using scrambled grid shield unwanted rule sentencesI. Journal of Computational Design and Engineering, 7,
concepts from stable diffusion. Proceedings of the 31st ACM Inter- 563–576. https://doi.org/10.1093/jcde/qwaa046.
national Conference on Multimedia. St, L., & Wold, S. (1989). Analysis of variance (ANOVA). Chemometrics
Oppenlaender, J. (2022). Prompt engineering for text-based genera- and Intelligent Laboratory Systems, 6, 259–272. https://doi.org/10.1
tive art. arXiv preprint arXiv:2204.13988. 6, 1–18. https://doi.org/ 016/0169- 7439(89)80095- 4.
10.48550/arXiv.2204.13988. Technology, S. C. N. (2024). Architectural Schoolteacher. https://jianzhux
Pérez, R. I. P. (2024). Blurring the Boundaries between Real and Artificial uezhang.com Accessed 24 July 2024.
in Architecture and Urban Design through the Use of Artificial Intelli- Thyssen, M. H., Emmitt, S., Bonke, S., & Kirk-Christoffersen, A. (2010).
Downloaded from https://academic.oup.com/jcde/article/11/5/125/7749580 by guest on 11 November 2024
gence. Universidade da Coruña. http://hdl.handle.net/2183/19688 Facilitating client value creation in the conceptual design phase
Accessed 24 July 2024. of construction projects: A workshop approach. Architectural Engi-
Pourzolfaghar, Z., Ibrahim, R., Abdullah, R., & Adam, N. M. (2014). A neering and Design Management, 6, 18–30. https://doi.org/10.3763/
technique to capture multi-disciplinary tacit knowledge during aedm.2008.0095.
the conceptual design phase of a building project. Journal of Infor- Wang, B., Zhu, Y., Chen, L., Liu, J., Sun, L., & Childs, P. (2023). A study of
mation & Knowledge Management, 13, 1450013. https://doi.org/10.1 the evaluation metrics for generative images containing combi-
142/S0219649214500130. national creativity. Artificial Intelligence for Engineering Design, Anal-
Qiu, S., Fok, S., Chen, C., & Xu, S. (2002). Conceptual design using evo- ysis and Manufacturing, 37, e11. https://doi.org/10.1017/S0890060
lution strategy. The International Journal of Advanced Manufacturing 423000069.
Technology, 20, 683–691. https://doi.org/10.1007/s001700200207. Wen, W., Hong, L., & Xueqiang, M. (2010). Application of fractals in
Rapone, G., & Saro, O. (2012). Optimisation of curtain wall façades for architectural shape design. In Proceedings of the 2010 IEEE 2nd Sym-
office buildings by means of PSO algorithm. Energy and Buildings, posium on Web Society. IEEE.
45, 189–196. https://doi.org/10.1016/j.enbuild.2011.11.003. Wikberg, F., Olofsson, T., & Ekholm, A. (2014). Design configuration
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. with architectural objects: Linking customer requirements with
(2023). Dreambooth: Fine tuning text-to-image diffusion models system capabilities in industrialized house-building platforms.
for subject-driven generation. Proceedings of the IEEE/CVF Confer- Construction Management and Economics, 32, 196–207. https://doi.
ence on Computer Vision and Pattern Recognition. IEEE. org/10.1080/01446193.2013.864780.
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, Xia, C., Zhu, Y., & Lin, B. (2008). Building simulation as assistance in
R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., the conceptual design. Building Simulation, 1, 46–52. https://doi.or
Schramowski, P., Kundurthy, S., Crowson, K., Schmidt, L. Kaczmar- g/10.1007/s12273- 008- 8107- y.
czyk, R., & Jitsev, J. (2022). Laion-5b: An open large-scale dataset Yi, H., & Kim, I. (2022). Differential evolutionary cuckoo-search-
for training next generation image-text models. Advances in Neu- integrated tabu-adaptive pattern search (DECS-TAPS): A novel
ral Information Processing Systems, 35, 25278–25294. https://doi.or multihybrid variant of swarm intelligence and evolutionary algo-
g/10.48550/arXiv.2210.08402. rithm in architectural design optimization and automation. Jour-
Sengupta, A., & Mishra, V. K. (2014). Integrated particle swarm op- nal of Computational Design and Engineering, 9, 2103–2133. https:
timization (i-PSO): An adaptive design space exploration frame- //doi.org/10.1093/jcde/qwac100.
work for power-performance tradeoff in architectural synthesis. Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control
In Proceedings of the Fifteenth International Symposium on Quality Elec- to text-to-image diffusion models. In Proceedings of the IEEE/CVF
tronic Design. IEEE. International Conference on Computer Vision. IEEE.
Sharafi, P., Teh, L. H., & Hadi, M. N. (2015). Conceptual design opti- Zhao, S. (2021). Using artificial neural network and WebGL to algo-
mization of rectilinear building frames: A knapsack problem ap- rithmically optimize window wall ratios of high-rise office build-
proach. Engineering Optimization, 47, 1303–1323. https://doi.org/10 ings. Journal of Computational Design and Engineering, 8, 638–653.
.1080/0305215X.2014.963068. https://doi.org/10.1093/jcde/qwab005.
Received: May 23, 2024. Revised: August 14, 2024. Accepted: August 14, 2024
© The Author(s) 2024. Published by Oxford University Press on behalf of the Society for Computational Design and Engineering. This is an Open Access article distributed
under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use,
distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]