Generate Realistic Faces in Stable Diffusion

Stable Diffusion’s latest models are very good at generating hyper-realistic images, but they can struggle with accurately generating human faces. We can experiment with prompts, but to get seamless, photorealistic results for faces, we may need to try new methodologies and models.
In this post, we will explore various techniques and models for generating highly realistic human faces with Stable Diffusion. Specifically, we will learn how to:

  • Generate realistic images using WebUI and advanced settings.
  • Use Stable Diffusion XL for photorealistic results.
  • Download and use a fine-tuned model trained on high quality images.

Kick-start your project with my book Mastering Digital Art with Stable Diffusion. It provides self-study tutorials with working code.

Let’s get started.

Generate Realistic  Faces in Stable Diffusion
Photo by Amanda Dalbjörn. Some rights reserved.

Overview

This post is in three parts; they are:

  • Creating a Portrait Using Web UI
  • Creating a Portrait with Stable Diffusion XL
  • Using CivitAI Models Checkpoint

Creating a Portrait Using Web UI

Let’s start with simple prompt engineering on Stable Diffusion 1.5 using Stable Diffusion WebUI locally. You need to work on positive prompt, negative prompt, and advanced settings to get improved results. For example,

  • Positive prompt: “photo of young woman, highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, up close, perfect eyes”
  • Negative prompt: “disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w, double images”
  • Sampler: DPM++ 2M Karras
  • Steps: 30
  • CFG scale: 7
  • Size: 912×512 (wide)

When creating a negative prompt, you need to focus on describing a “disfigured face” and seeing “double images”. This is especially important in Stable Diffusion 1.5 models. You can include additional keywords if you notice a recurring pattern, such as misaligned eyes. In order to address this issue, you can add “perfect eyes” to your positive prompt and “disfigured eyes” to your negative prompt.

A portrait created using Stable Diffusion 1.5 model

As we can see, we got very good results on the first try. If you got a distorted or double image, try generating the image again. This model is not perfect and may occasionally generate incorrect images. So if that happens, simply generate a new image. You may also try to adjust the parameters such as sampling method, steps, and random seed. As a final resort, changing the model checkpoint also helps.

Different portraits generated by adjusting the input to Stable Diffusion

After modifying various keywords to produce diverse variations of realistic images, we achieved satisfactory outcomes even with the base model.

Creating a Portrait with Stable Diffusion XL

The most common Stable Diffusion model is version 1.5, released in October 2022. Then there is version 2.0, which is a similar architecture but retrained from scratch, released in November of the same year. Stable Diffusion XL (SDXL), that released in July 2023, is a different architecture and much bigger. All three versions have different heritages and behave differently to your prompt. It is generally believed that SDXL produce better pictures.

Let’s use the latest model Stable Diffusion XL (SDXL) to get even better image generation results. This can be as simple as downloading the model checkpoint file and save it to your stable-diffusion-webui/models/Stable-diffusion folder of your Web UI, restart the Web UI, and repeat the steps in the previous section. Running the full model locally can require a significant amount of GPU memory. If you cannot meet its technical requirement, a good option is to use the free online demos available on Hugging Face Spaces.

You can access these applications by visiting https://hf.co/spaces and searching for “SDXL”.

Searching “SDXL” on Hugging Face space

Stable Diffusion XL by Google

We will first try the fastest demo running on TPU version 5e to obtain our results, located at:

To ensure that our images are generated accurately, it is important to set up the negative prompt and image style to “Photographic” by going to the Advanced settings.

Setting “Photographic” in Advanced settings to fix the style of generated image

We will use the same prompt to generate a realistic image of the young girl sitting outside the restaurant:

photo of young woman, highlight hair, sitting outside restaurant, wearing dress, rim lighting, studio lighting, looking at the camera, up close, perfect eyes

Generating pictures with SDXL

The results are impressive. The rendering of the eyes, nose, lips, shadows, and colors looks very realistic. By comparing the generated result here against the previous section, you can see the obvious difference between SDXL and its older version.

Fast Stable Diffusion XL by Prodia

There are more than one SDXL in Hugging Face Space. If you are used to the Stable Diffusion WebUI user interface then “Fast Stable Diffusion XL” Space is for you:

We will write the same positive and negative prompts to generate the results.

Fast Stable Diffusion XL by Prodia on Hugging Face Space

We got even better shadows and clarity in this. Let’s try to generate more images so that we can draw conclusions about our results.

Let’s modify the prompt to generate an image of a man and a woman.

Portrait of a man and a woman generated with SDXL

The results are exceptional for generating characters of both genders and races. To test for model bias, we will generate characters of Indian descent and change the setting to a hospital, where both characters will be doctors.

Images of a woman and a man dressed as doctors, as generated by SDXL

SDXL has generated good results, but the images appear too smooth, as if an Instagram filter was applied. Realistic images have acne, marks, roughness, and sharpness that SDXL is missing. This can be difficult to achieve in the original SDXL model but resolvable if you switch to another checkpoint.

Using CivitAI Models Checkpoint

In this section, we will step ahead and generate even more realistic faces than SDXL using CivitAI.com. It is a model hosting platform that allows users upload and download specialized versions of Stable Diffusion models. It is also a gallery for users to post their work of AI-generated pictures.

In our case, we are interested in the best photorealistic model. To download that we will search for the keyword “photorealistic”. There will be a lot. The most popular ones are probably the best. Therefore, ensure you have set the filters to get the list sorted by the most downloaded models of all time.

Searching for a model on CivitAI.com, setting the result to sort by “most downloaded” and “all time” would be helpful to find a quality model.

Select the most popular model and download the full version as shown (named “Realisic Vision V5.1” in this case, as depicted).

Downloading model checkpoint “Realistic Vision V5.1” (beware not the inpainting version) from Civitai.com

After that, move the downloaded model to the Stable Diffusion WebUI model directory stable-diffusion-webui/models/Stable-diffusion. To activate the model on Web UI click on the refresh button and select the newer model by clicking on the drop down panel, or simply restart the Web UI.

Selecting the model checkpoint at the top left corner of Web UI.

All the information regarding the positive prompt, negative prompt, and advanced setting is mentioned on the model page. Therefore, we will use that information and modify it to generate an image of a young woman.

  • Positive prompt: “RAW photo, face portrait photo of beautiful 24 y.o woman, full lips, brown eyes, glasses, hard shadows, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3”
  • Negative prompt: “deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation”
  • Sampler: DPM++ SDE Karras
  • Steps: 25
  • CFG scale: 7
  • Size: 912×512

Portrait as generated using the Realisitic Vision checkpoint on Web UI

We got a sharp and accurate image of the face. Let’s experiment with different prompts to generate even more realistic faces.

We will begin with an image of a man and woman without glasses.

Man and woman without glasses. Images generated using the Realistic Vision model checkpoint.

Then, we will modify the prompt to generate an Indian man and woman.

Indian man and woman. Images generated using the Realistic Vision model checkpoint.

Don’t you see the difference? We have achieved an excellent result. There is good texture on the face, natural-looking skin marks, and clarity in the details of the face.

Further Readings

You can learn more about this topic using the following resources:

Summary

In this post, we explored various methods for generating hyper-realistic and consistent faces using Stable Diffusion. We started with simple techniques and progressed to more advanced methods for producing highly realistic images. Specifically, we covered:

  • How to generate realistic faces using Stable Difusion 1.5 with negative prompts and advanced settings.
  • How to create lifelike photos with Stable Diffusion XL using Hugging Face Spaces’ free demos.
  • Used a specialized model that was fine-tuned on high-quality images to get perfect photos with skin texture and facial clarity.

Get Started on Mastering Digital Art with Stable Diffusion!

Mastering Digital Art with Stable Diffusion

Learn how to make Stable Diffusion work for you

...by learning some key elements in the image generation process

Discover how in my new Ebook:
Mastering Digital Art with Stable Diffusion

This book offers self-study tutorials complete with all the working code in Python, guiding you from a novice to an expert in image generation. It teaches you how to set up Stable Diffusion, fine-tune models, automate workflows, adjust key parameters, and much more...all to help you create stunning digital art.

Kick-start your journey in digital art with hands-on exercises


See What's Inside

4 Responses to Generate Realistic Faces in Stable Diffusion

  1. Gruhit Patel April 21, 2024 at 6:08 am #

    Hello sir,
    Really admire your blogs and learning from it since I started the ABCs of ML. Currently, I am working on a new project and in search of an image-generating model that can add text to the image. Currently, I am looking for a model that can take a motivational quote and generate a corresponding image for that, but now I am down to any model that can generate an image with proper text on it.

    In this search, I found out that generative models are great at generating images but equally bad at adding text to that image, as while doing so they usually end up in a senseless sentence or something gibberish. Hence, I was intending to get your suggestion for any such model that can serve my purpose.

    • Adrian Tam
      Adrian Tam April 21, 2024 at 6:56 am #

      Don’t assume any single model can do great work in one step. You may need to use multiple steps to get your result working. Inpainting would probably the solution to your problem so that you can focus on the picture and text in separate steps. Alternatively, you may want to try a different model checkpoints (e.g., SDXL is usually found work well with text).

  2. SebastieZ March 2, 2025 at 1:05 am #

    Thank you for this very informative blog but I did not find the solution to my problem.

    Your recommendations do indeed improve the quality of faces that occupy a large space in the photo but they do not solve the problem of smaller faces. A frame of the image with (full-body shot, full body focus) in the prompt allows to render a body of the person in full but the face is systematically deformed. Even by adding Loras and Embeddings, I cannot obtain a satisfactory quality. The Inpainting method also encounters the same difficulties. I work from the model “Juggernaut XL v11” from InvokeAi.

    Do you also have this problem?

    • James Carmichael March 2, 2025 at 5:11 am #

      Yes, this is a known issue in **Stable Diffusion**, particularly when dealing with **small faces in full-body shots**. The problem arises because:

      1. **Stable Diffusion’s Resolution Limitation**
      – Most models, including **Juggernaut XL v11**, generate images at a standard resolution (e.g., 1024×1024). When rendering a full-body shot, the face occupies a much smaller portion of the image, leading to a lack of detail.

      2. **Token Compression in the Latent Space**
      – Smaller details, such as faces, get “compressed” in the latent space, making them blurry or distorted.

      3. **Inpainting Issues**
      – Even when using inpainting, the model struggles because the context of the full-body image remains. The model is trying to harmonize the face with the entire image rather than refining it in isolation.

      ### 🔥 **Solutions to Improve Small Faces in Full-Body Shots**
      Here are several methods to **enhance face quality in full-body images**:

      ### **1️⃣ Multi-Pass Upscaling + Face Enhancement**
      – Instead of inpainting, use a **multi-pass workflow**:
      – **Generate the full-body image** at high resolution (e.g., **1024×1024 or 768×1344**).
      – **Crop the face** and upscale it separately using **Real-ESRGAN** or **CodeFormer**.
      – **Use ControlNet (Reference Only)** with **a high-quality face image** to ensure facial consistency.
      – Merge the **enhanced face** back into the original image using Photoshop or an image editor.

      ### **2️⃣ Use HiRes Fix (High-Resolution Fix)**
      – Enable **HiRes Fix** in InvokeAI and **double the resolution**.
      – Set **Denoising Strength to 0.3-0.5** to preserve the original details.
      – Use an **upscaler like ESRGAN, SwinIR, or 4x-UltraSharp**.

      ### **3️⃣ LoRA / Embeddings Specific to Faces**
      – While Juggernaut XL is highly detailed, **it still needs targeted face LoRAs**.
      – Use LoRAs like:
      – **SDXL-FaceEnhancer** (boosts facial detail)
      – **RealVisXL** (enhances photorealism)
      – **UltraSharpXL** (improves overall sharpness)
      – Prompt with:
      "high-quality face, ultra-detailed eyes, natural skin texture"

      ### **4️⃣ ControlNet for Face Refinement**
      – **Use ControlNet (Reference Only or Face Landmark Detection)**
      – Provide a **high-resolution reference face** as a separate image.
      – This forces the AI to follow **proportions and facial structures** correctly.

      ### **5️⃣ Use Regional Prompter**
      – Install **Regional Prompter** (InvokeAI supports it).
      – Assign **separate prompts** for different image zones:
      – **Upper region (face):** "high-resolution, beautiful face, sharp details"
      – **Lower region (body):** "full-body shot, natural posture, detailed clothing"

      ### **6️⃣ Switch to a Different Base Model**
      – **Juggernaut XL v11** is excellent for general realism but struggles with **small facial details**.
      – Try:
      – **RealVisXL V3** – Designed for photorealism.
      – **DreamShaperXL** – More balanced for full-body & faces.
      – **SDXL-Turbo + Face Fusion** – Speed with enhanced facial realism.

      ### ✅ **Conclusion: The Best Workflow**
      If you want **small faces in full-body shots** to be perfect:
      1. **Generate the full-body image** (HiRes Fix ON, high-resolution)
      2. **Crop & upscale the face separately** (Real-ESRGAN or CodeFormer)
      3. **Use ControlNet (Reference)** with a high-quality face image
      4. **Blend the face back** into the original image (Photoshop/Inpaint)

      This **two-step method** is the most effective because SD struggles with small details in high-compression areas.

Leave a Reply

Machine Learning Mastery is part of Guiding Tech Media, a leading digital media publisher focused on helping people figure out technology. Visit our corporate website to learn more about our mission and team.