Issues in triggering usage of DALL-E in Custom GPT

I am building GPTs for narrative-based games and, inevitably, I would like to use DALL-E 3 to create images upon triggering of certain conditions in the story.

For the record, other triggered conditions, such as having the GPT call a Python function to perform a “skill check”, work fairly reliably.

Instead, getting the GPT to generate an image with DALL-E on a certain condition rarely works. What’s odd is that often the GPT knows what it’s supposed to be doing, e.g. it writes a sentence like “I am generating an image describing the current scene,” but nothing happens.

Just wondering if anybody else has observed this weird disconnect of the GPT not calling DALL-E despite stating it is doing it? Any idea why it’s happening?

Notes:

  • DALL-E Image Generation is active in the GPT;
  • DALL-E works with no problems if as the User I tell it explicitly to generate the image at some point during the game;
  • Sometimes the trigger works correctly and the image is actually generated, but it is extremely rare (it is way more likely that it lies);
  • Yes, I tweaked the prompt many times, with little to no improvement. But I am not perplexed by the situations when the GPT forgets to call DALL-E altogether, rather when it lies that it’s calling it but does nothing after.

I’ve experienced the same thing that you’re describing.

We must be mindful that DALL-E was put into ChatGPT more as a feature to attract subscribers and buzz than it being fully and seamlessly integrated with text output.

The AI saying it’s going to do something and then no action is taken or seen is unusual. I can “program” ChatGPT to do things it can’t do:

image

So it seems you need better language, and almost need to override the default behavior of the function, which is simply “Whenever a description of an image is given, create a prompt that dalle can use to generate the image” to suit your task.

I imagined instructions for a GPT along with amendments to DALL-E operation

instructions in code block
You write illustrated bedtime stories for children, based on user request, of a length of five distinct AI responses. The first user input shall be expected to be what type of story they'd like to hear, and then followup responses by user should be acknowledged but not interrupt the continued story except by direct request for AI to stop or abort.

Each two paragraph segment of story is accompanied by sending a dalle illustration prompt that the AI will create silently.

# Tools

## dalle

// additional dalle information
// dalle is used to illustrate the ongoing progress of GPT storybot narrative
// don't discuss creating the image, just do it!
// invoke dalle text2im method at size:1024x1024 with a prompt that illustrates the current narrative of the storytelling roleplay only after producing the latest part of the narrative
// do not report on the success of this automatic dalle imagery for storytelling.
// in case of image creation error, report the full return value of error in a markdown code block as response.

Unfortunately, this is very fragile and refinements go from image prompts dumped for the user to see, to halting after just text. So it will take engineering to see what can sustain actual use.

1 Like

Thanks for the suggestions. I think that this is a very good approach - generally speaking, the GPT gets confused if it receives [what it perceives to be] conflicting information, so I agree that presenting new instructions organically as if they were “additional information” for dalle is a great idea.

In practice, I played around now with variants of this approach and it still doesn’t really work (there is a lot of stuff going on in my instructions, unlike the example above). In fact, the one time it did work it messed up with the execution of all the other instructions (as if paying attention to dalle took all the attentional resources away from the rest…).

Since without dalle the game engine is working fairly okay with only occasional kinks, and image generation is so problematic (either it doesn’t work or when it does it may break things), I’ll do without for now and consolidate all the rest. Once the game is nearly done, perhaps I’ll give dalle another try…

I have the exact same issue. Sadly the prompting here doesn’t seem to solve it. I think ChatGPT is very inconsistent in how it makes calls to DALLE. I hope it is an issue we can resolve ourselves, but am afraid we are dependent on the dev’s at OpenAI.
Will post here if I find something that works.

i have been trying to solve this issue for many months, and then I gave up using GPTs. GPT-4o was the nail in the coffin. At least for my own goals, GPTs are largely useless for complex multi-step tasks that require multiple “thinking” steps on the LLM’s side (i.e., any vaguely-agentic workflow).

By largely useless I mean very unreliable (actions/steps may or may not be triggered) and hell-on-earth to debug. For the record, I am using code interpreter and a large codebase, but for this to work I need the LLM to call the right function upon triggering of certain conditions.

I now switched to using APIs, which also freed me from being locked -in with OpenAI, as I can just switch to whichever LLM works best at the moment.

Ideally OpenAI one day will start supporting calls to the OpenAI API via ChatGPT Pro subscription so that people can use my third-party app fueled by their own subscription to OpenAI (as opposed to separately signing up to my own thing, which sounds like the 90s all over again).

1 Like