API for image generation for gpt-4o model

How do i generate an image from the new gpt-4o model? i am unable to get the modalities to be set to image. it gives an error. setting modalities to text work.

2 Likes

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

https://openai.com/index/introducing-4o-image-generation/

4 Likes

Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.

1 Like

The real question is whether the API will support image editing or only image generation.

Currently you can’t even use Dall-e 3 for image editing in the API, only Dall-e 2 is supported.

2 Likes

Or rather, whether they give you real multi-modal gpt-4o, or just an images endpoint loaded up with rules as its prompt.

thanks all. whats the best way to keep up with all the announcements?

This usually comes a bit after Twitter:

This is what OpenAI is doing right now with GPT-4o image gen in ChatGPT. It’s a totally crippled and frustrating implementation. The images don’t truly stay in context, and it’s just passing a text string prompt to another endpoint with GPT-4o doing the generative work.

I was expecting truly in-context capabilities. Hopefully, the API allows this.

1 Like

Quick update—turns out I was wrong here. After experimenting more, the text2im schema ChatGPT uses (below) seems mostly for UI facilitation. The actual prompt string itself is basically irrelevant; GPT-4o image gen clearly draws directly from the chat conversation context. Reference IDs (referenced_image_ids) appear to have no real functional impact. One uncertainty is exactly how much of the chat stays in visual context, but it’s definitely more limited than the main GPT-4 chat context.

Here’s the schema for clarity:

type text2im = (_: {
  prompt?: string,
  size?: string,
  n?: number,
  transparent_background?: boolean,
  referenced_image_ids?: string[],
}) => any;

What I would like to know if the price is going to be calculated by tokens like text and if it is going to be cheaper then dalle-3 in the API, given the render times I think It will be drastically higher in price making it even more unaffordable for a tool that would be very powerful if used in an automated api call sequence, unless they provide something like a turbo model for images.

1 Like

Yep. I’d expect image generation pricing to be similar to audio tokens—way more expensive per token, given how much denser and information-rich image data is compared to plain text. Brace yourself!

1 Like

I think it’ll able to the reason is because it works on natural text, unlike dall e where you need to inpaint…
But I’m not sure that will it be able to retain memory with unique ID or each time we need to provide input a image to manipulate it!
Also will the input needs to be in base64 or it’ll be able to process public url

Does this have api for image generation or not? Where are the API docs. Get your act together stop wasting my time.

2 Likes

any update on the timeframe for this release?

1 Like