Highlights:

  • DALL-E, a transformer language model that lets users generate and alter creative pictures using natural language prompts, joins GPT-3, Embeddings, and Codex on Open AI’s API platform.

Developers may now incorporate DALL-E directly into their applications and products thanks to OpenAI’s public beta release of the DALL-E API.

DALL-E, a transformer language model that lets users generate and alter creative pictures using natural language prompts, joins GPT-3, Embeddings, and Codex in Open AI’s API platform.

Cala, a platform for fashion design, and Mixtiles, a company that prints internet photographs on lightweight decorative tiles, have already deployed and tested the API for their particular use cases.

Meanwhile, Microsoft is also integrating DALL-E into its new graphic design app, Designer. It is also integrating DALL-E into Bing and Microsoft Edge with Image Creator, allowing users to generate images if online search results do not provide the desired results. Shutterstock also announced last week that it would use the API to give consumers DALL-E-generated photos.

OpenAI will continue to iterate DALL-E API

The API will be accessible to everyone on the OpenAI platform, according to Luke Miller, product manager at OpenAI.

With the API in beta, “we’ll continue to iterate and improve through the end of the year,” Luke Miller said. “We’re really excited for all the ways that developers can take this technology and customize it for specific needs, specific applications, and specific communities, to scale further than we ever could.”

DALL-E’s fast-paced journey to cultural touchstone

The DALL-E API is yet another significant step for the text-to-image generator, which, since the release of DALL-E 2 just six months ago, has become a part of the mainstream pop culture zeitgeist.

Simultaneously, there have been several outcries and heated arguments about the possibility of a legal battle over copyright ownership of DALL-E photographs, how DALL-training E’s data may reflect bias, and DALL-accuracy E’s and capacity.

However, Open AI asserts that three million individuals already use DALL-E to stimulate creativity and accelerate processes, creating over four million photos daily. They claim that developers may now begin developing using DALL-E within minutes.

From side projects to startups

Miller noted that this includes making it as simple as possible to join up and sunning by signing up, obtaining an API key, and beginning the development.

Rowan Curran, an AI and ML analyst at Forrester Research, feels the DALL-E API would be “very valuable” for developers if it permits picture modification and enhancement.

API price will be per image

The DALL-E API is priced per output picture according to its size. 1024 x 1024 costs USD 0.02/Image, while 512 x 512 and 256 x 256 cost USD 0.018/image and USD 0.016/image, respectively.

Miller described that the API has three capabilities. Users can develop a picture, modify a portion of the image, and generate many variants of the image.

According to Curran, historically, one of the limitations around big language models overall has been the cost involved in running them. Therefore, if the pricing of the DALL-E API is reasonable, it will “open up a wide host of use cases, particularly for businesses and individuals receiving initial financing,” he added.

However, he emphasized that major organizations, particularly innovation teams, would likely also choose to utilize the DALL-E API.

Rowan Curran said, “In addition to that, I expect to see that drive more enterprise-level research and usage in terms of adopting and fine-tuning their large language models for various use cases. Because I think that ability to take the large language models, add this fine-tuning layer on top for some of these really specific industries is where it’s going to really start to be very game-changing.”

Questions about trust and safety

Critics continue to raise concerns over the trustworthiness and safety of generative AI in general, and DALL-E in particular, saying that fake photos could be used to bully and harass, for example, or spread disinformation and spur violence. In May, researchers stated that the instrument might potentially promote negative preconceptions about women and people of color.

The news that photos produced with the API would not require a watermark – introduced during the DALL-E 2 beta but is optional with the API – may not please those with ethical and legal concerns regarding DALL-E.

However, in a press statement, OpenAI asserted that the DALL-E API is “incorporating the trust and safety lessons we’ve learned while deploying DALL-E to 3 million artists and users worldwide.”

With the API, “developers can ship with confidence knowing that built-in mitigations – like filters for hate symbols and gore – will handle the challenging aspects of moderation,” the press release continued. “As a part of OpenAI’s commitment to responsible deployment, we will continue to make trust and safety a top priority so that developers can focus on building.”

Mixtiles uses DALL-E API to make memories

Eytan Levit, the co-founder of Tel Aviv-based Mixtiles, stated that the firm quickly saw DALL-E 2’s potential and signed up for early access.

Levit said that DALL-E users have a learning curve for the first time. “For example, you need to know which styles you can use, such as an oil painting, digital art, pencil sketch, or watercolor,” he said. “We’ve learned that referencing the time of day materially affects your results, while color palettes also help with getting great pictures.”

Using the API, Mixtiles’ way forward has been to guide the user through several processes, with each step bringing them closer to the creation of emotionally resonant artwork.

Ultimately, he said, Mixtiles is betting that generative AI and DALL-E constitute a technical breakthrough “equivalent to the invention of paper, the picture frame, canvas print or the invention of computer graphics — we think it’s going to fuel an explosion of new use cases, of human creativity and emotional connection.”

For Mixtiles, this entails allowing clients to upload family photos and portraits and then personalize these images.