ChatGPT’s highly-anticipated vision capabilities might be coming soon, according to some eagle-eyed sleuths.
Android Authority spotted some lines of code in the Advanced Voice Mode part of the latest ChatGPT v1.2024.317 beta build, which point to something called “Live camera.” The code appears to be a warning to users to not use Live camera “for live navigation or decisions that may impact your health or safety.”
Another line in the code seems to give instructions for vision capabilities saying, “Tap the camera icon to let ChatGPT view and chat about your surroundings.”
ChatGPT’s evolving capabilities: Vision, voice, and beyond
ChatGPT’s ability to visually process information was a major feature debuted at the OpenAI event last May, launching GPT-4o. Demos from the event showed how GPT-4o could use a mobile or desktop camera to identify subjects and remember details about the visuals. One particular demo featured GPT-4o identifying a dog playing with a tennis ball and remembering that it’s name is “Bowser.”
Since the OpenAI event and subsequent early access to a few lucky alpha testers, not much has been said about GPT-4o with vision. Meanwhile, OpenAI shipped Advanced Voice Mode to ChatGPT Plus and Team users in September.
If ChatGPT’s vision mode is imminent as the code suggests, users will soon be able to test out of both components of the new GPT-4o features teased last spring.
OpenAI has been busy lately, despite reports of diminishing returns with future models. Last month, it launched ChatGPT Search, which connects the AI model to the web, providing real-time information. It is also rumored to be working on some kind of agent that’s capable of multi-step tasks on the user’s behalf, like writing code and browsing the web, possibly slated for a January release.