On March 14 (local time), Figure, a high-profile startup, released the first demo of its humanoid robot, Figure 01, powered by OpenAI’s advanced model. This marks one of the first results of Figure’s collaboration with OpenAI to enhance humanoid robotic capabilities.
Although utilizing only a single neural network, Figure 01, as showcased in a series of official videos, can interact with humans, understand and execute commands, and perform tasks with remarkable fluidity.
Last month, Figure secured a massive $675 million investment from OpenAI, Microsoft, NVIDIA, and others to develop humanoid robots designed to supplement human labor in repetitive and hazardous warehouse and retail jobs. This funding has pushed the company’s valuation to $2.6 billion.
Additionally, Figure has entered into a collaboration agreement with OpenAI to extend the capabilities of its multimodal large models (VLM) into robotic perception, reasoning, and interaction—paving the way for so-called ‘embodied intelligence.’
With Figure 01 now officially unveiled, this milestone comes just 13 days after the company’s successful Series B funding round.
Figure 01 Video Demonstration
According to videos released by Figure, Figure 01 can smoothly perform tasks such as handing over an apple, cleaning up trash into a bin, and placing dishes on a drying rack.
More importantly, many of Figure 01’s actions and responses are based on open-ended questions and requests from users. It logically processes the queries to devise solutions, meaning it can converse, think, and learn—bringing it closer to human-like intelligence than traditional robots.
At the beginning of the video, Figure emphasizes that the robot’s actions are entirely based on spoken logical reasoning, utilizing an end-to-end neural network. The entire footage is captured in a single take without any acceleration or edits.
Figure’s founder, Brett Adcock, reinforced this in a social media post, stating that all of Figure 01’s behaviors were learned without any remote control. He also noted that the robot’s speed has significantly improved and is now approaching human-like efficiency.
The ‘Brain’ Behind Figure 01
Figure describes Figure 01 as the world’s first commercially viable general-purpose humanoid robot. It stands at 5 feet 6 inches (about 1.68 meters), weighs 132 pounds (60 kg), has a payload capacity of 44 pounds (20 kg), operates for up to 5 hours on a single charge, and moves at a speed of 3.9 feet per second (1.2 meters per second).
In Figure 01, OpenAI’s model powers advanced vision and language intelligence, while Figure’s neural network enables rapid, low-level, and dexterous robotic movements.
Earlier this month, Figure announced plans to develop the next generation of humanoid AI models based on OpenAI’s latest GPT technology. The company is training its robots using proprietary motion data, enabling them to engage in conversations, perceive their surroundings, and execute complex tasks.
Following the release of the demonstration video, Figure 01’s senior AI engineer, Corey Lynch, detailed its technical principles on social media. Figure 01 can describe its visual experiences, plan future actions, reflect on past memories, and verbally explain its reasoning process.
Specifically, the robot’s speech capability is based on a large ‘text-to-speech’ model. Figure AI processes images captured by the robot’s cameras and transcribes spoken audio into text. These inputs are fed into OpenAI’s multimodal model, allowing simultaneous understanding of images and text. The model then generates language responses based on the processed information.
During execution, the same model determines how to respond to a given command by activating specific neural network weights on its GPU, thereby executing the corresponding strategies.
Brett Adcock also highlighted that Figure has internally developed all of Figure 01’s key components, including motors, middleware operating systems, sensors, and mechanical structures—all designed by the company’s engineers.
The Future Is Here: The Rise of Embodied Intelligence
NVIDIA’s founder and CEO, Jensen Huang, has predicted that ‘embodied intelligence’ will drive the next wave of AI innovation.
Figure was founded in 2022 and had already made significant AI advancements before partnering with OpenAI. At the time, Brett Adcock revealed that the company planned to focus on developing humanoid robots with AI systems, low-level controls, and other core functionalities over the next one to two years.
In January 2024, Figure 01 successfully implemented an end-to-end neural network, allowing it to self-correct errors. After just 10 hours of training, it learned how to make coffee. By February, Figure 01 was already operating in warehouses, handling transportation tasks with autonomous navigation, object recognition, and task prioritization. However, its speed was only 16.7% of a human’s.
Figure is also actively working on real-world applications. Recently, the company signed a commercial agreement with BMW to deploy general-purpose robots in automotive manufacturing. Figure 01 has already begun testing at a factory in South Carolina, USA.
While many AI researchers believe general-purpose humanoid robots will take decades to become mainstream, robotics expert Eric Jang cautioned: ‘Don’t forget, ChatGPT’s rise happened almost overnight.’
Powered by OpenAI’s latest models, Figure 01 may come at a premium price. However, Figure has yet to disclose any pricing details. That said, Brett Adcock has expressed optimism about making Figure 01 more affordable in the future.
For more updates on AI and robotics, visit dailynewspapers.in.