Content from: @DrJimFan
We are pleased to announce the launch of Eureka, an open agent that has a reward function designed for superhuman level robot flexibility. It’s like the Voyager of the physics simulator API world!
Eureka bridges the gap between high-level reasoning (coding) and low-level motor control. It is a “mixed gradient architecture”: a black box, with only inference LLM indicating a white box, a learnable neural network. The outer loop runs GPT-4 to refine the reward function (without gradients), while the inner loop runs reinforcement learning to train the robot controller (based on gradients).
Eureka can be extended thanks to IsaacGym, a GPU-accelerated physics simulator that increases real-world speeds by 1000 times. In the benchmark suite of 29 tasks for 10 robots, Eureka rewarded tasks that were better than those written by experts on 83% of tasks, with an average improvement of 52%. We were surprised that Eureka was able to learn pen rotation techniques, even for CGI artists, making animations frame by frame is very difficult!
Eureka also supports a new form of contextual RLHF that incorporates human operator feedback into natural language to guide and adjust reward functions. It can serve as a powerful co-pilot for robot engineers to design complex motor behaviors.
As always, open source everything!
In robot learning, LLMs are good at generating advanced plans and intermediate movements such as pick and place (VIMA, RT-1, etc.), but lack complex high-frequency motor control.
Eureka! The moment for us (pun intended) is that the reward function through coding is a key gateway through which LLMs can take risks to acquire dexterous skills.
Eureka achieves human-level reward design by developing reward functions in context. There are 3 key components:
- The simulator environment code activates the initial “seed” reward function as a context jump.
- Massive parallel reinforcement learning on GPUs can quickly evaluate a large number of candidate rewards.
- The reward reflex produces targeted reward mutations in the context.
First, by using the original IsaacGym environment code as a context, Eureka can already generate usable reward plans without requiring any task-specific reminder projects.
This makes Eureka an open, versatile reward designer with minimal hacking.
Second, Eureka generates many candidate rewards at each evolutionary step and then evaluates them using a full RL training cycle. Normally, this is very slow and can take days or even weeks.
Thanks to NVIDIA’s GPU-native robot training platform IsaacGym ( https://developer.nvidia.com/isaac-gym ), we were able to scale up, and compared to real-time, the platform speeds up simulation by 1000 times. The internal RL cycle can now be completed in minutes!
Finally, Eureka relies on reward reflexes, which are automatic text summaries trained by RL. This allows Eureka to perform targeted reward mutations, thanks to GPT-4 ‘s excellent contextual code repair capabilities.
If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank
Code Base: http://eureka-research.github.io
Thesis: http://arxiv.org/abs/2310.12931
Code: http://github.com/eureka-research/Eureka
Oil tubing: