Eureka : A human-level reward design algorithm implemented by encoding large language models.

Eureka

AI Model Training and Inference AI Development Assistant #Reward Design #Reinforcement Learning #Language Models Standard Picks Open Source

Overview :

Eureka is a human-level reward design algorithm implemented by encoding large language models. It leverages the zero-shot generation, code writing, and context improvement capabilities of state-of-the-art language models (such as GPT-4) to evolve and optimize reward code. The generated rewards can be used to acquire complex skills through reinforcement learning. Eureka-generated reward functions outperform human-expert-designed reward functions in 29 open-source reinforcement learning environments, including 10 diverse robot morphologies. Eureka is also capable of flexibly improving reward functions to enhance the quality and safety of generated rewards. By combining with curriculum learning, using Eureka reward functions, we first demonstrate a simulated Shadow Hand performing a pen-twirling trick, skillfully manipulating the pen at a fast speed within a circle.

Target Users :

Suitable for tasks requiring reward design and reinforcement learning.

Total Visits： 3.0K

Top Region： US(93.71%)

Website Views ： 66.0K

Features

Utilizes large language models for reward design

Generates complex reward functions through evolutionary optimization

Uses generated reward functions for reinforcement learning