Scaling legal guidelines for reward mannequin overoptimization

In reinforcement studying from human suggestions, it is not uncommon to optimize towards a reward mannequin skilled to foretell human preferences. As a result of the reward mannequin is an imperfect proxy, optimizing its worth an excessive amount of can hinder floor fact efficiency, in accordance with Goodhart’s regulation. This impact has been often noticed, however not fastidiously measured as a result of expense of gathering human desire information. On this work, we use an artificial setup during which a hard and fast “gold-standard” reward mannequin performs the position of people, offering labels used to coach a proxy reward mannequin. We research how the gold reward mannequin rating adjustments as we optimize towards the proxy reward mannequin utilizing both reinforcement studying or best-of-n sampling. We discover that this relationship follows a distinct purposeful kind relying on the tactic of optimization, and that in each instances its coefficients scale easily with the variety of reward mannequin parameters. We additionally research the impact on this relationship of the dimensions of the reward mannequin dataset, the variety of reward mannequin and coverage parameters, and the coefficient of the KL penalty added to the reward within the reinforcement studying setup. We discover the implications of those empirical outcomes for theoretical concerns in AI alignment.

Author:
Date: 2022-10-19 03:00:00

Source link

Scaling legal guidelines for reward mannequin overoptimization

Subscribe

Related articles

Remodeling Database Entry: The LLM-based Textual content-to-SQL Method

Registration for Thailand’s digital pockets launches

Focused PyPi Package deal Steals Google Cloud Credentials from macOS Devs

Self-Route: A Easy But Efficient AI Technique that Routes Queries to RAG or Lengthy Context LC primarily based on Mannequin Self-Reflection

IT techniques for US safety clearances in danger, GAO says

LEAVE A REPLY Cancel reply

About us

Company

Must Read

Remodeling Database Entry: The LLM-based Textual content-to-SQL Method

Registration for Thailand’s digital pockets launches

Subscribe