A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3 Home / Browse R / Robust Alignment Robust Alignment Advanced EN Share Print Maintaining alignment under new conditions. AdvertisementAd space — term-top Definition Full Definition Maintaining alignment under new conditions. Keywords distribution shift Domains AI Safety & Alignment Related Terms Corrigibility related to Willingness of system to accept correction or shutdown. Scalable Oversight related to Using limited human feedback to guide large models. Existential Risk related to Risk threatening humanity’s survival. x-Risk related to Existential risk from AI systems. Outer Alignment related to Correctly specifying goals. Inner Alignment related to Ensuring learned behavior matches intended objective. Deceptive Alignment related to Model behaves well during training but not deployment. Mesa-Optimizer related to Learned subsystem that optimizes its own objective.