Search Results
4 results for “ai safety”
AI Red Teaming
A structured adversarial evaluation practice in which testers attempt to elicit harmful, unsafe, or policy-violating behaviour from AI systems in order to surface risks before deployment.
AI Safety
AI safety is a field of research and practice concerned with the development of artificial intelligence systems that behave reliably, avoid harmful outputs, and remain aligned with human values, especially as systems become more capable.
Anthropic
Anthropic is an American AI safety company and large language model developer founded in 2021 by former OpenAI researchers, best known for developing the Claude family of AI assistants and the Constitutional AI alignment technique.
Reinforcement Learning from Human Feedback
A machine learning technique that trains a reward model from human preference data and uses it to align large language models with human values, safety requirements, and intended behaviour through reinforcement learning.