Search Results
7 results for “alignment”
AI Alignment
AI alignment is the field of research dedicated to ensuring that artificial intelligence systems pursue goals, values, and behaviours that are consistent with human intentions.
AI Red Teaming
A structured adversarial evaluation practice in which testers attempt to elicit harmful, unsafe, or policy-violating behaviour from AI systems in order to surface risks before deployment.
AI Safety
AI safety is a field of research and practice concerned with the development of artificial intelligence systems that behave reliably, avoid harmful outputs, and remain aligned with human values, especially as systems become more capable.
Anthropic
Anthropic is an American AI safety company and large language model developer founded in 2021 by former OpenAI researchers, best known for developing the Claude family of AI assistants and the Constitutional AI alignment technique.
Constitutional AI
Constitutional AI is an alignment method developed by Anthropic that trains language models to follow a set of written ethical principles by using the model itself to critique and revise its own outputs, reducing dependence on human feedback for harmlessness.
DALL-E
DALL-E is a series of text-to-image generative AI models developed by OpenAI that create photorealistic and artistic images from natural language prompts using diffusion and language-vision alignment techniques.
Reinforcement Learning from Human Feedback
A machine learning technique that trains a reward model from human preference data and uses it to align large language models with human values, safety requirements, and intended behaviour through reinforcement learning.