Job Description
Shape the Future of Safe Artificial Intelligence.
At Nebula Horizon Labs, we are building the foundational architecture for the next generation of sentient systems. As a Senior AI Safety & Alignment Engineer, you will be at the forefront of ensuring that advanced AI models remain beneficial, transparent, and aligned with human values. If you are passionate about the ethical implications of deep learning and possess the technical prowess to build robust guardrails, we want to hear from you.
Join a team of world-class researchers and engineers dedicated to solving the most complex challenges in AI alignment before 2026.
Responsibilities
- Develop and implement rigorous Red-Teaming protocols to identify and mitigate vulnerabilities in large language models and neural networks.
- Design and deploy Reinforcement Learning from Human Feedback (RLHF) pipelines to fine-tune model behaviors in complex, ambiguous scenarios.
- Create scalable monitoring tools to detect and prevent Adversarial Attacks and model manipulation in real-time.
- Collaborate with cross-functional teams of AI Researchers, Policy Experts, and UX Designers to define safety guidelines for emerging AGI technologies.
- Conduct extensive research on Interpretability techniques to understand the decision-making processes of black-box models.
- Document safety assessments and contribute to open-source libraries focused on AI governance.
Qualifications
- Ph.D. or Masterβs degree in Computer Science, Cognitive Science, or a related field with a focus on AI/ML.
- 5+ years of professional experience in Machine Learning, Natural Language Processing (NLP), or Reinforcement Learning.
- Deep understanding of Transformer Architectures and modern Large Language Models (LLMs).
- Proven track record of conducting adversarial analysis or safety audits on production models.
- Strong proficiency in Python, PyTorch, or TensorFlow.
- Excellent written and verbal communication skills for translating complex technical concepts for diverse stakeholders.
- Experience with formal verification methods or game-theoretic approaches to AI safety is a major plus.