Home Job Details
N
Information Technology 🏒 Full Time ⭐️ Verified

Senior AI Safety & Alignment Engineer

Nebula Horizon Labs
San Francisco
Estimated Salary
USD 180.000 – USD 260.000
Live Update
12 Mei 2026
Deadline
12 Mei 2027

Job Description

Shape the Future of Safe Artificial Intelligence.

At Nebula Horizon Labs, we are building the foundational architecture for the next generation of sentient systems. As a Senior AI Safety & Alignment Engineer, you will be at the forefront of ensuring that advanced AI models remain beneficial, transparent, and aligned with human values. If you are passionate about the ethical implications of deep learning and possess the technical prowess to build robust guardrails, we want to hear from you.

Join a team of world-class researchers and engineers dedicated to solving the most complex challenges in AI alignment before 2026.

Responsibilities

  • Develop and implement rigorous Red-Teaming protocols to identify and mitigate vulnerabilities in large language models and neural networks.
  • Design and deploy Reinforcement Learning from Human Feedback (RLHF) pipelines to fine-tune model behaviors in complex, ambiguous scenarios.
  • Create scalable monitoring tools to detect and prevent Adversarial Attacks and model manipulation in real-time.
  • Collaborate with cross-functional teams of AI Researchers, Policy Experts, and UX Designers to define safety guidelines for emerging AGI technologies.
  • Conduct extensive research on Interpretability techniques to understand the decision-making processes of black-box models.
  • Document safety assessments and contribute to open-source libraries focused on AI governance.

Qualifications

  • Ph.D. or Master’s degree in Computer Science, Cognitive Science, or a related field with a focus on AI/ML.
  • 5+ years of professional experience in Machine Learning, Natural Language Processing (NLP), or Reinforcement Learning.
  • Deep understanding of Transformer Architectures and modern Large Language Models (LLMs).
  • Proven track record of conducting adversarial analysis or safety audits on production models.
  • Strong proficiency in Python, PyTorch, or TensorFlow.
  • Excellent written and verbal communication skills for translating complex technical concepts for diverse stakeholders.
  • Experience with formal verification methods or game-theoretic approaches to AI safety is a major plus.

Required Skills

Artificial Intelligence Machine Learning Python PyTorch TensorFlow NLP RLHF Red-Teaming Adversarial Attacks AI Ethics Deep Learning

Ready to Take This Challenge?

Make sure your resume is ready. Submit your application now before the deadline.

Apply Now

Related Jobs

Similar job recommendations for you

View All