Job Description
We are building the foundational architecture for the next generation of intelligence. Nexus Horizon Labs is seeking a visionary Senior AI Infrastructure Architect to lead our efforts in deploying scalable, high-performance AI systems aligned with the technological landscape of 2026 and beyond.
In this role, you will bridge the gap between theoretical AI research and production-grade engineering. You will be responsible for designing robust cloud-native ecosystems, optimizing Large Language Models (LLMs), and ensuring our infrastructure can handle petabyte-scale data processing with zero-latency inference.
If you are passionate about the future of Artificial General Intelligence and want to shape the roadmap for the year 2026, we want to meet you.
Responsibilities
- Architect Enterprise-Grade AI Pipelines: Design and implement scalable data pipelines and model serving architectures capable of handling high-throughput inference loads.
- Optimize LLM Performance: Fine-tune and optimize Large Language Models for specific verticals, focusing on cost-efficiency and speed.
- Cloud & Infrastructure Strategy: Lead the migration and management of AI workloads on AWS or Azure, leveraging serverless and containerized technologies (Kubernetes).
- System Reliability: Implement advanced monitoring and observability tools to ensure 99.99% uptime for critical AI services.
- Mentorship: Guide a team of junior data scientists and ML engineers, fostering a culture of technical excellence and innovation.
Qualifications
- Education: Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related technical field.
- Experience: 5+ years of experience in software engineering, with at least 3 years specifically in AI/ML infrastructure or MLOps.
- Technical Stack: Deep proficiency in Python, PyTorch/TensorFlow, and SQL. Experience with Docker, Kubernetes, and Terraform is required.
- Cloud Mastery: Expert-level experience with cloud platforms (AWS, GCP, or Azure) and their native AI services.
- Problem Solving: Proven track record of solving complex distributed systems problems and optimizing deep learning model inference times.