AI Infrastructure Engineer (DevOps/MLOps)
Techvantage.ai is a next-generation technology and product engineering company at the forefront of innovation in Generative AI, Agentic AI, and autonomous intelligent systems. We build intelligent, cutting-edge solutions designed to scale and evolve with the future of artificial intelligence.
Role Overview:
We are looking for a skilled and versatile AI Infrastructure Engineer (DevOps/MLOps) to build and manage the cloud infrastructure, deployment pipelines, and machine learning operations behind our AI-powered products. You will work at the intersection of software engineering, ML, and cloud architecture to ensure that our models and systems are scalable, reliable, and production-ready.
Role Overview:
We are looking for a skilled and versatile AI Infrastructure Engineer (DevOps/MLOps) to build and manage the cloud infrastructure, deployment pipelines, and machine learning operations behind our AI-powered products. You will work at the intersection of software engineering, ML, and cloud architecture to ensure that our models and systems are scalable, reliable, and production-ready.
Key Responsibilities:
- Design and manage CI/CD pipelines for both software applications and machine learning workflows.
- Deploy and monitor ML models in production using tools like MLflow, SageMaker, Vertex AI, or similar.
- Automate the provisioning and configuration of infrastructure using IaC tools (Terraform, Pulumi, etc.).
- Build robust monitoring, logging, and alerting systems for AI applications.
- Manage containerized services with Docker and orchestration platforms like Kubernetes.
- Collaborate with data scientists and ML engineers to streamline model experimentation, versioning, and deployment.
- Optimize compute resources and storage costs across cloud environments (AWS, GCP, or Azure).
- Ensure system reliability, scalability, and security across all environments.
Requirements:
- 5+ years of experience in DevOps, MLOps, or infrastructure engineering roles.
- Hands-on experience with cloud platforms (AWS, GCP, or Azure) and services related to ML workloads.
- Strong knowledge of CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI).
- Proficiency in Docker, Kubernetes, and infrastructure-as-code frameworks.
- Experience with ML pipelines, model versioning, and ML monitoring tools.
- Scripting skills in Python, Bash, or similar for automation tasks.
- Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK, CloudWatch, etc.).
- Understanding of ML lifecycle management and reproducibility.
Preferred Qualifications:
- Experience with Kubeflow, MLflow, DVC, or Triton Inference Server.
- Exposure to data versioning, feature stores, and model registries.
- Certification in AWS/GCP DevOps or Machine Learning Engineering is a plus.
- Background in software engineering, data engineering, or ML research is a bonus.
What We Offer:
- Work on cutting-edge AI platforms and infrastructure
- Cross-functional collaboration with top ML, research, and product teams
- Competitive compensation package – no constraints for the right candidate