Platforms Engineer (MLOps) for AI Singapore

Date: 19 Apr 2024

Location: UNIV ADMIN, Kent Ridge Campus, SG

Company: National University of Singapore

Job Description

AI Singapore (AISG) is a national AI programme launched by the National Research Foundation (NRF) to anchor deep national capabilities in Artificial Intelligence (AI).

 

The programme office is hosted by the National University of Singapore (NUS) and brings together all Singapore-based research institutions and the vibrant ecosystem of AI start-ups and companies developing AI products to perform use-inspired research, grow the knowledge, create the tools, and develop the talent to power Singapore's AI efforts.

 

Since our inception in 2017, we have established a culture of respect, continuous learning, experimentation and curiosity, with all our work centered around innovation. The candidate will join a talented team of AI engineers, data scientists, consultants, data engineers and platforms engineers, who are all inspired by the opportunity to work on emerging technologies and lead Singapore into an AI-powered future.

 

Candidates, especially seniors, will be expected to provide technical leadership, engage stakeholders independently, mentor junior engineers and apprentices, and contribute ideas to improve the system.

 

As a Platforms Engineer (MLOps) under the Platforms Engineering group, you'll help build and operate modern infrastructures and systems to run large-scale machine learning and deep learning workloads. You will also design, develop, and maintain the AISG platform and tooling stack to enable AIAP, 100E project teams, and Partners to build better and faster products.

 

Duties & Responsibilities

  • Evaluate, architect, deploy, and maintain platform & tooling stack for empowering AISG engineers in carrying out their roles & responsibilities.
  • Mentor AISG apprentices and assist whenever needed in developing end-to-end MLOps workflows to facilitate the AI lifecycle and ensure that solutions are delivered efficiently and sustainably.
  • Act as an intermediary between the Platforms Engineering and AI project teams.
  • Assist the InfraOps, DataOps and Experiences teams in building and maintaining production infrastructure to be resilient, secure, and high-performing.
  • Implement infrastructure as code (IaC) processes to automate our systems' configuration, provisioning, deployment, and monitoring.
  • Collaborate with AISG’s partners to design, implement, and deploy new systems and improvements to existing systems.
  • Document and troubleshoot issues arising from our systems when they occur.
  • Develop tools and software that improve and automate infrastructure provisioning.
  • Propose and drive technical decisions to completion for the aforementioned responsibilities, including documentation.

Qualifications

  • Excellent communication skills, including thoughtful listening skills and the ability to express complex ideas clearly and succinctly.
  • The ability to reason about engineering issues holistically using engineering fundamentals and knowledge about architecture.
  • A systematic approach to development and engineering, such as debugging, DevOps and MLOps practices, and agile software development.
  • Have held at least one engineering or infrastructure position for a minimum of 2 years.
  • Proficient in at least one commonly used programming language, such as Python, Ruby, Go, Rust, Javascript, Java, C#, etc. Proficiency in Python would be preferred.
  • Proficient in administering Linux systems.
  • Basic proficiency and understanding of machine learning concepts including data analysis, predictive modelling and model evaluation.
  • Proficiency in at least one automation tool (Ansible, Chef, Puppet, Bash, PowerShell, etc.).
  • Familiarity with virtualisation technologies (KVM, VMWare, etc.).
  • Familiarity with the container and container orchestration technologies (Docker, rkt, Singularity, Kubernetes, Docker Swarm, Helm, etc.).
  • Familiarity with public cloud providers such as AWS, Microsoft Azure, Google Cloud Platform
  • Experience with deployment of applications on cloud or distributed systems.
  • An aptitude for automated system design and implementation (automated deployments and automated testing).

 

We will also consider folks with AICE Associate certification and above, who exhibit the aptitude and potential for the above skills, despite a lack of experience.

More Information

Location: Kent Ridge Campus

Organization: Office of the Deputy President(Res&Tech)

Department : AI Singapore

Employee Referral Eligible: No

Job requisition ID : 23796