About NUS IT
NUS Information Technology is the cornerstone to providing reliable, high-performance and secure IT solutions and effective IT governance for the campus. Here at NUS IT, we aim to transform NUS into a borderless computing community providing knowledge at its fingertips by enhancing the use of effective applications and services for teaching and learning.
We drive a culture that is forward-looking. With a strong passion for IT, our people are always striving to improve, push boundaries and innovate with a "can-do" attitude. We embrace collaboration, open communication and knowledge sharing. If you see yourself thriving in a dynamic environment and breaking new grounds with innovative ideas, you will find yourself at home in NUS IT.
As part of our team, you can look forward an empowered work environment that allows you to take charge of your own career path. We provide competitive remuneration as well as flexible work arrangements to enable your growth and development. We pride ourselves on our diverse workforce and are committed to transforming NUS into a leading global University shaping the future.
https://nusit.nus.edu.sg/
Job Description
This role requires an experienced HPC (High-Performance Computing) Architect or Engineer to lead our operation team in designing, deploying, enhancing, and managing our HPC infrastructure.
Duties and Responsibilities
• Lead the administration and operation of our HPC infrastructure (both on-premise and/or cloud), including hardware, software, and networking components.
• Lead the development of HPC infrastructure together with lead architect and management team.
• Perform project management activities and vendor management on strategic HPC infrastructure projects.
• Develop, implement, and document standard operation procedures and best practices for HPC operations, including system monitoring, performance tuning, and security.
• Ensure the operation process and practices adhere to prevailing institutional policies and governance principle.
• Ensure the HPC infrastructure’s availability and performances are according to the predefined SLA and/or standards. Perform corrective actions if the metrics are not met.
Qualifications
• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
• 5+ years of experience in HPC operations and administration. Preferably with 3+ years of experience managing an operational team and/or HPC projects.
• Strong knowledge of HPC technologies, including high-performance computing hardware, parallel file systems, job schedulers, and cluster management software.
• Proficient in scripting and programming languages such as Python, Bash, and Perl.
• Experience with Linux operating systems and system administration.
• Experience in administering an HPC workload scheduler, e.g. PBS pro, Slurm, SGE, etc.
• Experience in administering a parallel file system, e.g. GPFS, LUSTRE, etc.
• Excellent analytical and problem-solving skills.
• Strong written and verbal communication skills.
• Ability to work in a team environment and lead other engineers.
More Information
Location: Kent Ridge Campus
Organization: NUS Information Technology
Department : Infrastructure - Research Computing
Employee Referral Eligible: Yes
Job requisition ID : 27632