COMPSs runtime and infrastructure manager (RE2)

Context And Mission

The Computer Sciences (CS) department of the Barcelona Supercomputing Center aims to conduct research and development to influence how computing machines are built, programmed and used. The Workflows and Distributed Computing (WDC) group at the Barcelona Supercomputing Center aims to carry out research on programming models for distributed computing.

COMPSs is a task-based, parallel programming model offered to application developers. In COMPSs, applications are described as sequential programs and annotations are added to identify tasks. The tasks are functions of the application that can run in parallel among them. At execution time, the tasks are identified by the COMPSs runtime, which manages their execution in a distributed computing infrastructure, such as supercomputers, clouds, or edge-to-cloud platforms (see compss.bsc.es for more details).

The group is looking for an engineer with at least five years of experience in a similar job to maintain and further extend the COMPSs runtime. It is a must that the candidate excels in Java programming in parallel systems. The group also maintains a CD/CI infrastructure for the COMPSs project and other software projects. The candidate will also manage such infrastructure and should have previous experience in these or similar tasks (i.e., experience in infrastructure management with Jenkins). Finally, the sought candidate would require experience in HPC systems at the user level and expertise in interacting with job schedulers (especially with SLURM).

Other valued expertise includes containers and Kubernetes, parallel programming, distributed computing, and scheduling for task-based environments in distributed systems. In addition to Java, COMPSs supports Python and C/C++ syntax. Thus, expertise in these programming languages will be valued.

The job also includes active participation in the projects where the group is involved, attending project meetings, collaborating with partners and writing deliverables.


Key Duties

- Maintenance and development of COMPSs runtime
- Maintenance and management of the group CD/CI infrastructure
- Contribution and maintenance of COMPSs documentation
- The candidate will work closely with other research members on the team of the Workflows and Distributed Computing group
- Management of COMPSs releases (twice a year)

Requirements

Education
- Computer science degree, master on computer science will be valued

Essential Knowledge and Professional Experience
- Proficient skills in Java programming
- Previous experience in HPC systems at user level
- Knowledge of HPC job schedulers
- Experience in CD/CI infrastructure management

Additional Knowledge and Professional Experience
- Previous experience in PyCOMPSS/COMPSs and its runtime or in similar task-based environments
- Knowledge in distributed computing
- Programming skills in Python, C/C++ and R

Competences
- Fluency in spoken and written English, while fluency in other European languages will be also valued

Conditions

- The position will be located at BSC within the Computer Sciences Department
- We offer a full-time contract (37.5h/week), a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, private health insurance, support to the relocation procedures
- Duration: Open-ended contract due to technical and scientific activities linked to the project and budget duration
- Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement
- Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
- Starting date: 01-09-2024