Research Data Engineers (RE1-RE2) – Biomedical data integration and FAIR data management

About BSC

The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, was a founding and hosting member of the former European HPC infrastructure PRACE (Partnership for Advanced Computing in Europe), and is now hosting entity for EuroHPC JU, the Joint Undertaking that leads large-scale investments and HPC provision in Europe. The mission of BSC is to research, develop and manage information technologies in order to facilitate scientific progress. BSC combines HPC service provision and R&D into both computer and computational science (life, earth and engineering sciences) under one roof, and currently has over 900 staff from 55 countries.

Look at the BSC experience:
BSC-CNS YouTube Channel
Let's stay connected with BSC Folks!

We are particularly interested for this role in the strengths and lived experiences of women and underrepresented groups to help us avoid perpetuating biases and oversights in science and IT research. In instances of equal merit, the incorporation of the under-represented sex will be favoured.

Context And Mission

The Life Sciences Department from the Barcelona Supercomputing Center is establishing a new team on Health Data Research within the Life Sciences Department that will collaborate closely with the INB/ELIXIR-ES unit. The Spanish National Bioinformatics Institute (INB) is the major bioinformatics platform in Spain, and the Spanish representative in the European bioinformatics infrastructure ELIXIR. INB is fully oriented to complete a solid integration with International Bioinformatics and Health Research Data infrastructures, including the ELIXIR consortium.

This new team has its focus on the massive amount of health data available for research in the last years and the recent efforts at national and European levels to tap on those data to benefit patients. Using such data helps scientists to understand diseases and health conditions better and provides new ways of identifying patients at higher risk. It also helps to prevent illness and to provide better treatments. This, in turn, improves the health services' efficiency, optimising their resources where they matter the most.

The selected candidate will work on the acquisition, normalisation and standardisation of Health Research Data, and will collaborate with the rest of the Health Research Data team in analysing and understanding data sources, participating in the design, and providing insights and guidance on database technologies, data modelling and data publication under FAIR best practices of biomedical research relevant datasets, working in a highly sophisticated HPC environment. The candidate will have access to state-of-the-art systems and computational infrastructures and establish collaborations with experts in different areas both at international and local levels.

Female candidates are especially encouraged to apply.

Key Duties

Data processing pipelines
Data model transformations and standardisation (OMOP, etc.)
Components of a multi-modal distributed database.
Query, model and deploy with PostgreSQL, Redis, RabbitMQ.
ETL with SQL and Python.
Design, implement & test microservices.
Integrate our developments with external APIs.
Development and testing of our core products and frameworks.
Testing, logging, measuring and alerting of the deployed services.
Monitor the existing metrics, analyse data, and collaborate with other teams in an effort to identify and implement a system and process improvements.
Ensure that the collected data is of high quality and optimal for use across the team and the research community at large.
Oversee activities of the junior team members, ensuring proper execution of their duties and alignment with business vision and objectives.
Document and publish data and software under FAIR standards.


. Education
- MSc in computer science, data science or bioinformatics.
- Alternatively, an engineering degree, with a strong computer science background, with demonstrated experience in data engineering applications.
. Essential Knowledge and Professional Experience
- Experience in data engineering methodologies.
- Programming: SQL and Python will be your main tools of the trade.
- Be comfortable with container development with Docker and or kubernetes.
- Work experience in a collaborative software development environment (git, CI/CD, documentation, etc).
- Interest in computer science applications for biomedical sciences.
. Additional Knowledge and Professional Experience
- Knowledge and experience in biomedical research.
- Experience with good data management practices (FAIR data, data quality, etc.).
- Fluency in spoken and written English.
. Competences
- Capacity to explore new technologies.
- Good communication and presentation skills.
- Ability to work both independently and within a team.


- The position will be located at BSC within the Life Sciences Department
- We offer a full-time contract (37.5h/week), a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, private health insurance, support to the relocation procedures
- Duration: Open-ended contract due to technical and scientific activities linked to the project and budget duration
- Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement
- Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
- Starting date: asap