Job Description
We are seeking a skilled and motivated Research Associate to join our environmental informatics team. In this role, the candidate will build and maintain the data infrastructure that underpins our environmental monitoring and early-warning systems. The work will involve diverse, high-volume data streams — including rainfall records, temperature sensors, radar imagery, and computer vision outputs — to deliver a unified, query table, and secure data platform that drives research, operational decision-making, and stakeholder dashboards.
Key Responsibilities
1. Environmental Data Platform
• Design, build, and maintain a unified database to ingest and store diverse environmental data streams: rain gauge records, gridded temperature data, rainfall radar (e.g. OPERA, NEXRAD), satellite imagery, and computer vision model outputs.
• Define and enforce common data schemas and ontologies across heterogeneous source formats (NetCDF, HDF5, GeoTIFF, CSV, JSON, REST/API feeds).
• Implement scalable ingestion pipelines supporting real-time streaming and batch historical loads.
• Ensure data traceability with robust metadata, provenance tracking, and versioning.
2. Data Processing & Quality Assurance
• Develop and maintain automated pipelines for data cleaning, outlier detection, and quality flagging.
• Implement missing-data imputation methods appropriate to environmental time-series and spatial fields (e.g. interpolation, climatological fill, ML-based gap-filling).
• Apply noise-removal algorithms (e.g. signal filtering, radar clutter suppression, spike detection) across sensor and remote-sensing data types.
• Document processing logic and maintain reproducible workflow configurations.
3. Visualisation & Dashboards
• Design and develop interactive dashboards for operational and research users, displaying spatial maps, time-series plots, and aggregated statistics.
• Integrate visualisation tools (e.g. Grafana, Superset, Plotly Dash, or custom web front-ends) with the data backend.
• Collaborate with domain scientists to translate monitoring requirements into effective visual analytics.
• Ensure dashboards remain performant and responsive under live data load.
4. Data Security & Governance
• Implement and maintain role-based access control (RBAC) for all data assets.
• Enforce data encryption at rest and in transit; manage secrets and credentials securely.
• Support compliance with relevant data governance policies and institutional data-sharing agreements.
• Maintain audit trails and access logs; respond to security reviews and risk assessments.
5. Infrastructure & Operations
• Manage cloud or on-premise database services (e.g. PostgreSQL/PostGIS, TimescaleDB, InfluxDB, or equivalent); tune for time-series and geospatial query performance.
• Maintain CI/CD pipelines for data pipeline code; apply version control best practices.
• Monitor pipeline health, set up alerting for failures, and respond to incidents.
• Contribute to infrastructure-as-code practices (Docker, Kubernetes, Terraform or equivalent)
Job Requirements
• Possess at least a Master’s degree in computer science, Data Engineering, Environmental Informatics, or a closely related field.
• 3–6 years of professional experience in data engineering, with demonstrable work on time-series or geospatial data.
• Proficiency in Python (pandas, NumPy, xarray, or similar) and SQL; experience with at least one workflow orchestration tool (Airflow, Prefect, Luigi, etc.).
• Hands-on experience with geospatial or scientific data formats: NetCDF, HDF5, GeoTIFF, GeoJSON, or similar.
• Working knowledge of relational and time-series databases, with practical experience in data modelling.
• Familiarity with cloud platforms (AWS, GCP, or Azure) and containerisation (Docker).
• Solid understanding of data security principles: encryption, RBAC, secrets management.
• Open to fixed-term contract.
Experience
• Experience with environmental, meteorological, or hydrological datasets (radar QPE, NWP outputs, IoT sensor networks).
• Familiarity with PostGIS or other spatially enabled database extensions.
• Exposure to machine learning pipelines or MLOps practices for model output ingestion.
• Experience building dashboards with Grafana, Apache Superset, Plotly Dash, or equivalent.
• Contributions to open-source scientific data tooling (e.g. xarray, GDAL ecosystem).
Core Competencies
• Preferably knowledge with Python, SQL, Bash
• Time-series & geospatial DBs
• Data pipeline orchestration
• Cloud & containerisation
• Dashboard & BI tools
• Data security & RBAC