As a growing deep-tech startup in the semiconductor industry we are seeking a Data Platform Engineer to develop and maintain an inhouse data ingestion and analytics platform.
Responsibilities
Custom SOP-driven Data Ingestion (Upstream):
Support internal and external data producers with UI-tools and co-development of purposeful data contracts, which enables ingestion of unstructured data sources such as scientific instrument data, physical measurements, manual logistics, and canonical storage thereof.
Develop and maintain data validation procedures, to secure robustness and quality for downstream data pipelines.
Custom S3 Data Lake Management (Infra/Platform):
Own the architecture, IAM management, ETL design, and development of this (relatively small scale) data lake infrastructure to ensure availability, integrity and security. Manage serving of curated datasets to data consumers and insight subscribers.
Analytics / ML (Downstream):
Support the organization with traceability and performance tracking of our core-business semi-conductor process flow in both R&D and production phases. Help the company answer scientific and product-oriented questions with data. Develop own models and derive actionable insights, via automated reports, dashboards and statistics.
Some expected tasks:
Organization-wide support with onboarding of new SOPs which require data contracts.
Develop workflows and test strategies to ensure end-to-end data quality
Support data producers with troubleshooting, and upload guidance.
Develop custom UI- and data tools
Monitor and debug ingestion failures, ensuring high data quality and consistency.
Perform exploratory data analysis, statistical reporting, and machine learning modeling on curated datasets.
Document SOP onboarding processes, data validation rules, and platform workflows.
Collaborate with scientific and engineering teams to plan future platform improveme