Senior Linux Platform Engineer

Job Locations IE-Dublin
Requisition ID
2026-10603
Experience Level
Experienced Professionals
Job Categories
Technology - Infrastructure, Support + Engineering

Overview

We are seeking a highly technical Senior Platform Engineer with deep expertise in Linux Engineering, OpenStack development, Kubernetes, and GPU-enabled infrastructure to design, build, and operate SIG’s next-generation infrastructure platforms supporting trading and core technology environments.

  • This is a hands-on engineering role focused on building and tuning scalable, resilient, and high-performance infrastructure systems across CPU and GPU workloads. The ideal candidate will have strong Linux internals knowledge, experience developing and operating cloud-native platforms, and a deep understanding of distributed systems architecture, including the efficient provisioning, isolation, and performance tuning of accelerator-based compute resources.

What we’re looking for

Linux Systems Engineering

  • Deep troubleshooting across kernel, networking stack, storage, and performance layers.
  • Performance tuning for low-latency systems (CPU pinning, NUMA, IRQ balancing, kernel tuning).
  • Develop automation using Python, Go, or similar languages.
  • Build and maintain infrastructure tooling and internal platform services.
  • Implement high-availability solutions and disaster recovery strategies.
  • Perform root cause analysis for production incidents affecting distributed systems.
  • Design, deploy, and operate GPU-enabled infrastructure. Optimize GPU utilization (memory bandwidth, PCIe throughput, multi-process service, MIG partitioning where applicable).
  • Tune workloads to efficiently leverage NVIDIA GPUs (or equivalent accelerators) for compute-intensive applications.
  • Troubleshoot GPU driver, CUDA, kernel module, and firmware-related issues in production environments.

 

OpenStack Development & Cloud Infrastructure

  • Develop and extend OpenStack services (Nova, Neutron, Cinder, Keystone, etc.).
  • Build custom integrations and automation around OpenStack APIs.
  • Optimize compute, networking, and storage performance for high-performance workloads.
  • Design multi-tenant OpenStack architectures with strong isolation and security.
  • Contribute to infrastructure-as-code frameworks managing OpenStack environments.
  • Debug and resolve deep issues across hypervisors (KVM), networking layers, and control plane services.
  • Integrate OpenStack environments with Kubernetes platforms (hybrid cloud architectures).

 

Kubernetes Platform Engineering

  • Design, build, and operate highly available, production-grade Kubernetes clusters.
  • Develop and maintain Kubernetes operators, controllers, and custom resource definitions (CRDs).
  • Implement advanced scheduling, multi-tenancy, and workload isolation strategies.
  • Optimize cluster performance for low-latency and high-throughput workloads.
  • Integrate Kubernetes with CI/CD pipelines and GitOps workflows.
  • Implement cluster observability using Prometheus, Grafana, OpenTelemetry, etc.
  • Design and enforce networking policies (CNI), ingress architecture.
  • Implement secure cluster design including RBAC, OPA/Gatekeeper, secrets management, and runtime security.

Automation & Infrastructure as Code

  • Design and maintain infrastructure using Terraform, Ansible, Helm, or similar tools.
  • Build CI/CD pipelines for infrastructure and platform deployments.
  • Implement immutable infrastructure and GitOps methodologies.
  • Create automated validation, testing, and deployment frameworks for platform services.

 

Required Technical Skills

  • Advanced Linux systems knowledge (kernel, networking, storage)
  • Experience deploying and operating GPU-enabled Linux servers
  • Understanding of CUDA drivers, GPU kernel modules
  • Performance profiling and Tuning Workloads for compute-intensive applications.
  • Hands-on OpenStack development and operations experience
  • Strong experience administering and engineering production Kubernetes clusters
  • Strong understanding of distributed systems principles:
    • Consensus
    • Replication
    • Fault tolerance
    • CAP theorem tradeoffs
  • Experience with 
    • Python or similar programming languages
    • Infrastructure as Code (Terraform, Ansible)
    • Container runtimes (containerd, CRI-O)
    • Observability stacks (Prometheus, Grafana, ELK)

Desirable Experience

  • Experience in low-latency or high-performance trading environments
  • High-performance networking (DPDK, SR-IOV, CNI tuning)
  • Storage systems (Ceph, distributed storage, NVMe optimization)
  • Contribution to open-source projects (Kubernetes, OpenStack)
  • Experience designing multi-region or hybrid cloud architectures
  •  Experience tuning AI/ML, quantitative, or high-performance compute workloads on GPUs
  • Experience with NVIDIA DCGM, MIG (Multi-Instance GPU), or vGPU configurations
  • Familiarity with RDMA, GPUDirect, or high-throughput interconnects
  • Experience optimizing containerized ML or compute pipelines

 

Key Attributes

  • Strong systems thinking and deep technical curiosity
  • Ability to diagnose complex cross-layer failures
  • Passion for building reliable, scalable distributed systems
  • Comfortable operating in high-availability, high-performance production environments
  • Strong documentation and knowledge-sharing mindset

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed