Popsink is a cutting-edge data transfer solution revolutionizing how organizations handle and move their data. Our mission is to provide seamless, secure, and efficient data transfer capabilities for businesses of all sizes. As a fast-growing startup, we are seeking a passionate and experienced Site Reliability Engineer (SRE) to join our fully remote team and help us build a highly reliable, scalable, and efficient infrastructure.
As an SRE at Popsink, you will play a critical role in ensuring the reliability, scalability, and security of our infrastructure. You will collaborate with developers, product teams, and other engineers to design and implement robust systems and processes that power our stack, which includes Google Cloud Platform (GCP), Kubernetes, ArgoCD, and Terraform. Additionally, you will drive our monitoring and tracing strategies to ensure deep visibility into system health and performance.
Infrastructure Management:
Design, build, and manage cloud infrastructure on Google Cloud Platform (GCP).
Automate infrastructure provisioning and deployments using Terraform.
Orchestration & Automation:
Manage and optimize Kubernetes clusters for containerized application deployment and scaling.
Implement GitOps workflows using ArgoCD to ensure seamless application updates.
Monitoring, Tracing, & Performance:
Develop and maintain comprehensive monitoring and tracing solutions to track system health and performance.
Configure and utilize tools like Prometheus, Grafana, Jaeger, or similar systems for observability.
Proactively identify bottlenecks and optimize system performance based on metrics and logs.
Reliability Engineering:
Define and maintain SLOs, SLAs, and SLIs to ensure system reliability.
Lead post-incident reviews and implement preventive measures to enhance system resilience.
Collaboration:
Partner with development teams to implement CI/CD pipelines and enforce best practices.
Foster a culture of operational excellence, automation, and continuous improvement across the team.
Technical Expertise:
Hands-on experience with Google Cloud Platform (GCP) and its services (e.g., Compute Engine, GKE, Cloud Storage).
Proficiency in managing Kubernetes clusters for orchestration and scaling.
Strong knowledge of Terraform for infrastructure as code.
Familiarity with GitOps tools like ArgoCD.
Monitoring & Observability:
Experience implementing and managing monitoring and tracing systems (e.g., Prometheus, Grafana, Jaeger, or OpenTelemetry).
Deep understanding of observability principles and best practices.
Problem Solving:
Proven ability to troubleshoot complex distributed systems in production environments.
Experience with incident management and root cause analysis processes.
Programming & Automation:
Soft Skills:
Strong communication and collaboration skills, with a proactive mindset.
Comfort working in a fast-paced startup environment.
Certification in GCP or Kubernetes (e.g., Google Cloud Professional DevOps Engineer, CKA).
Experience with service meshes like Istio or Linkerd.
Familiarity with CI/CD tools like GitLab CI, Jenkins, or equivalent.
Knowledge of database systems and caching technologies (e.g., PostgreSQL, Redis).
Impact: Be part of a startup revolutionizing data transfer solutions.
Growth: Join a fast-paced environment with ample opportunities for career development.
Culture: Work with a collaborative, innovative, and supportive team.
Flexibility: Enjoy a fully remote work environment that supports work-life balance.
15min phone call
1h technical interview
These companies are also recruiting for the position of “Cloud computing et DevOps”.
See all job openings