Site Reliability Engineer, AI Platform

Indefinido
Paris
Salario: No especificado
Sin trabajo a distancia

Algolia
Algolia

¿Te interesa esta oferta?

Preguntas y respuestas sobre esta oferta

El puesto

Descripción del puesto

Algolia was built to help users deliver an intuitive search-as-you-type experience on their websites and mobile apps. We provide a search API used by thousands of customers in more than 100 countries. Billions of search queries are answered every month thanks to the code we push into production every day.

Join the AI Platform: Building Core components to speed up AI delivery

The AI Platform is dedicated to enable AI product delivery by providing other teams with turnkey tools, frameworks, and features so that they can focus on their core business instead of redundant work that falls outside their expertise. The areas covered by the AI Platform are two-fold: allowing teams to quickly design new models (AI development) and generating and serving predictions in production (AI productionization).

We’re looking for problem solvers with an entrepreneurial mindset—people who focus on outcomes and use data to drive decisions. If you're passionate about reliability, scalability, and automation, and want to contribute to a platform that powers AI at scale, we’d love to hear from you!

The team is composed of a variety of roles ranging from Site Reliability Engineer to Machine Learning specialists with a strong focus on Data Engineering, most of whom are fully remote, with different skill sets and backgrounds. Your experience, your knowledge and your perspective will add to this diversity and help the team deliver products that make a difference.

Day to day you will:

  • Implement, maintain, and improve the infrastructure that powers the AI Platform
  • Ensure the reliability and performance of Kubernetes-based deployments across cloud providers (GCP, AWS, Azure)
  • Develop and maintain infrastructure as code
  • Optimize CI/CD pipelines and deployment processes
  • Enhance monitoring, observability, and alerting systems
  • Contribute to incident response and post-mortem analysis

You might be a fit if you have:

  • Hands-on experience with Kubernetes and container orchestration in production environments
  • Experience with cloud providers (GCP, AWS, or Azure)
  • Experience with automation and infrastructure as code (e.g., Terraform)
  • Solid knowledge of CI/CD pipelines and deployment automation
  • Familiarity with monitoring and observability tools (e.g., Datadog)
  • A problem-solving mindset and a proactive approach to improving system reliability
  • Excellent spoken and written English skills

Ideally, you would also have:

  • Programming skills in Go and/or Python
  • Exposure to incident response and on-call best practices

We’re looking for someone who can live our values:

  • GRIT - Problem-solving and perseverance capability in an ever-changing and growing environment
  • TRUST - Willingness to trust our co-workers and to take ownership 
  • CANDOR - Ability to receive and give constructive feedback.
  • CARE - Genuine care about other team members, our clients and the decisions we make in the company.
  • HUMILITY- Aptitude for learning from others, putting ego aside.

#LI-Remote

¿Quieres saber más?

¡Estas ofertas de trabajo te pueden interesar!

Estas empresas también contratan para el puesto de "{profesión}".

Ver todas las ofertas