At Heuritech, we develop crawlers for various data sources (mainly social
media i.e Instagram, Weibo, TikTok) to gather millions of lines of data (posts,
authors, images) on a recurring basis. This data then goes through our data
pipeline, it get analyzed by our computer vision modules, these predictions
then get aggregated into time series that are used to build ongoing and
forecasted metrics in order to feed our product.
The Data Engineering team is responsible for building, maintaining and
monitoring the data pipeline that processes millions of posts from thousands of
authors, from their crawling on social media to their analysis by computer
vision modules and their aggregation into time series that are used to build
relevant metrics and forecasts to feed out product.
The data we crawl is inserted into our data warehouse and transformed along
the way to lead to relevant metrics that can either feed our product or be
accessible through our API(s).
As part of the Data Engineering team, your role will be to build and maintain
robust and scalable components for the data pipeline, from the gathering of
online content to their processing and transformations that lead to product
insights. This includes the following tasks:
Develop crawlers for new data sources and integrate them into data pipelines
Maintain the current crawling codebase
Expand our geographical and segmented coverage
Monitor data flows through relevant metrics
Optimize data processing and transformations
Develop tools usable by non-tech teams to access information regarding the crawled data
You have several years of experience in Software Engineering, ideally at least
3-5. You are familiar with data crawling on a large scale.
You have hand on experience handling large streams of data and have a view
on how to optimize them, software and infrastructure wise.
You have experience with Cloud and Big Data solutions.
You are able to move from a prototype project to an industrialized solution. You
know how to structure test and document your code. You implement good
coding practices and thrive working as part of a team.
Python: proficiency in programming and library development
Writing and implementing unit and integration tests
Hands-on experience working with Celery task queue
Familiar with containerization and Docker
Knowledge of version control with Git
Familiar with web scraping tools and libraries like Selenium, BeautifulSoup…
Experience in working with proxies
Experience in Linux and/or macOS environments
Ability to write and execute bash commands and scripts
Knowledge of SQL
Familiar with web frameworks such as FastAPI, Flask, Django or other frameworks
Experience with k8s
Understanding of methodologies related to agile project management
Ability to manage environment and dependencies with tools such as Poetry
Knowledge of setting-up and using pre-commit hooks
Experience with Kafka
Experience with Snowflake
As a seasoned developer, you should be able to work in autonomy, taking ownership of projects you are responsible of and leading them to completion, with a sense of initiative to provide relevant solutions for the challenges that will arise.
As we are an international company, you must be comfortable communicating in English, both oral and written.
First call to get to know each other and talk about the open position
Technical test (at home, with a small project to set up)
Meeting with the Data Engineering team to debrief the technical test and meet your future colleagues
Meeting with the CTO
These companies are also recruiting for the position of “Software & Web Development”.
Lyon · Versailles · Toulouse