Exploring Collaborative Filtering on Implicit Data for Job Recommendations
Feb 01, 2019
15 mins
Data scientist @ WTTJ
At Welcome to the Jungle (WTTJ), we help both candidates and companies to find their way through the jungle of recruitment. We achieve that with content creation, events, and of course search engines for jobs and employers. Companies subscribe to a yearly contract, which entitles them to have their profile on our website as well as a location for publishing their job offers.
Leveraging data for Data Science purposes is brand new here at WTTJ. At the time of writing, all visitors have access to the same experience on the website. Basically, you search for your content, such as jobs or articles, by using dedicated search engines. On the homepage, the same highlighted companies and videos are suggested to everyone who visits the site.
Why use recommender systems?
For our clients and us, the goal is to increase the number of qualified applications that are made and reduce the amount of time it takes to apply for a job. For applicants, an intelligently personalized experience reduces the amount of irrelevant content that is presented. It also offers further possibilities, such as grouping users according to their job tastes and personalizing newsletters.
Therefore, we recently started to investigate recommender systems (RS). This class of algorithm could bring a whole new dimension to our users’ experience by proposing relevant content to them at strategic locations on the website. For this article, we focus specifically on RS for jobs, covering collaborative filtering (CF) techniques for job recommendations and using data from our analytics.
There are many possible solutions to the problem, so intuition is key; therefore, we have deliberately made choices and guesses based on our knowledge of our business. We wanted quick results from this investigation to see if it’s worth going further. Spoiler: it is!
Testing collaborative filtering instead of content-based
As mentioned, we wanted to get answers quickly, but we also wanted to capitalize on our analytics data, so we chose to test a CF algorithm rather than a content-based (CB) one, which uses predefined item features such as contract type or semantic attributes.
Item features are not part of CF algorithms’ training, but rather, user behaviors and interactions—in our case, jobs. Take two users, Emma and Jules, who are looking for Data Science internships in Bordeaux, France. They both interact with the same two job offers, but in addition, Emma interacts with a third job she found. Relying on the previous interactions, we infer they have similar tastes, so we would therefore recommend that third job to Jules. We found similar users and similar jobs based on analyzing these kinds of interactions.
Icons made by Freepik & Vectors Market
However, it’s important to bear in mind that CF algorithms suffer from the cold start problem. When new jobs are pushed online, we do not have any historical data on them, meaning they can’t immediately be recommended to users by the algorithm until enough user-job interactions have been gathered for these jobs. Moreover, CF algorithms and libraries are not always standard, whereas CB algorithms can be reduced to classification problems.
Using a holistic approach, we would seek to harness the pros of both classes of algorithms and use hybrid recommender systems, which mix CF and CB techniques in various ways that we will not be covering here.
Choosing relevant data for collaborative filtering
Our choice of a CF algorithm is conditioned by the kind of data we have to hand. The most important criteria for the problem is data that translate the best user palatability for a job. In CF, two types of data can be used for recommendations: explicit and implicit feedback data.
We used implicit feedback as input data, which corresponds to a metric that suggests the suitability of a user for an item. On the other hand, explicit feedback comprises reviews of a job, such as a rating scale, and we do not have that kind of data on our website.
From our analytics data, there are several candidate metrics that fit our needs. we chose to look at application-form openings, which is a strong indicator of our users’ engagement. The curiosity patterns become diluted the more data we have, with the engagement appearing stronger if the same form is opened several times, for instance, when a potential candidate is preparing their application. Other metrics include job pageviews, but this may be too broad and dependent on external factors, such as company reputation and user wandering pattern. The time a user spends on a page relies a lot on the job-description length and the writer’s tone, and we are not able to gather enough data from application-form submissions.
The right algorithm?
Now that we have our data, it’s time to choose the algorithm we are going to use. Out of the methods that can deal with implicit data, we focused on a matrix factorization approach.
Below, we represent our data as a sparse user/job matrix (of shape m x n), where users are the rows, jobs are the columns, and the values represent the count of application-form openings for each user/job pair. In matrix factorization, the idea is to approximate the user/job matrix W as a product of two matrices, U and V, so that W = UV. U’s size will be m x k, while V’s size will be k x n, where k < (m,n). This means that users and jobs will be represented using a lower number of features (k in this case), called hidden or latent features. These factors are not interpretable independently, but as vectors describe users’ tastes or jobs’ similarities.
It is important to understand that using implicit data requires a fundamentally different approach compared with the use of explicit data. With explicit data, existing data provides either positive or negative feedback, and missing data is treated as unknown. With implicit data, missing data can actually be missing data, but it can also describe a voluntary behavior of a user who does not want to apply for a certain job. We cannot simply treat them as unknown, so the whole dataset is useful.
One particularly promising methodology is alternating least squares (ALS), which is explained brilliantly in this research paper. We wanted to take into account all feedback, so every observation was converted to an indicator of preference with a certain level of confidence. Absence of feedback was taken as an indicator of low preference with low confidence, and conversely the more application-form openings we had for a user/job pair, the more confident we were about the user’s high preference.
Enough with the explanation, let’s dive in!
Exploring the results
We used the Python library Implicit, written in Cython by Ben Frederickson, which is much faster than a standard Python implementation of the algorithm that uses loops and a solver.
The dataset
Here is what the dataset looks like once loaded and prepared. Ids in the article were re-indexed willingly.
data[['user_id', 'job_id', 'name', 'events']].sample(5)
| user_id | job_id | name | events ||---------|--------|-----------------------------------|--------|| 1 | 10 | Responsable Acquisition Digitale | 1 || 2 | 9 | Customer Success | 1 || 3 | 8 | Chief Marketing Officer (CMO) | 2 || 4 | 7 | Stage - Contrôle de gestion (F/H) | 1 || 5 | 6 | Data scientist | 2 |
Each row represents a user, a job, and the number of application-form openings. We chose to use about three months of data, inferring that users’ job tastes do not change over that period.
The matrix
From that we were able to create the user/job matrix. This needed to be a sparse matrix, as the vast majority of values were unknown.
sparse_user_item = sparse.csr_matrix((data['events'].astype(float), (data['user_id'], data['job_id'])))print sparse_user_item[0:3]
(0, 0) 2.0(1, 1) 1.0(2, 2) 1.0(2, 3) 2.0(2, 4) 1.0
Our matrix had about 1.5 billion possible pairs, so let’s see how sparse it was.
# Number of possible interactions in the matrixmatrix_size = sparse_user_item.shape[0]*sparse_user_item.shape[1]# Count of interactionscount_interactions = sparse_user_item.size# Compute matrix sparsitysparsity = 100*(1 - (float(count_interactions)/float(matrix_size)))print sparsity
99.9759687598
The result of 99.98% seems a bit high for CF (a maximum of 99.5% is recommended by Jesse Steinweg-Woods), but there is no fixed threshold. This was kept in mind when we explored the results.
It would be interesting to know if parts of the dataset would not be relevant for the training of the algorithm. If users only have one interaction with a job whose application form was opened only by this user, then this data is irrelevant for finding job similarities. Fortunately for us, this only applied to a few hundred users, so we were able to conclude it was an edge case, and most of the data was relevant to this exercise.
The parameters
In order to run the ALS-algorithm training, we needed to set 4 parameters :
- A coefficient for setting the confidence in the sparse matrix
- The number of factors that would give the size of each user and job vector
- A regularization parameter for controlling overfitting
- The number of iterations for the resolution of the ALS method
We trained the algorithm with the parameters lying within the recommended standards. For the purposes of this article, we haven’t covered performance evaluation, therefore we used the whole dataset for training.
# Set parametersconfidence_coef = 15factors = 60regularization = 0.1iterations = 100# Initialize modelmodel = implicit.als.AlternatingLeastSquares(factors=factors, regularization=regularization, iterations=iterations)# Fit modelmodel.fit((sparse_user_item.T*confidence_coef).astype('double'))# Get user and item vectors from our trained modeluser_vecs = model.user_factorsitem_vecs = model.item_factors
100%|██████████| 100.0/100 [00:24<00:00, 4.11it/s]
It took only about twenty seconds to train the algorithm on the sparse matrix, which could potentially have a billion entries.
The results
Using a recommend method, following this article by Victor, we were able to get top job recommendations for a given user, as well as insights on what works and what does not.
# Get recommendations resultsdef recommend(user_id, sparse_user_item, user_vecs, item_vecs, num_items=10): user_interactions = sparse_user_item[user_id,:].toarray() user_interactions = user_interactions.reshape(-1) + 1 user_interactions[user_interactions > 1] = 0 rec_vector = user_vecs[user_id,:].dot(item_vecs.T) min_max = MinMaxScaler() rec_vector_scaled = min_max.fit_transform(rec_vector.reshape(-1,1))[:,0] recommend_vector = user_interactions * rec_vector_scaled item_idx = np.argsort(recommend_vector)[::-1][:num_items] jobs = [] scores = [] for idx in item_idx: jobs.append(data.name.loc[data.job_id == idx].iloc[0]) scores.append(recommend_vector[idx]) recommendations = pd.DataFrame({'name': jobs, 'score': scores}) return recommendations
Let’s now explore the recommendations for random users and try to detect interesting patterns of behaviors captured by the algorithm.
recommendations = recommend(user_id, sparse_user_item, user_vecs, item_vecs)print 'APPLICATIONS HISTORY FOR USER : ' + str(user_id) + ''print data[data['user_id']==user_id][['name','events']]print ' RECOMMEND FOLLOWING JOBS 'print recommendations[recommendations['score']>0.70]
APPLICATIONS HISTORY FOR USER : 5| name | events ||-------------------------------|--------|| Marketing Director | 1 || CMO | 2 || CMO | 1 || Chief Marketing Officer (CMO) | 3 |RECOMMEND FOLLOWING JOBS| | name | score ||---|---------------------------------------------------|----------|| 0 | Directeur Marketing - Chief Marketing Officer ... | 1.000000 || 1 | Chief Marketing Officer (CMO) (F/H) | 0.918008 || 2 | CMO International | 0.902379 || 3 | Head of Marketing | 0.885167 || 4 | CMO / Head of Growth | 0.844978 |
APPLICATIONS HISTORY FOR USER : 6| name | events ||---------------------------------|--------|| STAGE BUSINESS DEVELOPER - LYON | 2 |RECOMMEND FOLLOWING JOBS| | name | score ||---|---------------------------------------------------|----------|| 0 | Stage - Business Developer (Lyon) | 1.000000 || 1 | Stage Chargé de Com' / Account Manager | 0.959485 || 2 | Brand content | 0.932273 || 3 | Business Developer Lyon | 0.887909 || 4 | STAGIAIRE MARKETING | 0.887474 |
APPLICATIONS HISTORY FOR USER : 7| name | events ||------------------------------------------------|--------|| Communication & Relations Presse (H/F) - Stage | 1 || Graphic designer junior - Alternance (H/F) | 1 || Graphiste en alternance | 2 || Graphiste de rêve - stage / alternance | 2 |RECOMMEND FOLLOWING JOBS| | name | score ||---|-------------------------------------------------|----------|| 0 | Graphiste/Webdesigner/UI Designer en alternance | 0.829903 || 1 | Web Designer / Graphiste (Stage ou alternance) | 0.786822 || 2 | Assistant Direction Artistique DA H/F | 0.765362 || 3 | Designer Print - alternance | 0.758083 || 4 | Stagiaire graphiste | 0.743899 |
What worked well
After exploring these previous top recommendations, we can clearly see that there are some very promising insights using the algorithm.
The same jobs with completely different semantics were recommended, for instance “CMO” and “Head of Marketing”, meaning the knowledge of users for job denominations is caught in the training.
Even more interestingly, the algorithm seems to have identified several job characteristics in the recommendations:
- Geographic: Recommendations for business development jobs in Nantes or the surrounding area are provided for one user, while another receives the appropriate recommendations for vendor jobs in Dakar or Abidjan.
- Contract types: If users are looking mainly for internships, then the algorithm will recommend mainly internships. The same goes for apprenticeships or, indeed, any contract types.
These characteristics are obvious in the job titles, and we need to further investigate the other captured job characteristics, such as company sectors or sizes, and categories of stacks for developers. It all depends on what attributes users consider the most important in a job.
A spotted drawback
APPLICATIONS HISTORY FOR USER : 8| name | events ||---------------------------------------------------|--------|| Influencer Manager [STAGE] | 1 || Vendeur (H/F) | 1 || URGENT / CDD Assistant RETAIL | 1 || Junior Account Manager - H/F - CDD 8 mois | 1 || CDD Noël - Vendeur 35h (H/F) - Boutique de Bor... | 1 |RECOMMEND FOLLOWING JOBS| | name | score ||---|---------------------------------------------------|----------|| 0 | Country Manager - Spain | 0.958362 || 1 | Responsable de Secteur ouverture d'Agence Bord... | 0.945686 || 2 | Commercial - Chargé de Développement (H/F) | 0.941281 || 3 | Business Developer Bordeaux | 0.913694 || 4 | Vendeur 28h (H/F) - Boutique de Bordeaux | 0.911692 |
Some job offers can be very popular on social networks, and therefore the algorithm may catch too much of the interest for these and link it more with other jobs.
Designing our own users
How about getting recommendations for a new user with designed behavior? The implicit library allows you to do that, because in this matrix factorization technique, user vectors can be calculated on the fly. We tried this feature on both a quite obvious user behavior and a totally random behavior (we call that a gray sheep, someone with more erratic user experience).
# new_data is a Dataframe with our new user and their interactions only.new_user_items = sparse.csr_matrix((new_data['events'].astype(float), (new_data['user_id'], new_data['job_id'])))recs = model.recommend(userid=new_user_id, user_items=new_user_items, recalculate_user=True)pd.DataFrame([(data.name.loc[data.job_id == r].iloc[0]) for r, s in recs])
- Example #1: We designed a new user that opened application forms for software engineering positions in Lyon and Paris, and the recommendations seemed to hold! The first recommended job, however, was in Nice.
0 Java Developer/Back Office #11 Développeur PHP Symfony2 Senior Software Engineer Backend Paris3 Senior Platform Engineer Lyon4 Backend Developer (NodeJS / Microservices)
- Example #2: This is our gray sheep. He opened an application form once for a data analyst internship role and once for a telemarketer position in Abidjan. We can see the telemarketer role sends a greater signal for the algorithm than the data analyst position. If the gray sheep opens the data analyst application form more than once, then the analyst position gets greater weight in the algorithm.
0 Chef Magasinier (H/F) - Abidjan1 Chargés clientèles Chat (H/F)2 Commercial chargé de la relation fournisseurs ...3 Approvisionneur stagiaire H/F - Abidjan4 Approvisionneur Senior H/F - Abidjan
0 Chef Magasinier (H/F) - Abidjan1 Data analyst2 Data & Marketing Manager (stage)3 Chargés clientèles Chat (H/F)4 Data Analyst Junior (H/F) - Stage de fin d'études
To conclude the exploration, there is one last aspect you should consider when implementing a recommender system: catalog coverage. We want to propose relevant job recommendations to visitors, but we also want to give unbiased visibility to jobs that are not necessarily in the spotlight. To address this, we can calculate the percentage of jobs that are recommended to users in their top k compared with all jobs for which we have interactions with users.
For a recommendation list of 10 jobs maximum, our catalog coverage is 21%. We will need to keep that metric in mind when we work on tuning the algorithm performance. Several actions could improve the catalog coverage, such as using job characteristics (job profession, company sector and size, contract type, experience level) to switch a recommended job for an equivalent one of low visibility.
Conclusion
For this article we used collaborative filtering for job recommendations, employing our analytics data as implicit feedback.
From our first results, this method looks very promising, even if other parameters need to be taken into account if we want to use RS on the website, such as:
- Are the jobs still published?
- What will be the default recommendations if we do not have any in the RS?
- How to include application openings of the current session of a user?
- How do we measure the performance of our algorithm?
We still have exciting challenges to overcome before going live. The next problems to be addressed are hyperparameter tuning and performance evaluation (offline and online), which will provide enough material to fill a future paper on collaborative filtering. Stay tuned!
This article is part of Behind the Code, the media for developers, by developers. Discover more articles and videos by visiting Behind the Code!
Want to contribute? Get published!
Follow us on Twitter to stay tuned!
Illustration by Blok
More inspiration: Coder stories
We can learn a lot by listening to the tales of those that have already paved a path and by meeting people who are willing to share their thoughts and knowledge about programming and technologies.
Keeping up with Swift's latest evolutions
Daniel Steinberg was our guest for an Ask Me Anything session (AMA) dedicated to the evolutions of the Swift language since Swift 5 was released.
May 10, 2021
"We like to think of Opstrace as open-source distribution for observability"
Discover the main insights gained from an AMA session with Sébastien Pahl about Opstrace, an open-source distribution for observability.
Apr 16, 2021
The One Who Co-created Siri
Co-creator of the voice assistant Siri, Luc Julia discusses how the back end for Siri was built at Apple, and shares his vision of the future of AI.
Dec 07, 2020
The Breaking Up of the Global Internet
Only 50 years since its birth, the Internet is undergoing some radical changes.
Nov 26, 2020
On the Importance of Understanding Memory Handling
One concept that can leave developers really scratching their heads is memory, and how programming languages interact with it.
Oct 27, 2020
The newsletter that does the job
Want to keep up with the latest articles? Twice a week you can receive stories, jobs, and tips in your inbox.
Looking for your next job?
Over 200,000 people have found a job with Welcome to the Jungle.
Explore jobs