A Guide to Surprise – Python Tool for Recommendation Systems
Building a recommendation system from scratch is a tedious task as it involves many preprocessing steps and requires sophisticated coding skills. There are many open source toolkits available that provide top performance for a variety of recommender systems. Unlike the weak codebase, in this article we will see how to build a cutting edge recommendation system using the Python Scikit package called Surprise. Before we jump straight to the surprise implementation, we’ll first see the background to the recommender system and the packages that can be used to build the system. The main points to cover in this article are described below.
- System of recommendations
- Approaches to the recommendation system
- Open source packages for recommendation systems
- The surprise package
- Implementation of recommendation systems with Surprise
First of all, we will quickly understand the recommendation system with its advantages and approaches.
Recommendation systems are computer programs that make recommendations to users based on various parameters. These systems predict the most likely product that users will buy and are of interest to them. Netflix, Amazon, and other companies use recommendation systems to help their users find the right product or movie for them.
Register for our Workshop on How To Start Your Career In Data Science?
The recommendation system filters a large volume of data by focusing on the most important information based on information provided by the user as well as other factors such as user preferences and interests. In order to offer recommendations, it determines the compatibility of the user and the object, as well as the similarities between the users and the products.
Recommendation systems used include playlist builders for video and music services, product recommenders for online businesses, content recommenders for social media platforms, and open web content recommenders. Within and between platforms, these systems can operate with a single input, such as music, or multiple inputs, such as news, books, and search queries.
It has the following advantages:
- Users benefit from the ability to find items of interest to them.
- Help article suppliers get their products to the right people.
- Users will be able to identify the products most relevant to them.
- Content tailored to the individual.
- Help websites increase user engagement.
Approaches to the recommendation system
Collaborative filtering is a popular method for building recommendation systems. Collaborative filtering is based on the premise that people who agreed in the past will agree in the future and that they will prefer comparable types of products in the past. The technology creates suggestions based solely on rating profiles for various people or things. They generate recommendations using this neighborhood by searching for peer users / items with rating histories similar to the current user or item.
Another widely used method when building recommendation systems is content-based filtering. The description of an item and a profile of user preferences are used in content-based filtering systems. When there is known data about an item (name, location, description, etc.) but not the user, these policies work well. Content-based recommendations treat suggestions as a user-specific classification problem, developing a classifier for a user’s likes and dislikes based on the properties of an item.
Session-based recommendation system
A user’s interactions during a session are used to generate recommendations in these recommender systems. Youtube and Amazon both use session-based recommendation systems. When a user’s history (such as previous clicks or transactions) is not available or relevant in the current user session, they are especially valuable. Video, e-commerce, travel, music, and other areas are all examples of where session-based suggestions are useful. Most session-based recommendation systems rely on the sequence of recent interactions within a session without requiring additional user information (history, demographics).
Multi-criteria recommendation system
Multi-criteria recommendation systems (MCRS) are recommendation systems that take many factors into account when formulating recommendations. Rather than developing recommendation techniques based on a single criterion value, such as the overall preference of user u for an item I, these systems attempt to predict a score for unexplored items of u by leveraging information from preference on several criteria that influence this overall preference value. Several researchers see MCRS as a multi-criteria decision-making problem (MCDM) and build MCRS systems using MCDM approaches and techniques.
Open Source Packages for Recommendation Systems
Let’s take a look at the best python packages used to build a community and researcher recommendation system.
LensKit is a free and open source framework for developing, investigating and learning recommender systems. It supports the development, execution and evaluation of recommendation algorithms in a flexible way suitable for research and teaching. LensKit for Python (LKPY) is the Python-based successor to the Java-based LensKit toolkit and a component of the LensKit project. LKPY enables the creation of robust, adaptable and repeatable experiments that take advantage of the large and developing PyData and Scientific Python ecosystems, such as scikit-learn and TensorFlow.
Crab is a Python recommendation engine that combines classic information filtering recommendation methods in a variety of scientific Python libraries, including Numpy, Scipy, and Matplotlib. It is also known as the Scikits recommender, and it aims to provide a comprehensive set of components from which one can build a custom recommendation system from a set of algorithms that can be used in various situations. User-based filtering, item-based filtering, and other features are available in Crab.
TensorRec is a Python recommendation system that allows you to quickly create and customize recommender systems using TensorFlow. User functionality, item functionality, and interactions are the three types of data that a TensorRec system consumes. He learns how to produce and classify recommendations from this data. TensorRec learns by comparing the scores it generates to actual interactions between users and things, such as likes and dislikes.
To have more details on similar packages, you can follow this post.
The surprise package
Surprise is a Python module that allows you to build and test rate prediction systems. It was created to closely resemble the scikit-learn API, which users familiar with the Python machine learning ecosystem should be comfortable with. Surprise includes a set of estimators (or prediction algorithms) to evaluate predictions. Classical techniques, such as the main algorithms based on similarity, as well as matrix factorization algorithms like SVD and NMF, are implemented.
It also includes model evaluation tools, such as cross-validation iterators and learned metrics built into scikit, as well as grid search and random search for model selection and automatic hyper-parameter search. . Users can develop their own recommendation technique with less code thanks to basic primitives and a lightweight API.
Traditional datasets, such as MovieLens datasets, are immediately available in the package, but user-defined datasets can be loaded as CSV files or used with pandas dataframes. Surprise is mostly written in Python, with Cython being used to optimize computational heavy bits. Internally, Surprise uses NumPy arrays and built-in Python data structures (mostly dictionaries).
Surprise was created to help researchers quickly test new recommendation ideas by allowing them to create bespoke prediction algorithms, but it can also be used as a learning resource for students and less experienced users due to of its complete documentation.
Implementation of recommendation systems with Surprise
Here, we’ll look at a quick example of how to upload a dataset, divide it into four panes for cross-validation, and calculate the Mean Error (MAE) and Mean Squared Error (RMSE) of the SVD algorithm.
! pip install surprise from surprise import SVD from surprise import Dataset from surprise.model_selection import cross_validate # load the data data = Dataset.load_builtin('ml-100k') # load algorithm algo =SVD() # train and validate cross_validate(algo, data, measures=['RMSE','MAE'], cv=5, verbose=True)
If the movielens-100k dataset has not already been downloaded, the load_builtin () method will offer to download it and save it in the .surprise data folder in your home directory (you can also choose to save elsewhere).
Here we are using the well-known SVD algorithm, although there are many alternative options. For more information, see Using Prediction Algorithms. The cross-validate () function calculates multiple precision metrics and performs a cross-validation procedure according to the cv argument. Here we use a traditional 5-fold cross-validation method, although other iterators can be used.
Through this article we have seen what exactly is called the recommender system and what are the different approaches that are taken depending on the type of system needed. Besides the approaches, we saw the common and widely used Python toolkit to create a SOTA system that allows developers to have a small code base. Finally, we saw a similar type of toolkit called Surprise built on top of sci-kit learn, which gives us a straightforward approach and allows us to use almost all of the functionality provided by the sci-kit learn package. kit.
Subscribe to our newsletter
Receive the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community