A Hybridized Recommendation System On Movie Data Using Content-Based And Collaborative Filtering

ABSTRACT

In recent times, the rate of growth in information available on the internet has resulted in large amounts of data and an increase in online users. The Recommendation System has been employed to empower users to make informed and accurate decisions from the vast abundance of information. In this Research, we propose a hybrid recommender engine which combines Content-Based and Collaborative filtering recommendations. This seeks to explore how prediction accuracy can be enhanced in existing collaborative filtering frameworks. We investigate to see if a Recommendation System combining Content-based and Collaborative filtering, using a Mahout Framework and built on Hadoop will improve recommendation accuracy and also alleviate scalability issues currently experienced in processing large volumes of data for recommending items to users. We employed the Feature augmentation hybrid technique where the output from the Content-based recommendation is used as an input to Collaborative filtering. The wellknown MovieLens data was matched with the Internet Movie Database (IMDB) in order to extract user and item content features. The input files generated from the integration of both databases was converted to text files which serve as an input into the Collaborative filtering framework in Mahout. By means of various experiments, the best parameter optimization for Mahout Components was determined for our model. We further examined these models by comparing the Root Mean Square Error of our model against the state of art model. The proposed model showed significant improvement when compared with the pure collaborative model. It was demonstrated from our analysis that the extracted user and items content features can, in some cases, lead to a better prediction accuracy. To be more precise, it was discovered that the user feature, gender, has no marginal impact on our underlying model while an item feature like Country is more beneficial than genre, contrary to findings in some other research work.