Ontology for Inter-Research Paper Similarity Measures (COReS) and its Applications


A large number of research papers are being published and indexed regularly by systems such as search engines, citation indexers, and digital libraries, enabling researchers to explore through these papers. Most of the users feel frustrated due to the large number of results for similar research papers with many of these results are not similar at all. A careful analysis of these systems to find similar research papers reveals a major problem that research paper based similarity measuring techniques have not been conceptually modelled to find the similarity measures with a reasonable accuracy. In order to solve this problem an ontology to model the domain of research papers similarity measures is required. While surveying content based similarity measuring techniques, it was found that these techniques were not integrated with each other, to formulate a hybrid technique without overlappings and redundancies in methods and features. We have surveyed different ontologies relevant to research paper similarity measures domain, finding that none of these were modeling this domain. In this thesis, content based similarity measuring techniques were modelled in the form of ontology named as COReS (Content based Ontology for Research paper Similarity) which has been evaluated using automated evaluation tools and user study based evaluation techniques. An important application of COReS is finding research paper similarity measures in a comprehensively by using knowledge about relationships between different similarity measuring techniques demonstrated using four use cases. An experiment was also performed on a gold standard data set of research papers to compute comprehensive similarity measures using COReS. The results of Fractional Regression Coefficient (Percentage Difference) between user study based similarity measure (as a benchmark) and comprehensive similarity measure were computed. It was found that comprehensive similarity measure was more correlated to user study based similarity measure with a value of 47% for Fractional Regression Coefficient as compared to vector space based and InText citation based similarity measuring techniques and their combinations. COReS models only the content based similarity measuring techniques, the model can be extended for other similarity measuring techniques for example Collaborative Filtering, Item Centric etc. COReS can also be aligned with other relevant ontologies (SPAR) to enhance its adaptation by community.

Download full paper