CONCEPTUAL KNOWLEDGE MODEL FOR IMPROVING TERM SIMILARITY IN RETRIEVAL OF WEB DOCUMENTS

Abstract

Terms Similarity (TS) in retrieval systems are based on lexical matching, which determines if query terms are useful and reflect the users‟ information need in related domains. Existing works on TS use Term Frequency-Inverse Document Frequency (TF-IDF) to determine the occurrence of terms in web documents (snippets) is incapable of capturing the problem of semantic language mismatch. This study was designed to develop a conceptual knowledge model to solve the problem of TS in web documents retrieval by amplifying structured semantic network in Multiple Document Sources (MDSs) to reduce mismatch in retrieval results. Four hundred and forty-two IS-A hierarchy concepts were extracted from Internet using a web ontology language. These hierarchies were structured in MDSs to determine similarities. The concepts were used to formulate queries with the addition of terms from knowledge domain. Suffix Tree Clustering (STC) was adapted to cluster, structure the web and reduce dimensionality of features. The IS-A hierarchy concept on parent and child relationship was incorporated into the STC to select the best cluster, consisting of 100 snippets, four web page counts and WordNet as MDSs. Similarity was estimated on Cosine, Euclidean and Radial Basis Function (RBF) on the TF-IDF. Based on STC, TF-IDF was modified to develop Concept Weighting (CW) estimation on snippets and web page count. Similarity was estimated between TF-IDF and developed Concept Weighting; Cosine and CW-Cosine, Euclidean and CW-Euclidean and RBF and CW-RBF. Semantic network (WordNetSimilarity) LIn‟ measure was extended with PAth length of the taxonomy concept to develop LIPA. The LIPA was compared with other WordNetSimilarity distance measures: Jiang and Conrath (JCN) and Wu and Palmer (WUP) as well as LIn and PAth length separately. Concept Weighting and WordNetSimilarity scores were combined using machine learning techniques to leverage a robust semantic similarity score and accuracy measure using Mean Absolute Error (MAE). The RBF and CW-RBF generated inconsistent values (0.9  x 1) for null and zero snippets. Similarity estimation obtained on Cosine, Euclidean, CW-Cosine and CWEuclidean were 0.881, 0.446, 0.950 and 0.964, respectively. The retrieved snippets removed irrelevant features and enhanced precisions. WordNetSimilarity JCN, WUP, LIn, PAth, and LIPA values were 0.868, 0.953, 0.995, 0.955 and 0.998, respectively. 

The WordNetSimilarity improved the semantic similarity of concepts. The Concept Weighting and WordNetSimilarity; CW-Cosine, CW-Euclidean, JCN, WUP, LIn, PAth, and LIPA were combined to generate similarity coefficient scores 0.941, 0.944, 0.661, 0.928, 0.996, 0.924 and 0.998, respectively. The MAE on Cosine, Euclidean, CW-Cosine and CW Euclidean were 0.058, 0.011, 0.014 and 0.009, respectively while for JCN, WUP, LIn, PAth, and LIPA were 0.022, 0.004, 0.022, 0.019 and 0.020, respectively. The accuracy of the combined similarity for JCN, WUP, LIn, PAth, CWCosine, CW-Euclidean and LIPA were 0.023, 0.050, 0.008, 0.011, 0.024, 0.015 and 0.009, respectively. The developed conceptual knowledge model improved retrieval of web documents with structured multiple document sources. This improved precision of information retrieval system and solved the problem of semantic language mismatch with robust similarity between the terms.

Overall Rating

0

5 Star
(0)
4 Star
(0)
3 Star
(0)
2 Star
(0)
1 Star
(0)
APA

ABDULLAHI, K (2021). CONCEPTUAL KNOWLEDGE MODEL FOR IMPROVING TERM SIMILARITY IN RETRIEVAL OF WEB DOCUMENTS. Afribary. Retrieved from https://tracking.afribary.com/works/conceptual-knowledge-model-for-improving-term-similarity-in-retrieval-of-web-documents

MLA 8th

ABDULLAHI, KHADIJHA-KUBURAT "CONCEPTUAL KNOWLEDGE MODEL FOR IMPROVING TERM SIMILARITY IN RETRIEVAL OF WEB DOCUMENTS" Afribary. Afribary, 20 Mar. 2021, https://tracking.afribary.com/works/conceptual-knowledge-model-for-improving-term-similarity-in-retrieval-of-web-documents. Accessed 13 Nov. 2024.

MLA7

ABDULLAHI, KHADIJHA-KUBURAT . "CONCEPTUAL KNOWLEDGE MODEL FOR IMPROVING TERM SIMILARITY IN RETRIEVAL OF WEB DOCUMENTS". Afribary, Afribary, 20 Mar. 2021. Web. 13 Nov. 2024. < https://tracking.afribary.com/works/conceptual-knowledge-model-for-improving-term-similarity-in-retrieval-of-web-documents >.

Chicago

ABDULLAHI, KHADIJHA-KUBURAT . "CONCEPTUAL KNOWLEDGE MODEL FOR IMPROVING TERM SIMILARITY IN RETRIEVAL OF WEB DOCUMENTS" Afribary (2021). Accessed November 13, 2024. https://tracking.afribary.com/works/conceptual-knowledge-model-for-improving-term-similarity-in-retrieval-of-web-documents

Document Details
KHADIJHA-KUBURAT ADEBISI ABDULLAHI Field: Computer Science Type: Thesis 235 PAGES (57435 WORDS) (pdf)