Review of the Implications of Uploading Unverified Dataset in A Data Banking Site (Case Study of Kaggle)

This review paper comprehensively detailed the methodologies involved in data analysis and theevaluation steps. It showed that steps and phases are the two main methodological parameters to be considered during data assessment for data of high qualities to be obtained.It is reviewed from this research that poor data quality is always caused by incompleteness, inconsistency, integrity and time-related dimensions and the four major factors that causes error in a dataset are duplication, commutative entries, incorrect values and black entries which always leads to catastrophe. This paper also reviewed the types of datasets, adopted techniques to ensure good data quality, types of data measurement and its classifications.Furthermore, the Kaggle site was used as a case study to show the trend of data growth and its consequences to the world and the data bankers. It is then deduced that low data quality which is caused as a result of errors during primary data mining and entries leads to wrong results which bring about the wrong conclusions. It was advised that critical data quality measures should be adopted by the data bankers such as Kaggle before uploading the data into their site to avoid catastrophe and harm to humans.Finally, the outlined solutions as reviewed in this paper will serve as a guide to data bankers and miners to obtain data of high quality, fit for use and devoid of a defect.