Abstract




 
   

IJE TRANSACTIONS B: Applications Vol. 31, No. 2 (February 2018) 229-241    Article in Press

PDF URL: http://www.ije.ir/Vol31/No2/B/6.pdf  
downloaded Downloaded: 0   viewed Viewed: 91

  AN EMPIRICAL COMPARISON OF DISTANCE MEASURES FOR MULTIVARIATE TIME SERIES CLUSTERING
 
A. Salarpour and H. Khotanlou
 
( Received: October 15, 2017 – Accepted: November 30, 2017 )
 
 

Abstract    Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative studies using quantitative and large scale evaluations. In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS clustering were conducted. The 14 well-known similarity measures with their variants and testing their effectiveness on 23 MTS datasets coming from a wide variety of application domains were re-implemented. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all of similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field.

 

Keywords    Multivariate time series, similarity measures, clustering, evaluation

 

چکیده    داده­ های سری­های زمانی چند متغیره در زندگی روزمره و زمینه­ های متفاوتی از علوم به وفور یافت می­ شود و چگونگی اندازه­ گیری شباهت بین این نوع سری­های زمانی یکی از بخش­های اصلی روند آنالیز سری­های زمانی است. حجم زیادی از تحقیقات انجام شده در این زمینه بر روی ارایه معیار های شباهت جدید متمرکز شده است. با وجود ارائه روش­های بسیار برای محاسبه شباهت بین سری­های زمانی، این زمینه همچنان از فقدان یک مطالعه مقایسه ­ای با استفاده از ارزیابی کمی در مقیاس مناسب رنج می­برد. بدین منظور و برای فراهم کردن یک سنجش مقایسه ای، ارزیابی گسترده ای بر روی معیار های شباهت برای خوشه بندی سری­های زمانی انجام داده­ ایم. بدین منظور، 14 معیار شباهت معتبر (به همراه تنوعات آنها) و بررسی اثرگذاری آنها بر روی 23 دیتاست سری زمانی چند متغیره (مربوط به کاربردهای مختلف) انجام شده است. در این مقاله، مقایسه تجربی در خصوص اثر بخشی معیارهای شباهت مبتنی بر خوشه­ بندی سلسه مراتبی ارائه شده است. علاوه بر این، آزمون آماری معناداری نیز برای سنجش معناداری بین بازدهی معیارهای شباهت، بکار گرفته شد. نتایج بدست آمده یک دید مقایسه ای بین معیارهای شباهت و برای یافتن روش مناسب بر اساس بازدهی و پیچیدگی زمانی فراهم می کند. همچنین نتایج نشان داد که تمامی معیارهای شباهت از لحاظ آماری و مبتنی بر آزمون غیرپارامتری اختلاف معناداری ندارند.

References    1. Keogh, E., Xi, X., Wei, L., and Ratanamahatana, C.A., 'The Ucr Time Series Classification/Clustering Homepage', URL= http://www. cs. ucr. edu/eamonn/time series data, 2006. 2. Jeong, Y.-S., Jeong, M.K., and Omitaomu, O.A., 'Weighted Dynamic Time Warping for Time Series Classification', Pattern recognition, 2011, 44, (9), pp. 2231-2240. 3. Kantz, H. and Schreiber, T., Nonlinear Time Series Analysis, (Cambridge university press, 2004) 4. Fu, Z., Hu, W., and Tan, T., 'Similarity Based Vehicle Trajectory Clustering and Anomaly Detection', in, Image Processing, 2005. ICIP 2005. IEEE International Conference on, (IEEE, 2005) 5. Liao, T.W., 'Clustering of Time Series Data—a Survey', Pattern recognition, 2005, 38, (11), pp. 1857-1874. 6. Aghabozorgi, S., Shirkhorshidi, A.S., and Wah, T.Y., 'Time-Series Clustering–a Decade Review', Information Systems, 2015, 53, pp. 16-38. 7. Kleist, C.: 'Time Series Data Mining Methods: A Review', Humboldt-Universität zu Berlin, 2015 8. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y., Fast Subsequence Matching in Time-Series Databases, (ACM, 1994) 9. Rabiner, L.R. and Juang, B.-H., 'Fundamentals of Speech Recognition', 1993. 10. Berndt, D.J. and Clifford, J., 'Using Dynamic Time Warping to Find Patterns in Time Series', in, KDD workshop, (Seattle, WA, 1994) 11. Vlachos, M., Kollios, G., and Gunopulos, D., 'Discovering Similar Multidimensional Trajectories', in, Data Engineering, 2002. Proceedings. 18th International Conference on, (IEEE, 2002) 12. Chen, L. and Ng, R., 'On the Marriage of Lp-Norms and Edit Distance', in, Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, (VLDB Endowment, 2004) 13. Chen, L., Özsu, M.T., and Oria, V., 'Robust and Fast Similarity Search for Moving Object Trajectories', in, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, (ACM, 2005) 14. Morse, M.D. and Patel, J.M., 'An Efficient and Accurate Method for Evaluating Time Series Similarity', in, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, (ACM, 2007) 15. Marteau, P.F., 'Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching', IEEE Trans Pattern Anal Mach Intell, 2009, 31, (2), pp. 306-318. 16. Stefan, A., Athitsos, V., and Das, G., 'The Move-Split-Merge Metric for Time Series', IEEE transactions on Knowledge and Data Engineering, 2013, 25, (6), pp. 1425-1438. 17. Eiter, T. and Mannila, H., Computing Discrete Fréchet Distance', (Citeseer, 1994) 18. Alt, H. and Godau, M., 'Computing the Fréchet Distance between Two Polygonal Curves', International Journal of Computational Geometry & Applications, 1995, 5, (01n02), pp. 75-91. 19. Besse, P.C., Guillouet, B., Loubes, J.-M., and Royer, F., 'Review and Perspective for Distance-Based Clustering of Vehicle Trajectories', IEEE Transactions on Intelligent Transportation Systems, 2016, 17, (11), pp. 3306-3317. 20. Górecki, T. and Łuczak, M., 'Using Derivatives in Time Series Classification', Data Mining and Knowledge Discovery, 2013, pp. 1-22. 21. Batista, G.E., Keogh, E.J., Tataw, O.M., and De Souza, V.M., 'Cid: An Efficient Complexity-Invariant Distance for Time Series', Data Mining and Knowledge Discovery, 2014, 28, (3), pp. 634-669. 22. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and Keogh, E., 'Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures', Proceedings of the VLDB Endowment, 2008, 1, (2), pp. 1542-1552. 23. Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., and Keogh, E., 'Experimental Comparison of Representation Methods and Distance Measures for Time Series Data', Data Mining and Knowledge Discovery, 2013, pp. 1-35. 24. Serra, J. and Arcos, J.L., 'An Empirical Evaluation of Similarity Measures for Time Series Classification', Knowledge-Based Systems, 2014, 67, pp. 305-314. 25. Lines, J. and Bagnall, A., 'Time Series Classification with Ensembles of Elastic Distance Measures', Data Mining and Knowledge Discovery, 2015, 29, (3), pp. 565-592. 26. Bagnall, A., Bostrom, A., Large, J., and Lines, J., 'The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms', Extended Version. CoRR, abs/1602.01711, 2016. 27. Deza, M.M. and Deza, E., Encyclopedia of Distances', Encyclopedia of Distances, (Springer, 2009) 28. Wei, W.W.-S., Time Series Analysis, (Addison-Wesley publ Reading, 1994) 29. Levenshtein, V.I., 'Binary Codes Capable of Correcting Deletions, Insertions, and Reversals', in, Soviet physics doklady, (1966) 30. Bellman, R., 'Dynamic Programming and Lagrange Multipliers', Proc Natl Acad Sci U S A, 1956, 42, (10), pp. 767-769. 31. Sakoe, H. and Chiba, S., 'Dynamic Programming Algorithm Optimization for Spoken Word Recognition', IEEE transactions on acoustics, speech, and signal processing, 1978, 26, (1), pp. 43-49. 32. Hausdorff, F., Mengenlehre, (Walter de Gruyter Berlin, 1927) 33. Mori, U., Mendiburu, A., and Lozano, J.A., 'Similarity Measure Selection for Clustering Time Series Databases', IEEE transactions on Knowledge and Data Engineering, 2016, 28, (1), pp. 181-195. 34. Yuan, G., Sun, P., Zhao, J., Li, D., and Wang, C., 'A Review of Moving Object Trajectory Clustering Algorithms', Artificial Intelligence Review, 2017, 47, (1), pp. 123-144. 35. Wagner, S. and Wagner, D., Comparing Clusterings: An Overview, (Universität Karlsruhe, Fakultät für Informatik Karlsruhe, 2007) 36. Zhang, Z., Huang, K., and Tan, T., 'Comparison of Similarity Measures for Trajectory Clustering in Outdoor Surveillance Scenes', in, Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, (IEEE, 2006) 37. Kadous, W., 'Grasp: Recognition of Australian Sign Language Using Instrumented Gloves', 1995. 38. Hu, W., Li, X., Tian, G., Maybank, S., and Zhang, Z., 'An Incremental Dpmm-Based Method for Trajectory Clustering, Modeling, and Retrieval', IEEE Trans Pattern Anal Mach Intell, 2013,35, (5), pp. 1051-1065. 39. Morris, B. and Trivedi, M., 'Learning Trajectory Patterns by Clustering: Experimental Studies and Comparative Evaluation', in, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, (IEEE, 2009) 40. Beyan, C. and Fisher, R.B., 'Detection of Abnormal Fish Trajectories Using a Clustering Based Hierarchical Classifier', in, BMVC, (2013) 41. Ramos-Garijo, R., Martín, S., Marzal, A., Prat, F., Vilar, J.M., and Llorens, D., 'An Input Panel and Recognition Engine for on-Line Handwritten Text Recognition', Frontiers in Artificial Intelligence and Applications, 2007, 163, p. 223. 42. Calderara, S., Prati, A., and Cucchiara, R., 'Mixtures of Von Mises Distributions for People Trajectory Shape Analysis', IEEE Transactions on Circuits and Systems for Video Technology, 2011, 21, (4), pp. 457-471. 43. Yeung, D.-Y., Chang, H., Xiong, Y., George, S., Kashi, R., Matsumoto, T., and Rigoll, G., Svc2004: First International Signature Verification Competition', Biometric Authentication, (Springer, 2004) 44. Demšar, J., 'Statistical Comparisons of Classifiers over Multiple Data Sets', Journal of Machine learning research, 2006, 7, (Jan), pp. 1-30. 45. García, S., Fernández, A., Luengo, J., and Herrera, F., 'Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining: Experimental Analysis of Power', Information Sciences, 2010, 180, (10), pp. 2044-2064. 46. Friedman, M., 'A Comparison of Alternative Tests of Significance for the Problem of M Rankings', The Annals of Mathematical Statistics, 1940, 11, (1), pp. 86-92. 47. Holland, B.S. and Copenhaver, M.D., 'An Improved Sequentially Rejective Bonferroni Test Procedure', Biometrics, 1987, pp. 417-423.


Download PDF 



International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir