Abstract




 
   

IJE TRANSACTIONS B: Applications Vol. 31, No. 2 (February 2018) 250-262   

PDF URL: http://www.ije.ir/Vol31/No2/B/8-2686.pdf  
downloaded Downloaded: 61   viewed Viewed: 548

  AN EMPIRICAL COMPARISON OF DISTANCE MEASURES FOR MULTIVARIATE TIME SERIES CLUSTERING
 
A. Salarpour and H. Khotanlou
 
( Received: October 15, 2017 – Accepted in Revised Form: November 30, 2017 )
 
 

Abstract    Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative studies using quantitative and large scale evaluations. In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS clustering were conducted. The 14 well-known similarity measures with their variants and testing their effectiveness on 23 MTS datasets coming from a wide variety of application domains were re-implemented. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all of similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field.

 

Keywords    Multivariate time series, similarity measures, clustering, evaluation

 

چکیده    داده­ های سری­های زمانی چند متغیره در زندگی روزمره و زمینه­ های متفاوتی از علوم به وفور یافت می­ شود و چگونگی اندازه­ گیری شباهت بین این نوع سری­های زمانی یکی از بخش­های اصلی روند آنالیز سری­های زمانی است. حجم زیادی از تحقیقات انجام شده در این زمینه بر روی ارایه معیار های شباهت جدید متمرکز شده است. با وجود ارائه روش­های بسیار برای محاسبه شباهت بین سری­های زمانی، این زمینه همچنان از فقدان یک مطالعه مقایسه ­ای با استفاده از ارزیابی کمی در مقیاس مناسب رنج می­برد. بدین منظور و برای فراهم کردن یک سنجش مقایسه ای، ارزیابی گسترده ای بر روی معیار های شباهت برای خوشه بندی سری­های زمانی انجام داده­ ایم. بدین منظور، 14 معیار شباهت معتبر (به همراه تنوعات آنها) و بررسی اثرگذاری آنها بر روی 23 دیتاست سری زمانی چند متغیره (مربوط به کاربردهای مختلف) انجام شده است. در این مقاله، مقایسه تجربی در خصوص اثر بخشی معیارهای شباهت مبتنی بر خوشه­ بندی سلسه مراتبی ارائه شده است. علاوه بر این، آزمون آماری معناداری نیز برای سنجش معناداری بین بازدهی معیارهای شباهت، بکار گرفته شد. نتایج بدست آمده یک دید مقایسه ای بین معیارهای شباهت و برای یافتن روش مناسب بر اساس بازدهی و پیچیدگی زمانی فراهم می کند. همچنین نتایج نشان داد که تمامی معیارهای شباهت از لحاظ آماری و مبتنی بر آزمون غیرپارامتری اختلاف معناداری ندارند.

References   

1.      Keogh, E., Xi, X., Wei, L. and Ratanamahatana, C. A., "The UCR time series classification/clustering homepage", URL= http://www. cs. ucr. edu/eamonn/time series data Vol., (2006)

2.      Jeong, Y.-S., Jeong, M. K. and Omitaomu, O. A., "Weighted dynamic time warping for time series classification", Pattern Recognition Vol. 9, (2011), 2231-2240.

3.      Kantz, H. and Schreiber, T., "Nonlinear time series analysis", Cambridge university press,  (2004).

4.      Fu, Z., Hu, W. and Tan, T., "Similarity based vehicle trajectory clustering and anomaly detection", Image Processing, 2005. ICIP 2005. IEEE International Conference on, IEEE, (2005); Liao, T. W., "Clustering of time series data—a survey", Pattern Recognition Vol. 11, (2005), 1857-1874.

5.      Doroudyan, M., Owlia, M., Sadeghi, H. and Amiri, A., "Monitoring Financial Processes with ARMA-GARCH Model Based on Shewhart Control Chart (Case Study: Tehran Stock Exchange)", International Journal of Engineering-Transactions B: Applications Vol. 2, (2017), 270-278.

6.      Shafiee-Chafi, M. and Gholizade-Narm, H., "A novel fuzzy based method for heart rate variability prediction", International Journal of Engineering-Transactions A: Basics Vol. 7, (2014), 1041-1049.

7.      Aghabozorgi, S., Shirkhorshidi, A. S. and Wah, T. Y., "Time-series clustering–A decade review", Information Systems Vol., (2015), 16-38.

8.      Kleist, C., "Time Series Data Mining Methods: A Review", Humboldt-Universität zu Berlin,  (2015).

9.      Darvishi, A. and Hassanpour, H., "A geometric view of similarity measures in data mining", International Journal of Engineering-Transactions C: Aspects Vol. 12, (2015), 1728-1734.

10.    Faloutsos, C., Ranganathan, M. and Manolopoulos, Y., "Fast subsequence matching in time-series databases", ACM,  (1994).

11.    Rabiner, L. R. and Juang, B.-H., "Fundamentals of speech recognition",  Vol., (1993)

12.    Berndt, D. J. and Clifford, J., "Using dynamic time warping to find patterns in time series", KDD workshop, Seattle, WA, (1994).

13.    Vlachos, M., Kollios, G. and Gunopulos, D., "Discovering similar multidimensional trajectories", Data Engineering, 2002. Proceedings. 18th International Conference on, IEEE, (2002).

14.    Chen, L. and Ng, R., "On the marriage of lp-norms and edit distance", Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment, (2004).

15.    Chen, L., Özsu, M. T. and Oria, V., "Robust and fast similarity search for moving object trajectories", Proceedings of the 2005 ACM SIGMOD international conference on Management of data, ACM, (2005).

16.    Morse, M. D. and Patel, J. M., "An efficient and accurate method for evaluating time series similarity", Proceedings of the 2007 ACM SIGMOD international conference on Management of data, ACM, (2007).

17.    Marteau, P. F., "Time warp edit distance with stiffness adjustment for time series matching", IEEE Trans Pattern Anal Mach Intell Vol. 2, (2009), 306-318.

18.    Stefan, A., Athitsos, V. and Das, G., "The move-split-merge metric for time series", IEEE transactions on Knowledge and Data Engineering Vol. 6, (2013), 1425-1438.

19.    Eiter, T. and Mannila, H., "Computing discrete Fréchet distance"  (1994).

20.    Alt, H. and Godau, M., "Computing the Fréchet distance between two polygonal curves", International Journal of Computational Geometry & Applications Vol. 01n02, (1995), 75-91.

21.    Besse, P. C., Guillouet, B., Loubes, J.-M. and Royer, F., "Review and Perspective for Distance-Based Clustering of Vehicle Trajectories", IEEE Transactions on Intelligent Transportation Systems Vol. 11, (2016), 3306-3317.

22.    Górecki, T. and Łuczak, M., "Using derivatives in time series classification", Data Mining and Knowledge Discovery Vol., (2013), 1-22.

23.    Batista, G. E., Keogh, E. J., Tataw, O. M. and De Souza, V. M., "CID: an efficient complexity-invariant distance for time series", Data Mining and Knowledge Discovery Vol. 3, (2014), 634-669.

24.    Ding, H., Trajcevski, G., Scheuermann, P., Wang, X. and Keogh, E., "Querying and mining of time series data: experimental comparison of representations and distance measures", Proceedings of the VLDB Endowment Vol. 2, (2008), 1542-1552.

25.    Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P. and Keogh, E., "Experimental comparison of representation methods and distance measures for time series data", Data Mining and Knowledge Discovery Vol., (2013), 1-35; Serra, J. and Arcos, J. L., "An empirical evaluation of similarity measures for time series classification", Knowledge-Based Systems Vol., (2014), 305-314; Lines, J. and Bagnall, A., "Time series classification with ensembles of elastic distance measures", Data Mining and Knowledge Discovery Vol. 3, (2015), 565-592; Bagnall, A., Bostrom, A., Large, J. and Lines, J., "The great time series classification bake off: an experimental evaluation of recently proposed algorithms", Extended Version. CoRR, abs/1602.01711 Vol., (2016)

26.    Deza, M. M. and Deza, E., "Encyclopedia of distances", Springer,  (2009).

27.    Wei, W. W.-S., "Time series analysis", Addison-Wesley publ Reading,  (1994).

28.    Levenshtein, V. I., "Binary codes capable of correcting deletions, insertions, and reversals", Soviet physics doklady, (1966).

29.    Sakoe, H. and Chiba, S., "Dynamic programming algorithm optimization for spoken word recognition", IEEE transactions on acoustics, speech, and signal processing Vol. 1, (1978), 43-49.

30.    Hausdorff, F., "Mengenlehre", Walter de Gruyter Berlin,  (1927).


Download PDF 



International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir