Clustering algorithms are usually based on an initial estimate of cores, have performance dependent on the number of clusters and dimension of the data, and are performed offline. Thus, by categorizing a highly coupled sensor network as an industrial plant, it is necessary that all these characteristics are achieved. The article presents an improvement of the TEDA-Cloud, based on the Typicity and Eccentricity Data Analitics (TEDA). In this way,the proposed (TEDA-Cloud modified), method reduces the amount of stored data for merging cores and speeds up the classification of the presented data.
Słowa kluczowe: Clustering, eccentricity, typicality, sensor networks, stream data, industrial process
Algorytmy klastrowania sa˛ zwykle oparte na wste˛pnym oszacowaniu rdzeni, maja˛ wydajnos´c´ zalez˙na˛ od liczby klastrów i wymiarów danych i sa˛ wykonywane w trybie offline. Zatem, poprzez kategoryzowanie wysoce sprze˛z˙onej sieci czujników jako instalacji przemysłowej, konieczne jest, aby wszystkie te cechy zostały osia˛gnie˛ te. W artykule przedstawiono ulepszenie chmury TEDA opartej na analizie typowos´ci i ekscentrycznos´ci danych (TEDA). W ten sposób proponowana (zmodyfikowana TEDA-Cloud) metoda zmniejsza ilos´c´ przechowywanych danych do ła˛czenia rdzeni i przyspiesza klasyfikacj˛e prezentowanych danych.
Keywords: Klastrowanie, ekscentryczno´s´c, typowo´s´c, sieci czujników, dane strumieniowe, proces przemysłowy
Clustering processes, in general, are dependent on the problem to be presented. If we know in advance the number of cores representing the data or the tolerance needed to consider the creation of a new core, then we will have a good ranking. However, these conditions are not always available. In an industrial process, for example, the need to identify patterns through clustering is hampered by the fact that known methods are performed offline . The K-Means is an example of a clustering algorithm that needs in advance information on the number of core. In addition, the procedure is computationally costly because all data is labeled at each new location of the cores. This makes it unfeasible to use the algorithm in real-time processes. Due to the limitations of existing clustering methods, a new clustering method has been proposed, known as TEDACloud . The method aims to solve all these limitations by applying the concepts of eccentricity and typicity (TEDA) that allows recursive methods of statistical calculations to be determined on the data . Even with the improvement from TEDA-Cloud, clustering still needs to store which points are about the classification of each pattern and also the number of points that are under the effect of two cores. The TEDA-Cloud classifies the data using statistical concepts that indicate a probability of the point being associated with each cores. If the eccentricity is high enough to decide that the points presented do not fit any pattern, then a new core is formed. However, if the data represented by a core are very scattered, they can affect the abagence of another core, allowing them to be joined together. The determination of whether two cores should be joined is calculated by the number of points that are together in the two. This makes TEDA-CLoud a high computational cost. The TEDA-Cloud modified brings the improvement of the way the fusion is defined. The method avoids the lab [...]
 Angelov P., Gu Xiaowe., Gutierrez G., Iglesia J. A., Guedes L. A., Sanchis A.: Autonomous data density based clustering method, IEEE International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24-29 July, 2016.  Bezerra C. G., Costa B. S. J., Guedes L. A., Angelov P. A.: A new evolving clustering algorithm for online data streams, IEEE Conference on Evolving and Adaptative Intelligent Systems (EAIS), pp. 162-168, Natal, 2016.  Costa B. S. J., Bezerra C. G., Guedes L. A., Angelov P.: Online fault detection based on typicality and eccentricity data analytics, International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12-17 July, 2015.  MacQueen J.: Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, Calif., 1, pp. 281-297, 1967.  Khanmohammadi S., Naiier Adibeig N., Samaneh Shanehbandy S.: An improved overlapping k-means clustering method for medical applications, Expert Systems With Applications, 67, pp. 12-18, Jan., 2017.  Kamath R. S., Kamat R. K.: K-means clustering for analyzing productivity in light of R D spillover, International Journal of Information Technology, Modeling and Computing (IJITMC), 4, May, 2016.  Punj G., Stewart D. W.: Cluster analysis in marketing research: review and suggestions for application, Journal of Marketing Research, 20, pp. 134-148, May, 1983.  Alkilany A., Ahmed A., Said H., Bakar A. A.: Application of the k-means clustering algorithm to predict load shedding of the Southern Electrical Grid of Libya, Fourth edition of the International Conference on the Innovative Computing Technology (INTECH 2014), Luton, UK, 13-15 Aug., 2014.  Zakharov K.: Application of k-means clustering in psychological studies, Tutorials in Quantitative Methods for Psychology, 12, pp. 87-100, 2016.  Hartigan J. A.: Clustering algorithms, John Wiley Sons, Inc., 99th. ed. New York, NY, USA, 1975. ISBN ISBN: 047135645X  Pietrzykowski M., Plucinski M.: Mini-model method based on k-means clustering, Przeglad Elektrotechniczny, pp. 73-76, Jan., 2017.  Kangin D., Angelov P., Iglesias, J. A.: Autonomously evolving classifier TEDAClass, Information Sciences, 366, pp. 1-11, 2016.  Angelov P.: Anomaly detection based on eccentricity analylis, IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), Orlando, FL, USA, 9-12 Dec., 366, pp. 32-58, 2016.  Bezerra C. G., Costa B. S. J., Guedes L. A., Angelov P.: An evolving approach to unsupervised and real-time fault detection in industrial processes, Expert systems with applications, 63, pp. 134-144, 2016.  Angelov P.: Outside the box: An alternative data analitcs frame-work, Journal of automation, Mobile Robotics and Intelligent Systems, 8, pp. 53-59, 2014.  Saw J. G., Yang M. C. K., Mo T. S. E. C.: Chebyshev inequality with estimated mean and variance, The American Statistician, 38, pp. 130-132, 1984.  Downs J. J., Voguel E. F.: A plant-wide industrial process control problem, Computers Chemical Engineering, 17, pp. 142- 149, June, 1993.  Tyréus B. D., Voguel E. F.: Dominant variables for partial control. 2. Application to the Tennessee Eastman challenge process, Industrial & Engineering Chemistry Research, 38, pp. 1444-1455, 1999.  Fränti P., Sieranoja S.: Clustering datasets, [web page] http://cs.joensuufi/sipu/datasets/. [Accessed on 10 July 2018.].