A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Anna Koufakou; Michael Georgiopoulos

doi:10.1007/s10618-009-0148-z

Back

A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Journal article

Peer reviewed

A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Anna Koufakou and Michael Georgiopoulos

Data mining and knowledge discovery, Vol.20(2), pp.259-289

11-11-2009

DOI: https://doi.org/10.1007/s10618-009-0148-z

Abstract

Article

Artificial Intelligence

Chemistry and Earth Sciences

Computer Science

Data Mining and Knowledge Discovery

Information Storage and Retrieval

Physics

Statistics for Engineering

Outlier detection has attracted substantial attention in many applications and research areas; some of the most prominent applications are network intrusion detection or credit card fraud detection. Many of the existing approaches are based on calculating distances among the points in the dataset. These approaches cannot easily adapt to current datasets that usually contain a mix of categorical and continuous attributes, and may be distributed among different geographical locations. In addition, current datasets usually have a large number of dimensions. These datasets tend to be sparse, and traditional concepts such as Euclidean distance or nearest neighbor become unsuitable. We propose a fast distributed outlier detection strategy intended for datasets containing mixed attributes. The proposed method takes into consideration the sparseness of the dataset, and is experimentally shown to be highly scalable with the number of points and the number of attributes in the dataset. Experimental results show that the proposed outlier detection method compares very favorably with other state-of-the art outlier detection strategies proposed in the literature and that the speedup achieved by its distributed version is very close to linear.

Metrics

19 Record Views

78 Times Cited - Web of Science

114 Times Cited - Scopus

See more details

Details

Title: A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Creators: Anna Koufakou - Florida Gulf Coast University
Michael Georgiopoulos - University of Central Florida
Publication Details: Data mining and knowledge discovery, Vol.20(2), pp.259-289
Publisher: Springer US; DORDRECHT
Number of pages: 31
Grant note: Direct For Education and Human Resources; Division Of Undergraduate Education: 0837307 Division Of Undergraduate Education; Direct For Education and Human Resources: 0806931, 0837332
Identifiers: 99383409441006570
Academic Unit: Department of Computing and Software Engineering
Language: English
Resource Type: Journal article

A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Abstract

Related links

Metrics

Details