Scalable Data Systems Lab
Scalable Machine Learning
High Dimensional Data Management
Vehicular Sensor Network Analytics
Scalable Machine Learning
Time Series Data Analysis
Time Series Classification and Clustering
Scalable Algorithms for Search and Indexing
Concept Drift Detection
Automatic Speech Recognition
Raheem Sarwar: Stylometric Analytical Query Processing
Nattapol Trijakwanich: Scalable Data Mining
Krissanee Kamthawee: Extreme Multi-class Multi-label Classification
Sasikarn Khwanmuang: Machine Learning Systems
Bundit Boonyarit: Data-Intensive Scientific Discovery (Molecular Dynamics Simulations)
Benchakarn Leelakittisin: Healthcare Data Management
Multi-Author Authorship Attribution
Aims at identifying the true author of an anonymous document from a set of candidate authors
Single-label (author) classification problem
Applicable to single-author documents
Authorship Identification for Multi-Author Documents (AIMD)
Given a corpus of multi-author documents labeled with their authors, identify the authors of an anonymous multi-author document from a set of authors of a given corpus.
Applicable to single-author/multi-author documents
Multi-label classification problem
Ref: Raheem Sarwar, Chenyun Yu, Sarana Nutanong, Norawit Urailertprasert, Nattapol Vannaboot, Thanawin Rakthanmanon: A Scalable Framework for Stylometric Analysis of Multi-author Documents. DASFAA (1) 2018: 813-829
C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join
Approximate similarity join based on locality-sensitive hashing (LSH) provides a good solution for reducing the processing cost with a predictable loss of accuracy.
The network cost is the bottleneck in a distributed processing environment.
Focusing on collision counting LSH-based similarity join on MapReduce, we propose a network-efficient solution called C2Net, which improves the utilization of MapReduce combiners.
Hangyu Li, Sarana Nutanong, Hong Xu, Chenyun Yu, Foryu Ha: C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join. IEEE Transactions on Knowledge and Data Engineering Year: 2018, ( Early Access )
A Hardware-Accelerated Solution for Join Operations
The join query is one of the most fundamental database query types for relational database management systems and has a high cost in comparison to other query types.
We propose a novel solution to accelerate processing of sort-merge join queries with a low match rate.
Zimeng Zhou, Chenyun Yu, Sarana Nutanong, Yufei Cui, Chenchen Fu, Chun Jason Xue: A Hardware-Accelerated Solution for Hierarchical Index-Based Merge-Join. IEEE Transactions on Knowledge and Data Engineering Year: 2018, ( Early Access )
A Quality-oriented Data Collection Scheme in Vehicular Sensor Networks
The communication overhead of collecting data from all vehicles at a high frequency could be prohibitively expensive.
We propose a Quality- oriented Data Collection (QDC) scheme which aims to effectively support the accuracy and real-time requirements stipulated by ITS applications, while reducing communication overhead due to the huge number of update packets.
Wendi Nie, Victor C. S. Lee, Dusit Niyato, Yaoxin Duan, Kai Liu, Sarana Nutanong: A Quality-oriented Data Collection Scheme in Vehicular Sensor Networks. IEEE Transactions on Vehicular Technology. Year: 2018, ( Early Access )
Multivariate Time Series Data Management
Identifying similar time series
is a core subroutine for many data mining and data analysis problems
Existing efficient solutions fail
to scale as the number of dimensions increases
We propose an efficient approximation method
via locality sensitive hashing.