Data Mining Applications

I have been involving in several research projects such as software development for cancer early detection, retina image analysis for diabetic retinopathy diagnosis, Batik cloth impression determination, news summary and economical/business forecasting. In order to achieve the research goal, I implemented several machine learning methods such as k-Means and Hierarchical  clustering, Neural Network variant, Nearest Neighbour, SVM and other classification method, t-test and annova feature selection, PCA and CCA for feature reduction, Genetic Algorithm and Particle Swarm Optimisation for parameter optimisation.  


For a general perspective, in this section I describes only two prototypes of all implemented applications. Not all of the implemented machine learning methods are employed in these presented prototypes. Other implemented applications have similar flow except for the used methods that are relevant to the specific problems.

Biomedic Application : Breast cancer prediction

This prototype aims to predict whether a microarray data from a patient is a breast cancer data or not. The classification methods that can be used are k-NN, fuzzy k-NN, weighted k-NN and back-propagation Neural Network. The feature selection methods are t-test and  annova.  PCA and CCA are available for feature reduction. The training data ini this dataset is very ideal. Positive and negative samples have pretty different features values. Therefore, the performance of the classification is very impressive. The prototype can be challenged with more complicated data in the future.  

Economic/Business Application: Credit Approval Application

In this prototype, the model prediction is trained using credit approval dataset from Germany. The application will predict whether a debtor candidate will be a good  debtor (will pay the debt) or not. The features for the model are the personal information of the debtor candidate. The application can use t-test or annova to select the more discriminative features and use PCA or CCA to reduce the number of features. Several classification methods also available for the training and prediction process. 


Book Recommendation (Affinity Analysis)

This is an example of affinity analysis. Using book review dataset from http://www2.informatik.uni-freiburg.de/~cziegler/BX/,  I perform affinity analysis.  The aim is to find book reccomendation rules using appriori algorithm. From the generated rules (premise and conclusion), it is expected that we can obtain the understanding of user (reviewer) behaviour. And at the end, when a user buy books in premise side, we can reccommend the books in conclusion sites. The example of this analysis is provided in this link 

News Classification

With the explosion of document, it will be helpful if we able to classify new document automatically. In this example, I built a classification model. The purpose is to classify a new news document into one of class (for example: business news). I employ TF-IDF as features. Because the news is in Bahasa Indoensia, the Indonesian stemmer and  and stopwords removal are employed. The dataset sample is crawled from Indonesian News Portal (kompas.com). The classification performance is quite excellent. The example of the model is available in here

Image Clustering

The existence of social media produce a huge number of image files. Users love to share their beautiful moments through pictures. The availability of this data might provide insight about user's behaviour. For example, from their accomodation photo post, we can find the similar photo cluster. Next, we can reccomend the similar type accomodation which is in the same cluster as the user's photo post. The simple example of this image clustering model is available in this link