Machine Learning (III)


Last week, we analyzed the mobile touch data using Tableau. This week, we will do a simple exercise using machine learnign to analyze the data. The classification task we will consider is “whether a touch event is left or right” using sensor measurements as features.

The dataset you will use is:

raw_touch_data.csv

Checkpoints

Checkpoint 1

We have developed a simple Matlab script that will try a variety of classifiers and include a variety of sensor reading features.

clear all; close all;
%%%%%%%%%%%%%% Change path %%%%%%%%%%%%%%%%
filename_path = '/change/path/to/file/raw_touch_data.csv';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%% Choose feature source %%%%%%%%%%%
add_accelerometer = false; % Adds accelerometer features
add_gyroscope = false; % Adds gyroscope features
add_magneticField = true; % Adds magnetic field features
add_gravity = false; % Adds gravity features
add_linearAcceleration = false; % Adds linear acceleration features
add_orientation = false; % Adds azimuth, pich and roll features
add_light = true; % Adds light value
add_proximity = false; % Adds proximity value
add_studentID = false; % Adds student ID
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%% Choose machine learning classifier parameters %%%%%%%%%%%%%%%
numTrees = 1; % Try different number of trees for the Random Forest classifier
sigma = 1; % Try different values of sigma for the Support Vector Machine classifier
dist = 'normal'; % Try different distributions = {'normal', 'kernel', 'mvmn' , 'mn'} for Naive Bayes classifier
K = 20; % Try different values of K for the K-nearest Neighbor classifier
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Main code starts here (you don't need to change anything here)
data = csvread(filename_path,1,0);
numTouch = data(end,28);
feat = []; feat_acc = []; feat_gyr = []; feat_mg = []; feat_grav = []; feat_lacc = []; feat_or = []; feat_light = []; feat_prox = []; feat_sID = [];
for i = 1: numTouch
temp = find(data(:,28) == i);
if add_accelerometer == true
feat_acc = compute3axialFeat(data(temp, [2 3 4]));
end
if add_gyroscope == true
feat_gyr = compute3axialFeat(data(temp, [5 6 7]));
end
if add_magneticField == true
feat_mf = compute3axialFeat(data(temp, [8 9 10]));
end
if add_gravity == true
feat_grav = compute3axialFeat(data(temp, [11 12 13]));
end
if add_linearAcceleration == true
feat_lacc = compute3axialFeat(data(temp, [14 15 16]));
end
if add_orientation == true
feat_or = compute3axialFeat(data(temp, [21 22 23]));
end
if add_light == true
feat_light = mean(data(temp,24));
end
if add_proximity == true
feat_prox = mean(data(temp,25));
end
if add_studentID == true
feat_sID = data(temp(1),26);
end
feat = [feat ; feat_acc feat_gyr feat_mg feat_grav feat_lacc feat_or feat_light feat_prox feat_sID];
label(i) = data(temp(1),29);
end
% Divide data into training and test sets
temp = 1: numTouch;
temp=temp(randperm(length(temp))); % Shaffle data points
train = temp(1:round(0.50*length(temp))); % train samples (50% of data samples)
test = temp(round(0.50*length(temp))+1:end); % test samples (remaining 50% of data samples)
% Classify using K-nearest neighbor
prediction = knnclassify(feat(test,:), feat(train, :), label(train), K);
accuracyKNN = numel(find(prediction == label(test)'))/length(test)*100
% Classify using Naive Bayes
NB = NaiveBayes.fit(feat(train,:),label(train),'Distribution', dist);
prediction = NB.predict(feat(test,:));
accuracyNB = numel(find(prediction == label(test)'))/length(test)*100
% Classify using SVM
SVMstruct = svmtrain(feat(train,:), label(train), 'kernel_function','rbf','rbf_sigma',sigma);
prediction = svmclassify(SVMstruct, feat(test,:));
accuracySVM = numel(find(prediction == label(test)'))/length(test)*100
% Classify using Random Forest
b = TreeBagger(numTrees,feat(train,:),label(train)');
prediction_cell = predict(b,feat(test,:));
for i = 1: length(prediction_cell)
prediction(i) = str2num(prediction_cell{i});
end
accuracyRF = numel(find(prediction == label(test)'))/length(test)*100

The script above depends on a utility function, below.

function feat = compute3axialFeat(data)
X = data(:,1); % X axis
Y = data(:,2); % Y axis
Z = data(:,3); % Z axis
XYZ = sqrt((X.^2+Y.^2+Z.^2));
% Compute mean, standard deviation, max and min features
feat = [mean(X) mean(Y) mean(Z) mean(XYZ) std(X) std(Y) std(Z) std(XYZ) max(X) max(Y) max(Z) max(XYZ) min(X) min(Y) min(Z) min(XYZ) ];
end

Download these two Matlab files and store them in the same folder. Also, copy the dataset file (i.e., raw_touch_data.csv) in the same folder.

Get this Matlab script to run on a computer. The expected output in the Matlab GUI Command Line window shoudl be something like below. Take a screenshot and submit.

matlab_commandline

The four classifiers are: (1) K nearest-neighbor (KNN), (2) Naive Bayes (NB), (3) Support Vector Machine (SVM), and (4) Random Forest (RF). As you can see, the performance with the default parameters are above chance (50%). But we can definitely improve.

Checkpoint 2

Let’s try change some parameters and see what happen to the accuracy performance. Change the parameter for the K-nearest neighbor classifier (i.e., K) from 20 to 10. Add the “light” measurement as a feature by setting the parameter add_light to true. Run the script again. Has accuracy improved? Take a screenshot of the new performance numbers.

Challenges

Test different combinations of features and training parameters. For each of the four classification algoithms, see if you can find a combination of features and parameter values to achieve really good accuacy performance.

1. K-NN

Report the highest accuracy number you’ve managed to achieve. Report the features and parameters you used.

2. Naive Bayes

Report the highest accuracy number you’ve managed to achieve. Report the features and parameters you used.

3. Support Vector Machine

Report the highest accuracy number you’ve managed to achieve. Report the features and parameters you used.

4. Random Forest

Report the highest accuracy number you’ve managed to achieve. Report the features and parameters you used.