インストール

Rで実装できませんでした。Pythonでやります。
OSはUbuntu 12.04。
まずFRaCのサイトから、最新版のスクリプトをダウンロードして解凍する。fracというフォルダができる(と思う)が、ここで作業することにする。
 
LIBSVMというSVM計算を行うソフトを導入する。
ubuntuならターミナルから

sudo apt-get install python-libsvm*

 
Wekaといういろいろな予測モデルをやってくれるソフトウェアをダウンロードする。適当なところに解凍する。
weka.jarというのが重要である。これが置いてあるディレクトリのパスはわかるように。
FRaCはWeka中の決定木をパクって使いたいらしい。
 
あとはfracフォルダ中にdetectというスクリプトがあるが、これを実行する。
引数やオプションは次の通り。
デフォルトは教育データがすべて正常の分布からくる(半教師有り学習)という状況を想定しているっぽいが、教育データとテストデータを同一のものにして実行したらたぶん教師なし学習として外れ値検出をしてくれるっぽい…ことを祈っている。
サンプルデータのacuteの中にあるデータセットを使ってみよう。

#いまのディレクトリはfrac
python detect -X test.data/acute/trainset -Q test.data/acute/testset -T -R -N -S -w ~/Desktop/weka-3-6-9/weka.jar
# Training set examples: 45
# Test set examples:     75
# Infer feature types from test.data/acute/trainset and test.data/acute/testset
# 6 feature types (column, type, values):
# 1	continuous	35.5,41.5
# 2	nominal   	no,yes
# 3	nominal   	no,yes
# 4	nominal   	no,yes
# 5	nominal   	no,yes
# 6	nominal   	no,yes
# @0.01 seconds:  Feature #1, Decision Stump ...
# @0.02 seconds:  Feature #1, svm-train -s 3 -t 0 ...
# @0.24 seconds:  Feature #1, svm-train -s 3 -t 2 ...
# @0.49 seconds:  Feature #1, weka.classifiers.trees.REPTree ...
# @4.10 seconds:  Feature #2, Decision Stump ...
# @4.11 seconds:  Feature #2, svm-train -s 0 -t 0 -b 1 ...
# @4.33 seconds:  Feature #2, svm-train -s 0 -t 2 -b 1 ...
# @4.58 seconds:  Feature #2, weka.classifiers.trees.J48 -R ...
# @8.24 seconds:  Feature #3, Decision Stump ...
# @8.24 seconds:  Feature #3, svm-train -s 0 -t 0 -b 1 ...
# @8.47 seconds:  Feature #3, svm-train -s 0 -t 2 -b 1 ...
# @8.70 seconds:  Feature #3, weka.classifiers.trees.J48 -R ...
# @12.29 seconds:  Feature #4, Decision Stump ...
# @12.30 seconds:  Feature #4, svm-train -s 0 -t 0 -b 1 ...
# @12.53 seconds:  Feature #4, svm-train -s 0 -t 2 -b 1 ...
# @12.75 seconds:  Feature #4, weka.classifiers.trees.J48 -R ...
# @16.82 seconds:  Feature #5, Decision Stump ...
# @16.82 seconds:  Feature #5, svm-train -s 0 -t 0 -b 1 ...
# @17.05 seconds:  Feature #5, svm-train -s 0 -t 2 -b 1 ...
# @17.30 seconds:  Feature #5, weka.classifiers.trees.J48 -R ...
# @20.94 seconds:  Feature #6, Decision Stump ...
# @20.95 seconds:  Feature #6, svm-train -s 0 -t 0 -b 1 ...
# @21.16 seconds:  Feature #6, svm-train -s 0 -t 2 -b 1 ...
# @21.37 seconds:  Feature #6, weka.classifiers.trees.J48 -R ...
# @25.24 seconds:  Write normalized surprisal ...
#以下、テストデータのanomaly scoreがずらずら…
  --version    show program's version number and exit
  -h, --help   show this help message and exit
  -X Filename  Training set file.  Format is tabular; each line is an example,
               each column is a feature
  -Q Filename  Test set file (same format as training set file above)
  -d String    Field delimiter.  E.g., if your input file(s) are lines of
               comma-separated values, set this to ','.   Default field
               separator is any white space
  -m Filename  Optional meta-data file.  If not supplied, column types are
               inferred automatically from the provided values.  The format
               for the meta-data file is as follows:  Each line has three
               parts: column number, column type, possible values.  Column
               numbers start at 1.  Valid column types are 'nominal',
               'continuous' and 'ignore'.  For nominal features, possible
               values should be a comma-separated list.  For continuous
               features, this list should have only a minimum and a maximum
               allowed value.  This option is useful if you need to specify
               that certain features are not to be considered, or that an
               enumerated nominal feature uses integers as values
  -T           Use pruned regression/decision tree models.  Uses WEKA
               (http://www.cs.waikato.ac.nz/ml/weka).
  -R           Use RBF-kernel SVM models.  Uses LIBSVM
               (http://www.csie.ntu.edu.tw/~cjlin/libsvm).
  -N           Use linear-kernel SVM models.  Uses LIBSVM.
  -S           Use decision stump models.  These simply predict the mean of
               the feature distribution without regard to the other features
               (mostly, this is just here because it doesn't require
               additional software)
  -w Path      Path to weka.jar (for supervised learners implemented in WEKA).
               Default is "./weka.jar"  See:
               http://www.cs.waikato.ac.nz/ml/weka
  -f Integer   Learn predictor models C_i for these features only (thus output
               anomaly scores will be sums of surprisal for these features
               only).  This option may be invoked multiple times.  If this
               option is not invoked, learn a predictor for all features
  -o Filename  Write anomaly detection scores to this location