- 浏览: 3566831 次
- 性别:
- 来自: 杭州
文章分类
- 全部博客 (1491)
- Hibernate (28)
- spring (37)
- struts2 (19)
- jsp (12)
- servlet (2)
- mysql (24)
- tomcat (3)
- weblogic (1)
- ajax (36)
- jquery (47)
- html (43)
- JS (32)
- ibatis (0)
- DWR (3)
- EXTJS (43)
- Linux (15)
- Maven (3)
- python (8)
- 其他 (8)
- JAVASE (6)
- java javase string (0)
- JAVA 语法 (3)
- juddiv3 (15)
- Mule (1)
- jquery easyui (2)
- mule esb (1)
- java (644)
- log4j (4)
- weka (12)
- android (257)
- web services (4)
- PHP (1)
- 算法 (18)
- 数据结构 算法 (7)
- 数据挖掘 (4)
- 期刊 (6)
- 面试 (5)
- C++ (1)
- 论文 (10)
- 工作 (1)
- 数据结构 (6)
- JAVA配置 (1)
- JAVA垃圾回收 (2)
- SVM (13)
- web st (1)
- jvm (7)
- weka libsvm (1)
- weka屈伟 (1)
- job (2)
- 排序 算法 面试 (3)
- spss (2)
- 搜索引擎 (6)
- java 爬虫 (6)
- 分布式 (1)
- data ming (1)
- eclipse (6)
- 正则表达式 (1)
- 分词器 (2)
- 张孝祥 (1)
- solr (3)
- nutch (1)
- 爬虫 (4)
- lucene (3)
- 狗日的腾讯 (1)
- 我的收藏网址 (13)
- 网络 (1)
- java 数据结构 (22)
- ACM (7)
- jboss (0)
- 大纸 (10)
- maven2 (0)
- elipse (0)
- SVN使用 (2)
- office (1)
- .net (14)
- extjs4 (2)
- zhaopin (0)
- C (2)
- spring mvc (5)
- JPA (9)
- iphone (3)
- css (3)
- 前端框架 (2)
- jui (1)
- dwz (1)
- joomla (1)
- im (1)
- web (2)
- 1 (0)
- 移动UI (1)
- java (1)
- jsoup (1)
- 管理模板 (2)
- javajava (1)
- kali (7)
- 单片机 (1)
- 嵌入式 (1)
- mybatis (2)
- layui (7)
- asp (12)
- asp.net (1)
- sql (1)
- c# (4)
- andorid (1)
- 地价 (1)
- yihuo (1)
- oracle (1)
最新评论
-
endual:
https://blog.csdn.net/chenxbxh2 ...
IE6 bug -
ice86rain:
你好,ES跑起来了吗?我的在tomcat启动时卡在这里Hibe ...
ES架构技术介绍 -
TopLongMan:
...
java public ,protect,friendly,private的方法权限(转) -
贝塔ZQ:
java实现操作word中的表格内容,用插件实现的话,可以试试 ...
java 读取 doc poi读取word中的表格(转) -
ysj570440569:
Maven多模块spring + springMVC + JP ...
Spring+SpringMVC+JPA
(本页内容来自互联网,如果对您的利益造成侵害,请通知我,我会立即删除!)
View this tutorial in: English Only TraditionalChinese Only Both (Default)
(req. JavaScript if you want to switch languages)
Core StyleSheets: Chocolate Midnight Modernist Oldstyle Steely Swiss Traditional Ultramarine
*
This document is written in multilingual format. We strongly suggest
that you choose your language first to get a better display. piaip's Using (lib)SVM Tutorial piaip 的 (lib)SVM 简易入门
piaip at csie dot ntu dot edu dot tw,
Why this tutorial is here
我一直觉得 SVM 是个很有趣的东西,不过也一直没办法 (mostly 冲堂 ) 去听 林智仁老师
的 Data mining 跟 SVM 的课; 后来看了一些网络上的文件跟听 kcwu 讲了一下 libsvm
的用法后,就想整理一下,算是对于并不需要知道完整 SVM 理论的人提供使用 libsvm
的
入门。 原始 libsvm 的 README 跟 FAQ 也是很好的文件, 不过你可能要先对 svm 跟流程有点了解才看得懂 (
我在看时有这样的感觉 ) ; 这篇入门就是为了从零开始的人而写的。 I've been considering SVM as an
interesting and useful tool but couldn't attend the "Data mining and
SVM" course by prof. cjline about it (mostly due to scheduling
conflicts). After reading some materials on the internet and discussing
libsvm
with
some of my classmates and friends , I wanted to provide some notes here
as a tutorial for those who do not need to know the complete theory
behind SVM theory to use libsvm
.
The original README and FAQ files that comes with libsvm are good
documents too. But you may need to have some basic knowledge of SVM and
its workflow (that's how I felt when I was reading them). This tutorial
is specificly for those starting from zero. 后来还有一些人提供意见,所以在此要感谢: I must thank these guys who provided feedback and helped me make this tutorial: kcwu, biboshen, puffer, somi
不
过请记得底下可能有些说法不一定对,但是对于只是想用 SVM 的人来说我觉得这样说明会比较易懂。 Remember that some
aspect below may not be correct. But for those who just wish to "USE"
SVM, I think the explanation below is easier to understand. 这
篇入门原则上是给会写基本程序的人看的,也是给我自己一个备忘 , 不用太多数学底子,也不用对 SVM 有任何先备知识。 This
tutorial is basically for people who already know how to program. It's
also a memo to myself. Neither too much mathmatics nor prior SVM
knowledge is required. 还看不懂的话
有三个情形 , 一是我讲的不够清楚 , 二是你的常识不足 , 三是你是小白 ^^; If you still can't understand
this tutorial, there are three possibilities: 1. I didn't explain
clearly enough, 2. You lack sufficient common knowledge, 3. You don't
use your brain properly ^^; 我
自己是以完全不懂的角度开始的,这篇入门也有不少一样不懂 SVM 的人 看过、而且看完多半都有一定程度的理解,所以假设情况一不会发生,
那如果不懂一定是后两个情况 :P 也所以 , 有问题别问我。 Since I begin writing this myself with
no understanding of the subject, ans this document has been read by
many people who also didn't understand SVM but gained a certain level
of understanding after reading it, possibility 1 can be ruled out. Thus
if you can't understand it you must belong to the latter two
categories, :P thus even if you have any questions after reading this,
don't ask me. SVM: What is it and what can it do for me?
SVM, Support Vector Machine
, 简而言之它是个起源跟类神经网络有点像的东西, 不过现今最常拿来就是做分类 (classification) 。 也就是说,如果我有一堆已经分好类的东西 (可是分类的依据是未知的!)
,那当收到新的东西时, SVM 可以预测 (predict) 新的数据要分到哪一堆去。 SVM, Support Vector Machine
, is something that has similar roots with neural networks. But recently it has been widely used in Classification
. That means, if I have some sets of things classified (But you know nothing about HOW I CLASSIFIED THEM, or say you don't know the rules used for classification)
, when a new data comes, SVM can PREDICT
which set it should belong to. 听起来是很神奇的事(如果你觉得不神奇,请重想一想这句话代表什么: 分类的依据是未知的!
,还是不神奇的话就请你写个程序 解解看这个问题), 也很像要 AI 之类的高等技巧 ... 不过 SVM 基于 统计学习理论
可
以在合理的时间内漂亮的解决这个问题。 It sounds marvelous and would seem to require
advanced techniques like AI searching or some time-consuming complex
computation. But SVM used some Statistical Learning Theory
to solve this problem in reasonable time. 以
图形化的例子来说明 (by SVMToy), 像假定我在空间中标了一堆用颜色分类的点 , 点的颜色就是他的类别 , 位置就是他的数据 , 那
SVM 就可以找出区隔这些点的方程式 , 依此就可以分出一区区的区域 ; 拿到新的点 ( 数据 ) 时 , 只要对照该位置在哪一区就可以
(predict) 找出他应该是哪一颜色 ( 类别 ) 了 :
Now we explain with a graphical
example(by SVMToy), I marked lots of points with different colors on a
plane, the color of each point is its "class" and the location is its
data. SVM can then find equations to split these points and with these
equations we can get colored regions. When a new point(data) comes, we
can find (predict) what color (class) a point should be just by using
the point's location (data) 原始资料分布 Original Data
SVM 找出来的区域 SVM Regions
当
然 SVM 不是真的只有画图分区那么简单 , 不过看上面的例子应该可以了解 SVM 大概在作什么 . Of course SVM is not
really just about painting and marking regions, but with the example
above you should should be able to get some idea about what SVM is
doing. 要对 SVM 再多懂一点点,可以参考 cjlin 在 data mining 课的 slides: pdf
or ps
。 所
以 , 我们可以把 SVM 当个黑盒子 , 数据丢进去让他处理然后我们再来用就好了 . Thus we can consider SVM as
a black box. Just push data into SVM and use the output. How do I get SVM?
林智仁(cjlin) 老师
的 libsvm
当然是最完美的工具 . Chih-Jen Lin
's libsvm
is of course the best tool you can ever find. Download libsvm
下载处 : Download Location: .zip
跟 .tar.gz 基本上是一样的 , 只是看你的 OS; 习惯上 Windows 用 .zip 比较方便 ( 因为有 WinZIP,
不过我都用 WinRAR), UNIX 则是用 .tar.gz Contents in the .zip and .tar.gz are
the same. People using Windows usually like to use .zip files because
they have WinZIP, which I always replace with WinRAR. UNIX users mostly
prefer .tar.gz Build libsvm
解
开来后 , 假定是 UNIX 系统 , 直接打 make 就可以了 ; 编不出来的话请 详读说明和运用常识 . 因为这是 tutorial,
所以我不花时间细谈 , 而且 会编不出来的情形真是少之又少 , 通常一定是你的系统有问题或你太笨了 . 其它的子目录可以不管 , 只要 svm-train, svm-scale, svm-predict
三个执行档有编出来就可以了 . After you extracted the archives, just type make
if you are using UNIX. You may ignore some of the subdirectories. We only need these executable files: svm-train, svm-scale, and svm-predict
Windows 的用户要自己重编当然也是可以 , 不过已经有编好的 binary 在里面了 : 请检查 windows 子目录 , 应该会有 svmtrain.exe, svmscale.exe, svmpredict.exe, svmtoy.exe
.
Windows users may rebuild from source if you want, but there're already
some prebuilt binaries in the archive: just check your "windows"
subdirectory and you should find svmtrain.exe, svmscale.exe, svmpredict.exe, and svmtoy.exe
. Using SVM
libsvm
有很多种用法 , 这篇 tutorial 只打算讲简单的部分 . libsvm has lots of functions. This
tutorial will only explain the easier parts (mostly classification with
default model). The programs
解
释一下几个主要执行档的作用 : (UNIX/Windows 下檔名稍有不同 , 请用常识理解我在讲哪个 ) I'm going to
describe how to use the most important executables here. The filenames
are a little bit different under Unix and Windows, apply common sense
to see which I'm referring to. svmtrain Train
( 训练 ) data. 跑 SVM 被戏称为 " 开火车 " 也是由于这个程序名而来 . train 会接受特定格式的输入 , 产生一个
"Model" 檔 . 这个 model 你可以想象成 SVM 的内部数据 , 因为 predict 要 model 才能 predict,
不能直接吃原始数据 . 想想也很合理 , 假定 train 本身是很耗时的动作 , 而 train 好可以以某种形式存起内部数据 , 那下次要
predict 时直接把那些内部数据 load 进来就快多了 . Use your data for training. Running
SVM is often referred to as 'driving trains' by its non-native English
speaking authors because of this program. svmtrain accepts some
specifically format which will be explained below and then generate a
'Model' file. You may think of a 'Model' as a storage format for the
internal data of SVM. This should appear very reasonable after some
thought, since training with data is a time-consuming process, so we
'train' first and store the result enabling the 'predict' operation to
go much faster. svmpredict 依照已经 train 好的 model, 再加上给定的输入 ( 新值 ), 输出 predict ( 预测 ) 新值所对应的类别 (class). Output the predicted
class of the new input data according to a pre-trained model. svmscale Rescale
data. 因为原始数据可能范围过大或过小 , svmscale 可以先将数据重新 scale ( 缩放 ) 到适当范围 . Rescale
data. The original data maybe too huge or small in range, thus we can
rescale them to the proper range so that training and predicting will
be faster. 档
案格式要先交代一下 . 你可以参考 libsvm 里面附的 "heart_scale": This is the input file
format of SVM. You may also refer to the file "heart_scale" which is
bundled in official libsvm source archive. [label]
[index1]:[value1] [index2]:[value2] ... 一行一笔资料,如 One record per line, as: +1 1:0.708 2:1 3:1 4:-0.320 5:-0.105 6:-1
label 或
说是 class, 就是你要分类的种类,通常是一些整数。 Sometimes referred to as 'class', the
class (or set) of your classification. Usually we put integers here. index 是有顺序的索引,通常是放连续的整数。 Ordered indexes. usually continuous integers. value 就是用来 train 的数据,通常是一堆实数。 The data for training. Usually lots of real (floating point) numbers. 每
一行都是如上的结构 , 意思就是 : 我有一排资料 , 分别是 value1, value2, .... valueN, (
而且它们的顺序已由 indexN 分别指定 ) ,这排数据的分类结果就是 label 。 Each line has the
structure described above. It means, I have an array(vector) of
data(numbers): value1, value2, .... valueN (and the order of the values
are specified by the respective index), and the class (or the result)
of this array is label. 或许你会不
太懂,为什么会是 value1,value2,.... 这样一排呢? 这牵涉到 SVM 的原理。你可以这样想(我没说这是正确的),
它的名字就叫 Support "Vector" Machine ,所以输入的 training data 是 "Vector"( 向量 ),
也就是一排的 x1, x2, x3, ... 这些值就是 valueN ,而 x[n] 的 n 就是 由 indexN 指定。这些东西又称为
"attribute" 。 真实的情况是, 大部份时候我们给定的数据可能有很多 " 特征 (feature)" 或说 " 属性 (attribute)" ,所以输入会是 一组的。举例来说,以前面 画点分区的例子
来说,我们不是每个点都有 X 跟 Y 的坐标吗? 所以它就有 两种 attribute 。 假定我有两个点: (0,3) 跟 (5,8) 分别在 label(class) 1 跟 2 ,那就会写成 1 1:0 2:3 这
种档案格式最大的好处就是可以使用 sparse matrix , 或说有些 data 的 attribute 可以不存在。 This kind
of fileformat has the advantage that we can specify a sparse matrix,
ie. some attribute of a record can be omitted. To Run libsvm
来
解释一下 libsvm 的程序怎么用。 你可以先拿 libsvm 附的 heart_scale 来做输入,底下也以它为例: Now I'll
show you how to use libsvm. You may use the heart_scale file in the
libsvm source archive as input, as I'll do in this example: 看到这里你应该也了解,使用 SVM 的流程大概就是: You should have a sense that using libsvm is basically: 1. 准备数据并做成指定 格式
( 有必要时需 svmscale) Prepare data in specified format
and svmscale it if necessary. 2. 用 svmtrain 来 train 成 model Train the data to create a model with svmtrain. 3. 对新的输入,使用 svmpredict 来 predict 新数据的 class Predict new input data with svmpredict and get the result. svmtrain
svmtrain 的语法大致就是 : The syntax of svmtrain is basically: svmtrain [options] training_set_file [model_file]
training_set_file 就是之前的格式,而 model_file 如果不给就会 叫 [training_set_file].model 。 options 可以先不要给。 The
format of training_set_files is described above. If the model_file is
not specified, it'll be [training_set_file].model by default. Options
can be ignored at first. 下列程序
执行结果会产生 heart_scale.model 檔: ( 屏幕输出不是很重要,没有错误就好了 ) The following
command will generate the heart_scale.model file. The screen output may
be ignored if there were no errors. ./svm-train heart_scale
svmpredict
svmpredict 的语法是 : The syntax to svm-predict is: svmpredict test_file model_file output_file
test_file
就是我们要 predict 的数据。它的格式跟 svmtrain 的输入,也就是 training_set_file
是一样的!不过每行最前面的 label 可以省略 ( 因为 predict 就是要 predict 那个 label) 。 但如果
test_file 有 label 的值的话, predict 完会顺便拿 predict 出来的值跟 test_file
里面写的值去做比对,这代表: test_file 写的 label 是真正的分类结果,拿来跟我们 predict 的结果比对就可以 知道
predict 有没有猜对了。 test_file is the data the we are going to 'predict'.
Its format is almost exactly the same as the training_set_file, which
we fed as input to svmtrain. But we can skip the leading label
(Because
'predict' will output the label). Somehow if test_file has labels,
after predicting svm-predict will compare the predicted label with the
label written in test_file. That means, test_file has the real (or
correct) result of classification, and after comparing with our
predicted result we can know whether the prediction is correct or not. 也
所以,我们可以拿原 training set 当做 test_file 再丢给 svmpredict 去 predict ( 因为格式一样 )
,看看正确率有多高, 方便后面调参数。 So we can use the original training_set_file as
test_file and feed it to svmpredict for prediction (nothing different
in file format) and see how high the accuracy is so we can optimize the
arguments. 其它参数就很好理解了: model_file
就是 svmtrain 出来 的档案, output_file
是存输出结果的档案。 Other arguments should be easy to figure out now: model_file
is the model trained by svmtrain, and output_file
is where we store the output result. 输
出的格式很简单,每行一个 label ,对应到你的 test_file 里面的各行。 Format of output is simple.
Each line contains a label corresponding to your test_file. 下列程序执行结果会产生 heart_scale.out : The following commands will generate heart_scale.out: ./svm-predict heart_scale heart_scale.model heart_scale.out
As
you can see ,我们把原输入丢回去 predict , 第一行的 Accuracy 就是预测的正确率了。 如果输入没有 label
的话,那就是真的 predict 了。 As you can see, after we 'predict'ed the original
input, we got 'Accuracy=86.6667%" on first line as accuracy of
prediction. If we don't put labels in input, the result is real
prediction. 看到这里,基本上你应该已经可以利用
svm 来作事了: 你只要写程序输出正确格式的数据,交给 svm 去 train , 后来再 predict 并读入结果即可。 Now you
can use SVM to do whatever you want! Just write a program to output its
data in the correct format, feed the data to SVM for training, then
predct and read the output. Advanced Topics
后
面可以说是一些稍微进阶的部份,我可能不会讲的很清楚,因为我的重点是想表达一些观念和解释一些你看相关文件时 很容易碰到的名词。 These
are a little advanced and I may not explain very clearly. Because I
just want to help you get familiar with some of the terminology and
ideas that you'll encounter when you read other (lib)SVM documents. Scaling
svm-scale 目前不太好用,不过它有其必要性。因为 适当的 scale 有助于参数的选择 ( 后述 ) 还有解 svm 的速度。 svm-scale
is not easy to use right now, but it is important. Scaling aids the
choosing of arguments (described below) and the speed of solving SVM. Arguments
前面提到,在 train 的时候可以下一些参数。 ( 直接执行 svm-train 不指定输入档与参数会列出所有参数及语法说明 ) 这些参数对应到原始 SVM 公式的一些参数,所以会影响 predict 的正确与否。 We
know that we can use some arguments when we were training data (Running
svm-train without any input file or arguments will cause it to print
its list syntax help and complete arguments). These arguments
corresponds to some arguments in original SVM equations so they will
affect the accuracy of prediction. Cross Validation
一般而言, SVM 使用的方式 ( 在决定参数时 ) 常是这样: 1. 先有已分好类的一堆资料 2. 随机数拆成好几组 training set 3. 用某组参数去 train 并 predict 别组看正确率 4. 正确率不够的话,换参数再重复 train/predict Mostly people use SVM while following this workflow: 1. Prepare lots of pre-classified (correct) data 2. Split them into several training sets randomly. 3. Train with some arguments and predict other sets of data to calculate the accuracy. 4. Change the arguments and repeat until we get good accuracy. 等找到一组不错的参数后,就拿这组参数来建 model 并用来做最后对未知数据的 predict 。 这整个过程叫 cross validation
,
也就是交叉比对。 When we got some nice arguments, we will then use them to
train the model and use the model for final prediction (on unknown test
data). This whole process is called cross validation
. 在我们找参数的过程中,可以利用 svmtrain 的内建 cross validation 功能来帮忙: 如果没有交叉比对的话,很容易找到只在特定输入时好的参数。像前面我们 c=10 得到 92.2% ,不过拿 -v 5 来看看: ./svm-train -v 5 -c 10 heart_scale
What arguments rules?
通常而言,比较重要的参数是 gamma (-g)
跟 cost (-c)
。
而 cross validation (-v) 的参数常用 5 。 Generally speaking, you will only
modify two important arguments when you are using training with data: gamma (-g)
and cost (-c)
. And cross validation (-v) is usually set to 5. cost 默认值是 1, gamma 默认值是 1/k , k 等于输入 数据笔数。 那我们怎么知道要用多少来当参数呢? 用 试 的
cost
is 1 by default, and gamma has default value = 1/k , k = number of
input records. Then how do we know what value to choose as arguments? T R Y
Try
参数的过程常用 exponential 指数成长的方式来增加与减少参数的数值,也就是 2^n (2 的 n 次方 ) 。 When
experimenting with arguments, the value usually increases and decreases
in exponential order. i.e., 2^n. 因为有两组参数,所以等于要 try n*n=n^2 次。 这个过程是不连续的成长,所以可以想成我们在一个 X-Y 平面上指定的范围内找一群格子点 ( grid
,
如果你不太明白,想成方格纸或我们把平面上所有 整数交点都打个点,就是那样 ) ,每个格子点的 X 跟 Y 经过换算 ( 如 2^x, 2^y)
就拿去当 cost 跟 gamma 的值来 cross validation 。 Because we have two important
arguments, we have to try n*n=n^2 times. The whole process is
discontinous and can be thought of as finding the grid
points
on a specified region (range) of the X-Y plane (Think of marking all
integer interception points on a paper). Convert each grid point's X
and Y coordinate to exponential values (like 2^x, 2^y) then we can use
them as value of cost and gamme for cross validation. 所
以现在你应该懂得 libsvm 的 python 子目录下面 有个 grid.py 是做啥的了: 它把上面的过程自动化, 在你给定的范围内呼叫
svm-train 去 try 所有的参数值。 python 是一种语言,在这里我不做介绍,因为我会了 :P (just a joke
,真正原因是 -- 这是 libsvm 的 tutorial) 。 grid.py 还会把结果 plot 出来,方便你寻找参数。 libsvm
有很多跟 python 结合的部份,由此可见 python 是强大方便的工具。很多神奇的功能,像自动登入多台 机器去平行跑 grid 等等都是
python 帮忙的。不过 SVM 本身可以完全不需要 python ,只是会比较方便。 So look for 'grid.py' in
the 'python' subdirectory inside the libsvm archive. You should know
what it does now: automatically execute the procedure above, try all
argument values by calling svm-train within the region specified by
you. Python is a programming language which I'm not going to explain
here. grid.py will also plot the result graphically to help you look
for good arguments. There're also many parts of libsvm powered by
python, like logging into several hosts and running grids at the same
time parallel. Keep in mind that libsvm can be used without python
entirely. Python just only helped us to do thinks quickly. 跑 grid ( 基本上用 grid.py 跑当然是最方便,不过 如果你不懂 python 而且觉得很难搞,那你要自己产生 参数来跑也是可以的 ) 通常好的范围是 [c,g]=[2^-10,2^10]*[2^-10,2^10]
另外其实 grid 用 [-8,8]
也很够了。 Running for grids (it's more convenient to just use grid.py but it's also ok if you don't) you may choose the range as [c,g]=[2^-10,2^10]*[2^-10,2^10]
Usually [-8,8] is enough for grids. Regression
另一个值得一提的是 regression 。 简
单来说,前面都是拿 SVM 来做分类 (classification), 所以 label 的值都是 discrete data
、或说已知的固定值。 而 regression 则是求 continuous 的值、或说未知的值。你也可以说,一般是 binary
classification, 而 regression 是可以预测一个实数。 比
如说我知道股市指数受到某些因素影响 , 然后我想预测股市 .. 股市的指数就是我们的 label, 那些因素量化以后变成 attributes
。 以后搜集那些 attributes 给 SVM 它就会 预测出指数 ( 可能是没出现过的数字 ) ,这就要用 regression 。
那乐透开奖的号码呢? 因为都是固定已知的数字, 很明显我们应该用一般 SVM 的 classification 来 predict 。 (
注:这是真实的例子 -- llwang 就写过这样的东西 ) 所以说 label 也要 scale, 用 svm-scale -y lower upper
但是比较糟糕的情况是 grid.py 不支持 regression , 而且 cross validation 对 regression 也常常不是很有效。 总而言之, regression 是非常有趣的东西,不过也是比较 进阶的用法。 在这里我们不细谈了,有兴趣的人请再 参考 SVM 与 libsvm 的其它文件。 The other important issue is "Regression". To
explain briefly, we only used SVM to do classification in this
tutorial. The type of label we used are always discrete data (ie. a
known fixed value). "Regression" in this context means to predict
labels with continuous values (or unknown values). You can think of
classification as predictions with only binary outcomes, and regression
as predictions that output real (floating point) numbers. Thus
to predict lottery numbers (since they are always fixed numbers) you
should use classification, and to predict the stock market you need
regression. The labels must also be scaled when you use regression, by svm-scale -y lower upper
However grid.py does not support regression, and cross validation sometimes does not work well with regression. Regression is interesting but also advanced. Please refer to other documents for details. Epilogue
到此我已经简单的说明了 libsvm 的使用方式, 更完整的用法请参考 libsvm 的说明跟 cjlin 的网站
、
SVM 的相关文件,或是去上 cjlin 的课。 Here we have already briefly explained the
libsvm software. For complete usage guides please refer to documents
inside the libsvm archive, cjlin's website
, SVM-related documents, or go take cjlin's course if you are a student at National Taiwan University :) 对于 SVM 的新手来说, libsvmtools
有很多好东西。像 SVM for dummies 就是很方便观察 libsvm 流程的东西。 Take a glance at libsvmtools
especially "SVM for dummies" there. Those are good tools for SVM newbies that helps in observing libsvm workflow. Copyright
All rights reserved by Hung-Te Lin ( 林弘德, piaip),
, All HTML/text typed within VIM on Solaris. LibSvm 使用说明 学习心得
Hung-Te Lin
Fri Apr 18 15:04:53 CST 2003
$Id: svm_tutorial.html,v 1.12 2005/10/26 06:12:40 piaip Exp piaip $ 原作:林弘德,转载请保留原出处
底下我试着在不用看那个 slide 的情况 解释及使用 libsvm 。 To get yourself more familiar with
SVM, you may refer to the slides cjlin used in his Data Mining course :
pdf
or ps
.
I'm going to try to explain and use libSVM without those slides.
[label]
[index1]:[value1] [index2]:[value2] ...
.
.
2 1:5 2:8
同理,空间中的三维坐标就等于有三组 attribute 。 Maybe it's confusing to you: why value,
value2, ...? The reason is usually the input data to the problem you
were trying to solve involves lots of 'features', or say 'attributes',
so the input will be a set (or say vector/array). Take the Marking points and find region
example described above, we assumed each point has coordinates X and Y
so it has two attributes (X and Y). To describe two points (0,3) and
(5,8) as having labels(classes) 1 and 2, we will write them as: 1 1:0
2:3
2 1:5 2:8
And 3-dimensional points will have 3 attributes.
optimization finished, #iter = 219
nu = 0.431030
obj = -100.877286, rho = 0.424632
nSV = 132, nBSV = 107
Total nSV = 132
Accuracy = 86.6667% (234/270) (classification)
Mean squared error = 0.533333 (regression)
Squared correlation coefficient = 0.532639(regression)
svmscale 会对每个 attribute 做 scale 。 范围用 -l, -u 指定, 通常是 [0,1] 或是 [-1,1] 。 输出在 stdout 。
另外要注意的 ( 常常会忘记 ) 是 testing data 和 training data 要一起 scale 。
而 svm-scale 最难用的地方就是没办法指定 testing data/training data( 不同档案 ) 然后一起 scale 。
svmscale rescales all atrributes with the specified (by -l, -u
) range, usually [0,1] or [-1,1].
Please keep in mind that testing data and training data MUST BE SCALED
WITH THE SAME RANGE. Don't forget to scale your testing data before you
predict.
We can't specify the testing and training data file together and scale
them in one command, that's why svm-scale is not so easy to use right
now.
举例来说,改个 c=10:
./svm-train -c 10 heart_scale
再来 predict ,正确率马上变成 92.2% (249/270) 。
Let's use c=10 as an example:
./svm-train -c 10 heart_scale
If you predict again now, the accuracy will be 92.2% (249/270).
-v n: n-fold cross validation
n 就是要拆成几组,像 n=3 就会拆成三组,然后先拿 1 跟 2 来 train model 并 predict 3 以得到正确率; 再来拿
2 跟 3 train 并 predict 1 ,最后 1,3 train 并 predict 2 。其它以此类推。 In the
process of experimenting with the arguments, we can use the built-in
support for validation of svmtrain:
-v n: n-fold cross validation
n is how many sets to split your input data. Specifing n=3 will split
data into 3 sets; train the model with data set 1 and 2 first then
predict data set 3 to get the accuracy, then train with data set 2 and
3 and predict data set 1, finally train 1,3 and predict 2, ... ad
infinitum.
...
Cross Validation Accuracy = 80.3704% 平均之后才只有 80.37% ,比一开始的 86 还差。 If we
don't use cross validation, sometimes we may be fooled by some
arguments only good for some special input. Like the example we used
above, c=10 has 92.2%. If we do so with -v 5: ./svm-train -v 5 -c 10 heart_scale
...
Cross Validation Accuracy = 80.3704% After the prediction results is
averaged with cross validation we have only 80.37% accuracy, even worse
than with the original argument (86%).
是的,别怀疑,就是 Try 参数找比较好的值。
Yes. Just by trial and error.
Website: piaip at ntu csie
, 2003.
Style sheet from W3C Core StyleSheets.
发表评论
-
LibSvm结论参数
2011-11-23 08:37 3554本文转自Bluenight在《Libsvm 使 用 ... -
LIBSVM简介及其使用方法
2011-11-19 21:53 58035LIBSVM简介及其使用方法(台湾大学林智仁(Lin Chih ... -
libsvm与python的使用
2011-11-19 21:35 2909libsvm与python的使用 ... -
libsvm-2.91中python接口的使用方法
2011-11-19 21:31 4236libsvm-2.91中python接口的 ... -
LibSvm python
2011-11-19 21:29 1812LibSvm python 调试实验 ... -
libsvm的使用
2011-11-18 22:00 1620http://hi.baidu.com/sjk2412/blo ... -
libsvm支持向量机C-SVM和NU-Svm的区别
2011-11-17 14:49 15751c-svc和 nu-svc本质差不多 c-svc中c的范围是 ... -
Weka LibSVM (WLSVM)
2011-11-16 20:09 2388Weka LibSVM (WLSVM): Integrat ... -
LibSvm---API
2011-11-16 14:35 2345http://java-ml.sourceforge.net/ ... -
LIBSVM使用心得(JAVA)
2011-11-16 14:33 4662libsvm 是著名的SVM开源组件,目前有JAVA.C/C ... -
LIBSVM使用心得
2011-11-16 14:31 4150首先下载Libsvm、Python和Gnuplot: l ... -
LIBSVM做回归预测
2011-11-16 14:26 2869LIBSVM做回归预测--终于弄通(原创) (2009- ...
相关推荐
《深入理解libsvm-mat-2.89-3与LibSvm:实践与学习心得》 LibSVM,全称“Library for Support Vector Machines”,是由台湾大学的Chih-Chung Chang和Chih-Jen Lin开发的一个开源软件库,主要用于支持向量机(SVM)...
- "LibSvm 使用说明 学习心得.doc":这是一份详细的使用心得,可能包括作者在实践过程中的经验总结和技巧分享,对于初学者来说极具参考价值。 - "www.pudn.com.txt":可能是从PUDN网站下载的附加资料,可能包含更...
LibSVM,全称为“Library for Support Vector Machines”,是由台湾大学林智仁教授开发的一款开源软件,主要用于支持向量机(SVM)的学习与应用。它提供了多种编程语言的接口,包括Java、Python、MATLAB等。在这个...
在本主题中,我们将深入探讨如何使用C++和LIBSVM库来实现机器学习和样本分类。 1. **支持向量机(SVM)基本概念**: - SVM是一种基于结构风险最小化的分类模型,它的核心思想是找到一个超平面,使得两类样本之间的...
### LibSvm 使用过程心得 #### 一、LibSVM简介与重要性 LibSVM是一款广泛应用于机器学习领域的支持向量机(Support Vector Machine, SVM)软件库。它由台湾大学林智仁教授领导的研究团队开发维护。由于其高效且...
《libsvm学习心得》 libsvm,全称“Library for Support Vector Machines”,是由台湾大学的Chih-Chung Chang和Chih-Jen Lin开发的一款开源工具,主要用于支持向量机(SVM)的学习与预测。这款强大的库在机器学习...
### libSVM 使用方法基于 Python ...通过上述步骤,我们可以有效地在 Python 环境下使用 libSVM 进行机器学习任务的开发。无论是对于初学者还是有经验的开发者来说,这些步骤都是十分实用的指南。
**支持向量机(Support Vector ...综上所述,libsvm试验数据及参数说明涉及支持向量机的基本概念、libsvm库的使用,以及如何解读和调整模型参数。通过深入理解这些内容,我们可以有效地应用和支持向量机解决实际问题。
### LibSVM学习入门知识点详解 #### 一、LibSVM简介 - **定义与背景**:LibSVM是一款由台湾林智仁教授于2001年开发的支持向量机(Support Vector Machine, SVM)工具包。它以其高效运算速度、灵活的应用性和较少的...
这个压缩包“libsvm.zip”包含了LibSVM的详细说明和编程方法,是研究和使用LibSVM的重要资源。 LibSVM的核心在于支持向量机(Support Vector Machine,SVM),这是一种二分类模型,通过寻找最大边距超平面来划分...
在使用MATLAB学习SVM结合LIBSVM时,首先需要了解以下几个关键概念: 1. **核函数**:SVM的核心在于其核技巧,通过将原始数据映射到高维空间,使得在原始空间难以线性分离的数据在高维空间中变得可分。常见的核函数...
#### 三、LibSVM使用指南 ##### 1. 数据格式 LibSVM支持特定的数据格式,通常每条记录由标签和一组特征组成,特征以索引:值的形式表示。例如: ``` 1 1:0.452 2:0.234 3:0.0 -1 1:0.123 2:0.765 ``` 其中,第一...
支持向量机(SVM)matlab 代码,附有详细的使用说明及举例讲解。
官方文档中包含了详细的使用说明和示例代码,同时也有FAQ解答常见问题,帮助用户快速上手LibSVM。 #### 十、总结 LibSVM作为一款强大的支持向量机软件库,在许多领域都得到了广泛的应用。它不仅提供了高效稳定的...
libsvm的官方使用说明,就几页,都是实例,数据在文档指定地方可以下载到,看完保证能上手
在使用libsvm进行训练时,用户可以设定多种参数来控制学习过程,这些参数包括但不限于: - **svm_type**:SVM类型,如C_SVC(C-Support Vector Classification)、NU_SVC(\(\nu\)-Support Vector Classification)...
提供的" SVM使用教程 "包含了中英文两版的详细说明,对于初学者来说是非常宝贵的资源。中文版教程更符合国人的阅读习惯,帮助理解SVM的基本概念和LibSVM的使用方法;英文版则可以帮助读者接触原汁原味的学术语言,...
**libsvm 使用说明** **一、libsvm 概述** libsvm(Library for Support Vector Machines)是由台湾大学的林智仁教授开发的一款开源的支持向量机(SVM)库,适用于各种分类和回归任务。SVM是一种强大的机器学习...
matlab版本的libsvm的使用说明,包括具体的语句说明和参数解释,SVM必备!