libsvm
Status Report
Work on pylibsvm is picking up speed. I've figured out SVM regression (Smola's tutorial was very helpful), so I'm ready to add support for this.
I've also been looking at multi-label classification (seems the text classification folks use this). I ran into an interesting problem here -- libsvm 2.82's svm-train.c doesn't seem to be able to parse the multi-label examples provided by the libsvm authors. In fact, when reading these files, svm-train goes into an infinite loop. This would explain the problems this libsvm user reported when trying to use the rcv1v2 dataset. I've contacted Chih-Jen Lin about this issue. This problem could have been avoided if the svm-train code checked the return value from fscanf. Building with a recent version of GCC actually warns about this (very cool!).
Implementing support for multi-label classification might have implications for how precomputed kernels work (before SoC started I concentrated on binary classification), so this requires some more investigation.
I've also been tinkering with the API a bit. My original code provided a thin wrapper around libsvm's data structures (svm_node, svm_problem and svm_model). I think I'm going to modify the API slightly to look more like the stuff already in the SciPy sandbox for General Linear Models. Properly decoupling the pylibsvm objects from the libsvm data structures (wrapped with ctypes) will allow someone to implement a pure-Python backend (mostly model training stuff) at some stage and will allow me to get rid of a few of the messier ctypes bits in the current code.
Status Report
I spent Friday working on a super-duper top secret ultra-spiffy project related to NumPy with Stefan van der Walt (hopefully we'll have something very cool to show for it in a week or two). On the pylibsvm front, I've been doing some more research into SVM regression. I've added some details on the libsvm internals and heap fragmentation issues to the overview page.
Hello, World
My name is Albert Strasheim and I'll be working on adding Support Vector Machines to SciPy this summer (actually, winter here in South Africa).
I'm pretty new to the field of SVMs, but I've been using them for speaker verification for about 6 months. I recently took part in the NIST Speaker Recognition Evaluation during which I started development of the code that will be added to SciPy.
