- 浏览: 97405 次
- 性别:
- 来自: 上海
最新评论
-
jyjsjd:
请教博主这个ChineseTokenizer()你是怎么写的, ...
使用WVTool进行文本分类 -
superclay:
能不能发个indexwriter indexsearch ...
结合ehcache缓存对lucene使用单例模式搜索 -
strayly:
我采用和ehcache缓存结合使用单例模式
使用ehcache ...
lucene搜索优化(转)
文章列表
前端app(PHP) 与 后端 python通信
python端
import socket
import subprocess
socket = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
socket.bind(('127.0.0.1',10000))
socket.listen(10)
while True:
connection,address= socket.accept()
str=connection.recv(1024)
if(str):
print str
connection.send( ...
#!/usr/bin/env python
# coding=utf-8
import sys
import jieba
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing impo ...
Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with ...
$terms = array();
$prefix = 'Z';
for ($termi = $doc->termlist_begin(); !$termi->equals($doc->termlist_end()); $termi->next()) {
$term = array(
'wdf' => $termi->get_wdf(),
'freq' => $termi->get_termfreq(),
'name' =&g ...
TF-IDF计算方法
- 博客分类:
- 数据挖掘
例1
有很多不同的数学公式可以用来计算TF-IDF。这边的例子以上述的数学公式来计算。词频 (TF) 是一词语出现的次数除以该文件的总词语数。假如一篇文件的总词语数是100个,而词语“母牛”出现了3次,那么“母牛”一词在该文 ...
<?php
Class ContentSim {
var $analysis;
var $word_limit = 20;
public function __construct(){
$this->analysis = scws_new();
$this->analysis->set_charset('utf8');
}
public function get_tfidf($txt){
$word_ret = array();
$this->analysis->send_text( ...
Crab是基于Python开发的开源推荐软件,其中实现有item和user的协同过滤
Features:
Recommender Algorithms: User-Based Filtering and Item-Based Filtering
Work in progress: Slope One, SVD, Evaluation of Recommenders.
Planed: Sparse Matrices, REST API’s.
http://muricoca.github.io/crab/
http://www.ryanzhang.info/archives/2594
基于内容的推荐系统的核心思想是:推荐给用户 x 那些与 x 给出高评价的物品近似的物品。
具体方法为:
为物品简历“档案” item profiles
根据用户对物品的打分建立用户“档案” user profiles
推荐 ...
# coding=utf-8
import re
import pandas as pd
import string
import MySQLdb
import jieba
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cross_validation import train_test_split
from sklearn.metrics impor ...
pickle.dumps参数
- 博客分类:
- python
pickle.dumps(obj[,protocol])¶
Return the pickled representation of the object as a string, instead of writing it to a file.
If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value orHIGHEST_PROTOCOL, the highest protocol version will be used.
Changed in ...
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import edu.udo.cs.wvtool.main.WVTWordVector;
public class KNN {
private int k = 15;
private int cLimit = 1;
private double[] ClassSim = null;
...
package text_category;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.io.Reader;
import java.io.StringReader;
import com.xjt.nlp.word.ICTCLAS;
import edu.udo.cs.wvtool.generic.tokenizer.WVTTokenizer;
impo ...
主要是由于安装的是vim.tiny版本,不是vim完整版。
两种方法:
方法一:
1.直接在终端输入 sudo apt-get remove vim-common
2.直接在终端输入 sudo apt-get install vim ,按步操作安装好vim即可。
方法二:
2. 修改/etc/vim/vimrc.tiny 文件
将set compatible 设置成set nocompatible ,
添加 set backspace=2
这是因为系统会默认vim兼容vi,所以使用vi的命令。
安装apache
sudo apt-get install apache2
安装mysql
sudo apt-get install mysql-server
sudo apt-get install python-mysqldb
安装wsgi
sudo apt-get install libapache2-mod-wsgi
安装django
sudo apt-get install python-setuptools
sudo easy_install django
在/var/www下建个项目
cd /var/www
django-admin.py startproject mysite
...
mysql 随机查出优化
- 博客分类:
- 数据库
SELECT id
FROM dede_archives t1
JOIN (
SELECT ROUND( RAND( ) * (
SELECT MAX( id )
FROM dede_archives ) ) AS nid
FROM dede_archives
LIMIT 1
)t2 ON t1.id = t2.nid