首先创建在mysql中创建库以及对应的表
1
2
3
4
5
6
7
8
9
10
|
mysql>
create
database
mahout;
Query OK, 1 row affected (0.00 sec)
mysql> use mahout;
Database
changed
mysql>
create
table
intro(
-> uid
varchar
(20)
not
null
,
-> iid
varchar
(50)
not
null
,
-> val
varchar
(50)
not
null
,
->
time
varchar
(50)
default
null
-> );
|
注意 在计算的时候会损耗大量资源 建议 添加索引 在my.ini当中设置各种调优参数
(这里只是为了实现功能)
插入数据 (这里就使用mahout in action 第一个推荐例子当中的数据 注意 要把里面的空行删除 不然会有不能为空的提示)
1
2
3
|
mysql>
load
data
local
infile
'D:/intro.csv'
replace
into
table
intro fields terminated
by
','
lines terminated
by
'\n'
(@col1,@col2,@col3)
set
uid=@col1,iid=@col2,val=@col3;
Query OK, 21
rows
affected (0.19 sec)
Records: 21 Deleted: 0 Skipped: 0 Warnings: 0
|
查看一下数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
mysql>
select
*
from
intro;
+
-----+-----+-----+------+
| uid | iid | val |
time
|
+
-----+-----+-----+------+
| 1 | 101 | 5.0 |
NULL
|
| 1 | 102 | 3.0 |
NULL
|
| 1 | 103 | 2.5 |
NULL
|
| 2 | 101 | 2.0 |
NULL
|
| 2 | 102 | 2.5 |
NULL
|
| 2 | 103 | 5.0 |
NULL
|
| 2 | 104 | 2.0 |
NULL
|
| 3 | 101 | 2.5 |
NULL
|
| 3 | 104 | 4.0 |
NULL
|
| 3 | 105 | 4.5 |
NULL
|
| 3 | 107 | 5.0 |
NULL
|
| 4 | 101 | 5.0 |
NULL
|
| 4 | 103 | 3.0 |
NULL
|
| 4 | 104 | 4.5 |
NULL
|
| 4 | 106 | 4.0 |
NULL
|
| 5 | 101 | 4.0 |
NULL
|
| 5 | 102 | 3.0 |
NULL
|
| 5 | 103 | 2.0 |
NULL
|
| 5 | 104 | 4.0 |
NULL
|
| 5 | 105 | 3.5 |
NULL
|
| 5 | 106 | 4.0 |
NULL
|
+
-----+-----+-----+------+
21
rows
in
set
(0.00 sec)
|
然后就是正式程序 写的比较简单主要是为了实现功能
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
import
java.util.List;
import
org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import
org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import
org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import
org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import
org.apache.mahout.cf.taste.model.DataModel;
import
org.apache.mahout.cf.taste.model.JDBCDataModel;
import
org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import
org.apache.mahout.cf.taste.recommender.RecommendedItem;
import
org.apache.mahout.cf.taste.recommender.Recommender;
import
org.apache.mahout.cf.taste.similarity.UserSimilarity;
import
com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
public
class
MysqlJDBCRecommender {
public
static
void
main(String[] args)
throws
Exception {
MysqlDataSource dataSource =
new
MysqlDataSource();
dataSource.setServerName(
"localhost"
);
dataSource.setUser(
"root"
);
dataSource.setPassword(
"toor"
);
dataSource.setDatabaseName(
"mahout"
);
JDBCDataModel dataModel =
new
MySQLJDBCDataModel(dataSource,
"intro"
,
"uid"
,
"iid"
,
"val"
,
"time"
);
DataModel model = dataModel;
UserSimilarity similarity=
new
PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood=
new
NearestNUserNeighborhood(
2
,similarity,model);
Recommender recommender=
new
GenericUserBasedRecommender(model,neighborhood,similarity);
List<RecommendedItem> recommendations = recommender.recommend(
1
,
3
);
for
(RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
|
计算结果
1
2
3
4
5
6
7
8
|
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0.
7
/slf4j-jcl-
1.6
.
1
.jar!/org/slf4j/impl/StaticLoggerBinder.
class
]
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0.
7
/mahout-examples-
0.7
-job.jar!/org/slf4j/impl/StaticLoggerBinder.
class
]
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0.
7
/slf4j-log4j12-
1.6
.
1
.jar!/org/slf4j/impl/StaticLoggerBinder.
class
]
SLF4J: See http:
//www.slf4j.org/codes.html#multiple_bindings for an explanation.
13
/
12
/
07
13
:
56
:
41
WARN jdbc.AbstractJDBCDataModel: You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.
RecommendedItem[item:
104
, value:
4.257081
]
RecommendedItem[item:
106
, value:
4.0
]
|
MySQLJDBCDataModel API中的建议
A JDBCDataModel backed by a MySQL database and accessed via JDBC. It may work with other JDBC databases. By default, this class assumes that there is a DataSource available under the JNDI name "jdbc/taste", which gives access to a database with a "taste_preferences" table with the following schema:
user_id | item_id | preference |
---|---|---|
987 | 123 | 0.9 |
987 | 456 | 0.1 |
654 | 123 | 0.2 |
654 | 789 | 0.3 |
preference
must have a type compatible with the Java float
type. user_id
and item_id
should be compatible with long type (BIGINT). For example, the following command sets up a suitable table in MySQL, complete with primary key and indexes:
CREATE TABLE taste_preferences ( user_id BIGINT NOT NULL, item_id BIGINT NOT NULL, preference FLOAT NOT NULL, PRIMARY KEY (user_id, item_id), INDEX (user_id), INDEX (item_id) )
The table may optionally have a timestamp
column whose type is compatible with Java long
.
Performance Notes
See the notes in AbstractJDBCDataModel regarding using connection pooling. It's pretty vital to performance.
Some experimentation suggests that MySQL's InnoDB engine is faster than MyISAM for these kinds of applications. While MyISAM is the default and, I believe, generally considered the lighter-weight and faster of the two engines, my guess is the row-level locking of InnoDB helps here. Your mileage may vary.
Here are some key settings that can be tuned for MySQL, and suggested size for a data set of around 1 million elements:
-
innodb_buffer_pool_size=64M
-
myisam_sort_buffer_size=64M
-
query_cache_limit=64M
-
query_cache_min_res_unit=512K
-
query_cache_type=1
-
query_cache_size=64M
Also consider setting some parameters on the MySQL Connector/J driver:
cachePreparedStatements = true cachePrepStmts = true cacheResultSetMetadata = true alwaysSendSetIsolation = false elideSetAutoCommits = true
Thanks to Amila Jayasooriya for contributing MySQL notes above as part of Google Summer of Code 2007.