mahout0.7 使用 JDBCDataModel

本文涉及的产品
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

首先创建在mysql中创建库以及对应的表

1
2
3
4
5
6
7
8
9
10
mysql>  create  database  mahout;
Query OK, 1 row affected (0.00 sec)
mysql> use mahout;
Database  changed
mysql>  create  table  intro(
     ->  uid  varchar (20)  not  null ,
     ->  iid  varchar (50)  not  null ,
     ->  val  varchar (50)  not  null ,
     ->   time  varchar (50)  default  null
     -> );

注意 在计算的时候会损耗大量资源 建议 添加索引 在my.ini当中设置各种调优参数

(这里只是为了实现功能)

插入数据 (这里就使用mahout in action 第一个推荐例子当中的数据 注意 要把里面的空行删除 不然会有不能为空的提示)


1
2
3
mysql>  load  data  local  infile  'D:/intro.csv'  replace  into  table  intro fields terminated  by  ','  lines terminated  by  '\n'  (@col1,@col2,@col3)  set  uid=@col1,iid=@col2,val=@col3;
Query OK, 21  rows  affected (0.19 sec)
Records: 21  Deleted: 0  Skipped: 0  Warnings: 0

查看一下数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mysql>  select  from  intro;
+ -----+-----+-----+------+
| uid | iid | val |  time  |
+ -----+-----+-----+------+
| 1   | 101 | 5.0 |  NULL  |
| 1   | 102 | 3.0 |  NULL  |
| 1   | 103 | 2.5 |  NULL  |
| 2   | 101 | 2.0 |  NULL  |
| 2   | 102 | 2.5 |  NULL  |
| 2   | 103 | 5.0 |  NULL  |
| 2   | 104 | 2.0 |  NULL  |
| 3   | 101 | 2.5 |  NULL  |
| 3   | 104 | 4.0 |  NULL  |
| 3   | 105 | 4.5 |  NULL  |
| 3   | 107 | 5.0 |  NULL  |
| 4   | 101 | 5.0 |  NULL  |
| 4   | 103 | 3.0 |  NULL  |
| 4   | 104 | 4.5 |  NULL  |
| 4   | 106 | 4.0 |  NULL  |
| 5   | 101 | 4.0 |  NULL  |
| 5   | 102 | 3.0 |  NULL  |
| 5   | 103 | 2.0 |  NULL  |
| 5   | 104 | 4.0 |  NULL  |
| 5   | 105 | 3.5 |  NULL  |
| 5   | 106 | 4.0 |  NULL  |
+ -----+-----+-----+------+
21  rows  in  set  (0.00 sec)


然后就是正式程序 写的比较简单主要是为了实现功能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import  java.util.List;
import  org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import  org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import  org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import  org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import  org.apache.mahout.cf.taste.model.DataModel;
import  org.apache.mahout.cf.taste.model.JDBCDataModel;
import  org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import  org.apache.mahout.cf.taste.recommender.RecommendedItem;
import  org.apache.mahout.cf.taste.recommender.Recommender;
import  org.apache.mahout.cf.taste.similarity.UserSimilarity;
import  com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
public  class  MysqlJDBCRecommender {
     public  static  void  main(String[] args)  throws  Exception {
         MysqlDataSource dataSource =  new  MysqlDataSource();
         dataSource.setServerName( "localhost" );
         dataSource.setUser( "root" );
         dataSource.setPassword( "toor" );
         dataSource.setDatabaseName( "mahout" );
                                                                         
         JDBCDataModel dataModel =  new  MySQLJDBCDataModel(dataSource,  "intro" "uid" "iid" "val" "time" );
                                                                         
         DataModel model = dataModel;
         UserSimilarity similarity= new  PearsonCorrelationSimilarity(model);
         UserNeighborhood neighborhood= new  NearestNUserNeighborhood( 2 ,similarity,model);
                                                                         
         Recommender recommender= new  GenericUserBasedRecommender(model,neighborhood,similarity);
                                                                         
         List<RecommendedItem> recommendations = recommender.recommend( 1 3 );
         for  (RecommendedItem recommendation : recommendations) {
             System.out.println(recommendation);
         }
     }
}

计算结果

1
2
3
4
5
6
7
8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0. 7 /slf4j-jcl- 1.6 . 1 .jar!/org/slf4j/impl/StaticLoggerBinder. class ]
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0. 7 /mahout-examples- 0.7 -job.jar!/org/slf4j/impl/StaticLoggerBinder. class ]
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0. 7 /slf4j-log4j12- 1.6 . 1 .jar!/org/slf4j/impl/StaticLoggerBinder. class ]
SLF4J: See http: //www.slf4j.org/codes.html#multiple_bindings for an explanation.
13 / 12 / 07  13 : 56 : 41  WARN jdbc.AbstractJDBCDataModel: You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.
RecommendedItem[item: 104 , value: 4.257081 ]
RecommendedItem[item: 106 , value: 4.0 ]


MySQLJDBCDataModel API中的建议

JDBCDataModel backed by a MySQL database and accessed via JDBC. It may work with other JDBC databases. By default, this class assumes that there is a DataSource available under the JNDI name "jdbc/taste", which gives access to a database with a "taste_preferences" table with the following schema:

user_id item_id preference
987 123 0.9
987 456 0.1
654 123 0.2
654 789 0.3

preference must have a type compatible with the Java float type. user_id and item_id should be compatible with long type (BIGINT). For example, the following command sets up a suitable table in MySQL, complete with primary key and indexes:

 CREATE TABLE taste_preferences (
   user_id BIGINT NOT NULL,
   item_id BIGINT NOT NULL,
   preference FLOAT NOT NULL,
   PRIMARY KEY (user_id, item_id),
   INDEX (user_id),
   INDEX (item_id)
 )
 

The table may optionally have a timestamp column whose type is compatible with Java long.

Performance Notes

See the notes in AbstractJDBCDataModel regarding using connection pooling. It's pretty vital to performance.

Some experimentation suggests that MySQL's InnoDB engine is faster than MyISAM for these kinds of applications. While MyISAM is the default and, I believe, generally considered the lighter-weight and faster of the two engines, my guess is the row-level locking of InnoDB helps here. Your mileage may vary.

Here are some key settings that can be tuned for MySQL, and suggested size for a data set of around 1 million elements:

  • innodb_buffer_pool_size=64M

  • myisam_sort_buffer_size=64M

  • query_cache_limit=64M

  • query_cache_min_res_unit=512K

  • query_cache_type=1

  • query_cache_size=64M

Also consider setting some parameters on the MySQL Connector/J driver:

 cachePreparedStatements = true
 cachePrepStmts = true
 cacheResultSetMetadata = true
 alwaysSendSetIsolation = false
 elideSetAutoCommits = true
 

Thanks to Amila Jayasooriya for contributing MySQL notes above as part of Google Summer of Code 2007.



本文转自    拖鞋崽      51CTO博客,原文链接:http://blog.51cto.com/1992mrwang/1337759

相关实践学习
基于CentOS快速搭建LAMP环境
本教程介绍如何搭建LAMP环境,其中LAMP分别代表Linux、Apache、MySQL和PHP。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
相关文章
|
5月前
|
机器学习/深度学习 分布式计算 算法
144 Mahout介绍
144 Mahout介绍
20 0
|
9月前
|
机器学习/深度学习 存储 分布式计算
Hadoop生态系统中的机器学习与数据挖掘技术:Apache Mahout和Apache Spark MLlib的应用
Hadoop生态系统中的机器学习与数据挖掘技术:Apache Mahout和Apache Spark MLlib的应用
|
分布式计算 搜索推荐 Hadoop
|
分布式计算 算法 Hadoop
部署Mahout
一、Mahout简介 Mahout 是 Apache Software Foundation(ASF) 旗下的一个开源项目,提供一些可扩展的机器学习领域经典算法的实现,旨在帮助开发人员更加方便快捷地创建智能应用程序。
1187 0
|
算法 数据挖掘 分布式计算
|
机器学习/深度学习 搜索推荐 算法
|
算法 机器学习/深度学习 数据挖掘