背景
PolarDB 的云原生存算分离架构, 具备低廉的数据存储、高效扩展弹性、高速多机并行计算能力、高速数据搜索和处理; PolarDB与计算算法结合, 将实现双剑合璧, 推动业务数据的价值产出, 将数据变成生产力.
本文将介绍PolarDB结合madlib, 让PolarDB具备机器学习功能.
madlib库无疑是大而全的数据库机器学习库,
- Deep Learning
- Graph
- Model Selection
- Sampling
- Statistics
- Supervised Learning
- Time Series Analysis
- Unsupervised Learning
将madlib安装到PolarDB, 让PolarDB具备机器学习功能
这个例子直接在PolarDB容器中部署pgcat.
PolarDB部署请参考:
进入PolarDB环境
docker exec -it 67e1eed1b4b6 bash
下载madlib rpm
https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide
wget https://dist.apache.org/repos/dist/release/madlib/1.20.0/apache-madlib-1.20.0-CentOS7.rpm
安装madlib
sudo rpm -ivh apache-madlib-1.20.0-CentOS7.rpm
加载madlib到PolarDB数据库对应DB中(非extension管理)
/usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install
or
/usr/local/madlib/bin/madpack -s madlib -p postgres install
测试madlib安装正确性
/usr/local/madlib/bin/madpack -s madlib -p postgres install-check
使用madlib
[postgres@67e1eed1b4b6 ~]$ psql -h 127.0.0.1
psql (11.9)
Type "help" for help.
postgres=# set search_path =madlib, "$user", public;
SET
postgres=# \dT+
List of data types
Schema | Name | Internal name | Size | Elements | Owner | Access privileges | Description
--------+-----------------------------------+-----------------------------------+-------+----------+----------+-------------------+-------------
madlib | args_and_value_double | args_and_value_double | tuple | | postgres | |
madlib | __arima_lm_result | __arima_lm_result | tuple | | postgres | |
madlib | __arima_lm_stat_result | __arima_lm_stat_result | tuple | | postgres | |
madlib | __arima_lm_sum_result | __arima_lm_sum_result | tuple | | postgres | |
madlib | assoc_rules_results | assoc_rules_results | tuple | | postgres | |
madlib | bytea8 | bytea8 | var | | postgres | |
madlib | _cat_levels_type | _cat_levels_type | tuple | | postgres | |
madlib | chi2_test_result | chi2_test_result | tuple | | postgres | |
madlib | closest_column_result | closest_column_result | tuple | | postgres | |
madlib | closest_columns_result | closest_columns_result | tuple | | postgres | |
madlib | __clustered_agg_result | __clustered_agg_result | tuple | | postgres | |
madlib | __clustered_lin_result | __clustered_lin_result | tuple | | postgres | |
madlib | __clustered_log_result | __clustered_log_result | tuple | | postgres | |
madlib | __clustered_mlog_result | __clustered_mlog_result | tuple | | postgres | |
madlib | complex | complex | tuple | | postgres | |
madlib | __coxph_a_b_result | __coxph_a_b_result | tuple | | postgres | |
madlib | __coxph_cl_var_result | __coxph_cl_var_result | tuple | | postgres | |
madlib | coxph_result | coxph_result | tuple | | postgres | |
madlib | coxph_step_result | coxph_step_result | tuple | | postgres | |
madlib | cox_prop_hazards_result | cox_prop_hazards_result | tuple | | postgres | |
madlib | __cox_resid_stat_result | __cox_resid_stat_result | tuple | | postgres | |
madlib | __dbscan_edge | __dbscan_edge | tuple | | postgres | |
madlib | __dbscan_losses | __dbscan_losses | tuple | | postgres | |
madlib | __dbscan_record | __dbscan_record | tuple | | postgres | |
madlib | dense_linear_solver_result | dense_linear_solver_result | tuple | | postgres | |
madlib | __elastic_net_result | __elastic_net_result | tuple | | postgres | |
madlib | _flattened_tree | _flattened_tree | tuple | | postgres | |
madlib | f_test_result | f_test_result | tuple | | postgres | |
madlib | __glm_result_type | __glm_result_type | tuple | | postgres | |
madlib | _grp_state_type | _grp_state_type | tuple | | postgres | |
madlib | heteroskedasticity_test_result | heteroskedasticity_test_result | tuple | | postgres | |
madlib | kmeans_result | kmeans_result | tuple | | postgres | |
madlib | kmeans_state | kmeans_state | tuple | | postgres | |
madlib | ks_test_result | ks_test_result | tuple | | postgres | |
madlib | lda_result | lda_result | tuple | | postgres | |
madlib | lincrf_result | lincrf_result | tuple | | postgres | |
madlib | linear_svm_result | linear_svm_result | tuple | | postgres | |
madlib | linregr_result | linregr_result | tuple | | postgres | |
madlib | lmf_result | lmf_result | tuple | | postgres | |
madlib | __logregr_result | __logregr_result | tuple | | postgres | |
madlib | marginal_logregr_result | marginal_logregr_result | tuple | | postgres | |
madlib | marginal_mlogregr_result | marginal_mlogregr_result | tuple | | postgres | |
madlib | margins_result | margins_result | tuple | | postgres | |
madlib | matrix_result | matrix_result | tuple | | postgres | |
madlib | __mlogregr_cat_coef | __mlogregr_cat_coef | tuple | | postgres | |
madlib | mlogregr_result | mlogregr_result | tuple | | postgres | |
madlib | mlogregr_summary_result | mlogregr_summary_result | tuple | | postgres | |
madlib | mlp_result | mlp_result | tuple | | postgres | |
madlib | __multinom_result_type | __multinom_result_type | tuple | | postgres | |
madlib | mw_test_result | mw_test_result | tuple | | postgres | |
madlib | one_way_anova_result | one_way_anova_result | tuple | | postgres | |
madlib | __ordinal_result_type | __ordinal_result_type | tuple | | postgres | |
madlib | path_match_result | path_match_result | tuple | | postgres | |
madlib | _pivotalr_lda_model | _pivotalr_lda_model | tuple | | postgres | |
madlib | _prune_result_type | _prune_result_type | tuple | | postgres | |
madlib | __rb_coxph_hs_result | __rb_coxph_hs_result | tuple | | postgres | |
madlib | __rb_coxph_result | __rb_coxph_result | tuple | | postgres | |
madlib | residual_norm_result | residual_norm_result | tuple | | postgres | |
madlib | robust_linregr_result | robust_linregr_result | tuple | | postgres | |
madlib | robust_logregr_result | robust_logregr_result | tuple | | postgres | |
madlib | robust_mlogregr_result | robust_mlogregr_result | tuple | | postgres | |
madlib | sparse_linear_solver_result | sparse_linear_solver_result | tuple | | postgres | |
madlib | summary_result | summary_result | tuple | | postgres | |
madlib | __svd_bidiagonal_matrix_result | __svd_bidiagonal_matrix_result | tuple | | postgres | |
madlib | __svd_lanczos_result | __svd_lanczos_result | tuple | | postgres | |
madlib | __svd_vec_mat_mult_result | __svd_vec_mat_mult_result | tuple | | postgres | |
madlib | svec | svec | var | | postgres | |
madlib | _tree_result_type | _tree_result_type | tuple | | postgres | |
madlib | t_test_result | t_test_result | tuple | | postgres | |
madlib | __utils_scales | __utils_scales | tuple | | postgres | |
madlib | wsr_test_result | wsr_test_result | tuple | | postgres | |
madlib | xgb_gridsearch_train_results_type | xgb_gridsearch_train_results_type | tuple | | postgres | |
public | vector | vector | var | | postgres | |
(73 rows)
postgres=# \do+
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
--------+------+--------------------+--------------------+------------------+-------------------------------+-------------
madlib | %*% | double precision[] | double precision[] | double precision | madlib.svec_dot |
madlib | %*% | double precision[] | svec | double precision | madlib.svec_dot |
madlib | %*% | svec | double precision[] | double precision | madlib.svec_dot |
madlib | %*% | svec | svec | double precision | madlib.svec_dot |
madlib | * | double precision[] | double precision[] | svec | float8arr_mult_float8arr |
madlib | * | double precision[] | svec | svec | float8arr_mult_svec |
madlib | * | svec | double precision[] | svec | svec_mult_float8arr |
madlib | * | svec | svec | svec | svec_mult |
madlib | *|| | integer | svec | svec | svec_concat_replicate |
madlib | + | double precision[] | double precision[] | svec | float8arr_plus_float8arr |
madlib | + | double precision[] | svec | svec | float8arr_plus_svec |
madlib | + | svec | double precision[] | svec | svec_plus_float8arr |
madlib | + | svec | svec | svec | svec_plus |
madlib | - | double precision[] | double precision[] | svec | float8arr_minus_float8arr |
madlib | - | double precision[] | svec | svec | float8arr_minus_svec |
madlib | - | svec | double precision[] | svec | svec_minus_float8arr |
madlib | - | svec | svec | svec | svec_minus |
madlib | / | double precision[] | double precision[] | svec | float8arr_div_float8arr |
madlib | / | double precision[] | svec | svec | float8arr_div_svec |
madlib | / | svec | double precision[] | svec | svec_div_float8arr |
madlib | / | svec | svec | svec | svec_div |
madlib | < | svec | svec | boolean | svec_lt |
madlib | <= | svec | svec | boolean | svec_le |
madlib | <> | svec | svec | boolean | svec_ne |
madlib | = | svec | svec | boolean | svec_eq |
madlib | == | svec | svec | boolean | svec_eq |
madlib | > | svec | svec | boolean | svec_gt |
madlib | >= | svec | svec | boolean | svec_ge |
madlib | ^ | svec | svec | svec | svec_pow |
madlib | || | svec | svec | svec | svec_concat |
public | + | vector | vector | vector | vector_add |
public | - | vector | vector | vector | vector_sub |
public | < | vector | vector | boolean | vector_lt |
public | <#> | vector | vector | double precision | vector_negative_inner_product |
public | <-> | vector | vector | double precision | l2_distance |
public | <= | vector | vector | boolean | vector_le |
public | <=> | vector | vector | double precision | cosine_distance |
public | <> | vector | vector | boolean | vector_ne |
public | = | vector | vector | boolean | vector_eq |
public | > | vector | vector | boolean | vector_gt |
public | >= | vector | vector | boolean | vector_ge |
(41 rows)
参考
https://madlib.apache.org/docs/latest/index.html
https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide