java 数据分析 数据可视化 大数据
2021年01月
2020年11月
2020年09月
Spark读取HBase数据分析 并将结果存入HBase分析结果表 hbase只是数据
一.准备阶段
1.准备2套能正常运行的hbase集群(new cluster:222|oldcluster:226)
2.2套集群的hosts文件内容都需要包含对方的主机地址
3.zookeeper可以单独部署2个集群,也可用一个zookeeper集群管理2套hbase集群,就是不能用hbase自带的zookeeper集群做管理
4.hadoop、hbase等组件版本号保持一致
二.配置阶段
1.修改222集群的hbase-site.xml文件
添加如下内容:
hbase.replication
true
2.在222集群上添加peer
./hbase shell add_peer'1','172.16.205.226:2181:/hbase' 执行该命令会报错,但是不影响执行结果,如果不想让其有报错提示,可进入zookeeper将peerid删除,再执行此命令就行了
3.启动复制 ./hbase shellstart_replication 执行该命令也会报错,不予理会。若想看看状态是否被开启,同样进入zookeeper查看state
4.创建表 在两套集群创建同样的表(结构需要完全一样)
5.在226上添加replication属性,并刷新其结构
disable 'your_table'
alter 'your_table', {NAME =>'family_name', REPLICATION_SCOPE => '1'}
enable 'your_table'
6.测试数据同步
在226上put一条数据进hbase
222上将能在随后被scan到
Phoenix
第一步:MAVEN配置
 Â
org.apache.spark
spark-core_2.11
1.6.0
org.apache.spark
spark-mllib_2.11
1.6.0
 org.apache.spark
 spark-sql_2.11
  1.6.0
org.scala-lang
scala-library
2.11.0
org.scala-lang
scala-compiler
2.11.0
org.scala-lang
scala-reflect
2.11.0
第二步:Spring配置
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
classpath:jdbc.properties
classpath:spark.properties
第三步:新增属性文件 spark.properties
spark.master=local
spark.url=jdbc:mysql://192.168.0.202:3306/spark?useUnicode=true&characterEncoding=UTF-8
spark.table=testtable
spark.username=root
spark.password=mysql
第四步:写代码
/**
*
*/
package com.harleycorp.service.impl;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import javax.annotation.Resource;
import org.apache.log4j.Logger;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.SaveMode;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import com.harleycorp.pojo.SparkUser;
import com.harleycorp.service.ISparkUpperService;
/**
* @author kevin
*
*/
@Service
public class SparkUpperServiceImpl implements ISparkUpperService {
private Logger logger =Logger.getLogger(SparkUpperServiceImpl.class);
@Value("${spark.master}")
public String master ; // = "local"
@Value("${spark.url}")
public String url ;//= "jdbc:mysql://192.168.0.202:3306/spark?useUnicode=true&characterEncoding=UTF-8";
@Value("${spark.table}")
public String table ; //= "testtable"
@Value("${spark.username}")
public String username ;// = "root";
//@Value("${spark.password}")
public String password = "mysql";
@Resource
public SQLContext sqlContext;
@Resource
public JavaSparkContext sc;
public Properties getConnectionProperties(){
Properties connectionProperties = new Properties();
connectionProperties.setProperty("dbtable",table);
connectionProperties.setProperty("user",username);//数据库用户
connectionProperties.setProperty("password",password); //数据库用户密码
return connectionProperties;
}
public String query() {
logger.info("=======================this url:"+this.url);
logger.info("=======================this table:"+this.table);
logger.info("=======================this master:"+this.master);
logger.info("=======================this username:"+this.username);
logger.info("=======================this password:"+this.password);
DataFrame df = null;
//以下数据库连接内容请使用实际配置地址代替
df = sqlContext.read().jdbc(url,table, getConnectionProperties());
df.registerTempTable(table);
String result = sqlContext.sql("select * from testtable").javaRDD().collect().toString();
logger.info("=====================spark mysql:"+result);
return result;
}
public String queryByCon(){
logger.info("=======================this url:"+this.url);
logger.info("=======================this table:"+this.table);
logger.info("=======================this master:"+this.master);
logger.info("=======================this username:"+this.username);
logger.info("=======================this password:"+this.password);
DataFrame df = sqlContext.read().jdbc(url, table, new String[]{"password=000000"}, getConnectionProperties());
String result = df.collectAsList().toString();
logger.info("=====================spark mysql:"+result);
return null;
}
public void add(){
List list = new ArrayList();
SparkUser us = new SparkUser();
us.setUsername("kevin");
us.setPassword("000000");
list.add(us);
SparkUser us2 = new SparkUser();
us2.setUsername("Lisa");
us2.setPassword("666666");
list.add(us2);
JavaRDD personsRDD = sc.parallelize(list);
DataFrame userDf = sqlContext.createDataFrame(personsRDD, SparkUser.class);
userDf.write().mode(SaveMode.Append).jdbc(url, table, getConnectionProperties());
}
}
第五步:junit调用
package com.harleycorp.testmybatis;
import javax.annotation.Resource;
import org.apache.log4j.Logger;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;
import com.harleycorp.service.ISparkUpperService;
@RunWith(SpringJUnit4ClassRunner.class) //表示继承了SpringJUnit4ClassRunner类
@ContextConfiguration(locations = {"classpath:spring-mybatis.xml"})
public class TestSpark {
private static Logger logger=Logger.getLogger(TestSpark.class);
@Resource
private ISparkUpperService sparkUpperService = null;
@Test
public void test1(){
sparkUpperService.query();
}
@Test
public void test2(){
sparkUpperService.add();
}
@Test
public void test3(){
sparkUpperService.queryByCon();
}
}
第六步:运行
权限呢
在哪里?配置
你应该说清楚 环境, 版本 ,问题 详细说清楚
springX
等待事件占总的call time的比率
云HBase团队为大家提供了一个github项目供大家参考使用上面的三种方式来开发Spark分析HBase的程序,项目地址; https://github.com/lw309637554/alicloud-hbase-spark-examples?spm=a2c4e.11153940.blogcont573569.14.320377b4U14MDa
依赖项:需要下载云HBase及云Phoenix的client包
分析HFILE:
需要先开通云HBase的HDFS访问权限,参考文档
在hbase shell中对表生成snapshot表“snapshot 'sourceTable', ‘snapshotName'”
在项目中配置自己的hdfs-sit.xml文件,然后通过直读HDFS的方式分析snapshot表
具体的example
RDD API对应:org.apache.spark.hbase.NativeRDDAnalyze
SQL API对应:org.apache.spark.sql.execution.datasources.hbase.SqlAnalyze
分析HFILE对应:org.apache.spark.hfile.SparkAnalyzeHFILE
没有设置好吧 ,看一下说明
一.移植Curl工具到Android环境步骤
1.修改cURL源码下的mk文件。源码下面的Android.mk文件最后生成的是静态库libcurl.a,做如下修改(编译成动态库)。
LOCAL_PRELINK_MODULE := false
LOCAL_MODULE:= libcurl
LOCAL_MODULE_TAGS := optional
ALL_PREBUILT += $(LOCAL_PATH)/NOTICE
$(LOCAL_PATH)/NOTICE: $(LOCAL_PATH)/COPYING | $(ACP)
$(copy-file-to-target)
include $(BUILD_SHARED_LIBRARY)
2.配置编译环境(cd 到Android.mk同一目录,直接在控制台输入下列代码或者把下面代码弄成sh脚本执行)红色部分根据自己源码情况
ANDROID_HOME=/home/zhoulc/android/ && \
NDK_HOME=/home/zhoulc/android/ndk && \
PATH="$ANDROID_HOME/prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin:$PATH" \
./configure --host=arm-linux CC=arm-eabi-gcc --with-random=/dev/urandom \
CPPFLAGS="-I$NDK_HOME/platforms/android-8/arch-arm/usr/include \
-I $ANDROID_HOME/external/curl/include/ \
-I $ANDROID_HOME/external/curl/3rd/include \
-I $ANDROID_HOME/external/curl \
-I $ANDROID_HOME/out/target/product/generic/obj/STATIC_LIBRARIES/libcurl_intermediates \
-I $ANDROID_HOME/dalvik/libnativehelper/include/nativehelper \
-I $ANDROID_HOME/system/core/include \
-I $ANDROID_HOME/hardware/libhardware/include \
-I $ANDROID_HOME/hardware/libhardware_legacy/include \
-I $ANDROID_HOME/hardware/ril/include \
-I $ANDROID_HOME/dalvik/libnativehelper/include \
-I $ANDROID_HOME/frameworks/base/include \
-I $ANDROID_HOME/frameworks/base/opengl/include \
-I $ANDROID_HOME/frameworks/base/native/include \
-I $ANDROID_HOME/external/skia/include \
-I $ANDROID_HOME/out/target/product/generic/obj/include \
-I $ANDROID_HOME/bionic/libc/arch-arm/include \
-I $ANDROID_HOME/bionic/libc/include \
-I $ANDROID_HOME/bionic/libstdc++/include \
-I $ANDROID_HOME/bionic/libc/kernel/common \
-I $ANDROID_HOME/bionic/libc/kernel/arch-arm \
-I $ANDROID_HOME/bionic/libm/include \
-I $ANDROID_HOME/bionic/libm/include/arch/arm \
-I $ANDROID_HOME/bionic/libthread_db/include \
-include $ANDROID_HOME/system/core/include/arch/linux-arm/AndroidConfig.h \
-I $ANDROID_HOME/system/core/include/arch/linux-arm/ \
-D__ARM_ARCH_5__ -D__ARM_ARCH_5T__ -D__ARM_ARCH_5E__ -D__ARM_ARCH_5TE__ -DANDROID -DNDEBUG -DNDEBUG -DHAVE_CONFIG_H" \
CFLAGS="-fno-exceptions -Wno-multichar -msoft-float -fpic -ffunction-sections \
-funwind-tables -fstack-protector -Wa,--noexecstack -Werror=format-security \
-fno-short-enums -march=armv5te -mtune=xscale -Wno-psabi -mthumb-interwork \
-fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith \
-Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point \
-g -Wstrict-aliasing=2 -finline-functions -fno-inline-functions-called-once \
-fgcse-after-reload -frerun-cse-after-loop -frename-registers -UDEBUG \
-mthumb -Os -fomit-frame-pointer -fno-strict-aliasing -finline-limit=64 \
-Wpointer-arith -Wwrite-strings -Wunused -Winline -Wnested-externs \
-Wmissing-declarations -Wmissing-prototypes -Wno-long-long -Wfloat-equal \
-Wno-multichar -Wsign-compare -Wno-format-nonliteral -Wendif-labels \
-Wstrict-prototypes -Wdeclaration-after-statement -Wno-system-headers" \
LIBS="-nostdlib -Bdynamic -Wl,-T,$ANDROID_HOME/build/core/armelf.x \
-Wl,-dynamic-linker,/system/bin/linker -Wl,--gc-sections -Wl,-z,nocopyreloc \
-L$ANDROID_HOME/out/target/product/generic/obj/lib -Wl,-z,noexecstack \
-Wl,-rpath-link=$ANDROID_HOME/out/target/product/generic/obj/lib \
-lc -llog -lcutils -lstdc++ \
-Wl,--no-undefined $ANDROID_HOME/prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/lib/gcc/arm-eabi/4.4.0/libgcc.a \
$ANDROID_HOME/out/target/product/generic/obj/lib/crtend_android.o \
-lm $ANDROID_HOME/out/target/product/generic/obj/lib/crtbegin_dynamic.o \
-L$ANDROID_HOME/external/curl/3rd/libs"
3.编译libcurl.so库
cd进入android/external/curl源码目录
mm-》编译生成libcurl.so库
4.编写测试case 以及Android.mk文件并生成可执行文件
新建一个测试案例curl_test.cpp
int main() {
CURL *curl;
CURLcode res;
curl_global_init(CURL_GLOBAL_ALL);
curl = curl_easy_init();
if (curl) {
curl_easy_setopt(curl, CURLOPT_URL, "http://www.baidu.com/");
res = curl_easy_perform(curl);
if (0!=res) {
printf("curl error: %d\n", res);
}
curl_easy_cleanup(curl);
}
curl_global_cleanup();
return 0;
}
在同一目录下写一个Android.mk文件生成curl_test可执行文件
LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_C_INCLUDES += \
$(TOP)/external/curl/include/ \
LOCAL_SRC_FILES:= curl_test.cpp
LOCAL_SHARED_LIBRARIES := libcurl
LOCAL_MODULE_TAGS := optional
LOCAL_MODULE := curl_test
include $(BUILD_EXECUTABLE)
生成可执行文件:curl_test
4.运行查看测试结果
运行测试case:curl_test
5.(补充)移植libcurl到android4.0,修改两个地方
1)把生成的路径改一下,一般默认为out/target/product/generic下面,我们根据系统不同(根据lunch选择不同,最终生成的路径不一样)改为系统的全局变量,
把$ANDROID_HOME/out/target/product/generic替换成$ANDROID_PRODUCT_OUT。
ANDROID_HOME_CURL=../../ && \
NDK_HOME_CURL=../../prebuilt/ndk && \
PATH="$ANDROID_HOME_CURL/prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin:$PATH" \
./configure --host=arm-linux CC=gcc --with-random=/dev/urandom \
CPPFLAGS="-I$NDK_HOME_CURL/platforms/android-8/arch-arm/usr/include \
-I $ANDROID_HOME_CURL/external/curl/include/ \
-I $ANDROID_HOME_CURL/external/curl/3rd/include \
-I $ANDROID_HOME_CURL/external/curl \
-I $ANDROID_HOME_CURL/out/target/product/generic/obj/STATIC_LIBRARIES/libcurl_intermediates \
-I $ANDROID_HOME_CURL/dalvik/libnativehelper/include/nativehelper \
-I $ANDROID_HOME_CURL/system/core/include \
-I $ANDROID_HOME_CURL/hardware/libhardware/include \
-I $ANDROID_HOME_CURL/hardware/libhardware_legacy/include \
-I $ANDROID_HOME_CURL/hardware/ril/include \
-I $ANDROID_HOME_CURL/dalvik/libnativehelper/include \
-I $ANDROID_HOME_CURL/frameworks/base/include \
-I $ANDROID_HOME_CURL/frameworks/base/opengl/include \
-I $ANDROID_HOME_CURL/frameworks/base/native/include \
-I $ANDROID_HOME_CURL/external/skia/include \
-I $ANDROID_HOME_CURL/out/target/product/generic/obj/include \
-I $ANDROID_HOME_CURL/bionic/libc/arch-arm/include \
-I $ANDROID_HOME_CURL/bionic/libc/include \
-I $ANDROID_HOME_CURL/bionic/libstdc++/include \
-I $ANDROID_HOME_CURL/bionic/libc/kernel/common \
-I $ANDROID_HOME_CURL/bionic/libc/kernel/arch-arm \
-I $ANDROID_HOME_CURL/bionic/libm/include \
-I $ANDROID_HOME_CURL/bionic/libm/include/arch/arm \
-I $ANDROID_HOME_CURL/bionic/libthread_db/include \
-include $ANDROID_HOME_CURL/system/core/include/arch/linux-arm/AndroidConfig.h \
-I $ANDROID_HOME_CURL/system/core/include/arch/linux-arm/ \
-D__ARM_ARCH_5__ -D__ARM_ARCH_5T__ -D__ARM_ARCH_5E__ -D__ARM_ARCH_5TE__ -DANDROID -DNDEBUG -DNDEBUG -DHAVE_CONFIG_H" \
CFLAGS="-fno-exceptions -Wno-multichar -msoft-float -fpic -ffunction-sections \
-funwind-tables -fstack-protector -Wa,--noexecstack -Werror=format-security \
-fno-short-enums -march=armv5te -mtune=xscale -Wno-psabi -mthumb-interwork \
-fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith \
-Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point \
-g -Wstrict-aliasing=2 -finline-functions -fno-inline-functions-called-once \
-fgcse-after-reload -frerun-cse-after-loop -frename-registers -UDEBUG \
-mthumb -Os -fomit-frame-pointer -fno-strict-aliasing -finline-limit=64 \
-Wpointer-arith -Wwrite-strings -Wunused -Winline -Wnested-externs \
-Wmissing-declarations -Wmissing-prototypes -Wno-long-long -Wfloat-equal \
-Wno-multichar -Wsign-compare -Wno-format-nonliteral -Wendif-labels \
-Wstrict-prototypes -Wdeclaration-after-statement -Wno-system-headers" \
LIBS="-nostdlib -Bdynamic -Wl,-T,$ANDROID_HOME_CURL/build/core/armelf.x \
-Wl,-dynamic-linker,/system/bin/linker -Wl,--gc-sections -Wl,-z,nocopyreloc \
-L$ANDROID_PRODUCT_OUT/obj/lib -Wl,-z,noexecstack \
-Wl,-rpath-link=$ANDROID_PRODUCT_OUT/obj/lib \
-lc -llog -lcutils -lstdc++ \
-Wl,--no-undefined $ANDROID_HOME_CURL/prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/lib/gcc/arm-eabi/4.4.0/libgcc.a \
$ANDROID_PRODUCT_OUT/obj/lib/crtend_android.o \
-lm $ANDROID_PRODUCT_OUT/obj/lib/crtbegin_dynamic.o \
-L$ANDROID_HOME_CURL/external/curl/3rd/libs"
2)修改Android.mk
把关于ALL_PREBUILT模块全部注释调
socket 报错
1) Servlet接口:提供最基本的管理Servlet生命周期的功能,主要有init(构造)、service(执行服务)、destroy(析构)三大功能,因此Servlet接口是任何Servlet类的生命和希望;
2) ServletConfig接口:提供最基本的查询Servlet配置信息的功能,主要有getServletName(获取当前Servlet注册名)、getInitParameter(获取Servlet初始参数)、getServletContext(获取Servlet环境句柄)等,但是它只是一个接口,因此里面没有ServletConfig对象,只是规定了该接口必须实现的功能;
3) GnericServlet类:它是个真正的类,它实现了Servlet和ServletConfig接口,并且它是所有具体Servlet的框架类,所有Servlet类都是该类的子类(都继承自它),包括HttpServlet类,并且它包含真正的数据对象ServletConfig,可以通过ServletConfig接口中的各种方法获取该对象中的信息(即Servlet配置信息);
1) 首先Web容器会读取web.xml配置信息并创建一个ServletConfig对象,将配置信息存在该对象中;
2) 接着Web容器调用Servlet类的空的构造函数构造一个空的Servlet类对象(里面什么都没有初始化)将其实例化;
3) 然后才是初始化,调用Servlet接口的init(config: ServletConfig)方法,将前面保存Servlet配置的ServletConfig对象作为参数传入init进行初始化,init中只有两句话:
public void init(ServletConfig config) throws ServletException {
this.config = config;
this.init();
}
i. 第一句当然是初始化config对象;
ii. 第二句不是调用基类的init函数,因为当前方法本身就是从Servlet接口继承来的方法,第二个无参的init方法是GenericServlet自己定义的方法,专门用来初始化自定义的数据成员(比如自己继承HttpServlet类实现了一个MyServlet类,里面自定义了一个数据成员String myName,而这样的数据成员的初始化就可以放在GenericServlet定义的无参init方法中;
4) 接受请求创建request和response对象并传入service进行服务;
5) 最后在一定条件下(关闭服务器、或主动关闭Servlet)调用destroy方法关闭并回收Servlet的空间;
每条rowkey都要发起一次请求,这种方法效率十分低
是的,性能的损失,hive支持通过类似sql语句的语法来操作hbase中的数据, 但是速度慢;
百亿
把CSS、JS,等一些静态文件都合并加载