一、引言
flume-ng是一个分布式、高可靠和高效的日志收集系统,flume-ng是flume的新版本的意思,其中“ng”意为new generate(新一代),目前来说,flume-ng 1.4是最新的版本。flume-ng与flume相比,发生了很大的变化,因为之前一直在flume0.9的版本,一直没有升级到flume-ng,最近因为项目需要,做了一次升级,发现了一些问题,特记录下来,分享给大家。
二、版本说明
三、安装步骤
下载、解压、安装JDK、设置环境变量部分已经有很多介绍性的问题,不做说明。需要特别说明之处的是,flume-ng不需要要zookeeper,无需设置。
四、flume-ng bug
安装完成后运行flume-ng会出现错误信息,这主要是因为shell脚本的问题,我将修改后的flume-ng完整的上传如下,其中标注:#zhangzl下面的行是需要修改的部分。完整脚本如下所示:
1 #!/bin/bash 2 # 3 # 4 # Licensed to the Apache Software Foundation (ASF) under one 5 # or more contributor license agreements. See the NOTICE file 6 # distributed with this work for additional information 7 # regarding copyright ownership. The ASF licenses this file 8 # to you under the Apache License, Version 2.0 (the 9 # "License"); you may not use this file except in compliance 10 # with the License. You may obtain a copy of the License at 11 # 12 # http://www.apache.org/licenses/LICENSE-2.0 13 # 14 # Unless required by applicable law or agreed to in writing, 15 # software distributed under the License is distributed on an 16 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 17 # KIND, either express or implied. See the License for the 18 # specific language governing permissions and limitations 19 # under the License. 20 # 21 22 ################################ 23 # constants 24 ################################ 25 26 FLUME_AGENT_CLASS="org.apache.flume.node.Application" 27 FLUME_AVRO_CLIENT_CLASS="org.apache.flume.client.avro.AvroCLIClient" 28 FLUME_VERSION_CLASS="org.apache.flume.tools.VersionInfo" 29 FLUME_TOOLS_CLASS="org.apache.flume.tools.FlumeToolsMain" 30 31 CLEAN_FLAG=1 32 ################################ 33 # functions 34 ################################ 35 36 info() { 37 if [ ${CLEAN_FLAG} -ne 0 ]; then 38 local msg=$1 39 echo "Info: $msg" >&2 40 fi 41 } 42 43 warn() { 44 if [ ${CLEAN_FLAG} -ne 0 ]; then 45 local msg=$1 46 echo "Warning: $msg" >&2 47 fi 48 } 49 50 error() { 51 local msg=$1 52 local exit_code=$2 53 54 echo "Error: $msg" >&2 55 56 if [ -n "$exit_code" ] ; then 57 exit $exit_code 58 fi 59 } 60 61 # If avail, add Hadoop paths to the FLUME_CLASSPATH and to the 62 # FLUME_JAVA_LIBRARY_PATH env vars. 63 # Requires Flume jars to already be on FLUME_CLASSPATH. 64 add_hadoop_paths() { 65 local HADOOP_IN_PATH=$(PATH="${HADOOP_HOME:-${HADOOP_PREFIX}}/bin:$PATH" \ 66 which hadoop 2>/dev/null) 67 68 if [ -f "${HADOOP_IN_PATH}" ]; then 69 info "Including Hadoop libraries found via ($HADOOP_IN_PATH) for HDFS access" 70 71 # determine hadoop java.library.path and use that for flume 72 local HADOOP_CLASSPATH="" 73 local HADOOP_JAVA_LIBRARY_PATH=$(HADOOP_CLASSPATH="$FLUME_CLASSPATH" \ 74 ${HADOOP_IN_PATH} org.apache.flume.tools.GetJavaProperty \ 75 java.library.path) 76 77 # look for the line that has the desired property value 78 # (considering extraneous output from some GC options that write to stdout) 79 # IFS = InternalFieldSeparator (set to recognize only newline char as delimiter) 80 IFS=$'\n' 81 for line in $HADOOP_JAVA_LIBRARY_PATH; do 82 #if [[ $line =~ ^java\.library\.path=(.*)$ ]]; then 83 if [[ "$line" =~ "^java\.library\.path=(.*)$" ]]; then 84 HADOOP_JAVA_LIBRARY_PATH=${BASH_REMATCH[1]} 85 break 86 fi 87 done 88 unset IFS 89 90 if [ -n "${HADOOP_JAVA_LIBRARY_PATH}" ]; then 91 FLUME_JAVA_LIBRARY_PATH="$FLUME_JAVA_LIBRARY_PATH:$HADOOP_JAVA_LIBRARY_PATH" 92 fi 93 94 # determine hadoop classpath 95 HADOOP_CLASSPATH=$($HADOOP_IN_PATH classpath) 96 97 # hack up and filter hadoop classpath 98 local ELEMENTS=$(sed -e 's/:/ /g' <<<${HADOOP_CLASSPATH}) 99 local ELEMENT 100 for ELEMENT in $ELEMENTS; do 101 local PIECE 102 for PIECE in $(echo $ELEMENT); do 103 #zhangzl 104 if [[ $PIECE =~ "slf4j-(api|log4j12).*\.jar" ]]; then 105 info "Excluding $PIECE from classpath" 106 continue 107 else 108 FLUME_CLASSPATH="$FLUME_CLASSPATH:$PIECE" 109 fi 110 done 111 done 112 113 fi 114 } 115 add_HBASE_paths() { 116 local HBASE_IN_PATH=$(PATH="${HBASE_HOME}/bin:$PATH" \ 117 which hbase 2>/dev/null) 118 119 if [ -f "${HBASE_IN_PATH}" ]; then 120 info "Including HBASE libraries found via ($HBASE_IN_PATH) for HBASE access" 121 122 # determine HBASE java.library.path and use that for flume 123 local HBASE_CLASSPATH="" 124 local HBASE_JAVA_LIBRARY_PATH=$(HBASE_CLASSPATH="$FLUME_CLASSPATH" \ 125 ${HBASE_IN_PATH} org.apache.flume.tools.GetJavaProperty \ 126 java.library.path) 127 128 # look for the line that has the desired property value 129 # (considering extraneous output from some GC options that write to stdout) 130 # IFS = InternalFieldSeparator (set to recognize only newline char as delimiter) 131 IFS=$'\n' 132 for line in $HBASE_JAVA_LIBRARY_PATH; do 133 #zhangzl 134 if [[ $line =~ "^java\.library\.path=(.*)$" ]]; then 135 HBASE_JAVA_LIBRARY_PATH=${BASH_REMATCH[1]} 136 break 137 fi 138 done 139 unset IFS 140 141 if [ -n "${HBASE_JAVA_LIBRARY_PATH}" ]; then 142 FLUME_JAVA_LIBRARY_PATH="$FLUME_JAVA_LIBRARY_PATH:$HBASE_JAVA_LIBRARY_PATH" 143 fi 144 145 # determine HBASE classpath 146 HBASE_CLASSPATH=$($HBASE_IN_PATH classpath) 147 148 # hack up and filter HBASE classpath 149 local ELEMENTS=$(sed -e 's/:/ /g' <<<${HBASE_CLASSPATH}) 150 local ELEMENT 151 for ELEMENT in $ELEMENTS; do 152 local PIECE 153 for PIECE in $(echo $ELEMENT); do 154 #zhangzl 155 if [[ $PIECE =~ "slf4j-(api|log4j12).*\.jar" ]]; then 156 info "Excluding $PIECE from classpath" 157 continue 158 else 159 FLUME_CLASSPATH="$FLUME_CLASSPATH:$PIECE" 160 fi 161 done 162 done 163 FLUME_CLASSPATH="$FLUME_CLASSPATH:$HBASE_HOME/conf" 164 165 fi 166 } 167 168 set_LD_LIBRARY_PATH(){ 169 #Append the FLUME_JAVA_LIBRARY_PATH to whatever the user may have specified in 170 #flume-env.sh 171 if [ -n "${FLUME_JAVA_LIBRARY_PATH}" ]; then 172 export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${FLUME_JAVA_LIBRARY_PATH}" 173 fi 174 } 175 176 display_help() { 177 cat <<EOF 178 Usage: $0 <command> [options]... 179 180 commands: 181 help display this help text 182 agent run a Flume agent 183 avro-client run an avro Flume client 184 version show Flume version info 185 186 global options: 187 --conf,-c <conf> use configs in <conf> directory 188 --classpath,-C <cp> append to the classpath 189 --dryrun,-d do not actually start Flume, just print the command 190 --plugins-path <dirs> colon-separated list of plugins.d directories. See the 191 plugins.d section in the user guide for more details. 192 Default: \$FLUME_HOME/plugins.d 193 -Dproperty=value sets a Java system property value 194 -Xproperty=value sets a Java -X option 195 196 agent options: 197 --conf-file,-f <file> specify a config file (required) 198 --name,-n <name> the name of this agent (required) 199 --help,-h display help text 200 201 avro-client options: 202 --rpcProps,-P <file> RPC client properties file with server connection params 203 --host,-H <host> hostname to which events will be sent 204 --port,-p <port> port of the avro source 205 --dirname <dir> directory to stream to avro source 206 --filename,-F <file> text file to stream to avro source (default: std input) 207 --headerFile,-R <file> File containing event headers as key/value pairs on each new line 208 --help,-h display help text 209 210 Either --rpcProps or both --host and --port must be specified. 211 212 Note that if <conf> directory is specified, then it is always included first 213 in the classpath. 214 215 EOF 216 } 217 218 run_flume() { 219 local FLUME_APPLICATION_CLASS 220 221 if [ "$#" -gt 0 ]; then 222 FLUME_APPLICATION_CLASS=$1 223 shift 224 else 225 error "Must specify flume application class" 1 226 fi 227 228 if [ ${CLEAN_FLAG} -ne 0 ]; then 229 set -x 230 fi 231 $EXEC $JAVA_HOME/bin/java $JAVA_OPTS -cp "$FLUME_CLASSPATH" \ 232 -Djava.library.path=$FLUME_JAVA_LIBRARY_PATH "$FLUME_APPLICATION_CLASS" $* 233 } 234 235 ################################ 236 # main 237 ################################ 238 239 # set default params 240 FLUME_CLASSPATH="" 241 FLUME_JAVA_LIBRARY_PATH="" 242 JAVA_OPTS="-Xmx20m" 243 LD_LIBRARY_PATH="" 244 245 opt_conf="" 246 opt_classpath="" 247 opt_plugins_dirs="" 248 opt_java_props="" 249 opt_dryrun="" 250 251 mode=$1 252 shift 253 254 case "$mode" in 255 help) 256 display_help 257 exit 0 258 ;; 259 agent) 260 opt_agent=1 261 ;; 262 node) 263 opt_agent=1 264 warn "The \"node\" command is deprecated. Please use \"agent\" instead." 265 ;; 266 avro-client) 267 opt_avro_client=1 268 ;; 269 tool) 270 opt_tool=1 271 ;; 272 version) 273 opt_version=1 274 CLEAN_FLAG=0 275 ;; 276 *) 277 error "Unknown or unspecified command '$mode'" 278 echo 279 display_help 280 exit 1 281 ;; 282 esac 283 284 args="" 285 while [ -n "$*" ] ; do 286 arg=$1 287 shift 288 289 case "$arg" in 290 --conf|-c) 291 [ -n "$1" ] || error "Option --conf requires an argument" 1 292 opt_conf=$1 293 shift 294 ;; 295 --classpath|-C) 296 [ -n "$1" ] || error "Option --classpath requires an argument" 1 297 opt_classpath=$1 298 shift 299 ;; 300 --dryrun|-d) 301 opt_dryrun="1" 302 ;; 303 --plugins-path) 304 opt_plugins_dirs=$1 305 shift 306 ;; 307 -D*) 308 opt_java_props="$opt_java_props $arg" 309 ;; 310 -X*) 311 opt_java_props="$opt_java_props $arg" 312 ;; 313 *) 314 args="$args $arg" 315 ;; 316 esac 317 done 318 319 # make opt_conf absolute 320 if [[ -n "$opt_conf" && -d "$opt_conf" ]]; then 321 opt_conf=$(cd $opt_conf; pwd) 322 fi 323 324 # allow users to override the default env vars via conf/flume-env.sh 325 if [ -z "$opt_conf" ]; then 326 warn "No configuration directory set! Use --conf <dir> to override." 327 elif [ -f "$opt_conf/flume-env.sh" ]; then 328 info "Sourcing environment configuration script $opt_conf/flume-env.sh" 329 source "$opt_conf/flume-env.sh" 330 fi 331 332 # append command-line java options to stock or env script JAVA_OPTS 333 if [ -n "${opt_java_props}" ]; then 334 JAVA_OPTS="${JAVA_OPTS} ${opt_java_props}" 335 fi 336 337 # prepend command-line classpath to env script classpath 338 if [ -n "${opt_classpath}" ]; then 339 if [ -n "${FLUME_CLASSPATH}" ]; then 340 FLUME_CLASSPATH="${opt_classpath}:${FLUME_CLASSPATH}" 341 else 342 FLUME_CLASSPATH="${opt_classpath}" 343 fi 344 fi 345 346 if [ -z "${FLUME_HOME}" ]; then 347 FLUME_HOME=$(cd $(dirname $0)/..; pwd) 348 fi 349 350 # prepend $FLUME_HOME/lib jars to the specified classpath (if any) 351 if [ -n "${FLUME_CLASSPATH}" ] ; then 352 FLUME_CLASSPATH="${FLUME_HOME}/lib/*:$FLUME_CLASSPATH" 353 else 354 FLUME_CLASSPATH="${FLUME_HOME}/lib/*" 355 fi 356 357 # load plugins.d directories 358 PLUGINS_DIRS="" 359 if [ -n "${opt_plugins_dirs}" ]; then 360 PLUGINS_DIRS=$(sed -e 's/:/ /g' <<<${opt_plugins_dirs}) 361 else 362 PLUGINS_DIRS="${FLUME_HOME}/plugins.d" 363 fi 364 365 unset plugin_lib plugin_libext plugin_native 366 for PLUGINS_DIR in $PLUGINS_DIRS; do 367 if [[ -d ${PLUGINS_DIR} ]]; then 368 for plugin in ${PLUGINS_DIR}/*; do 369 if [[ -d "$plugin/lib" ]]; then 370 plugin_lib="${plugin_lib}${plugin_lib+:}${plugin}/lib/*" 371 fi 372 if [[ -d "$plugin/libext" ]]; then 373 plugin_libext="${plugin_libext}${plugin_libext+:}${plugin}/libext/*" 374 fi 375 if [[ -d "$plugin/native" ]]; then 376 plugin_native="${plugin_native}${plugin_native+:}${plugin}/native" 377 fi 378 done 379 fi 380 done 381 382 if [[ -n "${plugin_lib}" ]] 383 then 384 FLUME_CLASSPATH="${FLUME_CLASSPATH}:${plugin_lib}" 385 fi 386 387 if [[ -n "${plugin_libext}" ]] 388 then 389 FLUME_CLASSPATH="${FLUME_CLASSPATH}:${plugin_libext}" 390 fi 391 392 if [[ -n "${plugin_native}" ]] 393 then 394 if [[ -n "${FLUME_JAVA_LIBRARY_PATH}" ]] 395 then 396 FLUME_JAVA_LIBRARY_PATH="${FLUME_JAVA_LIBRARY_PATH}:${plugin_native}" 397 else 398 FLUME_JAVA_LIBRARY_PATH="${plugin_native}" 399 fi 400 fi 401 402 # find java 403 if [ -z "${JAVA_HOME}" ] ; then 404 warn "JAVA_HOME is not set!" 405 # Try to use Bigtop to autodetect JAVA_HOME if it's available 406 if [ -e /usr/libexec/bigtop-detect-javahome ] ; then 407 . /usr/libexec/bigtop-detect-javahome 408 elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ] ; then 409 . /usr/lib/bigtop-utils/bigtop-detect-javahome 410 fi 411 412 # Using java from path if bigtop is not installed or couldn't find it 413 if [ -z "${JAVA_HOME}" ] ; then 414 JAVA_DEFAULT=$(type -p java) 415 [ -n "$JAVA_DEFAULT" ] || error "Unable to find java executable. Is it in your PATH?" 1 416 JAVA_HOME=$(cd $(dirname $JAVA_DEFAULT)/..; pwd) 417 fi 418 fi 419 420 # look for hadoop libs 421 add_hadoop_paths 422 add_HBASE_paths 423 424 # prepend conf dir to classpath 425 if [ -n "$opt_conf" ]; then 426 FLUME_CLASSPATH="$opt_conf:$FLUME_CLASSPATH" 427 fi 428 429 set_LD_LIBRARY_PATH 430 # allow dryrun 431 EXEC="exec" 432 if [ -n "${opt_dryrun}" ]; then 433 warn "Dryrun mode enabled (will not actually initiate startup)" 434 EXEC="echo" 435 fi 436 437 # finally, invoke the appropriate command 438 if [ -n "$opt_agent" ] ; then 439 run_flume $FLUME_AGENT_CLASS $args 440 elif [ -n "$opt_avro_client" ] ; then 441 run_flume $FLUME_AVRO_CLIENT_CLASS $args 442 elif [ -n "${opt_version}" ] ; then 443 run_flume $FLUME_VERSION_CLASS $args 444 elif [ -n "${opt_tool}" ] ; then 445 run_flume $FLUME_TOOLS_CLASS $args 446 else 447 error "This message should never appear" 1 448 fi 449 450 exit 0
五、测试配置文件
在conf目录下创建example-conf.properties文件,属性如下所示:
1 # Describe the source 2 a1.sources = r1 3 a1.sinks = k1 4 a1.channels = c1 5 6 # Describe/configure the source 7 a1.sources.r1.type = avro 8 a1.sources.r1.bind = localhost 9 a1.sources.r1.port = 44444 10 11 # Describe the sink 12 # 将数据输出至日志中 13 a1.sinks.k1.type = logger 14 15 16 # Use a channel which buffers events in memory 17 a1.channels.c1.type = memory 18 a1.channels.c1.capacity = 1000 19 a1.channels.c1.transactionCapacity = 100 20 21 # Bind the source and sink to the channel 22 a1.sources.r1.channels = c1 23 a1.sinks.k1.channel = c1
六、运行命令
6.1 启动代理
[hadoop@hadoop1 conf]$ flume-ng agent -n a1 -f example-conf.properties
6.2 启动avro-client客户端向agent代理发送数据-需要单独启动新的窗口
[hadoop@hadoop1 conf]$ flume-ng avro-client -H localhost -p 44444 -F file01
七、结果查看
1 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] OPEN 2 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] BOUND: /127.0.0.1:44444 3 14/01/16 22:26:34 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 => /127.0.0.1:44444] CONNECTED: /127.0.0.1:54289 4 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] DISCONNECTED 5 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] UNBOUND 6 14/01/16 22:26:36 INFO ipc.NettyServer: [id: 0x0100c7e4, /127.0.0.1:54289 :> /127.0.0.1:44444] CLOSED 7 14/01/16 22:26:36 INFO ipc.NettyServer: Connection to /127.0.0.1:54289 disconnected. 8 14/01/16 22:26:38 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
作者:张子良
出处:http://www.cnblogs.com/hadoopdev
本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。