ganglia metric extended by gmetric command line tool-阿里云开发者社区

上一篇文章简单的介绍了一下metric的相关知识 :

http://blog.163.com/digoal@126/blog/static/163877040201481835916946/

ganglia gmond实际上是将metric collection模块化来处理的, 包括自带的metric, 也是模块化来加载的, 所以非常适合扩展.

例如 :

[root@db-172-16-3-221 cpu]# cd /opt/ganglia-core-3.6.0/
[root@db-172-16-3-221 ganglia-core-3.6.0]# ll
total 24
drwxr-xr-x 2 root root 4096 Sep  9 11:03 bin
drwxr-xr-x 3 root root 4096 Sep 15 09:31 etc
drwxr-xr-x 2 root root 4096 Sep  9 11:03 include
drwxr-xr-x 3 root root 4096 Sep  9 11:03 lib64
drwxr-xr-x 2 root root 4096 Sep  9 11:03 sbin
drwxr-xr-x 3 root root 4096 Sep  9 11:03 share
[root@db-172-16-3-221 ganglia-core-3.6.0]# cd lib64/
[root@db-172-16-3-221 lib64]# ll
total 660
drwxr-xr-x 2 root root   4096 Sep  9 11:03 ganglia
lrwxrwxrwx 1 root root     25 Sep  9 11:03 libganglia-3.6.0.so.0 -> libganglia-3.6.0.so.0.0.0
-rwxr-xr-x 1 root root 257554 Sep  9 11:03 libganglia-3.6.0.so.0.0.0
-rw-r--r-- 1 root root 408276 Sep  9 11:03 libganglia.a
-rwxr-xr-x 1 root root   1270 Sep  9 11:03 libganglia.la
lrwxrwxrwx 1 root root     25 Sep  9 11:03 libganglia.so -> libganglia-3.6.0.so.0.0.0
[root@db-172-16-3-221 lib64]# cd ganglia/
[root@db-172-16-3-221 ganglia]# ll
total 700
-rwxr-xr-x 1 root root 86896 Sep  9 11:03 modcpu.so
-rwxr-xr-x 1 root root 84118 Sep  9 11:03 moddisk.so
-rwxr-xr-x 1 root root 17896 Sep  9 11:03 modgstatus.so
-rwxr-xr-x 1 root root 84078 Sep  9 11:03 modload.so
-rwxr-xr-x 1 root root 85832 Sep  9 11:03 modmem.so
-rwxr-xr-x 1 root root 31655 Sep  9 11:03 modmulticpu.so
-rwxr-xr-x 1 root root 84480 Sep  9 11:03 modnet.so
-rwxr-xr-x 1 root root 83862 Sep  9 11:03 modproc.so
-rwxr-xr-x 1 root root 53954 Sep  9 11:03 modpython.so
-rwxr-xr-x 1 root root 85136 Sep  9 11:03 modsys.so

在gmond的配置文件中加载模块 :

# less /opt/ganglia-core-3.6.0/etc/gmond.conf
/* Each metrics module that is referenced by gmond must be specified and
   loaded. If the module has been statically linked with gmond, it does
   not require a load path. However all dynamically loadable modules must
   include a load path. */
modules {
  module {
    name = "core_metrics"
  }
  module {
    name = "cpu_module"
    path = "modcpu.so"
  }
  module {
    name = "disk_module"
    path = "moddisk.so"
  }
  module {
    name = "load_module"
    path = "modload.so"
  }
  module {
    name = "mem_module"
    path = "modmem.so"
  }
  module {
    name = "net_module"
    path = "modnet.so"
  }
  module {
    name = "proc_module"
    path = "modproc.so"
  }
  module {
    name = "sys_module"
    path = "modsys.so"
  }
}

当gmond自带的metric模块不能满足你的监控需求时, 你需要扩展metric collection.

扩展方法比较多, 例如直接修改gmond, 添加模块并重新编译gmond. 然后在gmond.conf中添加新增的模块, 并配置metric采集项.

或者使用c/c++, python , php, perl遵循gmond的标准来写扩展模块. 然后在gmond.conf中添加新增的模块, 并配置metric采集项.

(编译时需要开启php, perl的功能, 具体见源码里的 configure --help)

最后, 还可以使用gmetric命令直接向集群中的多播地址或单播地址发送metric信息.

显然使用gmetric来得最快, 但是使用gmetric有一个弊端, 必须用户自己调度gmetric的执行,

而如果使用模块的话, 可以统一在gmond.conf中配置, gmond handler会根据gmond的配置来调用模块来完成metric的采集, 不需要用户去调度.

接下来要使用gmetric来测试一下自定义metric收集.

[root@db-172-16-3-221 ganglia-3.6.0]# gmetric --help
gmetric 3.6.0

The Ganglia Metric Client (gmetric) announces a metric
on the list of defined send channels defined in a configuration file

Usage: gmetric [OPTIONS]...

  -h, --help            Print help and exit
  -V, --version         Print version and exit
  -c, --conf=STRING     The configuration file to use for finding send channels 
                           (default=`/opt/ganglia-core-3.6.0/etc/gmond.conf')
  -n, --name=STRING     Name of the metric
  -v, --value=STRING    Value of the metric
  -t, --type=STRING     Either 
                          string|int8|uint8|int16|uint16|int32|uint32|float|double
  -u, --units=STRING    Unit of measure for the value e.g. Kilobytes, Celcius  
                          (default=`')
  -s, --slope=STRING    Either zero|positive|negative|both  (default=`both')
  -x, --tmax=INT        The maximum time in seconds between gmetric calls  
                          (default=`60')
  -d, --dmax=INT        The lifetime in seconds of this metric  (default=`0')
  -g, --group=STRING    Group(s) of the metric (comma-separated)
  -C, --cluster=STRING  Cluster of the metric
  -D, --desc=STRING     Description of the metric
  -T, --title=STRING    Title of the metric
  -S, --spoof=STRING    IP address and name of host/device (colon separated) we 
                          are spoofing  (default=`')
  -H, --heartbeat       spoof a heartbeat message (use with spoof option)

首先在集群中要有gmond监听来自网络中的metric消息.

例如我们网络中只有一个gmond服务, 并在多播地址上起了一个监听来接收网络中发送过来的metric信息.

[root@db-172-16-3-221 ganglia-3.6.0]# ps -ewf|grep gmond
nobody    4271     1  0 15:51 ?        00:00:00 gmond
root      4930 26259  0 16:54 pts/2    00:00:00 grep gmond
[root@db-172-16-3-221 ganglia-3.6.0]# netstat -anp|grep gmon
tcp        0      0 0.0.0.0:8649                0.0.0.0:*                   LISTEN      4271/gmond          
udp        0      0 172.16.3.221:63894          239.2.11.71:8649            ESTABLISHED 4271/gmond          
udp        0      0 239.2.11.71:8649            0.0.0.0:*                               4271/gmond

接下来我在多播域中的另一台机器上配置一下gmond.conf

主要配置如下, 务必配置正确的udp发送通道.

[root@150 etc]# vi /opt/ganglia-core-3.6.0/etc/gmond.conf
/* This configuration is as close to 2.5.x default behavior as possible
   The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  # By default gmond will use reverse DNS resolution when displaying your hostname
  # Uncommeting following value will override that value.
  # override_hostname = "mywebserver.domain.com"
  # If you are not using multicast this value should be set to something other than 0.
  # Otherwise if you restart aggregator gmond you will get empty graphs. 60 seconds is reasonable
  send_metadata_interval = 0 /*secs */

}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "test"
  owner = "digoal"
  latlong = "111 122"
  url = "http://dba.sky-mobi.com"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "1,2,4"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 10
}

主机名和IP如下 :

[root@150 etc]# hostname
150.sky-mobi.com
[root@150 etc]# ifconfig
em1       Link encap:Ethernet  HWaddr 00:22:19:60:77:8F  
          inet6 addr: fe80::222:19ff:fe60:778f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:599238829 errors:0 dropped:0 overruns:0 frame:0
          TX packets:67014382 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:882536084904 (821.9 GiB)  TX bytes:8841108757 (8.2 GiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:81579301 errors:0 dropped:0 overruns:0 frame:0
          TX packets:81579301 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:15164677095 (14.1 GiB)  TX bytes:15164677095 (14.1 GiB)

ovirtmgmt Link encap:Ethernet  HWaddr 00:22:19:60:77:8F  
          inet addr:172.16.3.150  Bcast:172.16.3.255  Mask:255.255.255.0
          inet6 addr: fe80::222:19ff:fe60:778f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:93718326 errors:0 dropped:0 overruns:0 frame:0
          TX packets:86213477 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1098952054073 (1023.4 GiB)  TX bytes:11330242206 (10.5 GiB)

ovirtmgmt:1 Link encap:Ethernet  HWaddr 00:22:19:60:77:8F  
          inet addr:172.16.13.150  Bcast:172.16.13.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

首先要启动gmond, (当然, 不启动也可以, 启动gmond除了可以发心跳包, 还可以将location等基础信息发过去)

Usage: gmetric [OPTIONS]...

  -h, --help            Print help and exit
  -V, --version         Print version and exit
  -c, --conf=STRING     The configuration file to use for finding send channels 
                           (default=`/opt/ganglia-core-3.6.0/etc/gmond.conf')
  -n, --name=STRING     Name of the metric
  -v, --value=STRING    Value of the metric
  -t, --type=STRING     Either 
                          string|int8|uint8|int16|uint16|int32|uint32|float|double
  -u, --units=STRING    Unit of measure for the value e.g. Kilobytes, Celcius  
                          (default=`')
  -s, --slope=STRING    Either zero|positive|negative|both  (default=`both')
  -x, --tmax=INT        The maximum time in seconds between gmetric calls  
                          (default=`60')
  -d, --dmax=INT        The lifetime in seconds of this metric  (default=`0')
  -g, --group=STRING    Group(s) of the metric (comma-separated)
  -C, --cluster=STRING  Cluster of the metric
  -D, --desc=STRING     Description of the metric
  -T, --title=STRING    Title of the metric
  -S, --spoof=STRING    IP address and name of host/device (colon separated) we 
                          are spoofing  (default=`')
  -H, --heartbeat       spoof a heartbeat message (use with spoof option)
[root@150 etc]# gmetric -c /opt/ganglia-core-3.6.0/etc/gmond.conf -n load_one -v 1.0 -t float -s both -x 20 -g load -C test -D "load per min" -T "load one"

例如在172.16.3.150模拟发送172.16.3.151的metric信息 :

[root@150 etc]# gmetric -c /opt/ganglia-core-3.6.0/etc/gmond.conf -n load_one -v 1.0 -t float -s both -x 20 -g load -C test -D "load per min" -T "load one" -S "172.16.3.151:digoal_host"

不指定 -S "172.16.3.151:digoal_host"的话, 就是以当前主机的metric来发送.

如果不启动gmond的话, 那么需要发心跳包过去.

[root@150 etc]# gmetric -c /opt/ganglia-core-3.6.0/etc/gmond.conf -n load_one -v 1.0 -t float -s both -x 20 -g load -C test -D "load per min" -T "load one" -S "172.16.3.151:digoal_host" -H

发完心跳包后 :

ganglia metric extended by gmetric command line tool - 德哥@Digoal - PostgreSQL research

[注意]

1. gmond 负责监听的节点, 必须开放防火墙的访问, 例如 :

[root@db-172-16-3-221 kernel-doc-2.6.32]# netstat -anp|grep gmond
tcp        0      0 0.0.0.0:8649                0.0.0.0:*                   LISTEN      4271/gmond          
udp        0      0 172.16.3.221:63894          239.2.11.71:8649            ESTABLISHED 4271/gmond          
udp        0      0 239.2.11.71:8649            0.0.0.0:*                               4271/gmond

务必允许其他主机访问这个多播地址.

[参考]

1. https://github.com/ganglia/ganglia_contrib/tree/master/gmond-debug

2. http://blog.163.com/digoal@126/blog/static/163877040201481835916946/

ganglia metric extended by gmetric command line tool

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

ganglia metric extended by gmetric command line tool

热门文章

最新文章

相关电子书