简介
基础架构即代码 (Infrastructure as code, IaC) 工具通过使用配置文件而不是通过图形用户界面来管理基础架构。采用基础设施即代码的思想,是将云主机、网络、存储等基础设施都当作软件来对待。这样不仅可对其进行编程处理,而且能实现版本化管理、重用、调试等手段,甚至在出现错误时还可以进行还原操作。同时,对于在软件开发生命周期中进行的代码评审、自动化测试、CI/CD等活动也可以应用到基础设施管理上来。
Terraform是由HashiCorp公司创建的开源工具,是一个IT基础架构自动化编排工具,它的口号是“Write, Plan, and Create Infrastructure as Code”, 是一个“基础设施即代码”工具。Terraform的命令行接口(CLI)提供一种简单机制,用于将配置文件部署到阿里云或其他任意支持的云上,并对其进行版本控制。
使用Terraform可以标准化部署工作流程。云厂商将基础设施的各个单元(例如计算实例或专用网络)定义为资源。用户可以将多种独立的资源组合成可重用的Terraform配置,并使用一致的语言和工作流程来管理它们。Terraform 的配置语言是声明性的,这意味着它描述了基础设施所需的最终状态。Terraform会自动计算资源之间的依赖关系,以便以正确的顺序创建或销毁它们。
Logtail是阿里云日志服务SLS提供的日志采集Agent,用于采集阿里云ECS、阿里云ACK、自建IDC、其他云厂商等服务器上的日志。阿里云作为第三大云服务提供商,terraform-alicloud-provider已经支持了包括SLS、OSS、SLB、RDS在内的众多云产品。本文将介绍使用Terraform实现logtail日志自动化采集的最佳实践。
安装
安装方式详见:https://help.aliyun.com/document_detail/95825.html
使用Terraform全新构建SLS资源
创建Terraform配置模版
创建Terraform工作目录,并定义sls logtail采集配置。
采集配置两部分组成:
- provider
- 在这里可以完成Alibaba Cloud provider 的一些基础配置。更多详见链接。
- 鉴权的配置,包括静态鉴权、ECS role鉴权等多种方式。本文采用静态鉴权的方式。
- 地域的配置,用于表示后续资源创建的地域。
- resource
- alicloud_log_project:用于创建project,更多配置详见链接。
- alicloud_log_store:用于创建logstore,更多配置详见链接。
- alicloud_log_store_index:用于开启索引,更多配置详见链接。
- alicloud_log_machine_group:用于创建机器组,更多配置详见链接。
- alicloud_logtail_config:用于创建采集配置,更多配置详见链接。
- alicloud_logtail_attachment:用于关联机器组跟采集配置,更多配置详见链接。
接下来将演示如何通过标识型机器组,采集对应机器上特定目录下的文件。
$ mkdir learn-terraform-sls
$ cd learn-terraform-sls
$ vi terraform.tf
provider "alicloud" {
region = "cn-zhangjiakou"
}
resource "alicloud_log_project" "test" {
name = "tf-test-project-zhangjiakou"
description = "create by terraform"
}
resource "alicloud_log_store" "test" {
project = alicloud_log_project.test.name
name = "tf-test-logstore"
retention_period = 7
shard_count = 3
auto_split = true
max_split_shard_count = 60
append_meta = true
}
resource "alicloud_log_store_index" "test" {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
full_text {
case_sensitive = true
token = " #$%^*\r\n "
}
}
resource "alicloud_log_machine_group" "test" {
project = alicloud_log_project.test.name
name = "tf-log-machine-group"
topic = "terraform"
identify_type = "ip"
identify_list = ["172.26.51.68"]
}
resource "alicloud_logtail_config" "test" {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
input_type = "file"
log_sample = "test"
name = "tf-log-config"
output_type = "LogService"
input_detail = <<DEFINITION
{
"logPath": "/root/tmp",
"filePattern": "access.log",
"logType": "json_log",
"topicFormat": "default",
"discardUnmatch": false,
"enableRawLog": false,
"fileEncoding": "gbk",
"maxDepth": 10
}
DEFINITION
}
resource "alicloud_logtail_attachment" "test" {
project = alicloud_log_project.test.name
logtail_config_name = alicloud_logtail_config.test.name
machine_group_name = alicloud_log_machine_group.test.name
}
初始化工作目录
执行terraform init
初始化工作目录,安装alicloud provider。
$ terraform init
Initializing the backend...
Initializing provider plugins...
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
terraform fmt
格式化terraform.tf
文件。
$ sls_terraform terraform fmt
terraform.tf
terraform validate
验证配置模版正确性。
$ terraform validate
Success! The configuration is valid.
创建基础设施
通过terraform apply
创建resource对应的资源。在创建动作真正下发前,Terraform会打印出所有的execution plan,他们描述了将会发生的变更事件。如下面的执行,将会有6个plan用于新增资源。
$ terraform apply
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# alicloud_log_machine_group.test will be created
+ resource "alicloud_log_machine_group" "test" {
+ id = (known after apply)
+ identify_list = [
+ "172.26.51.68",
]
+ identify_type = "ip"
+ name = "tf-log-machine-group"
+ project = "tf-test-project-zhangjiakou"
+ topic = "terraform"
}
# alicloud_log_project.test will be created
+ resource "alicloud_log_project" "test" {
+ description = "create by terraform"
+ id = (known after apply)
+ name = "tf-test-project-zhangjiakou"
}
# alicloud_log_store.test will be created
+ resource "alicloud_log_store" "test" {
+ append_meta = true
+ auto_split = true
+ enable_web_tracking = false
+ id = (known after apply)
+ max_split_shard_count = 60
+ name = "tf-test-logstore"
+ project = "tf-test-project-zhangjiakou"
+ retention_period = 7
+ shard_count = 3
+ shards = (known after apply)
}
# alicloud_log_store_index.test will be created
+ resource "alicloud_log_store_index" "test" {
+ id = (known after apply)
+ logstore = "tf-test-logstore"
+ project = "tf-test-project-zhangjiakou"
+ full_text {
+ case_sensitive = true
+ include_chinese = false
+ token = " #$%^*\r\n "
}
}
# alicloud_logtail_attachment.test will be created
+ resource "alicloud_logtail_attachment" "test" {
+ id = (known after apply)
+ logtail_config_name = "tf-log-config"
+ machine_group_name = "tf-log-machine-group"
+ project = "tf-test-project-zhangjiakou"
}
# alicloud_logtail_config.test will be created
+ resource "alicloud_logtail_config" "test" {
+ id = (known after apply)
+ input_detail = jsonencode(
{
+ discardUnmatch = false
+ enableRawLog = false
+ fileEncoding = "gbk"
+ filePattern = "access.log"
+ logPath = "/root/tmp"
+ logType = "json_log"
+ maxDepth = 10
+ topicFormat = "default"
}
)
+ input_type = "file"
+ log_sample = "test"
+ logstore = "tf-test-logstore"
+ name = "tf-log-config"
+ output_type = "LogService"
+ project = "tf-test-project-zhangjiakou"
}
Plan: 6 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value:
确认无误后,输入yes,将会真正的执行。
alicloud_log_project.test: Creating...
alicloud_log_project.test: Creation complete after 1s [id=tf-test-project-zhangjiakou]
alicloud_log_machine_group.test: Creating...
alicloud_log_store.test: Creating...
alicloud_log_machine_group.test: Creation complete after 0s [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Still creating... [10s elapsed]
alicloud_log_store.test: Still creating... [20s elapsed]
alicloud_log_store.test: Still creating... [30s elapsed]
alicloud_log_store.test: Still creating... [40s elapsed]
alicloud_log_store.test: Still creating... [50s elapsed]
alicloud_log_store.test: Still creating... [1m0s elapsed]
alicloud_log_store.test: Creation complete after 1m0s [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_config.test: Creating...
alicloud_log_store_index.test: Creating...
alicloud_logtail_config.test: Creation complete after 1s [id=tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config]
alicloud_logtail_attachment.test: Creating...
alicloud_logtail_attachment.test: Creation complete after 0s [id=tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group]
alicloud_log_store_index.test: Creation complete after 1s [id=tf-test-project-zhangjiakou:tf-test-logstore]
可以使用terraform show
查看配置的状态。
$ terraform show
# alicloud_log_machine_group.test:
resource "alicloud_log_machine_group" "test" {
id = "tf-test-project-zhangjiakou:tf-log-machine-group"
identify_list = [
"172.26.51.68",
]
identify_type = "ip"
name = "tf-log-machine-group"
project = "tf-test-project-zhangjiakou"
topic = "terraform"
}
# alicloud_log_project.test:
resource "alicloud_log_project" "test" {
description = "create by terraform"
id = "tf-test-project-zhangjiakou"
name = "tf-test-project-zhangjiakou"
}
# alicloud_log_store.test:
resource "alicloud_log_store" "test" {
append_meta = true
auto_split = true
enable_web_tracking = false
id = "tf-test-project-zhangjiakou:tf-test-logstore"
max_split_shard_count = 60
name = "tf-test-logstore"
project = "tf-test-project-zhangjiakou"
retention_period = 7
shard_count = 3
shards = [
{
begin_key = "00000000000000000000000000000000"
end_key = "55000000000000000000000000000000"
id = 0
status = "readwrite"
},
{
begin_key = "55000000000000000000000000000000"
end_key = "aa000000000000000000000000000000"
id = 1
status = "readwrite"
},
{
begin_key = "aa000000000000000000000000000000"
end_key = "ffffffffffffffffffffffffffffffff"
id = 2
status = "readwrite"
},
]
}
# alicloud_log_store_index.test:
resource "alicloud_log_store_index" "test" {
id = "tf-test-project-zhangjiakou:tf-test-logstore"
logstore = "tf-test-logstore"
project = "tf-test-project-zhangjiakou"
full_text {
case_sensitive = true
include_chinese = false
token = " #$%^*\r\n "
}
}
# alicloud_logtail_attachment.test:
resource "alicloud_logtail_attachment" "test" {
id = "tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group"
logtail_config_name = "tf-log-config"
machine_group_name = "tf-log-machine-group"
project = "tf-test-project-zhangjiakou"
}
# alicloud_logtail_config.test:
resource "alicloud_logtail_config" "test" {
id = "tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config"
input_detail = jsonencode(
{
discardUnmatch = false
enableRawLog = false
fileEncoding = "gbk"
filePattern = "access.log"
logPath = "/root/tmp"
logType = "json_log"
maxDepth = 10
topicFormat = "default"
}
)
input_type = "file"
log_sample = "test"
logstore = "tf-test-logstore"
name = "tf-log-config"
output_type = "LogService"
project = "tf-test-project-zhangjiakou"
}
可以看到日志已经可以正常采集。
使用Terraform变更SLS资源
基础设施往往不是一成不变的,经常会随着业务的变化而变化。Terraform提供了管理资源变更的能力,只需要修改Terraform配置模版,Terraform就可以构建出execution plan,只修改对应的部分达到预期的状态。
例如随着业务的增张,使用sls的场景也会发生变化。例如
1、之前logstore 7天的ttl时间太短,需要调整成30天。
2、ip型机器组不好扩展,需要改成自定义标识型。
首先更改terraform.tf配置模版:
provider "alicloud" {
region = "cn-zhangjiakou"
}
resource "alicloud_log_project" "test" {
name = "tf-test-project-zhangjiakou"
description = "create by terraform"
}
resource "alicloud_log_store" "test" {
project = alicloud_log_project.test.name
name = "tf-test-logstore"
retention_period = 30
shard_count = 3
auto_split = true
max_split_shard_count = 60
append_meta = true
}
resource "alicloud_log_store_index" "test" {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
full_text {
case_sensitive = true
token = " #$%^*\r\n "
}
}
resource "alicloud_log_machine_group" "test" {
project = alicloud_log_project.test.name
name = "tf-log-machine-group"
topic = "terraform"
identify_type = "userdefined"
identify_list = ["user_defined_id_zhangjiakou_test"]
}
resource "alicloud_logtail_config" "test" {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
input_type = "file"
log_sample = "test"
name = "tf-log-config"
output_type = "LogService"
input_detail = <<DEFINITION
{
"logPath": "/root/tmp",
"filePattern": "access.log",
"logType": "json_log",
"topicFormat": "default",
"discardUnmatch": false,
"enableRawLog": false,
"fileEncoding": "gbk",
"maxDepth": 10
}
DEFINITION
}
resource "alicloud_logtail_attachment" "test" {
project = alicloud_log_project.test.name
logtail_config_name = alicloud_logtail_config.test.name
machine_group_name = alicloud_log_machine_group.test.name
}
运行terraform apply
使得新的配置生效。Terraform会识别出配置的变化,并生成了2条execution plan。
$ terraform apply
alicloud_log_project.test: Refreshing state... [id=tf-test-project-zhangjiakou]
alicloud_log_machine_group.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_config.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config]
alicloud_log_store_index.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_attachment.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# alicloud_log_machine_group.test will be updated in-place
~ resource "alicloud_log_machine_group" "test" {
id = "tf-test-project-zhangjiakou:tf-log-machine-group"
~ identify_list = [
- "172.26.51.68",
+ "user_defined_id_zhangjiakou_test",
]
~ identify_type = "ip" -> "userdefined"
name = "tf-log-machine-group"
project = "tf-test-project-zhangjiakou"
topic = "terraform"
}
# alicloud_log_store.test will be updated in-place
~ resource "alicloud_log_store" "test" {
append_meta = true
auto_split = true
enable_web_tracking = false
id = "tf-test-project-zhangjiakou:tf-test-logstore"
max_split_shard_count = 60
name = "tf-test-logstore"
project = "tf-test-project-zhangjiakou"
~ retention_period = 7 -> 30
shard_count = 3
shards = [
{
begin_key = "00000000000000000000000000000000"
end_key = "55000000000000000000000000000000"
id = 0
status = "readwrite"
},
{
begin_key = "55000000000000000000000000000000"
end_key = "aa000000000000000000000000000000"
id = 1
status = "readwrite"
},
{
begin_key = "aa000000000000000000000000000000"
end_key = "ffffffffffffffffffffffffffffffff"
id = 2
status = "readwrite"
},
]
}
Plan: 0 to add, 2 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
alicloud_log_machine_group.test: Modifying... [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Modifying... [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_log_machine_group.test: Modifications complete after 1s [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Modifications complete after 1s [id=tf-test-project-zhangjiakou:tf-test-logstore]
Apply complete! Resources: 0 added, 2 changed, 0 destroyed.
使用Terraform管理过程中,控制台变更问题
问题背景
如果长期使用Terraform管理SLS资源,但是有时也存在偶尔通过控制台(或者API等其他渠道)对SLS资源做配置变更,这种变更Terraform不会天然感知到的。这里要分两种情况:
1、控制台或API变更了Terraform已经管控的资源的属性。
场景:上文Terraform创建的tf-test-logstore logstore的TTL是30,控制台改成了10,此时执行terraform apply会将TTL刷回成30。
2、控制台或API变更了Terraform未管控的资源的。
场景:控制台在tf-test-logstore下创建了新采集配置console_manual。因为不在Terraform配置模版的管控范围内,terraform apply不会有任何影响。
对于上述两种情况,如何保证Terraform管控的一致性呢?这时候就可以用到terraform import命令来解决这个问题了。虽然说可以有机制来进行配置的同步,但是整个操作过程特别是变更识别的过程相对来说还是比较繁琐的,所以强烈建议使用来Terraform管控后,尽量还是要保证配置的单一性,避免无谓的管理开销。
接下来我们如何同步上述提到的两类变更。
解决办法
首先明确涉及变更的资源。
- 存量变更:logstore: tf-test-logstore ttl改成了10
- resource:alicloud_log_store.test
- 新增变更:logstore: tf-test-logstore下新建了console_manual采集配置,并且该采集配置机器组建立了关联。
- 新增resource:alicloud_logtail_config.console_manual
- 新增resource:alicloud_logtail_attachment.console_manual
存量变更直接在terraform.tf找到对应位置即可。新增变更,因为配置较多无法手动修改,所以先在terraform.tf中声明即可。terraform.tf变更点如下:
resource "alicloud_log_store" "test" {
- retention_period = 30
+ retention_period = 10
+ resource "alicloud_logtail_config" "console_manual_config" {
+ project = alicloud_log_project.test.name
+ logstore = alicloud_log_store.test.name
+ name = "console_manual"
+ }
+ resource "alicloud_logtail_attachment" "console_manual_attachment" {
+ project = alicloud_log_project.test.name
+ logtail_config_name = alicloud_logtail_config.console_manual_config.name
+ machine_group_name = alicloud_log_machine_group.test.name
+ }
针对新增的管控资源执行terraform import命令。
命令格式:terraform import <资源类型>.<资源标识> <资源ID>
# 导出新增的采集配置console_manual
## 资源ID取值为 project:logstore:name
terraform import alicloud_logtail_config.console_manual_config tf-test-project-zhangjiakou:tf-test-logstore:console_manual
# 导出新增采集配置console_manual跟机器组的绑定关系
## 资源ID取值为 project:logtail_config_name:machine_group_name
terraform import alicloud_logtail_attachment.console_manual_attachment tf-test-project-zhangjiakou:console_manual:tf-log-machine-group
查看terraform.tfstate会发现上述导出资源的最新状态,然后根据这些状态去补齐terraform.tf模版配置。是否完整补齐通过terraform plan命令进行验证,这个过程可能存在反复。
# terraform.tfstate新增的资源状态
{
"mode": "managed",
"type": "alicloud_logtail_config",
"name": "console_manual_config",
"provider": "provider.alicloud",
"instances": [
{
"schema_version": 0,
"attributes": {
"id": "tf-test-project-zhangjiakou:tf-test-logstore:console_manual",
"input_detail": "{\"adjustTimezone\":false,\"advanced\":{\"force_multiconfig\":false,\"tail_size_kb\":1024},\"delayAlarmBytes\":0,\"delaySkipBytes\":0,\"discardNonUtf8\":false,\"discardUnmatch\":false,\"dockerExcludeEnv\":{},\"dockerExcludeLabel\":{},\"dockerFile\":false,\"dockerIncludeEnv\":{},\"dockerIncludeLabel\":{},\"enableRawLog\":false,\"enableTag\":false,\"fileEncoding\":\"utf8\",\"filePattern\":\"test.log\",\"filterKey\":[],\"filterRegex\":[],\"key\":[\"content\"],\"localStorage\":true,\"logBeginRegex\":\".*\",\"logPath\":\"/root/tmp\",\"logTimezone\":\"\",\"logType\":\"common_reg_log\",\"maxDepth\":10,\"maxSendRate\":-1,\"mergeType\":\"topic\",\"preserve\":true,\"preserveDepth\":1,\"priority\":0,\"regex\":\"(.*)\",\"sendRateExpire\":0,\"sensitive_keys\":[],\"shardHashKey\":[],\"tailExisted\":false,\"timeFormat\":\"\",\"topicFormat\":\"none\"}",
"input_type": "file",
"log_sample": "",
"logstore": "tf-test-logstore",
"name": "console_manual",
"output_type": "LogService",
"project": "tf-test-project-zhangjiakou"
}
}
]
},
{
"mode": "managed",
"type": "alicloud_logtail_attachment",
"name": "console_manual_attachment",
"provider": "provider.alicloud",
"instances": [
{
"schema_version": 0,
"attributes": {
"id": "tf-test-project-zhangjiakou:console_manual:tf-log-machine-group",
"logtail_config_name": "console_manual",
"machine_group_name": "tf-log-machine-group",
"project": "tf-test-project-zhangjiakou"
}
}
]
}
当terraform plan测试通过后,说明terraform已经完成了配置同步。
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
alicloud_log_project.test: Refreshing state... [id=tf-test-project-zhangjiakou]
alicloud_log_machine_group.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_log_store_index.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_config.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config]
alicloud_logtail_config.console_manual_config: Refreshing state... [id=tf-test-project-zhangjiakou:tf-test-logstore:console_manual]
alicloud_logtail_attachment.console_manual_attachment: Refreshing state... [id=tf-test-project-zhangjiakou:console_manual:tf-log-machine-group]
alicloud_logtail_attachment.test: Refreshing state... [id=tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group]
------------------------------------------------------------------------
No changes. Infrastructure is up-to-date.
This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.
Terraform纳管存量SLS资源
对于长期使用控制台、阿里云CLI、资源编排服务或者直接调用API创建和管理SLS资源,初次使用Terraform时,同样面临着使用Terrform将存量的资源导入的问题。大体步骤如下:
- 找到要使用Terraform接管的project。
- 梳理Project所有的资源列表,包括Logstore列表、Logstore的索引、Logstore的采集配置、机器组列表、机器组跟采集配置的关联关系。
- 通过terraform import命令来对存量资源的导入,进而使用Terraform统一管理。
操作步骤“使用Terraform管理过程中,控制台变更问题”章节类似,这里就不再做详细介绍。