すきま風

勉強したことのメモとか

AWS Fargate × Firelens (Fluentd) でGoogle BigQueryとCloudWatch LogsにLogを連携する

記事の要点

  • Fargateで起動しているApplication LogをBigQueryとCloudWatch Logsに連携する
  • Table Createをしたいので、Fluent-bitではなくFluentdを利用する
  • Fluentd用のDockerfile, custom.conf を実装する

実装

FluentdでBigQuery, CloudWatch Logsへのデータ連携を実装します。データ連携だけならFluent-bitでも実装可能ですが、現時点 (2020/05/30) で、 Fluent-bit BigQuery pluginではtable createができないみたいなのでFluentdを利用します。

Fluentd

Dockerfile

FROM fluent/fluentd:v1.10.4-1.0

USER root

# file copy
COPY conf/extra.conf     /fluentd/etc/extra.conf
COPY conf/schema.json    /fluentd/etc/schema.json
COPY conf/conf.out       /fluentd/etc/conf.out
COPY extra_entrypoint.sh /bin/

# below RUN includes plugin as examples elasticsearch is not required
# you may customize including plugins as you wish
RUN apk add --no-cache --update --virtual .build-deps \
        sudo build-base ruby-dev \
    && chmod +x /bin/extra_entrypoint.sh \
    && sudo gem install fluent-plugin-bigquery -v "~> 2.2.0" \
    && sudo gem install fluent-plugin-record-reformer -v "~> 0.9.1" \
    && sudo gem install fluent-plugin-cloudwatch-logs -v "~> 0.9.4" \
    && sudo gem sources --clear-all \
    && apk del .build-deps \
    && rm -rf /tmp/* /var/tmp/* /usr/lib/ruby/gems/*/cache/*.gem \
    && mkdir -p /fluentd/etc/.keys \
    && chown -R fluent:fluent /fluentd/etc

USER fluent

# HACK: redefine entrypoint
ENTRYPOINT ["/bin/extra_entrypoint.sh"]
CMD ["fluentd"]

BigQueryへの認証用のjson_keyをContainerに用意するために、entrypoint.shを少し修正しています。 なんでこんな面倒くさいことをしているかというと、bigQueryPlugin内で、環境変数をなぜか読み込んでくれなかったからです。記事の最後に記載しておきます。 (教えてエロい人 🤗)

extra_entrypoint.sh

#!/bin/sh

###### start extra #####
# create key file
cat << EOS > /fluentd/etc/.keys/my-jsonkey.json
{
  "client_email": "${BQ_CLIENT_EMAIL}",
  "private_key": "${BQ_PRIVATE_KEY}"
}
EOS
###### end extra #####

# start default entrypoint
# https://github.com/fluent/fluentd-docker-image/blob/master/v1.10/alpine/entrypoint.sh
/bin/entrypoint.sh "$@"

/fluentd/etc/extra.conf

# firelensに設定するfluentdのoption config file
# source directiveはfirelensのdefault fluent.confに定義されているので記述をしない

# fluentTagDockerFormat is the format for the log tag, which is "containerName-firelens-taskID"
# https://github.com/aws/amazon-ecs-agent/blob/master/agent/engine/docker_task_engine.go#L91
<match "#{ENV['CONTAINER_NAME']}-firelens-**">
  @type  relabel
  @label @firelens_log
  @id    forward_all
</match>

# copy record for bigQuery and cloudwatch logs
<label @firelens_log>
  <match **>
    @type copy
    <store>
      @type relabel
      @label @out_bigquery
    </store>
    <store>
      @type relabel
      @label @out_cloudwatch_logs
    </store>
  </match>
</label>

# include bigquery, cloudwatch config file
@include conf.out/*.conf

source directiveはAWS側が用意する fluent.confに記載されるため、ここでは設定しません。また、fluent.confはAWSで予約されているので名前はextra.confにしました。

conf.out/bigquery.conf

<label @out_bigquery>
  <filter **>
    @type record_transformer
    <record>
      tag ${tag}
    </record>
  </filter>

  # grep log
  <filter **>
    @type grep
    <regexp>
      key log
      pattern /^\[(?:[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{6})\] \[API\] INFO \- log \- (?:.*) \- \[\]/
    </regexp>
  </filter>

  # extract message, timestamp
  <filter **>
    @type parser
    key_name log
    <parse>
      @type regexp
      expression /^\[(?<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{6})\] \[API\] INFO \- log \- (?<message>.*) \- \[\]/
    </parse>
  </filter>

  # tag rewrite - add ymd
  <match **>
    @type  record_reformer
    @label @insert_bigquery

    enable_ruby true

    # HACK: Application Logのtimestamp (timezone: Asia/Tokyo) からBigQuery table suffix用のymdを取得する (fluentdはUTCなので)
    tag #{ENV['ENV']}.${record["timestamp"].slice(0, 10).gsub('-', '')}
  </match>
</label>

<label @insert_bigquery>
  <match **>
    @type bigquery_insert

    auth_method json_key
    json_key /fluentd/etc/.keys/my-jsonkey.json

    # buffer - set chunk keys tag
    # https://docs.fluentd.org/configuration/buffer-section#chunk-keys
    <buffer tag>
      flush_interval 1
    </buffer>

    project my-project
    dataset mytable_${tag[0]}  # mytable_dev / mytable_prod
    table log${tag[1]}         # log20200101
    auto_create_table true
    schema_path /fluentd/etc/schema.json
  </match>
</label>

schema.json

[
  {
    "name": "timestamp",
    "type": "TIMESTAMP",
    "mode": "REQUIRED"
  },
  {
    "name": "message",
    "type": "STRING"
  }
]

conf.out.cloudwatch.conf

<label @out_cloudwatch_logs>
  <match **>
    @type cloudwatch_logs
    log_group_name "#{ENV['LOG_GROUP_NAME']}"
    region "#{ENV['AWS_REGION']}"
    use_tag_as_stream  true
    auto_create_stream true
  </match>
</label>

Fargate

task_definition

要点だけ抜粋します

{
  "name": "${local.myapp_name}",
  "image": "${local.myapp_image}",
  "logConfiguration": {
    "logDriver": "awsfirelens",
    "options": {
      "region": "${data.aws_region.current.name}",
      "auto_create_stream": "true",
      "log_group_name": "${aws_cloudwatch_log_group.myapp.name}",
      "use_tag_as_stream": "true",
      "@type": "cloudwatch_logs"
    }
  }
},
{
  "name": "${local.log_router_name}",
  "image": "${local.log_router_image}",
  "essential": true,
  "firelensConfiguration": {
    "type": "fluentd",
    "options": {
      "config-file-type": "file",
      "config-file-value": "/fluentd/etc/extra.conf"
    }
  },
  "linuxParameters": {
    "initProcessEnabled": true
  },
  "environment" : [
    { "name" : "ENV", "value" : "${local.environment}" },
    { "name" : "CONTAINER_NAME", "value" : "${local.myapp_name}" },
    { "name" : "LOG_GROUP_NAME", "value" : "${aws_cloudwatch_log_group.myapp.name}" },
    { "name" : "AWS_REGION", "value" : "${data.aws_region.current.name}" }
  ],
  "secrets": [
    {
      "name": "BQ_CLIENT_EMAIL",
      "valueFrom": "${aws_ssm_parameter.big_query_client_email.name}"
    },
    {
      "name": "BQ_PRIVATE_KEY",
      "valueFrom": "${aws_ssm_parameter.big_query_private_key.name}"
    }
  ],
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-region": "${data.aws_region.current.name}",
      "awslogs-group": "${aws_cloudwatch_log_group.log_router.name}",
      "awslogs-stream-prefix": "firelens"
    }
  }
}

要点

  • config-file-type は file / s3が選択できますが、Firelensは1.4.0時点でs3を選択できない
  • json_keyに利用する環境変数は ssm から secretsで取得
  • fluentdの公式Dockerのentrypoint.shはtiniを使っていたけど上書きしたので代わりにinitProcessEnabledを指定
  • extra.confでlogを全部matchするが、awsのfluent.confの初期化のためにlogConfigration#optionsが必要

おまけ

json_keyは、本当はこんな感じで指定したかった (けど動かなかった) 😔

<match dummy>
  @type bigquery_insert

  auth_method json_key
  json_key {"private_key": "#{ENV['BQ_PRIVATE_KEY']}", "client_email": "#{ENV['BQ_CLIENT_EMAIL']}"}

</match>

参考

GitHub - fluent-plugins-nursery/fluent-plugin-bigquery

BigQuery - Fluent Bit: Official Manual

カスタムログルーティング - Amazon ECS

GitHub - fluent/fluentd-docker-image: Docker image for Fluentd

FargateでFireLensログドライバを使い自前で用意したfluentdを動かす - Qiita