상세 컨텐츠

본문 제목

Grafana + Telegraf + InfluxDB 이용한 모니터링 시스템 설치

Ops/Monitoring

by 크리두 2019. 11. 19. 10:05

본문

반응형

서버 환경 : Ubuntu 18.04

 

InfluxDB 설치

sourcelist 추가

$ echo "deb https://repos.influxdata.com/ubuntu bionic stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

 

import apt key

 

$ sudo curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -



update apt index and install influxdb



$ sudo apt-get update

$sudo apt-get install -y influxdb

 

Start and enable the service to start on boot up.

 

$ sudo systemctl enable --now influxdb

$ sudo systemctl is-enabled influxdb



Grafana 설치

 

참조

https://grafana.com/docs/installation/debian/ 



$ sudo apt-get install -y software-properties-common



$ sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"

 

에러 참고

 

 

 

# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys <NO_PUBKEY 값>

 

$ wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install grafana
$ sudo apt-get install -y apt-transport-https

 

# systemctl daemon-reload
# systemctl start grafana-server
# systemctl status grafana-server

 

$ sudo systemctl enable grafana-server.service

 

Telegraf 설치

 

참조

https://docs.influxdata.com/telegraf/v1.12/introduction/installation/


repository 추가

# wget -qO- https://repos.influxdata.com/influxdb.key | apt-key add -

# source /etc/lsb-release

# echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | tee /etc/apt/sources.list.d/influxdb.list

에러 참고

gpg: no valid OpenPGP data found.

 

repos.influxdata.com 과 통신이 되는지 확인

(repository 와 통신이 안되는 경우가 있으니 네트워크 확인 필요)

 

apt 업데이트 transport-https 설치

$ sudo apt-get install apt-transport-https -y

$ sudo apt-get update


# Add the InfluxData key

# wget -qO- https://repos.influxdata.com/influxdb.key | apt-key add -
source /etc/os-release
test $VERSION_ID = "7" && echo "deb https://repos.influxdata.com/debian wheezy stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
test $VERSION_ID = "8" && echo "deb https://repos.influxdata.com/debian jessie stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
test $VERSION_ID = "9" && echo "deb https://repos.influxdata.com/debian stretch stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
test $VERSION_ID = "10" && echo "deb https://repos.influxdata.com/debian buster stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

Telegraf 설치

$ sudo apt-get update && sudo apt-get install telegraf -y
$ sudo service telegraf start

 

telegraf conf 설정

# Global tags can be specified here in key="value" format.
[global_tags]
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  hostname = "hostname" #호스트 이름 변경
  omit_hostname = false


### OUTPUT

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"] #influxdb 주소 변경
  database = "xdn_telegraf"

  ## Retention policy to write to. Empty string writes to the default rp.
  retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
  write_consistency = "any"

  ## Write timeout (for the InfluxDB client), formatted as a string.
  ## If not provided, will default to 5s. 0s means no timeout (not recommended).
  timeout = "5s"
  # username = "telegraf"
  # password = "2bmpiIeSWd63a7ew"
  ## Set the user agent for HTTP POSTs (can be useful for log differentiation)
  # user_agent = "telegraf"
  ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
  # udp_payload = 512


# Read metrics about cpu usage
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## Comment this line if you want the raw CPU time metrics
  fielddrop = ["time_*"]


# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  # mount_points = ["/"]

  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ignore_fs = ["tmpfs", "devtmpfs"]


# Read metrics about disk IO by device
[[inputs.diskio]]
  ## By default, telegraf will gather stats for all devices including
  ## disk partitions.
  ## Setting devices will restrict the stats to the specified devices.
  # devices = ["sda", "sdb"]
  ## Uncomment the following line if you need disk serial numbers.
  # skip_serial_number = false


# Get kernel statistics from /proc/stat
[[inputs.kernel]]
  # no configuration


# Read metrics about memory usage
[[inputs.mem]]
  # no configuration


# Get the number of processes and group them by status
[[inputs.processes]]
  # no configuration


# Read metrics about swap memory usage
[[inputs.swap]]
  # no configuration


# Read metrics about system load & uptime
[[inputs.system]]
  # no configuration

# Read metrics about network interface usage
[[inputs.net]]
  # collect data only about specific interfaces
  # interfaces = ["eth0"]


[[inputs.netstat]]
  # no configuration

[[inputs.interrupts]]
  # no configuration

[[inputs.linux_sysctl_fs]]
  # no configuration

[[inputs.ping]]
  ## List of urls to ping
  urls = ["IP"] # Ping 체크 주소 추가

  ## Number of pings to send per collection (ping -c )
  # count = 1

  ## Interval, in s, at which to ping. 0 == default (ping -i )
  ## Not available in Windows.
  # ping_interval = 1.0

  ## Per-ping timeout, in s. 0 == no timeout (ping -W )
  # timeout = 1.0

  ## Total-ping deadline, in s. 0 == no deadline (ping -w )
  # deadline = 10

  ## Interface or source address to send ping from (ping -I <INTERFACE/SRC_ADDR>)
  ## on Darwin and Freebsd only source address possible: (ping -S )
  # interface = ""

  ## Specify the ping executable binary, default is "ping"
  # binary = "ping"

  ## Arguments for ping command
  ## when arguments is not empty, other options (ping_interval, timeout, etc) will be ignored
  # arguments = ["-c", "3"]

 

* 오류 참조

[inputs.ping] did not complete within its interval

--> https://github.com/influxdata/telegraf/issues/5796

 

input did not complete within its interval · Issue #5796 · influxdata/telegraf

Relevant telegraf.conf: [global_tags] [agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval...

github.com

--> https://github.com/influxdata/telegraf/issues/1654

 

Ping logs a parse error on timeout · Issue #1654 · influxdata/telegraf

Bug report Relevant telegraf.conf: [[inputs.ping]] urls = ["1.2.3.4"] count = 1 System info: Telegraf 1.0b3 FreeBSD 10.1 Steps to reproduce: telegraf -config telegraf.conf -test Expected ...

github.com

 

Slack Alert 세팅

 

https://grafana.com/docs/v4.1/alerting/notifications/

 

Alerting Notifications

Alert Notifications Alerting is only available in Grafana v4.0 and above. When an alert changes state it sends out notifications. Each alert rule can have multiple notifications. But in order to add a notification to an alert rule you first need to add and

grafana.com

Slack 관련 내용에서 https://api.slack.com/messaging/webhooks 클릭 후 app 생성 필요

다음에 자신의 slack에서 channel 추가

반응형

관련글 더보기

댓글 영역