7.4. hdfs: Storing messages on the Hadoop Distributed File System (HDFS)

Starting with version 5.3, syslog-ng PE can send plain-text log files to the Hadoop Distributed File System (HDFS), allowing you to store your log data on a distributed, scalable file system. This is especially useful if you have huge amount of log messages that would be difficult to store otherwise, or if you want to process your messages using Hadoop tools (for example, Apache Pig).

Note

In order to use this destination, syslog-ng Premium Edition must run in server mode. Typically, only the central syslog-ng Premium Edition server uses this destination. For details on the server mode, see Section 2.3.3, Server mode.

Note the following limitations when using the syslog-ng PE hdfs destination:

  • This destination is only supported on the Linux platforms that use the linux glibc2.11 installer, including: Debian 7 (wheezy), Red Hat ES 7, Ubuntu 12.04 (Precise Pangolin), Ubuntu 14.04 (Trusty Tahr).

  • Since syslog-ng PE uses the official Java HDFS client, the hdfs destination has significant memory usage (about 400MB).

  • You cannot set when log messages are flushed. Hadoop performs this action automatically, depending on its configured block size, and the amount of data received. There is no way for the syslog-ng PE application to influence when the messages are actually written to disk. This means that syslog-ng PE cannot guarantee that a message sent to HDFS is actually written to disk. When using flow-control, syslog-ng PE acknowledges a message as written to disk when it passes the message to the HDFS client. This method is as reliable as your HDFS environment.

  • The log messages of the underlying client libraries are available in the internal() source of syslog-ng PE.

Note

The hdfs destination has been tested with Hortonworks Data Platform.

Declaration: 

@module mod-java
@include "scl.conf"

hdfs(
    client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:<path-to-preinstalled-hadoop-libraries>")
    hdfs-uri("hdfs://NameNode:8020")
    hdfs-file("<path-to-logfile>")
);
Example 7.10. Storing logfiles on HDFS

The following example defines an hdfs destination using only the required parameters.

@module mod-java
@include "scl.conf"

destination d_hdfs {
    hdfs(
        client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/hadoop/libs")
        hdfs-uri("hdfs://10.140.32.80:8020")
        hdfs-file("/user/log/logfile.txt")
    );
};