8.6.1. CSV parsers

The syslog-ng application can separate parts of log messages (i.e., the contents of the $MSG macro) to named fields (columns). These fields act as user-defined macros that can be referenced in message templates, file- and tablenames, etc.

To create a parser, define the columns of the message, the delimiter or separator characters, and optionally the characters that are used to escape the delimiter characters (quote-pairs).

Declaration:
    parser parser_name {
        csv-parser(column1, column2, ...)
        delimiters()
        quote-pairs()
    };

Column names work like macros. Always use a prefix to identify the columns of the parsers, e.g., MYPARSER1.COLUMN1, MYPARSER2.COLUMN2, etc. Column names starting with a dot (e.g., .HOST) are reserved for use by syslog-ng.

Name Synopsis Description
csv-parser csv-parser(columns("PARSER.COLUMN1", "PARSER.COLUMN2", ...)) Specifies the type of parser to use, and the name of the columns to separate messages to. Currently only the csv-parser is implemented, which can separate columns based on delimiter characters and strings.
delimiters delimiters("<delimiter_characters>") The character that separates the columns in the message.
flags() drop-invalid, escape-none, escape-backslash, escape-double-char, greedy, strip-whitespace

When the drop-invalid option is set, the parser does not process messages that have less columns than defined in the parser. Using this option practically turns the parser into a special filter, that matches messages that have the predifined number of columns (using the specified delimiters).

The escape-none, escape-backslash, escape-double-char flags set the escaping rules used by the parser.

The greedy option assigns the remainder of the message to the last column, regardless of the delimiter characters set. You can use this option to process messages where the number of columns varies.

The strip-whitespace flag removes trailing whitespaces from the beginning and the end of the columns.

quote-pairs() quote-pairs('<quote_pairs>') List quote-pairs between single quotes. Delimiter characters enclosed between quote characters are ignored. Note that the beginning and ending quote character does not have to be identical, e.g., [} can also be a quote-pair.
template() template("${<macroname>}") The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, this is empty and the parser processes the entire message.

Table 8.21. Parser parameters


[Example] Example 8.27. Segmenting hostnames separated with a dash

The following example separates hostnames like example-1 and example-2 into two parts.

parser p_hostname_segmentation {
    csv-parser(columns("HOSTNAME.NAME", "HOSTNAME.ID")
    delimiters("-")
    flags(escape-none)
    template("${HOST}"));
};
destination d_file { file("/var/log/messages-${HOSTNAME.NAME:-examplehost}"); };
log { source(s_local); parser(p_hostname_segmentation); destination(d_file);};
[Example] Example 8.28. Parsing Apache log files

The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"

Here is a sample message:

192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.balabit

To parse such logs, the delimiter character is set to a single whitespace (delimiters(" ")). Whitespaces between quotes and brackets are ignored (quote-pairs('""[]')).

parser p_apache {
    csv-parser(columns("APACHE.CLIENT_IP", "APACHE.IDENT_NAME", "APACHE.USER_NAME",
        "APACHE.TIMESTAMP", "APACHE.REQUEST_URL", "APACHE.REQUEST_STATUS",
        "APACHE.CONTENT_LENGTH", "APACHE.REFERER", "APACHE.USER_AGENT",
        "APACHE.PROCESS_TIME", "APACHE.SERVER_NAME")
         flags(escape-double-char,strip-whitespace)
         delimiters(" ")
         quote-pairs('""[]')
         );
};

The results can be used for example to separate log messages into different files based on the APACHE.USER_NAME field. If the field is empty, the nouser name is assigned.

log { source(s_local);
    parser(p_apache); destination(d_file);};
};
destination d_file { file("/var/log/messages-${APACHE.USER_NAME:-nouser}"); };
[Example] Example 8.29. Segmenting a part of a message

The following example splits the timestamp of a parsed Apache log message into separate fields.

parser p_apache_timestamp {
    csv-parser(columns("APACHE.TIMESTAMP.DAY", "APACHE.TIMESTAMP.MONTH", "APACHE.TIMESTAMP.YEAR", "APACHE.TIMESTAMP.HOUR", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.ZONE")
    delimiters("/: ")
    flags(escape-none)
    template("${APACHE.TIMESTAMP}"));
    };
log { source(s_local);
    log { parser(p_apache); parser(p_apache_timestamp); destination(d_file);};
};
[Example] Example 8.30. Adding the end of the message to the last column

If the greedy option is enabled, the syslog-ng application adds the not-yet-parsed part of the message to the last column, ignoring any delimiter characters that may appear in this part of the message.

For example, you receive the following comma-separated message: example 1, example2, example3, and you segment it with the following parser:

csv_parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(","));

The COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, respectively. If the message looks like example 1, example2, example3, some more information, then any text appearing after the third comma (i.e., some more information) is not parsed, and possibly lost if you use only the variables to reconstruct the message (for example, to send it to different columns of an SQL table).

Using the greedy flag will assign the remainder of the message to the last column, so that the COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, some more information.

csv_parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(",") flags(greedy));

© 2007-2010 BalaBit IT Security
Please send your comments or documentation bugs to: documentation@balabit.com