8.6.2. Pattern databases

8.6.2.1. Using pattern parsers

Pattern parsers attempt to parse a part of the message using rules specific to the type of the parser. Parsers are enclosed between @ characters. The syntax of parsers is the following:

  • a beginning @ character;

  • the type of the parser written in capitals;

  • optionally a name;

  • parameters of the parser, if any;

  • a closing @ character.

[Example] Example 8.31. Pattern parser syntax

A simple parser:

@STRING@

A named parser:

@STRING:myparser_name@

A named parser with a parameter:

@STRING:myparser_name:*@

A parser with a parameter, but without a name:

@STRING::*@

The following parsers are available:

  • @ANYSTRING@: Parses everything to the end of the message; you can use it to collect everything that is not parsed specifically to a single macro. In that sense its behavior is similar to the greedy() option of the CSV parser.

  • @DOUBLE@: An obsolete alias of the @FLOAT@ parser.

  • @ESTRING@: This parser has a required parameter that acts as the stopcharacter: the parser parses everything until it finds the stopcharacter. For example to stop by the next " (double quote) character, use @ESTRING::"@. As of syslog-ng 3.1, it is possible to specify a stopstring instead of a single character, e.g., @ESTRING::stop_here.@.

  • @FLOAT@: A floating-point number that may contain a dot (.) character. (Up to syslog-ng 3.1, the name of this parser was @DOUBLE@.)

  • @IPv4@: Parses an IPv4 IP address (numbers separated with a maximum of 3 dots).

  • @IPv6@: Parses any valid IPv6 IP address.

  • @IPvANY@: Parses any IP address.

  • @NUMBER@: A sequence of decimal (0-9) numbers (e.g., 1, 0687, etc.). Note that if the number starts with the 0x characters, it is parsed as a hexadecimal number, but only if at least one valid character follows 0x.

  • @QSTRING@: Parse a string between the quote characters specified as parameter. Note that the quote character can be different at the beginning and the end of the quote, e.g.: @QSTRING::"@ parses everything between two quotation marks ("), while @QSTRING:<>@ parses from an opening bracket to the closing bracket.

  • @STRING@: A sequence of alphanumeric characters (0-9, A-z), not including any whitespace. Optionally, other accepted characters can be listed as parameters (e.g., to parse a complete sentence, add the whitespace as parameter, like: @STRING:: @). Note that the @ character cannot be a parameter, nor can line-breaks or tabs.

Patterns and literals can be mixed together. For example, to parse a message that begins with the Host: string followed by an IP address (e.g., Host: 192.168.1.1), the following pattern can be used: Host:@IPv4@.

[Note] Note

Note that using parsers is a CPU-intensive operation. Use the ESTRING and QSTRING parsers whenever possible, as these can be processed much faster than the other parsers.

[Example] Example 8.32. Using the STRING and ESTRING parsers

For example, if the message is user=joe96 group=somegroup, @STRING:mytext:@ parses only to the first non-alphanumeric character (=), parsing only user. @STRING:mytext:=@ parses the equation mark as well, and proceeds to the next non-alphanumeric character (the whitespace), resulting in user=joe96 being parsed. @STRING:mytext:= @ will parse the whitespace as well, and proceed to the next non-alphanumeric non-equation mark non-whitespace character, resulting in user=joe96 group=somegroup.

Of course, usually it is better to parse the different values separately, like this: "user=@STRING:user@ group=@STRING:group@".

If the username or the group may contain non-alphanumeric characters, you can either include these in the second parameter of the parser (as shown at the beginning of this example), or use an ESTRING parser to parse the message till the next whitespace: "user=@ESTRING:user: @group=@ESTRING:group: @".

8.6.2.2. Filtering messages based on classification

The results of message classification and parsing can be used in custom filters and file and database templates as well. There are two built-in macros in syslog-ng that allow you to use the results of the classification: the .classifier.class macro contains the class assigned to the message (e.g., violation, security, or unknown), while the .classifier.rule_id macro contains the identifier of the message pattern that matched the message.

[Example] Example 8.33. Using classification results for filtering messages

To filter on a specific message class, create a filter that checks the .classifier_class macro, and use this filter in a log statement.

filter fi_class_violation {
                        match("violation"
                        value(".classifier.class")
                        type("string")
                        );
                        };
log { 
                        source(s_all);
                        parser(pattern_db);
                        filter(fi_class_violation);
                        destination(di_class_violation);
                        };

Filtering on the unknown class selects messages that did not match any rule of the pattern database. Routing these messages into a separate file allows you to periodically review new or unknown messages.

To filter on messages matching a specific classification rule, create a filter that checks the .classifier_rule_id macro. The unique identifier of the rule (e.g., e1e9c0d8-13bb-11de-8293-000c2922ed0a) is the id attribute of the rule in the XML database.

filter fi_class_rule {
                        match("e1e9c0d8-13bb-11de-8293-000c2922ed0a"
                        value(".classifier_rule_id")
                        type("string")
                        );
                        };

The message-segments parsed by the pattern parsers can also be used as macros as well. To accomplish this, you have to add a name to the parser, and then you can use this name as a macro that refers to the parsed value of the message.

[Example] Example 8.34. Using pattern parsers as macros

For example, you want to parse messages of an application that look like "Transaction: <type>.", where <type> is a string that has different values (e.g., refused, accepted, incomplete, etc.). To parse these messages, you can use the following pattern:

'Transaction: @ESTRING::.@'

Here the @ESTRING@ parser parses the message until the next full stop character. To use the results in a filter or a filename template, include a name in the parser of the pattern, e.g.:

'Transaction: @ESTRING:TRANSACTIONTYPE:.@'

After that, add a custom template to the logpath that uses this template. For example, to select every accepted transaction, use the following custom filter in the log path:

match("accepted" value("TRANSACTIONTYPE"));
[Note] Note

The above macros can be used in database columns and filename templates as well, if you create custom templates for the destination or logspace.

Use a consistent naming scheme for your macros, for example, APPLICATIONNAME_MACRONAME.

8.6.2.3. Creating pattern databases

Pattern databases are XML files that contain rules describing the message patterns.

The XML schema of the V1 pattern database used in syslog-ng OSE and PE 3.0.X is the following:

[Warning] Warning

This is an experimental database format that will change in the future releases of syslog-ng. When the new format will be released, an upgrading script will be available to convert the existing databases to the new format. Note that the sample pattern databases available at the BalaBit website already use the new format (dubbed V2).

  • <patterndb>: The container element of the pattern database. For example:

    <patterndb version='1' pub_date='2008-08-25'>
  • version: The schema version of the pattern database. The current version is 2.

  • pubdate: The publication date of the XML file.

  • <program>: A container element to group log patterns for an application or program. For example:

    <program name='su' id='480de478-d4a6-4a7f-bea4-0c0245d361e1'>

    <patterndb> element may contain any number of <program> elements.

    • name: The name of the application. Note that the function of this attribute is to make the database more readable, syslog-ng uses the <pattern> element to identify the applications sending log messages.

    • id: A unique ID of the application, for example, the md5 sum of the name attribute.

    • pattern: The name of the application — syslog-ng matches this value to the $PROGRAM header of the syslog message to find the rulesets applicable to the syslog message. This element is also called program pattern. E.g.,

      <pattern>su</pattern>

    • description: OPTIONAL — A description of the ruleset or the application.

    • url: OPTIONAL — An URL referring to further information about the ruleset or the application.

    • <rules>: A container element for the rules of the ruleset.

      • <rule>: An element containing message patterns and how a message that matches these patterns is classified. For example:

        <rule provider='balabit' id='f57196aa-75fd-11dd-9bba-001e6806451b' class='violation'>

        [Note] Note

        If the following characters appear in the message, they must be escaped in the rule as follows:

        • @: Use @@, e.g., user@@example.com

        • <: Use &lt;

        • >: Use &gt;

        • &: Use &amp;

        The <rules> element may contain any number of <rule> elements.

      • provider: The provider of the rule. This is used to distinguish between who supplied the rule; i.e., if it has been created by BalaBit, or added to the xml by a local user.

      • id: The globally unique ID of the rule.

      • class: The class of the rule — syslog-ng assigns this class to the messages matching a pattern of this rule.

      • <pattern>: A pattern describing a log message. This element is also called message pattern. For example:

        <pattern>+ ??? root-</pattern>
[Example] Example 8.35. A V1 pattern database containing a single rule

The following pattern database contains a single rule that matches log messages of the PF packet-filtering application. A sample log message looks like:

PF: DROP filter/INPUT IN=eth0 OUT= MAC=00:1A:4B:80:90:C9:00:1A:4B:80:90:C6 SRC=192.168.155.11 DST=192.168.155.1 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=51939 DF PROTO=TCP SPT=34407 DPT=80 WINDOW=32792 RES=0x00 SYN URGP=0

The following is a simple pattern database containing a matching rule.

<patterndb version='1' pub_date='2009-04-17'>
    <program name='PF'>
        <pattern>PF</pattern>
            <rule id='1' class='pf'>
                <pattern>@STRING:PF.VERDICT@ @STRING:PF.CHAIN:/@ IN=@STRING:PF.IN_IFACE@ OUT= MAC=@STRING:PF.MAC::@ SRC=@IPV4:PF.SRC_IP@ DST=@IPV4:PF.DST_IP@ LEN=@NUMBER:PF.PKT_LEN@ TOS=@STRING:PF.TOS@ PREC=@STRING:PF.PREC@ TTL=@NUMBER:PF.TTL@ ID=@NUMBER:PF.ID@ DF PROTO=@STRING:PF.PROTO@ SPT=@NUMBER:PF.SRC_PORT@ DPT=@NUMBER:PF.DST_PORT@ WINDOW=@NUMBER:PF.TCP_WINDOW@ RES=@STRING:PF.RES@ SYN URGP=@NUMBER:PF.TCP_URGP@</pattern>
            </rule>
    </program>
</patterndb>

Note that the rule uses macros that refer to parts of the message, for example, you can use the $PF.DST_IP macro refer to the destination IP address of the logged connection+.

The following scheme describes the V2 format of the pattern database. This format is used by the syslog-ng Store Box (SSB) appliance version 1.0.x (see http://www.balabit.com/network-security/syslog-ng/log-server-appliance/ for details).

For a sample database containing only a single pattern, see Example 8.36, “A V2 pattern database containing a single rule”.

  • <patterndb>: The container element of the pattern database. For example:

    <patterndb version='2' pub_date='2008-08-25'>
  • version: The schema version of the pattern database. The current version is 2.

  • pubdate: The publication date of the XML file.

  • <ruleset>: A container element to group log patterns for an application or program. For example:

    <ruleset name='su' id='480de478-d4a6-4a7f-bea4-0c0245d361e1'>

    A <patterndb> element may contain any number of <ruleset> elements.

    • name: The name of the application. Note that the function of this attribute is to make the database more readable, syslog-ng uses the <pattern> element to identify the applications sending log messages.

    • id: A unique ID of the application, for example, the md5 sum of the name attribute.

    • description: OPTIONAL — A description of the ruleset or the application.

    • url: OPTIONAL — An URL referring to further information about the ruleset or the application.

    • pattern: The name of the application — syslog-ng matches this value to the $PROGRAM header of the syslog message to find the rulesets applicable to the syslog message. This element is also called program pattern. E.g.,

      <pattern>su</pattern>

      [Note] Note

      If the <pattern> element of a ruleset is not specified, -ng will use this ruleset as a fallback ruleset: it will apply the ruleset to messages that have an empty PROGRAM header, or if none of the program patterns matched the PROGRAM header of the incoming message.

    • <rules>: A container element for the rules of the ruleset.

      • <rule>: An element containing message patterns and how a message that matches these patterns is classified. For example:

        <rule provider='balabit'
                                                        id='f57196aa-75fd-11dd-9bba-001e6806451b'
                                                        class='violation'>
        [Note] Note

        If the following characters appear in the message, they must be escaped in the rule as follows:

        • @: Use @@, e.g., user@@example.com

        • <: Use &lt;

        • >: Use &gt;

        • &: Use &amp;

        The <rules> element may contain any number of <rule> elements.

      • provider: The provider of the rule. This is used to distinguish between who supplied the rule; i.e., if it has been created by BalaBit, or added to the xml by a local user.

      • id: The globally unique ID of the rule.

      • class: The class of the rule — syslog-ng assigns this class to the messages matching a pattern of this rule.

      • <patterns>: An element containing the patterns of the rule. If a <patterns> element contains multiple <pattern> elements, the class of the <rule> is assigned to every syslog message matching any of the patterns.

        • <pattern>: A pattern describing a log message. This element is also called message pattern. For example:

          <pattern>+ ??? root-</pattern>
        • description: OPTIONAL — A description of the pattern or the log message matching the pattern.

        • urls: OPTIONAL — An element containing one or more URLs referring to further information about the patterns or the matching log messages.

          • url: OPTIONAL — An URL referring to further information about the patterns or the matching log messages.

        • tags: OPTIONAL — An element containing custom keywords (tags) about the rules. The tags can be used to label specific events (e.g., user logons).

          • tag: OPTIONAL — A keyword or tags applied to messages matching the rule. For example:

            <tags><tag>UserLogin</tag></tags>
[Example] Example 8.36. A V2 pattern database containing a single rule

The following pattern database contains a single rule that matches a log message of the ssh application. A sample log message looks like:

Accepted password for sampleuser from 10.50.0.247 port 42156 ssh2

The following is a simple pattern database containing a matching rule.

<patterndb version='2' pub_date='2009-04-17'>
    <ruleset name='ssh' id='123456678'>
        <pattern>ssh</pattern>
            <rules>
                <rule provider='me' id='182437592347598' class='system'>
                    <patterns>
                        <pattern>Accepted @QSTRING:SSH.AUTH_METHOD: @ for@QSTRING:SSH_USERNAME: @from\ @QSTRING:SSH_CLIENT_ADDRESS: @port @NUMBER:SSH_PORT_NUMBER:@ ssh2</pattern>
                    </patterns>
                </rule>
            </rules>
    </ruleset>
</patterndb>

Note that the rule uses macros that refer to parts of the message, for example, you can use the $SSH_USERNAME macro refer to the username used in the connection.


© 2007-2010 BalaBit IT Security
Please send your comments or documentation bugs to: documentation@balabit.com