Classification
The syslog-ng application can compare the contents of the log messages to a database of predefined message patterns. This can be used for many different tasks:
- real-time log message classification (like identifying the message)
- extracting important information from messages (for example, usernames, IP addresses not included in the message header)
- real-time event correlation (create a single message from multiple related log messages)
Using patterndb™ also needs some patterns. These can easily be created based on:
- the guidelines in The syslog-ng Administrator Guide
- various blog posts
- samples from the patterndb™ project
Real-time log message classification
By comparing log messages to known patterns, syslog-ng is able to identify the exact type of the messages, and sort them into message classes. The message classes can be used to classify the type of the event described in the log message. The message classes can be customized, and for example can label the messages as user login, application crash, file transfer, etc. events.
In addition to classifying messages, you can also add different tags which can be used later for filtering messages, for example, to collect messages tagged as user_login to a separate file or to perform conditional post processing on the tagged messages.
The classification functionality of the pattern database was originally inspired by the logcheck project, but the syslog-ng aproach has the following advantages:
- The syslog-ng patterns are much easier to write and maintain than the regular expressions used by logcheck.
- It is much easier to understand syslog-ng pattens than regular expressions.
- Pattern matching based on regular expressions is computationally very intensive, especially when the number of patterns increases. The solution used by syslog-ng can be performed real-time, and is independent from the number of patterns, so it scales much better.
To find the pattern that matches a particular message, syslog-ng uses a method called longest prefix match radix tree. This means that syslog-ng creates a tree structure of the available patterns, where the different characters available in the patterns for a given position are the branches of the tree.
Examples
The following patterns describe the same message:
Accepted password for bazsi from 10.50.0.247 port 42156 ssh2
A regular expression matching this message from the logcheck project:
Accepted \ (gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam) \ for [^[:space:]]+ from [^[:space:]]+ port [0-9]+( (ssh|ssh2))?
A syslog-ng database pattern for this message:
Accepted @QSTRING:auth_method: @ for@QSTRING:username: @from\ @QSTRING:client_addr: @port @NUMBER:port:@ ssh2
Extracting important information from messages
Using patterns one can also extract important information and create name value pairs from data found in log messages. These can be used for many different tasks: removing sensitive information from log files, create files or database tables dynamically, etc.
Name value pairs can also help to standardize log information. BalaBit's patterndb™ project is a step in this direction, which extracts important information from logs using patterns and presents them in standardized fields and tags.
There is a much larger, vendor neutral effort to standardize log events, called CEE. Balabit is a board member here and plans to use CEE in patterns once the new standard is available. Until that we keep using our own schema for patterns. For more information about CEE, see http://cee.mitre.org/
Examples
In the previous example, “username” is the name, which will receive the value of the authenticated username used in the given ssh session. This value can be used to create a separate log file for each user, can be rewritten to anonymize logs, tag administrative users differently, etc.
Real time event correlation
Recent versions of syslog-ng also make real time event correlation possible. This can be useful in many different situations. For example important data for a single event is often scattered into multiple syslog messages. Also login and logout events are often logged far away from each other, even in different log files, making log analysis difficult. Using correlation these can be collected into a single new message.
For details check the in depth article at http://lwn.net/Articles/424459/ or the documentation.
Patterndb™ docs and blogs
There are many docs and blogs helping to write patterns for syslog-ng. First of all there is detailed documentation:
http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-v3.2-guide-admin-en.html/chapter-patterndb.html
And there are also many related blogs:
- http://bazsi.blogs.balabit.com/2011/02/article-on-message-correllation/
- http://bazsi.blogs.balabit.com/2010/11/patterndb-goes-cee/
- http://bazsi.blogs.balabit.com/2010/11/collecting-log-samples/
- http://czanik.blogs.balabit.com/2010/11/cee/
- http://czanik.blogs.balabit.com/2010/11/log-sample-collecting-project/
- http://czanik.blogs.balabit.com/2010/10/pattern-writing-tips-and-tricks/
- http://czanik.blogs.balabit.com/2010/10/pattern-writing-tips-and-tricks-ii/
Patterndb™ project
A good way of starting to write patterns is to look at patterns in the patterndb™ project at http://www.balabit.com/wiki/patterndb Currently it's based on BalaBit's own schema, but there is a plan to convert it to the CEE standard once a stable version is released. The current version is available at http://git.balabit.hu/?p=bazsi/syslog-ng-patterndb.git
How to contribute?
Please send patterns or fixes for existing patterns to the syslog-ng mailing list ( https://lists.balabit.hu/mailman/listinfo/syslog-ng ). Once we move to CEE, these patterns will be converted as part of the effort.
Learn more about the product feature areas as below
- New features
- Client side features
- Server side features
- Collect and Store Windows Event Log
- Agent for IBM System i platforms






