15.4. HTTP indexer configuration format

This section describes the configuration format and options of the HTTP indexer (that is, how and which fields of the HTTP audit trails are indexed). For details on how to customize HTTP indexing, see Procedure 15.2.7, Customizing the indexing of HTTP traffic.

Note

If you want to index HTTP POST messages, include the "application/x-www-form-urlencoded" Content-Type in the General > WhiteList list. The indexer will decode URL encoding (percentage encoding), and create key=value pairs from the form fields and their values. Note that in the values, the indexer will replace whitespace with the underscore (_) character. To avoid indexing sensitive information (for example, passwords from login forms), use the Form > Blacklist option.

HTTP indexer configuration options Type Description
General  Top level item

Determines which HTTP Content-Types are indexed. An HTTP message is indexed only if its Content-Type is listed in Whitelist and is not listed in Blacklist. For example:

"General": {
    "Whitelist": ["text/.*", ".*json.*", "multipart/.*", "application/x-www-form-urlencoded"],
    "Blacklist": ["text/css", "application/javascript", "text/xslt", ".*xml.*"]
  },
  Whitelist list

The list of HTTP Content-Types to index. Every entry of the list is treated as a regular expression. For example:

"Whitelist": ["text/.*", ".*json.*", "multipart/.*", "application/x-www-form-urlencoded"],
  Blacklist list

The list of HTTP Content-Types that are not indexed. Every entry of the list is treated as a regular expression. For example:

"Blacklist": ["text/css", "application/javascript", "text/xslt", ".*xml.*"]
Form  Top level item

Determines which fields are indexed in HTTP POST messages. For example:

"Form": {
    "Blacklist": ["password", "pass"]
  },
Note

If you want to index HTTP POST messages, include the "application/x-www-form-urlencoded" Content-Type in the General > WhiteList list. The indexer will decode URL encoding (percentage encoding), and create key=value pairs from the form fields and their values. Note that in the values, the indexer will replace whitespace with the underscore (_) character. To avoid indexing sensitive information (for example, passwords from login forms), use the Form > Blacklist option.

  Blacklist list

The list of fields that are not indexed in HTTP POST messages (for example, when submitting forms, such as login forms). Every entry of the list is treated as a regular expression. For example:

"Blacklist": ["password", "pass"]
Html  Top level item

Include this section in the configuration to process text/html messages. HTML tags are stripped from the text, and only their content is indexed (for example, <html><title>Title</title></html> becomes Title). For example:

"Html": {
    "Attributes": ["href", "name", "value", "title", "id", "src"],
    "StrippedTags": ["script", "object", "style", "noscript", "embed", "video", "audio", "canvas", "svg"]
  }
  Attributes list

The list of HTML attributes that extracted as key=value pairs and indexed. Note that in the values, the indexer will replace whitespace with the underscore (_) character, and decode URL encoding. For example:

"Attributes": ["href", "name", "value", "title", "id", "src"],

Note that for the content attribute of the meta name="description", meta name="keywords", meta name="author" and meta name="application-name" is always indexed.

For example, if an audit trail contains the following HTML:

<head>
<meta name="description" content="Web page description">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Balabit SA">
<meta charset="UTF-8">
</head>

Then the index will contain the following text:

description=Web_page_description keywords=HTML,CSS,XML,JavaScript author=Balabit_SA
  StrippedTags list

The list of HTML tags that are not indexed. For example:

"StrippedTags": ["script", "object", "style", "noscript", "embed", "video", "audio", "canvas", "svg"]