Open Source Solutions for Cybersecurity Log Management

Open Source Solutions for Cybersecurity Log Management

In network forensics and cybersecurity monitoring, centralized event log management is crucial for aggregating events from many nodes onto a few server instances so that they can be analyzed centrally. 

Security incidents can be detected in real-time on incoming events through event correlation and other advanced monitoring techniques. Security incidents can also be detected offline through forensic analysis of past events. Monitoring and forensic activities have to be performed on individual network nodes without centrally gathering events.

It is necessary to analyze the event logs of all network nodes if, for example, a security incident affects hundreds of network nodes. Additionally, if an attacker wants to remove a trace of their malicious activity, they can delete events from the local event log. Logging all events, both locally and centrally, will ensure that the latter always has a copy of the original log, making forensic investigation possible even if the original log is lost.

These factors have led to various commercial and open source solutions that collect and analyze events centrally entering the market. While some of these solutions are only intended for records management, others are complete SIEM frameworks.

A typical security incident and event management (SIEM) system architecture

The nodes of the IT system use IETF Syslog to send events to the logging collector on the central log server. Collection services use different techniques to filter and normalize events, and they store pre-processed events on some storage medium (such as a database or flat file). Cybersecurity personnel can access data through a GUI for searching, creating reports, and other analyses.

Independent organizations like Gartner evaluate and compare business records management solutions regularly.

A scatter plot chart evaluating and comparing different business records management solutions available in the market.

Source

In the case of open-source tools, especially newly developed solutions, these comparisons are often difficult to obtain. The advantage of open source tools for monitoring and forensics is that they support the implementation of frameworks for incident detection and analysis in a cost-effective way. 

The open source solutions that have been introduced recently have contributed to a new architectural trend in which records of a management system can be divided into independent modules that can be interchanged through well-defined interfaces and protocols. 

This post will be an overview of open source solutions that have recently emerged in the cyber security log management space and new technologies and solutions that have emerged over the last 2-3 years. 

BSD syslog protocol

Till the 1980s, it was common to log events to a local file system or to some other local storage medium.  A BSD syslog protocol was developed in the 1980s by Eric Allman for event log collection. It was used initially in Sendmail but gradually adopted by most providers. 

Sync messages are encapsulated in UDP packets that are sent over the network as part of the BSD syslog protocol. Messages are classified according to ease and severity, while the install indicates which type of message is sent. 

In the BSD protocol, there are 24 ease values (0 to 23) and 8 severity values (0 to 7), respectively. Message values 0 and 2 denote operating system kernels, and 3 and 4 denote mail servers and system daemons, respectively. Message severity values 7 and 4 denote debugging messages. As an alternative to numbers, textual acronyms are commonly used. The mail signifies facility 2, and the warning signifies severity 4.

Message payloads used by BSD Syslog should have the following format: *Priority>Timestamp Hostname MSG, where Priority is defined as one number (8 * facility_value) + severity_value. Here is an example of a warning: “ids 1299: port scan from 192.168.1.102” for the daemon installation to be issued by node myhost2 at 12:33:59 on November 17:

<28>Nov 17 21:33:59 myhost2 ids[1299]: port scan from 192.168.1.10

According to convention, the letters that begin the MSG field are considered the label subfield, representing the name of the issuing program (“ids” in the example above). The remaining messages are content (“1299: ports scan 192.168.1.102” in the example above). There is often a number in square brackets after the label subfield that indicates the tracking ID of the shipping program (1299 in the example above). Syslog servers may consider the process ID with brackets and a colon as the label field.

On the one hand, BSD syslog is a lightweight and efficient protocol due to its UDP-based architecture that requires very little network bandwidth and system resources. However, the protocol has a number of drawbacks, which are outlined below:

  • The TCP protocol is not supported for the reliable transmission of messages.
  • Encryption and authentication are not supported.
  • Because of the absence of year number, time zone, and a fraction of second information, timestamps are not specific enough.
  • There is no structure to the MSG field other than the sender’s name and process ID.

An approach to the first problem was proposed during the 2000s. It involved the transmission of a sequence of messages in BSD syslog format over a TCP connection, with a new line character (ASCII) used as a separator. A protocol like this can also be combined with other utilities (Stunnel) to configure secure tunnels for the registry, which helps solve the second problem.

IETF syslog protocol

There were certain drawbacks of the BSD syslog protocol, which led to the IETF Syslog protocol being proposed in 2009. Syslog transmission over TLS can be secured with TLS, but it is also possible to transmit messages unencrypted over UDP. Further, RFC3339 timestamps and structured data blocks are used in a new message format. This is an example of the new message format:

<28>1 2021-11-17T12:33:59.223+02:00 myhost2 ids 1299 – [timeQuality tzKnown=”1″ isSynced=”1″][origin ip=”10.1.1.2″] port scan from 192.168.1.102

*28> denotes the priority specification and is immediately followed by version 1 (which is the current version). Also, the sender is passing two structured data blocks timeQuality tzKnown = “1” isSynced = “1”, and origin ip = “10.1.1.2” to the sender. The first indicates that the sender’s clock is synchronized to a time source external trusted, and the second indicates the sender’s IP address.

CEE syslog format

The Expression of Common Events initiative (CEE) is also working to devise a structure for recording messages. Besides JSON and XML formats for events, CEE also proposes that JSON-formatted events be transported using BSD and IETF syslog protocols. The following is a sample JSON event encapsulated in a BSD syslog message:

<28>Nov 17 21:33:59 myhost2 ids[1299]: @cee:{“pname”:”ids”,”pid”:1299,”msg”:”port scan from 192.168.1.102″,”action”:”portscan”,”dst”:”192.168.1.102″}

The use of such a keyword-value pair structure between a sender and receiver facilitates the evaluation of the message by the receiver. The JSON format is also compatible with syslog protocols, such as BSD and IETF, because it is used in MSG, which does not have a defined structure.

Other log recording protocols

Other methods may be used for event logging in addition to the ones mentioned above. In the 1980s, SNMP was introduced as a well-established monitoring and management protocol with UDP. Security-related information is sometimes transmitted via SNMP messages (traps or notifications) even though SNMP is primarily designed to manage faults and network performance of large networks. Net-SNMP and SNMPTT are open-source projects that receive SNMP notifications and store them in files. 
Log data is often stored in Elasticsearch, one of the most popular document-oriented database engines of the past couple of years. Java-based Elasticsearch is used as a backend in several popular log management packages, including Graylog2, Logstash, and Kibana.