Log analysis is a process that gives visibility into the performance and health of IT infrastructure and application stacks, through the review and interpretation of logs that are generated by network, operating systems, applications, servers, and other hardware and software components.
Logs typically contain time-series data that is either streamed using collectors real-time or stored for review at a later time. Log analysis offers insight into system performance and can indicate possible problems such as security breaches or imminent hardware failure.
Since logs offer visibility into application performance and health, log analysis lets operations and development teams understand and remedy any performance issues that arise during the course of business operations.
Log analysis serves many important functions including:
Some regulatory bodies insist that organizations perform log file analysis to be certified as compliant with their regulations, and every organization that wants to improve its cybersecurity posture will need expertise in log analysis to help uncover and remediate cyber threats of all kinds. Some of the regulatory compliance requirements that log analysis helps meet are ISO/IEC 27002:2013, regarding the code of practice for IT security, PCI DSS V3.1 which covers privacy for credit card and other financial information, and NIST 800-137 regarding continuous monitoring for federal IT organizations.
Logs are time-series records of actions and activities generated by applications, networks, devices (including programmable and IoT devices), and operating systems. They are typically stored in a file or database or in a dedicated application called a log collector for real-time log analysis.
A log analysts task is to help interpret the full range of log data and messages in context, which requires normalization of the log data to ensure use of a common set of terminology. This prevents confusion that might arise if one function signals ‘normal’ and other function signals ‘green’ when they both mean that there is no action required.
Generally, log data is collected for the log analysis program, cleansed, structured or normalized and then offered for analysis for the experts to detect patterns or uncover anomalies such as a cyber-attack or data exfiltration. Performing log file analysis generally follows these steps:
Compliance. Many governmental or regulatory bodies require organizations to demonstrate their compliance with the myriad of regulations that impact nearly every entity. Log file analysis can demonstrate that HIPAA, PCI, GDPR or other regulation’s mandates are in fact being met by the organization.
Security enhancements. As cybercrime becomes increasingly organized the need for stronger countermeasures also grows. Event log analysis provides powerful tools for taking proactive measures and enables forensic examinations after the fact if a breach or data loss does occur. Log analysis can utilize network monitoring data to uncover unauthorized access attempts and ensure security operations and firewalls are optimally configured.
Efficiency. A log analysis framework helps improve efficiency across the organization. IT resources in every department can share a single log repository, and analysis of an organization’s log data can help spot errors or trends in every business unit and department, enabling rapid remediation.
High availability. Timely action that occurs based on information uncovered by log analysis can prevent an issue from causing downtime. This in turn can help ensure that the organization meets its business goals, and that the IT organization meets its commitments to provide services with a given uptime guarantee.
Avoiding over- or under-provisioning. While organizations must plan to meet peak demands, log analysis can help project whether there is sufficient CPU, memory, disk, and network bandwidth to meet current demands – and projected trends. Overprovisioning wastes precious IT dollars, and under-provisioning can lead to service outages as organizations scramble to either purchase additional resources or utilize cloud resources to meet flexes in demand.
Sales and Marketing Effectiveness. By tracking metrics such as traffic volume and the pages that customers visit, log analysis can help sales and marketing professionals understand what programs are effective, and what should be changed. Traffic patterns can also help with retooling an organization’s website to make it easier for users to navigate to the most frequently accessed information.
Here are some components of an effective log analysis system.
Normalization: Converting different log element data into a consistent format can help ensure that ‘apples to apples’ comparisons can be made, and that data can be centrally stored and indexed regardless of the log source.
Pattern recognition: Modern machine learning (ML) tools can be applied to uncover patterns in the log data that could point to anomalies, for instance by comparing messages hidden in an external list to help determine if there is a threat hidden in the pattern. This can help filter out routine log entries so analysis can focus on those that might indicate abnormalities of some kind.
Tagging and classification: Tagging with keywords and classifying by type enables filters to be applied which can accelerate the uncovering of useful data. For example, all entries of class “LINUX” could be discarded when a virus that attacks Windows servers is being tracked.
Correlation: Analysts can combine logs from multiple sources to help decode an event not readily visible with data from just a single log. This can be particularly useful during and after cyber-attacks, where correlation between logs from network devices, servers, firewalls, and storage systems could indicate data relevant to the attack and indicate patterns that were not apparent from a single log.
Artificial Intelligence: Artificial intelligence and machine learning (AI/ML) tools incorporated into modern log analysis systems can automatically recognize and discard or ignore those log entries that do not aid in uncovering anomalies or security breaches. Sometimes referred to as “artificial ignorance”, this function enables log analysis to send alerts regarding scheduled routine events that did not occur when they should have.
Structured: To give the most value, all log data should be in a central repository and structured so it is understandable to both human and machine. Thanks to advances in log analysis tools much of the heavy lifting can be done automatically. Thus, organizations should practice full-stack logging throughout all system components to get the most complete view of activities and anomalies.
IT Operations Management
Cloud Infrastructure Management