Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, Applied Sciences
…
15 pages
1 file
Logs record valuable data from different software and systems. Execution logs are widely available and are helpful in monitoring, examination, and system understanding of complex applications. However, log files usually contain too many lines of data for a human to deal with, therefore it is important to develop methods to process logs by computers. Logs are usually unstructured, which is not conducive to automatic analysis. How to categorize logs and turn into structured data automatically is of great practical significance. In this paper, LTmatch algorithm is proposed, which implements a log pattern extracting algorithm based on a weighted word matching rate. Compared with our preview work, this algorithm not only classifies the logs according to the longest common subsequence(LCS) but also gets and updates the log template in real-time. Besides, the pattern warehouse of the algorithm uses a fixed deep tree to store the log patterns, which optimizes the matching efficiency of log ...
2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), 2021
Software log analysis helps to maintain the health of software solutions and ensure compliance and security. Existing software systems consist of heterogeneous components emitting logs in various formats. A typical solution is to unify the logs using manually built parsers, which is laborious. Instead, we explore the possibility of automating the parsing task by employing machine translation (MT). We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models. Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records. The median relative edit distance between an actual real-world log record and the MT prediction is less than or equal to 28%. Thus, we show that log parsing using an MT approach is promising.
2019
Understanding a computer system’s or network’s behavior is essential for various tasks such as fault diagnosis, intrusion detection or performance analysis. A key source of information describing a system’s current state is log data. However, accessing this information for further analysis is often complicated. Usually, log data is available in form of unstructured text lines and there exists no common standard for the appearance of logs. Hence, log parsers are required to pre-process log lines and structure their information for further analysis. State of the art log parsers still apply pre-defined lists of regular expressions, which are linearly processed and thus render online log analysis infeasible. Furthermore, defining log parsers manually is a cumbersome and time consuming task. Therefore, in this paper we propose AECID-PG, a novel log parser generator. AECID-PG implements a density-based approach to automatically generate a tree-like parser, which reduces the complexity of ...
UHD Journal of Science and Technology
In the past few years, software monitoring and log analysis become very interesting topics because it supports developers during software developing, identify problems with software systems and solving some of security issues. A log file is a computer-generated data file which provides information on use patterns, activities, and processes occurring within an operating system, application, server, or other devices. The traditional manual log inspection and analysis became impractical and almost impossible due logs’ nature as unstructured, to address this challenge, Machine Learning (ML) is regarded as a reliable solution to analyze log files automatically. This survey tries to explore the existing ML approaches and techniques which are utilized in analyzing log file types. It retrieves and presents the existing relevant studies from different scholar databases, then delivers a detailed comparison among them. It also thoroughly reviews utilized ML techniques in inspecting log files a...
Proceedings of the 26th Conference on Program Comprehension, 2018
Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets.
International Journal of Computer Applications, 2014
The past decade saw an exponential rise in the amount of information available on the World Wide Web. Almost every business organization today uses web based technology to wield its huge client base. Consequently, managing the large data and mining pertinent content has become the need of the hour. This is where the field of big data analytics sows its seeds. The linchpin for this is the process of knowledge discovery. Analyzing server logs and other data footprints, aggregated from clients, can facilitate the building of a concrete knowledge base. Querying the knowledge base can help supplement business and other managerial decisions. The approach herein proposes a real time, generalized alternative to log file management and analysis. It incorporates the development of a sustainable platform which would enable the analysts to understand the essence of the data available.
2009
Enterprise systems implementations are often accompanied by changes in the business processes of the organizations in which they take place. However, not all the changes are desirable. In “vanilla” implementations it is possible that the newly operational business process requires many additional steps as “workarounds” of the system limitations, and is hence performed in an inefficient manner. Such inefficiencies are reflected in the event log of the system as recurring patterns of log entries.
2009
Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.
2001
The paper provides an overview of current state of technology in the field of log file analysis and stands for basics of ongoing PhD thesis. The first part covers some fundamental theory and summarizes basic goals and techniques of log file analysis. It reveals that log file analysis is an omitted field of computer science. Available papers describe moreover specific log analyzers and only few contain some general methodology. Second part contains three case studies to illustrate different application of log file analysis. The examples were selected to show quite different approach and goals of analysis and thus they set up different requirements. The analysis of requirements then follows in the next part which discusses various criteria put on a general analysis tool and also proposes some design suggestions. Finally, in the last part there is an outline of the design and implementation of an universal analyzer. Some features are presented in more detail while others are just intentions or suggestions.
Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005
Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. This has been well known and experienced as a cumbersome, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments. In this paper, we will describe our research efforts on establishing an integrated framework for mining system log files for automatic management. In particular, we apply text mining techniques to categorize messages in log files into common situations, improve categorization accuracy by considering the temporal characteristics of log messages, develop temporal mining techniques to discover the relationships between different events, and utilize visualization tools to evaluate and validate the interesting temporal patterns for system management.
This research paper provides an overview of the current state of logs analysis in IT systems. Initial part covers some fundamental theory and summarises basic goals and techniques about system logs. The current software systems have been drastically evolving which are increasing in scale and complexity of software systems, that leads to a flood of logs. The traditional manual log inspection and analysis became impractical and almost impossible. As logs are unstructured in nature, the first important step is to parse the text log messages into structured and meaningful data for further processing and analysis. Correlation of diverse data and uncovering patterns and relationships in the data is a backbone of Artificial intelligence for IT operations (AIOps) field. In this research paper, we present a comprehensive evaluation study on log events and discovering best association rules in logs to better understand and get more insight of logs events. More specifically, we evaluate more than a hundred log events spanning across distributed IT systems, hosts, customised services and application servers. We report the pattern discovery results in terms of association rules which gives practical importance when investigating and troubleshoot system issues.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Procedia - Social and Behavioral Sciences, 2014
Engineering, Technology & Applied Science Research
IBM Journal of Research and Development, 2000
Lecture Notes in Computer Science, 2003
IEEE Access, 2022
International Conference on Future Networks and Distributed Systems (ICFNDS’17), 2017
International Journal of Advanced Research in Computer Science and Software Engineering, 2017
IEEE Access, 2023
2017
International Journal of Computer Applications, 2012
IEEE Security & Privacy, 2021
Product-Focused Software Process Improvement, 2019
First International Conference on Complex, Intelligent and Software Intensive Systems, CISIS'07, 2007
Proceedings of the International Conferences Big Data Analytics, Data Mining and Computational Intelligence 2019; and Theory and Practice in Modern Computing 2019
International Journal of Engineering Research & Technology (IJERT), 2020