Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, Theoretical Computer Science
…
12 pages
1 file
In this work we study the size of Boyer-Moore automata introduced in Knuth, Morris & Pratt's famous paper on pattern matching. We experimentally show that a finite class of binary patterns produce very large Boyer-Moore automata, and find one particular case which we conjecture, generates automata of size Ω(m 6 ). Further experimental results suggest that the maximal size could be a polynomial of O(m 7 ), or even an exponential O(2 0.4m ), where m is the length of the pattern.
1996
We study the size of Boyer-Moore automata introduced in Knuth, Morris & Pratt's famous paper on pattern matching. We experimentally exhibit a nite class of binary patterns, which produce large Boyer-Moore automata. The best approximation curve for their sizes is a polynomial O(m 7 ), or even an exponential O(2 0:4m ), in the length m of the patterns. All the previously known maximal sizes were at most cubic in m. Our results suggest to study two particular in nite classes of patterns, for which we conjecture that the generated automata have size (m 5 ).
Lecture Notes in Computer Science, 2013
proved that every language L that is (m, n)-recognizable by a deterministic frequency automaton such that m > n/2 can be recognized by a deterministic finite automaton as well. First, the size of deterministic frequency automata and of deterministic finite automata recognizing the same language is compared. Then approximations of a language are considered, where a language L is called an approximation of a language L if L differs from L in only a finite number of strings. We prove that if a deterministic frequency automaton has k states and (m, n)-recognizes a language L, where m > n/2, then there is a language L approximating L such that L can be recognized by a deterministic finite automaton with no more than k states. Austinat et al. [2] also proved that every language L over a singleletter alphabet that is (1, n)-recognizable by a deterministic frequency automaton can be recognized by a deterministic finite automaton. For languages over a single-letter alphabet we show that if a deterministic frequency automaton has k states and (1, n)-recognizes a language L then there is a language L approximating L such that L can be recognized by a deterministic finite automaton with no more that k states. However, there are approximations such that our bound is much higher, i.e., k!.
Bulletin of The European Association for Theoretical Computer Science, 2015
Because of their succinctness and clear syntax, regular expressions are the common choice to represent regular languages. Deterministic finite automata are an excellent representation for testing equivalence, containment or membership, as these problems are easily solved for this model. However, minimal deterministic finite automata can be exponentially larger than the associated regular expression, while the corresponding nondeterministic finite automata can be linearly larger. The worst case of both the complexity of the conversion algorithms, and of the size of the resulting automata, are well studied. However, for practical purposes, estimates for the average case can provide much more useful information. In this paper we review recent results on the average size of automata resulting from several constructions and suggest several directions of research. Most results were obtained within the framework of analytic combinatorics.
In this paper we are implementing the regular expression matching is done using finite state Aleshin type automata, including non-deterministic finite state aleshin type automata (NAAs) and deterministic finite automata (DAAs). Storage space of automata is jointly determined by the number of states and transitions between states. A key issue is that the size of the automaton obtained from a regular expression is large, where size is defined as the number of states and transition arcs between states. The size of an automaton is crucial for the efficiency of the algorithms using three pattern matching based on regular expressions, size directly affects both time and space efficiency. NAAs and DAAs have their own advantages and disadvantages in regular expression matching. Keywords: deterministic finite state Aleshin type automata, non-deterministic finite state aleshin type automata (NAAs) and partial derivative automata I.INTRODUCTION NFAs can provide an exponentially more succinct description than DFAs but equivalence, inclusion, and universality are computationally hard for NFAs, while many of these problems can be solved in polynomial time for DFAs. The processing complexity for each character in the input is O (1) in a DFA, but O (n 2) for an NFA if all n states are active at the same time. The key feature of a DFA is that there is only one active state at any time; but converting an NFA into a DFA may generate O (Σ n) states. The size of a DFA, obtained fro m a regular expression, can increase exponentially; the DFA of a regular expression with thousands of patterns yields tens of thousands of states, which means memory consumption of thousands of megabytes. Another problem is that a minimal NFA is hard to co mpute [6]. How to use the matching e fficiency of a DFA and the storage efficiency of an NFA to realize matching is always a pursued goal in the field of regular exp ression matching. The regular expression is an important notation for specific patterns. Owing to its expressive power and flexib ility in describing useful patterns [1], regular expression matching technology based on finite automat a is widely used in networks and information processing, including applications for network real-t ime processing, protocol analysis, intrusion detection systems, intrusion prevention systems, deep packet inspection systems, and virus detection systems like Snort [2], Linu x L7-filter [3], and Bro [4]. Regular expressions are replacing explicit string patterns as the method of choice for describing patterns. However, with the increasing scale and number of regular exp ressions in a practical system, it is challenging to achieve good performance fo r pattern matching based on regular expressions. For example, the number of signatures in Snort has grown from 3166 in 2003 to 15,047 in 2009 and the pattern matching routines in Snort account for up to 70% of the total execution time with 80% of the instructions executed on real traces [5]. II. REGULAR EXPRESSION The term alphabet denotes any finite set of symbols. A string over an alphabet is a finite sequence of symbols drawn fro m that alphabet with the term word often used as a synonym for the term string. Let Σ be an alphabet and Σ * be the set of all words over Σ , i.e., Σ * denotes the set of all finite strings of symbols in Σ. If Σ is an alphabet, then any subset of Σ * is a language over Σ. The length of a word w ∈ Σ * , usually written as |w |, is the number of occurrences of symbols in w , with ε denoting the empty word whose length is 0. ∅ is the empty set, a ∈ Σ is an input symbol, and r and s are regular expressions. A regular expression describes a set of strings witho ut enumerating them exp licit ly. A regular expression over Σ , which can be recursively de fined, is defined as follo ws: (1)∅ and ε are regular expressions, denoting ∅ and {ε}, respectively. (2)If a is a symbol in Σ , then a is a regular expression that denotes {a}. (3)Suppose r and s are regular expressions denoting the languages L (r) and L (s). Then, (r) + (s), (r) · (s), r * , and (r) are also regular exp ressions denoting L (r) ∪ L (s), L(r)L (s), (L (r)) * , and L (r), respectively. (4)All regular exp ressions can be obtained by applying rules (1), (2), and (3) a fin ite number of t imes.
Information and Computation, 2011
Finite automata are probably best known for being equivalent to right-linear context-free grammars and, thus, for capturing the lowest level of the Chomsky-hierarchy, the family of regular languages. Over the last half century, a vast literature documenting the importance of deterministic, nondeterministic, and alternating finite automata as an enormously valuable concept has been developed. In the present paper, we tour a fragment of this literature. Mostly, we discuss developments relevant to finite automata related problems like, for example, (i) simulation of and by several types of finite automata, (ii) standard automata problems such as fixed and general membership, emptiness, universality, equivalence, and related problems, and (iii) minimization and approximation. We thus come across descriptional and computational complexity issues of finite automata. We do not prove these results but we merely draw attention to the big picture and some of the main ideas involved.
Journal of Computer and System Sciences, 1981
It is known that for every restricted regular expression of length n there exists a nondeterministic finite automaton with n + 1 states giving rise to the upper bound of 2" + 1 on the number of states of the corresponding reduced automaton. In this note we show that this bound can be attained for all n ) 2, i.e., the upper bound 2" + 1 is optimal. An observation is then made about the synthesis problem for nondeterministic finite automata.
Theoretical Computer Science, 2003
For any q > 1, let MOD q be a quantum gate that determines if the number of 1's in the input is divisible by q. We show that for any q, t > 1, MOD q is equivalent to MOD t (up to constant depth). Based on the case q = 2, Moore [8] has shown that quantum analogs of AC (0) , ACC[q], and ACC, denoted QAC (0) wf , QACC[2], QACC respectively, define the same class of operators, leaving q > 2 as an open question. Our result resolves this question, proving that QAC
Proceedings of the ACM on Programming Languages, 2020
We propose a solution to the problem of efficient matching regular expressions (regexes) with bounded repetition, such as (ab){1,100}, using deterministic automata. For this, we introduce novel counting-set automata (CsAs) , automata with registers that can hold sets of bounded integers and can be manipulated by a limited portfolio of constant-time operations. We present an algorithm that compiles a large sub-class of regexes to deterministic CsAs. This includes (1) a novel Antimirov-style translation of regexes with counting to counting automata (CAs) , nondeterministic automata with bounded counters, and (2) our main technical contribution, a determinization of CAs that outputs CsAs. The main advantage of this workflow is that the size of the produced CsAs does not depend on the repetition bounds used in the regex (while the size of the DFA is exponential to them). Our experimental results confirm that deterministic CsAs produced from practical regexes with repetition are indeed v...
Theoretical Computer Science, 2009
Watson-Crick automata are finite state automata working on double-stranded tapes, introduced to investigate the potential of DNA molecules for computing. In this paper, we continue the investigation of descriptional complexity of Watson-Crick automata initiated by Păun et al. [A. Păun, M. Păun, State and transition complexity of Watson-Crick finite automata, in: G. Ciobanu, G. Paun (Eds.), Fundamentals of Computation Theory, FCT'99, in: LNCS, vol. 1684. In particular, we show that any finite language, as well as any unary regular language, can be recognized by a Watson-Crick automaton with only two, and respectively three, states. Also, we formally define the notion of determinism for these systems. Contrary to the case of non-deterministic Watson-Crick automata, we show that, for deterministic ones, the complementarity relation plays a major role in the acceptance power of these systems.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computing Research Repository, 2009
Lecture Notes in Computer Science
RAIRO - Theoretical Informatics and Applications, 2008
RAIRO - Theoretical Informatics and Applications, 2006
Developments in Language Theory
Fundamenta Informaticae, 2009