Hi,
I have about 200GB of data that I need to go through and extract the
common first part of a line. Something like this.
[color=blue][color=green][color=darkred]
>>>a = "abcdefghijklmn opqrstuvwxyz"
>>>b = "abcdefghijklmn opBHLHT"
>>>c = extract(a,b)
>>>print c[/color][/color][/color]
"abcdefghijklmn op"
Here I want to extract the common string "abcdefghijklmn op". Basically I
need a fast way to do that for any two given strings. For my situation,
the common string will always be at the beginning of both strings. I can
use regular expressions to do this, but from what I understand there is
a lot of overhead. New data is being generated at the rate of about 1GB
per hour, so this needs to be reasonably fast while leaving CPU time for
other processes.
Thanks
Ravi
I have about 200GB of data that I need to go through and extract the
common first part of a line. Something like this.
[color=blue][color=green][color=darkred]
>>>a = "abcdefghijklmn opqrstuvwxyz"
>>>b = "abcdefghijklmn opBHLHT"
>>>c = extract(a,b)
>>>print c[/color][/color][/color]
"abcdefghijklmn op"
Here I want to extract the common string "abcdefghijklmn op". Basically I
need a fast way to do that for any two given strings. For my situation,
the common string will always be at the beginning of both strings. I can
use regular expressions to do this, but from what I understand there is
a lot of overhead. New data is being generated at the rate of about 1GB
per hour, so this needs to be reasonably fast while leaving CPU time for
other processes.
Thanks
Ravi
Comment