Serious problem with Shelve

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rami A. Kishek

    Serious problem with Shelve

    Hi - this mysterious behavior with shelve is just about to kill me. I
    hope someone here can shed some light. First of all, I have this piece
    of code which uses shelve to save instances of some class I define. It
    works perfectly on an old machine (PII-400) running Python 2.2.1 under
    RedHat Linux 8.0. When I try to run it under Python for windows ME on a
    P-4 1.4 GHz, however, it keeps crashing on reading from the shelved file
    the second time I try to access it. The Windows machine was originally
    running python 1.5.2, so I upgraded to 2.2.3, thinking that would solve
    the problem, but it didn't!

    This is what the error looks like:
    tmprec = myrecs[key]
    File "D:\PROGRAMS\PY THON22\lib\shel ve.py", line 70, in __getitem__
    f = StringIO(self.d ict[key])
    KeyError: A_G_08631616188
    ^

    Notes:
    Here's what my program does (it is too much code to include here).
    I have 4 related modules: one containing the class definitions (in all
    other modules I use from classfile import ___); the second module builds
    the shelve file by parsing a large text file containing the data,
    building classes; the third re-opens the file later to do reading and
    writing operations; and the 4th module is a GUI controller that simple
    calls the appropriate functions from the other 2 modules.

    The main breakdown occurs in module 3. Significantly, I initially had
    this module set up as a script in which everything was done on the
    module level, and it was working fine (apparently). The problems
    started appearing when I wrapped code inside functions (I need to do
    that since I want to call it from other modules, and I have about 4000
    lines of code altogether!). I spent painstaking hours trying to isolate
    the problem - I pass the open shelve file as a parameter to all the
    functions that need it, and I close it properly using try: finally
    statements after every use. I also make sure all the keys that go in
    there are unique.

    What module 3 does is a series of short reads and writes to the shelve
    file. First I test if a particular key is in there - if it is not, I
    add an item, if it is, I read the existing item, update it, then write
    it back like this:

    tmprec = myrecs[key] # I read a particular instance from the shelve
    file
    tmprec.field = 1 # I update one field
    #del myrevs[key] # Commented lines are things I tried while
    debugging
    #myrecs.sync() #
    myrecs[key] = tmprec # Then I write it back to the shelve file
    #myrecs.sync()

    This one function apppears to be the guilty party. When I comment it
    out the crash stops. However it is a vital function for my program and
    I need to do it. Note that deleting the original item before reqwriting
    it helped reduce the frequency of crashes, but didn't eliminate it
    completely. The other possibility (which is why I unsuccessfully tried
    the .sync() lines) is that it has to do with the timing of writing to
    disk. The library reference is vague about this, saying that shelve is
    incapable of simultanteous reads and writes, so the file shouldn't be
    opened twice for write. However it does not say whether this implies we
    cannot read and write like this in quick succession.

    More details:
    * The first run of module 3 after creating the shelve file doesn't
    crash, although I suspect it is doing something funny.
    * The second time I get that error above, keeping in mind I am supposed
    to have a key in there called "A_G_0863161618 " (without the extra '8' at
    the end), so the database is already corrupted. So the key
    'A_G_0863161618 8' is in myshelvefile.ke ys(), the original is no more,
    yet NEITHER can be accesed using myshelvefile[key]!
    * After creation, the shelve file size is only 71 kB. After running
    module 3 - which is supposed to mostly read and not really change the
    file much - the size jumps to 110 kB!
    * If I open the file in a text editor, I notice all sorts of things that
    are not supposed to be there (like directory paths, etc), indicating it
    is corrupted. I do not see those things when I open the file on the
    good (Linux) machine.
    * I did a scandisk to ensure the disk is OK and it is.
  • Tim Churches

    #2
    Re: Serious problem with Shelve

    On Mon, 2003-08-18 at 03:04, Rami A. Kishek wrote:[color=blue]
    > Hi - this mysterious behavior with shelve is just about to kill me. I
    > hope someone here can shed some light. First of all, I have this piece
    > of code which uses shelve to save instances of some class I define. It
    > works perfectly on an old machine (PII-400) running Python 2.2.1 under
    > RedHat Linux 8.0. When I try to run it under Python for windows ME on a
    > P-4 1.4 GHz, however, it keeps crashing on reading from the shelved file
    > the second time I try to access it. The Windows machine was originally
    > running python 1.5.2, so I upgraded to 2.2.3, thinking that would solve
    > the problem, but it didn't![/color]

    In Python 2.2 or earlier, by default, shelve uses the Berkeley database
    1.8 libraries, which we have found to be seriously broken on all
    platforms we have tried them on. Upgrading to a later version of the
    Berkeley libraries and using the pybsddb module fixed the mysterious,
    inconsistent crashes and segfaults we were seeing with shelve (and which
    were also driving us crazy). The easiest way to upgrade is to move to
    Python 2.3, which includes these later versions, but you can also
    easily install them under earlier version of Python (at least under
    2.2).
    --

    Tim C

    PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
    or at http://members.optushome.com.au/tchur/pubkey.asc
    Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0



    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.0.7 (GNU/Linux)

    iD8DBQA/P+VBeJFGqer5k9A RAkuAAKD3bR7ei6 rB4XT+Mk9ifT64g UEM5gCeIBwO
    96YcIZ0DQ7H74iR HLkzcVlc=
    =RXEg
    -----END PGP SIGNATURE-----

    Comment

    • Rami A. Kishek

      #3
      Re: Serious problem with Shelve

      Well - I installed Python 2.3, but it still doesn't. My program now
      crashes on the first pass. After deleting the old databases and
      creating new ones, I opened them for read and this is what I get:

      self.revs = shelve.open(os. path.join(tgtdi r, dbfn))
      File "D:\PROGRAMS\PY THON23\lib\shel ve.py", line 231, in open
      return DbfilenameShelf (filename, flag, protocol, writeback, binary)
      File "D:\PROGRAMS\PY THON23\lib\shel ve.py", line 212, in __init__
      Shelf.__init__( self, anydbm.open(fil ename, flag), protocol,
      writeback, binary)
      File "D:\PROGRAMS\PY THON23\lib\anyd bm.py", line 82, in open
      mod = __import__(resu lt)
      ImportError: No module named bsddb185


      I will try enclosing that import bsddb185 in anydbm.py in try: except:,
      though I hate messing around with source files, and there may be many
      more such problems. Python developers, be aware of this glitch.


      Tim Churches wrote:[color=blue]
      >[color=green]
      > > of code which uses shelve to save instances of some class I define.
      > > it keeps crashing on reading from the shelved file
      > > the second time I try to access it.[/color]
      >
      > In Python 2.2 or earlier, by default, shelve uses the Berkeley database
      > 1.8 libraries, which we have found to be seriously broken on all
      > platforms we have tried them on. Upgrading to a later version of the
      > Berkeley libraries and using the pybsddb module fixed the mysterious,
      > inconsistent crashes and segfaults we were seeing with shelve (and which
      > were also driving us crazy). The easiest way to upgrade is to move to
      > Python 2.3, which includes these later versions, but you can also
      > easily install them under earlier version of Python (at least under
      > 2.2).
      > --[/color]

      Comment

      • Andrew MacIntyre

        #4
        Re: Serious problem with Shelve

        On Tue, 19 Aug 2003, Rami A. Kishek wrote:
        [color=blue]
        > File "D:\PROGRAMS\PY THON23\lib\shel ve.py", line 231, in open
        > return DbfilenameShelf (filename, flag, protocol, writeback, binary)
        > File "D:\PROGRAMS\PY THON23\lib\shel ve.py", line 212, in __init__
        > Shelf.__init__( self, anydbm.open(fil ename, flag), protocol,
        > writeback, binary)
        > File "D:\PROGRAMS\PY THON23\lib\anyd bm.py", line 80, in open
        > raise error, "db type could not be determined"
        > error: db type could not be determined
        >
        > Incidentally, on the other machine I mentioned (the one on which shelve
        > worked perfectly with 2.2.3) shelve still works perfectly after
        > upgrading to 2.3. Since that is a Linux 2 machine, I figure perhaps it
        > is using a different db like gdbm or something ...[/color]

        Your shelve file is in DB v1.85 format. Commenting out the lines in
        which.py didn't do anything except deny the shelve module information
        about what the format actually _is_.

        You'll need to find/build a v1.85 compatible module to read the shelve
        then write it out in a later format.

        --
        Andrew I MacIntyre "These thoughts are mine alone..."
        E-mail: andymac@bullsey e.apana.org.au (pref) | Snail: PO Box 370
        [email protected] g.au (alt) | Belconnen ACT 2616
        Web: http://www.andymac.org/ | Australia

        Comment

        • Skip Montanaro

          #5
          Re: Serious problem with Shelve


          Rami> Well - I installed Python 2.3, but it still doesn't. My program
          Rami> now crashes on the first pass. After deleting the old databases
          Rami> and creating new ones, I opened them for read and this is what I
          Rami> get:

          How did you create those new databases, using an older version of Python
          perhaps? What's happening is that whichdb.whichdb () determined that the
          file you passed into anydbm.open() was an old hash style database, which can
          only be opened in Python 2.3 by the old v 1.85 library, which is only
          exposed through the bsddb185 module.

          Rami> I will try enclosing that import bsddb185 in anydbm.py in try:
          Rami> except:, though I hate messing around with source files, and there
          Rami> may be many more such problems. Python developers, be aware of
          Rami> this glitch.

          That won't work. What's anydbm.open() going to use to open the file?

          Can you explain how the files were created? (Sorry if you explained
          already. I'm just coming to this thread.)

          If you have Python 2.1 or 2.2 laying around with a bsddb module which can
          read the file in question, use Tools/scripts/db2pickle.py to convert the
          file to a pickle, then with Python 2.3, run Tools/scripts/pickle2db.py to
          convert the pickle back to a db file, using the new bsddb. Those two
          scripts are in the Python 2.3 distribution, but not the Python 2.2
          distribution. They should work with Python 2.1 or 2.2, however. This
          problem is exactly why I wrote them.

          Synopsis:

          python2.2 db2pickle.py olddbfile pickle.pck
          python2.3 pickle2db.py newdbfile pickle.pck

          Skip

          Comment

          • Skip Montanaro

            #6
            Re: Serious problem with Shelve


            Rami> Incidentally, on the other machine I mentioned (the one on which
            Rami> shelve worked perfectly with 2.2.3) shelve still works perfectly
            Rami> after upgrading to 2.3. Since that is a Linux 2 machine, I figure
            Rami> perhaps it is using a different db like gdbm or something ...

            Try this using python 2.2.3 and python 2.3:

            import whichdb
            whichdb.whichdb (os.path.join(t gtdir, dbfn))

            and see what it prints. That will keep you from guessing about the nature
            of the file.

            Skip

            Comment

            • Rami A. Kishek

              #7
              Re: Serious problem with Shelve

              Thanks. With your help, I figured out one of the databases accessed WAS
              created with an older Python, so I simply cleaned up that one and now
              everything works!

              [color=blue]
              >Skip Montanaro wrote:
              >
              >Andrew MacIntyre wrote:
              >[/color]

              Comment

              Working...