Skip to content

AOF inconsistency: DELETE persists correctly but LPOP/ZREM/HDEL are not durable (data reappears after restart) #1675

Description

@LmingXie

Describe the bug

Garnet shows inconsistent persistence behavior between key-level deletion and data-structure mutation operations when AOF is enabled.

Specifically:

  • DELETE works correctly and is persisted across restarts
  • However, mutation operations such as:
    • LPOP (list)
    • ZREM (sorted set)
    • HDEL (hash)

are NOT persisted correctly, and deleted data reappears after restart.

This issue is reproducible on multiple versions and affects multiple data structures, suggesting a broader persistence inconsistency rather than a single command bug.

Steps to reproduce the bug

garnet.conf

{
  "Port": 6379,
  "Address": "0.0.0.0",

  "MemorySize": "4g",
  "IndexSize": "256m",
  "IndexMaxSize": "2g",
  "MutablePercent": 80,

  "AuthenticationMode": "Password",
  "Password": "123456",

  "EnableAOF": true,
  "CommitFrequencyMs": 50,
  "WaitForCommit": false,
  "CompactionFrequencySecs": 0,
  
  "AofMemorySize": "256m",
  "EnableFastCommit": true,
  "FastCommitThrottleFreq": 500,
  "EnableStorageTier": true,
  "UseStorage": true,
  "LogDir": "./data/log",
  "CheckpointDir": "./data/checkpoints",
  "Recover": true,

  "CompactionType": "Scan",
  "CompactionFrequencySecs": 300,
  "CompactionForceDelete": true,

  "LatencyMonitor": true,
  "MetricsSamplingFrequency": 5,

  "ThreadPoolMinThreads": 16,
  "ThreadPoolMaxThreads": 256,

  "NetworkSendThrottleMax": 8192,

  "EnableLua": true,
  "LuaTransactionMode": false,

  "LogLevel": "Information",
  "FileLogger": "./logs/garnet.log"
}

Run the following Python script:

class RedisHelper:

    _client: redis.asyncio.Redis = None

    HOST = REDIS_HOST
    PORT = REDIS_PORT
    PASSWORD = REDIS_PASS
    DB = 3
    _lock = asyncio.Lock()


    @classmethod
    async def _init_client(cls):
        try:
            pool = redis.asyncio.BlockingConnectionPool(
                host=cls.HOST,
                port=cls.PORT,
                password=cls.PASSWORD,
                db=cls.DB,
                max_connections=100,
                timeout=30, 
            )

            cls._client = redis.asyncio.Redis(
                connection_pool=pool,
                socket_timeout=5,
                socket_connect_timeout=5,
                retry_on_timeout=True,
                health_check_interval=30,
                decode_responses=True,
            )

            
            await cls._client.ping()

            logger.info("Redis connected")

        except Exception as e:
            logger.error(f"Redis connect failed: {e}")
            raise

    @classmethod
    async def client(cls) -> redis.asyncio.Redis:
        if cls._client:
            return cls._client

        async with cls._lock:
            if cls._client:
                return cls._client

            await cls._init_client()

        return cls._client

async def pipe_del_test():
    redis = await RedisHelper.client()

    # test string set/delete
    pipe = redis.pipeline()
    await pipe.set("test", "1")
    await pipe.set("test2", "2")
    await pipe.set("test3", "3")
    await pipe.execute()

    pipe = redis.pipeline()
    await pipe.delete("test")
    await pipe.delete("test2")
    await pipe.execute()

    await redis.delete("test3")

    # test list
    await redis.lpush("List", "value1")
    await redis.lpop("List")

    # test sorted set
    pipe2 = redis.pipeline()
    await pipe2.zadd("DEDUP_KEY", {"top1": 50, "top2": 60}, gt=True)
    await pipe2.execute()

    pipe3 = redis.pipeline()
    await pipe3.zrem("DEDUP_KEY", "top1")
    await pipe3.execute()

    # test hash
    await redis.hset("tempKey", mapping={
        "hkey1": "v1",
        "hkey2": "v2"
    })

    await redis.hdel("tempKey", "hkey1", "hkey2")

    # ensure AOF flush
    await asyncio.sleep(0.1)
garnet>GarnetServer.exe --storage-tier --config-import-path garnet.conf
    _________
   /_||___||_\      Garnet 1.1.1 64 bit; standalone mode
   '. \   / .'      Listening on: 0.0.0.0:6379
     '.\ /.'        https://aka.ms/GetGarnet
       '.'

05::13::47 info: GarnetServer[0] Garnet 1.1.1 64 bit; standalone mode; Endpoint: [0.0.0.0:6379]
05::13::47 info: GarnetServer[0] Environment .NET 9.0.7; Win32NT; X64
05::13::47 info: ArgParser[0] Configuration import from embedded resource succeeded. Path: defaults.conf.
05::13::47 info: ArgParser[0] Configuration import from local machine succeeded. Path: garnet.conf.
05::13::47 info: ArgParser[0] Found 25 non-default configuration options:
05::13::47 info: ArgParser[0] "Address": "0.0.0.0",
05::13::47 info: ArgParser[0] "MemorySize": "4g",
05::13::47 info: ArgParser[0] "IndexSize": "256m",
05::13::47 info: ArgParser[0] "IndexMaxSize": "2g",
05::13::47 info: ArgParser[0] "MutablePercent": 80,
05::13::47 info: ArgParser[0] "EnableStorageTier": true,
05::13::47 info: ArgParser[0] "LogDir": "./data/log",
05::13::47 info: ArgParser[0] "CheckpointDir": "./data/checkpoints",
05::13::47 info: ArgParser[0] "Recover": true,
05::13::47 info: ArgParser[0] "AuthenticationMode": "Password",
05::13::47 info: ArgParser[0] "EnableAOF": true,
05::13::47 info: ArgParser[0] "AofMemorySize": "256m",
05::13::47 info: ArgParser[0] "CommitFrequencyMs": 50,
05::13::47 info: ArgParser[0] "CompactionFrequencySecs": 300,
05::13::47 info: ArgParser[0] "CompactionType": "Scan",
05::13::47 info: ArgParser[0] "CompactionForceDelete": true,
05::13::47 info: ArgParser[0] "EnableLua": true,
05::13::47 info: ArgParser[0] "LatencyMonitor": true,
05::13::47 info: ArgParser[0] "MetricsSamplingFrequency": 5,
05::13::47 info: ArgParser[0] "LogLevel": "Information",
05::13::47 info: ArgParser[0] "FileLogger": "./logs/garnet.log",
05::13::47 info: ArgParser[0] "ThreadPoolMinThreads": 16,
05::13::47 info: ArgParser[0] "ThreadPoolMaxThreads": 256,
05::13::47 info: ArgParser[0] "FastCommitThrottleFreq": 500,
05::13::47 info: ArgParser[0] "NetworkSendThrottleMax": 8192
05::13::47 info: Options[0] [Store] Using page size of 32m
05::13::47 info: Options[0] [Store] Using log memory size of 4g
05::13::47 info: Options[0] [Store] There are 128 log pages in memory
05::13::47 info: Options[0] [Store] Using disk segment size of 1g
05::13::47 info: Options[0] [Store] Using hash index size of 256m (4m cache lines)
05::13::47 info: Options[0] [Store] Hash index size is optimized for up to ~16m distinct keys
05::13::47 info: Options[0] [Store] Using hash index max size of 2g, (32m cache lines)
05::13::47 info: Options[0] [Store] Hash index max size is optimized for up to ~128m distinct keys
05::13::47 info: Options[0] [Store] Using log mutable percentage of 80%
05::13::47 info: Options[0] Using device type Native
05::13::47 info: Options[0] [Store] Not using Revivification
05::13::47 info: Options[0] [Object Store] Using page size of 4k
05::13::47 info: Options[0] [Object Store] Each page can hold ~170 key-value pairs of objects
05::13::47 info: Options[0] [Object Store] Using log memory size of 32m
05::13::47 info: Options[0] [Object Store] This can hold ~1398101 key-value pairs of objects in memory total
05::13::47 info: Options[0] [Object Store] There are 8k log pages in memory
05::13::47 info: Options[0] [Object Store] Using disk segment size of 32m
05::13::47 info: Options[0] [Object Store] Using hash index size of 16m (256k cache lines)
05::13::47 info: Options[0] [Object Store] Hash index size is optimized for up to ~1m distinct keys
05::13::47 info: Options[0] [Object Store] Using log mutable percentage of 90%
05::13::47 info: Options[0] [Object Store] Heap memory size is unlimited
05::13::47 info: Options[0] [Object Store] Not using Revivification
05::13::47 info: VectorManager:0:d9ddda09-57ab-4c22-8c29-cdb69e806930[0] Created VectorManager
05::13::47 info: Options[0] [Store] Using page size of 32m
05::13::47 info: Options[0] [Store] Using log memory size of 4g
05::13::47 info: Options[0] [Store] There are 128 log pages in memory
05::13::47 info: Options[0] [Store] Using disk segment size of 1g
05::13::47 info: Options[0] [Store] Using hash index size of 256m (4m cache lines)
05::13::47 info: Options[0] [Store] Hash index size is optimized for up to ~16m distinct keys
05::13::47 info: Options[0] [Store] Using hash index max size of 2g, (32m cache lines)
05::13::47 info: Options[0] [Store] Hash index max size is optimized for up to ~128m distinct keys
05::13::47 info: Options[0] [Store] Using log mutable percentage of 80%
05::13::47 info: Options[0] Using device type Native
05::13::47 info: Options[0] [Store] Not using Revivification
05::13::47 info: Options[0] [Object Store] Using page size of 4k
05::13::47 info: Options[0] [Object Store] Each page can hold ~170 key-value pairs of objects
05::13::47 info: Options[0] [Object Store] Using log memory size of 32m
05::13::47 info: Options[0] [Object Store] This can hold ~1398101 key-value pairs of objects in memory total
05::13::47 info: Options[0] [Object Store] There are 8k log pages in memory
05::13::47 info: Options[0] [Object Store] Using disk segment size of 32m
05::13::47 info: Options[0] [Object Store] Using hash index size of 16m (256k cache lines)
05::13::47 info: Options[0] [Object Store] Hash index size is optimized for up to ~1m distinct keys
05::13::47 info: Options[0] [Object Store] Using log mutable percentage of 90%
05::13::47 info: Options[0] [Object Store] Heap memory size is unlimited
05::13::47 info: Options[0] [Object Store] Not using Revivification
05::13::47 info: VectorManager:0:5cb8ce9e-d0ef-421b-a4a3-7cac03f5fef2[0] Created VectorManager
05::13::47 info: GarnetServer[0] Total configured memory limit: 4882169856
05::13::47 info: GarnetServer[0] Max number of logical databases allowed on server: 16
05::13::47 info: TsavoriteLog [aof][0] ******* Recovered HybridLog Stats *******
05::13::47 info: TsavoriteLog [aof][0] Head Address: 64
05::13::47 info: TsavoriteLog [aof][0] Safe Head Address: 64
05::13::47 info: TsavoriteLog [aof][0] ReadOnly Address: 832
05::13::47 info: TsavoriteLog [aof][0] Safe ReadOnly Address: 832
05::13::47 info: TsavoriteLog [aof][0] Tail Address: 832
05::13::47 info: MultiDatabaseManager[0] Recovered AOF: begin address = 64, tail address = 832, DB ID: 0
05::13::47 info: Options[0] [Store] Using page size of 32m
05::13::47 info: Options[0] [Store] Using log memory size of 4g
05::13::47 info: Options[0] [Store] There are 128 log pages in memory
05::13::47 info: Options[0] [Store] Using disk segment size of 1g
05::13::47 info: Options[0] [Store] Using hash index size of 256m (4m cache lines)
05::13::47 info: Options[0] [Store] Hash index size is optimized for up to ~16m distinct keys
05::13::47 info: Options[0] [Store] Using hash index max size of 2g, (32m cache lines)
05::13::47 info: Options[0] [Store] Hash index max size is optimized for up to ~128m distinct keys
05::13::47 info: Options[0] [Store] Using log mutable percentage of 80%
05::13::47 info: Options[0] Using device type Native
05::13::47 info: Options[0] [Store] Not using Revivification
05::13::47 info: Options[0] [Object Store] Using page size of 4k
05::13::47 info: Options[0] [Object Store] Each page can hold ~170 key-value pairs of objects
05::13::47 info: Options[0] [Object Store] Using log memory size of 32m
05::13::47 info: Options[0] [Object Store] This can hold ~1398101 key-value pairs of objects in memory total
05::13::47 info: Options[0] [Object Store] There are 8k log pages in memory
05::13::47 info: Options[0] [Object Store] Using disk segment size of 32m
05::13::47 info: Options[0] [Object Store] Using hash index size of 16m (256k cache lines)
05::13::47 info: Options[0] [Object Store] Hash index size is optimized for up to ~1m distinct keys
05::13::47 info: Options[0] [Object Store] Using log mutable percentage of 90%
05::13::47 info: Options[0] [Object Store] Heap memory size is unlimited
05::13::47 info: Options[0] [Object Store] Not using Revivification
05::13::47 info: VectorManager:3:9814da85-8efa-45fe-aa5f-6014860f4dc3[0] Created VectorManager
05::13::47 info: TsavoriteLog [aof][0] ******* Recovered HybridLog Stats *******
05::13::47 info: TsavoriteLog [aof][0] Head Address: 64
05::13::47 info: TsavoriteLog [aof][0] Safe Head Address: 64
05::13::47 info: TsavoriteLog [aof][0] ReadOnly Address: 11344
05::13::47 info: TsavoriteLog [aof][0] Safe ReadOnly Address: 11344
05::13::47 info: TsavoriteLog [aof][0] Tail Address: 11344
05::13::47 info: MultiDatabaseManager[0] Recovered AOF: begin address = 64, tail address = 11344, DB ID: 3
05::13::47 info: MultiDatabaseManager[0] Begin AOF recovery for DB ID: 0
05::13::47 info: MultiDatabaseManager[0] Begin AOF replay for DB ID: 0
05::13::47 info: MultiDatabaseManager[0] Completed full AOF sublog replay of 0 records (DB ID: 0)
05::13::47 info: MultiDatabaseManager[0] AOF Recovery in 0.005562 secs
05::13::47 info: MultiDatabaseManager[0] Total number of replayed records 0 bytes
05::13::47 info: MultiDatabaseManager[0] Throughput 0.00 records/sec
05::13::47 info: MultiDatabaseManager[0] AOF Recovery size 768
05::13::47 info: MultiDatabaseManager[0] AOF Recovery throughput 0.00 GiB/secs
05::13::47 info: MultiDatabaseManager[0] Begin AOF recovery for DB ID: 3
05::13::47 info: MultiDatabaseManager[0] Begin AOF replay for DB ID: 3
05::13::47 info: MultiDatabaseManager[0] Completed full AOF sublog replay of 142 records (DB ID: 3)
05::13::47 info: MultiDatabaseManager[0] AOF Recovery in 0.0307165 secs
05::13::47 info: MultiDatabaseManager[0] Total number of replayed records 142 bytes
05::13::47 info: MultiDatabaseManager[0] Throughput 4,622.92 records/sec
05::13::47 info: MultiDatabaseManager[0] AOF Recovery size 11,280
05::13::47 info: MultiDatabaseManager[0] AOF Recovery throughput 0.00 GiB/secs
05::13::47 fail: MultiDatabaseManager[0] Error during recovery of AofProcessor System.NullReferenceException: Object reference not set to an instance of an object.    at Garnet.server.GarnetLatencyMetricsSession.Return() in /_/libs/server/Metrics/Latency/GarnetLatencyMetricsSession.cs:line 33    at Garnet.server.GarnetServerMonitor.AddMetricsHistorySessionDispose(GarnetSessionMetrics currSessionMetrics, GarnetLatencyMetricsSession currLatencyMetrics) in /_/libs/server/Metrics/GarnetServerMonitor.cs:line 87    at Garnet.server.RespServerSession.Dispose() in /_/libs/server/Resp/RespServerSession.cs:line 387    at Garnet.server.AofProcessor.<Recover>g__RecoverReplay|13_0(GarnetDatabase db, Int64 untilAddress, <>c__DisplayClass13_0&) in /_/libs/server/AOF/AofProcessor.cs:line 183    at Garnet.server.AofProcessor.<Recover>g__RecoverReplay|13_0(GarnetDatabase db, Int64 untilAddress, <>c__DisplayClass13_0&)    at Garnet.server.AofProcessor.Recover(GarnetDatabase db, Int64 untilAddress) in /_/libs/server/AOF/AofProcessor.cs:line 109    at Garnet.server.DatabaseManagerBase.ReplayDatabaseAOF(AofProcessor aofProcessor, GarnetDatabase db, Int64 untilAddress) in /_/libs/server/Databases/DatabaseManagerBase.cs:line 295
05::13::47 info: StoreWrapper[0] NOTE: Compaction will delete files, make sure checkpoint/recovery is not being used
* Ready to accept connections

Expected behavior


3. Expected behavior

All deletion operations should persist correctly after restart:

- List remains empty after `LPOP`
- Sorted set should not contain removed members (`ZREM`)
- Hash fields deleted via `HDEL` should remain deleted
- Behavior should be consistent with `DELETE`

Screenshots

Not available, but can be provided if needed.

Release version

  • Garnet v1.1.1 (win-x64-based-readytorun)
  • Garnet PREVIEW v2.0.0-beta.3 (win-x64-based-readytorun)

IDE

  • Python 3.10
  • redis-py ~= 7.3.0
  • OS: Windows 11

OS version

Windows 11 23H2

Additional context

This issue is consistently reproducible and affects multiple data structures, indicating a potential problem in AOF logging or replay of mutation operations.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions