Skip to content

Bug: IsBinaryFile condition always evaluates to true #702

@ziagham

Description

@ziagham

What version of FlowSynx?

1.2.3

Describe the bug

The current implementation of IsBinaryFile incorrectly flags most files as binary because the condition inside the .Count() expression always evaluates to true for many byte values.

File: plugins/FlowSynx.Plugins.LocalFileSystem/Extensions/ConverterExtensions.cs
Method: IsBinaryFile

Current Implementation

private static bool IsBinaryFile(byte[]? data, int sampleSize = 1024)
{
    if (data == null || data.Length == 0)
        return false;

    var checkLength = Math.Min(sampleSize, data.Length);
    var nonPrintableCount = data.Take(checkLength)
        .Count(b => (b < 8 || (b > 13 && b < 32)) && b != 9 && b != 10 && b != 13);

    var threshold = 0.1; // 10% threshold of non-printable characters
    return (double)nonPrintableCount / checkLength > threshold;
}

The logical condition mixes overlapping ranges and exclusions, causing it to behave inconsistently and often always return true.

Expected Behavior

IsBinaryFile should accurately detect whether a byte array represents a binary file, using printable ASCII and common whitespace rules.

Suggested Fix

private static bool IsBinaryFile(byte[]? data, int sampleSize = 1024)
{
    if (data == null || data.Length == 0)
        return false;

    int checkLength = Math.Min(sampleSize, data.Length);

    // Count bytes that are not printable ASCII or common whitespace
    int nonPrintableCount = data.Take(checkLength)
        .Count(b => b < 32 && b != 9 && b != 10 && b != 13);

    // Include DEL (127) and above as likely binary data
    nonPrintableCount += data.Take(checkLength)
        .Count(b => b >= 127);

    double threshold = 0.1; // 10% threshold
    return (double)nonPrintableCount / checkLength > threshold;
}

Acceptance Criteria

  • The IsBinaryFile method correctly distinguishes between text and binary files.
  • Text files (e.g., .txt, .csv, .json, .xml, .cs) should return false.
  • Binary files (e.g., .exe, .png, .jpg, .zip, .dll) should return true.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions