-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Background and Motivation
Currently hard links to the same file get duplicated in the archive.
Instead, when additional hard links to the same file are encountered, it should be possible to store them as hard links to the first entry.
When hardlink entries are present, the current implementation will already extract them as hard links.
Hard links in tar archives can create difficulties when extracting to file systems that do not support them.
The following API proposal enables a user to:
- Extract hard links as independent file copies instead of creating actual hard links
- Create archives where hard-linked files are stored as separate files
API Proposal
namespace System.Formats.Tar;
public class TarWriter
{
// New overload accepting options
public TarWriter(Stream stream, TarWriterOptions options, bool leaveOpen = false);
}
public static class TarFile
{
// New overloads accepting options for creation
public static void CreateFromDirectory(string sourceDirectoryName, Stream destination, TarCreateOptions options);
public static void CreateFromDirectory(string sourceDirectoryName, string destinationFileName, TarCreateOptions options);
public static Task CreateFromDirectoryAsync(string sourceDirectoryName, Stream destination, TarCreateOptions options, CancellationToken cancellationToken = default);
public static Task CreateFromDirectoryAsync(string sourceDirectoryName, string destinationFileName, TarCreateOptions options, CancellationToken cancellationToken = default);
// New overloads accepting options for extraction
public static void ExtractToDirectory(Stream source, string destinationDirectoryName, TarExtractOptions options);
public static void ExtractToDirectory(string sourceFileName, string destinationDirectoryName, TarExtractOptions options);
public static Task ExtractToDirectoryAsync(Stream source, string destinationDirectoryName, TarExtractOptions options, CancellationToken cancellationToken = default);
public static Task ExtractToDirectoryAsync(string sourceFileName, string destinationDirectoryName, TarExtractOptions options, CancellationToken cancellationToken = default);
}
// New class
public sealed class TarCreateOptions
{
// Corresponds to the arg being added in https://github.com/dotnet/runtime/pull/123407
public TarEntryFormat Format { get; set; } = System.Formats.Tar.TarEntryFormat.Pax;
// Corresponds to existing CreateFromDirectory argument.
public bool IncludeBaseDirectory { get; set; } = false;
/// This value is passed to TarWriterOptions.DereferenceHardLinks.
public bool DereferenceHardLinks { get; set; } = false;
}
// New class
public sealed class TarWriterOptions
{
// Corresponds to existing constructor argument.
public TarEntryFormat Format { get; set; } = System.Formats.Tar.TarEntryFormat.Pax;
/// When set to true, TarWriter.WriteEntry(string fileName, string? entryName) and TarWriter.WriteEntryAsync(string fileName, string? entryName, CancellationToken)
/// will store hard-linked files as separate entries in the archive.
public bool DereferenceHardLinks { get; set; } = false;
}
// New class
public sealed class TarExtractOptions
{
// Corresponds to existing ExtractToDirectoryAsync argument.
public bool OverwriteFiles { get; set; } = false;
/// When set to true, TarEntryType.HardLink entries will be restored as a copy of the linked file instead of creating a hard link.
public bool DereferenceHardLinks { get; set; } = false;
}API Usage
Extracting with Dereferenced Hard Links
// Extract tar archive with hard links converted to independent file copies
TarFile.ExtractToDirectory(
"archive.tar",
"/path/to/output",
new TarExtractOptions
{
OverwriteFiles = true,
DereferenceHardLinks = true // Hard links become independent file copies
}
);Creating Archives with Dereferenced Hard Links
// Create tar archive storing hard-linked files as separate entries
await TarFile.CreateFromDirectoryAsync(
"/path/to/source",
"output.tar",
new TarCreateOptions
{
Format = TarEntryFormat.Pax,
DereferenceHardLinks = true // Store hard links as separate files
}
);Using TarWriter with Options
using var stream = File.OpenWrite("archive.tar");
using var writer = new TarWriter(
stream,
new TarWriterOptions
{
Format = TarEntryFormat.Pax,
DereferenceHardLinks = true
}
);
writer.WriteEntry("/path/to/file", "entry-name");Alternative Designs
-
An alternative would be to add a
bool dereferenceHardLinksparameter directly to existing methods. The proposed options classes provides extensibility without having to add additional overloads for future arguments. -
The proposed default is
DereferenceHardLinks = false, this is a change from earlier .NET versions which are duplicating the files. -
Instead of adding separate types for
TarExtractOptionsandTarWriterOptions, one type could be used andIncludeBaseDirectorycould be "ignored" by theTarWriterconstructor. -
Instead of TarCreateOptions duplicating all TarWriterOptions properties, we can reference that type instead:
// New class
public sealed class TarCreateOptions
{
// Corresponds to existing CreateFromDirectory argument.
public bool IncludeBaseDirectory { get; set; } = false;
public TarWriterOptions TarWriterOptions { get; set; } = new (); // possibly lazy-init during get
}Or replace by the (bool includeBaseDirectory, TarWriterOptions writerOptions) parameter pair in the TarFile.CreateFromDirectory methods
Notes
-
This proposal does not include
TarReaderOptions.DereferenceHardLinksbecause when aTarEntryis read viaTarReader, theTarEntry.ExtractToFile(string destinationFileName)method has no base directory context to resolve theLinkNameand locate the file that should be copied. When usingTarFile.ExtractToDirectory, the base directory is known. -
DereferenceHardLinksonly applies to hard links. The behavior of symlinks can be made configurable by adding another property to the Options classes.