You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This task focuses on building the complete Input/Output (io) subsystem for dreid-forge. The architecture will follow a highly modular, Facade-based design, where each file format is handled by its own dedicated submodule, further broken down into reader and writer components. A key requirement is the complete encapsulation of the bio-forge library, which will be used internally to handle complex biological file formats (PDB, mmCIF) and their preparation pipeline (repair, protonation, topology generation). For standard chemical formats (SDF, MOL2), the readers will perform direct parsing of topology. The entire subsystem will be exposed through a clean, high-level API in src/io/mod.rs, providing a unified interface for all data ingestion and serialization tasks.
Tasks:
Phase 1: Establish I/O Module Architecture
Create the full directory structure: src/io/, src/io/error.rs, src/io/util.rs, and subdirectories for pdb, mmcif, sdf, mol2, lammps, and bgf, each with reader.rs and/or writer.rs.
In src/io/error.rs: Define the io::Error enum using thiserror to handle file parsing, I/O operations, missing metadata, and errors propagated from the internal bio-forge library.
In src/io/mod.rs:
Define the public-facing configuration structs: BioReadConfig and ProtonationConfig.
Implement the top-level API functions: read_structure, write_structure, read_template, and write_lammps_package.
Define the WritableStructure trait to allow write_structure to accept both System and ForgedSystem.
Re-export public types and functions for a clean user-facing module.
Phase 2: Implement Core Conversion Layer
In src/io/util.rs:
Implement from_bio_topology function to convert a fully processed bio_forge::Topology into our model::System. This is the primary bridge frombio-forge.
Implement to_bio_topology function to convert our model::System (with BioMetadata) back into a bio_forge::Topology. This is the primary bridge tobio-forge for writing.
Implement necessary helper functions for converting enums (Element, BondOrder) between the two crates to ensure type safety.
Phase 3: Implement Biological Format Readers (PDB & mmCIF)
In src/pdb/reader.rs:
Implement the read function that orchestrates the full bio-forge pipeline:
Call bio_forge::io::read_pdb_structure.
Conditionally apply repair and protonation based on BioReadConfig.
Build the topology using bio_forge::ops::TopologyBuilder (a mandatory step to get bonds).
Convert the final bio_forge::Topology to model::System using the util layer.
In src/mmcif/reader.rs:
Implement the read function following the same pipeline as the PDB reader.
Phase 4: Implement Chemical Format Readers (SDF & MOL2)
In src/sdf/reader.rs:
Implement a direct-to-System parser for SDF/MOL format. It should read atom elements, coordinates, and the connection table (CT block) to populate System.atoms and System.bonds. BioMetadata will be None.
In src/mol2/reader.rs:
Implement a direct parser for MOL2 format molecules.
In src/io/mod.rs:
Implement the read_template wrapper around bio_forge::io::read_mol2_template as specified.
Phase 5: Implement All Writers
In src/pdb/writer.rs and src/mmcif/writer.rs:
Implement write functions that check for BioMetadata, convert the System to bio_forge::Topology, and call the corresponding bio-forge writer.
In src/bgf/writer.rs:
Implement a write function for the BGF format, leveraging the bio-forge writer.
In src/sdf/writer.rs and src/mol2/writer.rs:
Implement writers for standard chemical formats.
In src/lammps/writer.rs:
Implement the write function to generate the *.data and *.in.settings file pair.
The settings writer must implement the "smart" if/else logic to adapt to user-defined boundary conditions.
The data writer must correctly map ForgedSystem to all required LAMMPS sections, including Masses, Atoms (with molecule IDs), and topology sections with type IDs.
Phase 6: Verification
Add unit tests for each reader and writer to ensure format correctness.
Create integration tests that read a file, process it through a mock forge pipeline, and write it out, ensuring data integrity.
Verify that the LAMMPS output can successfully run the water molecule test case without manual modification.
Description:
This task focuses on building the complete Input/Output (
io) subsystem fordreid-forge. The architecture will follow a highly modular, Facade-based design, where each file format is handled by its own dedicated submodule, further broken down intoreaderandwritercomponents. A key requirement is the complete encapsulation of thebio-forgelibrary, which will be used internally to handle complex biological file formats (PDB, mmCIF) and their preparation pipeline (repair, protonation, topology generation). For standard chemical formats (SDF, MOL2), the readers will perform direct parsing of topology. The entire subsystem will be exposed through a clean, high-level API insrc/io/mod.rs, providing a unified interface for all data ingestion and serialization tasks.Tasks:
Phase 1: Establish I/O Module Architecture
src/io/,src/io/error.rs,src/io/util.rs, and subdirectories forpdb,mmcif,sdf,mol2,lammps, andbgf, each withreader.rsand/orwriter.rs.src/io/error.rs: Define theio::Errorenum usingthiserrorto handle file parsing, I/O operations, missing metadata, and errors propagated from the internalbio-forgelibrary.src/io/mod.rs:BioReadConfigandProtonationConfig.read_structure,write_structure,read_template, andwrite_lammps_package.WritableStructuretrait to allowwrite_structureto accept bothSystemandForgedSystem.Phase 2: Implement Core Conversion Layer
src/io/util.rs:from_bio_topologyfunction to convert a fully processedbio_forge::Topologyinto ourmodel::System. This is the primary bridge frombio-forge.to_bio_topologyfunction to convert ourmodel::System(withBioMetadata) back into abio_forge::Topology. This is the primary bridge tobio-forgefor writing.Element,BondOrder) between the two crates to ensure type safety.Phase 3: Implement Biological Format Readers (PDB & mmCIF)
src/pdb/reader.rs:readfunction that orchestrates the fullbio-forgepipeline:bio_forge::io::read_pdb_structure.BioReadConfig.bio_forge::ops::TopologyBuilder(a mandatory step to get bonds).bio_forge::Topologytomodel::Systemusing theutillayer.src/mmcif/reader.rs:readfunction following the same pipeline as the PDB reader.Phase 4: Implement Chemical Format Readers (SDF & MOL2)
src/sdf/reader.rs:Systemparser for SDF/MOL format. It should read atom elements, coordinates, and the connection table (CT block) to populateSystem.atomsandSystem.bonds.BioMetadatawill beNone.src/mol2/reader.rs:src/io/mod.rs:read_templatewrapper aroundbio_forge::io::read_mol2_templateas specified.Phase 5: Implement All Writers
src/pdb/writer.rsandsrc/mmcif/writer.rs:writefunctions that check forBioMetadata, convert theSystemtobio_forge::Topology, and call the correspondingbio-forgewriter.src/bgf/writer.rs:writefunction for the BGF format, leveraging thebio-forgewriter.src/sdf/writer.rsandsrc/mol2/writer.rs:src/lammps/writer.rs:writefunction to generate the*.dataand*.in.settingsfile pair.if/elselogic to adapt to user-defined boundary conditions.ForgedSystemto all required LAMMPS sections, includingMasses,Atoms(with molecule IDs), and topology sections with type IDs.Phase 6: Verification
forgepipeline, and write it out, ensuring data integrity.