Skip to content

feat(io): Implement Modular I/O Subsystem and Bio-Forge Integration#4

Merged
TKanX merged 56 commits intomainfrom
feature/3-implement-modular-io-subsystem-with-encapsulated-bio-forge-pipeline
Dec 8, 2025
Merged

feat(io): Implement Modular I/O Subsystem and Bio-Forge Integration#4
TKanX merged 56 commits intomainfrom
feature/3-implement-modular-io-subsystem-with-encapsulated-bio-forge-pipeline

Conversation

@TKanX
Copy link
Copy Markdown
Member

@TKanX TKanX commented Dec 8, 2025

Summary:

Implemented a comprehensive Input/Output subsystem that serves as the interface between external file formats and the internal System model. Introduces a unified API for handling both macromolecular data (via an encapsulated bio-forge pipeline) and small molecule informatics. It includes native parsers for chemical formats and robust writers for simulation engines, establishing the project's capability to ingest raw structure data and export simulation-ready configurations.

Changes:

  • Designed Unified I/O Architecture:

    • Established ChemReader/ChemWriter for small molecule formats and BioReader/BioWriter for macromolecular structures.
    • Implemented a centralized Error handling system covering I/O, parsing, and conversion failures.
    • created src/io/mod.rs as the public facade, exporting configuration structs for topology, cleaning, protonation, and solvation.
  • Integrated bio-forge Pipeline:

    • Implemented src/io/util.rs to adapt bio-forge models (Structure, Topology) to local System models.
    • Added support for PDB and mmCIF formats (Read/Write) with automated structure preparation steps (repair, hydrogen addition, solvation).
    • Mapped biological configuration options (pH, salt concentration, histidine strategies) to the underlying engine.
  • Refactored Metadata System:

    • Updated AtomResidueInfo in src/model/metadata.rs to use a Builder Pattern, improving ergonomics for complex residue data.
    • Added StandardResidue, ResidueCategory, and ResiduePosition enums to richer biological context preservation.
  • Implemented Native Chemical Format Support:

    • SDF: Developed a V2000-compliant reader and writer, including coordinate parsing and bond order inference.
    • Mol2: Implemented a parser for TRIPOS Mol2 files, handling @<TRIPOS>ATOM, BOND, and MOLECULE sections.
  • Developed Simulation Output Formats:

    • BGF: Created a writer for the BioDesign/Dreiding BGF format, featuring atom sorting by chain/residue and CONECT record generation.
    • LAMMPS: Implemented a sophisticated writer capable of generating both Data files and Settings files.
      • Supports periodic and non-periodic boundary conditions.
      • Handles complex forcefield parameter mapping for Bonds, Angles, Dihedrals, Impropers, and Non-bonded interactions (LJ/Buckingham).

TKanX added 30 commits December 7, 2025 13:59
…ions for Clean, Protonation, Solvate, and Topology
TKanX added 21 commits December 7, 2025 20:23
@TKanX TKanX self-assigned this Dec 8, 2025
Copilot AI review requested due to automatic review settings December 8, 2025 10:50
@TKanX TKanX added the enhancement ✨ New feature or request label Dec 8, 2025
@TKanX TKanX linked an issue Dec 8, 2025 that may be closed by this pull request
44 tasks
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive I/O subsystem that establishes the project's capability to read and write molecular structure data across multiple file formats. The implementation introduces a clean separation between chemical formats (small molecules) and biological formats (macromolecules), with a unified error handling system and robust bio-forge integration for structure preparation workflows.

Key Changes:

  • Introduced builder pattern for AtomResidueInfo with new biological metadata fields (StandardResidue, ResidueCategory, ResiduePosition enums)
  • Implemented ChemReader/ChemWriter for SDF and Mol2 formats with native parsers
  • Implemented BioReader/BioWriter for PDB and mmCIF with configurable preparation pipelines (cleaning, protonation, solvation)
  • Created sophisticated LAMMPS and BGF writers supporting complex forcefield parameters and system topologies

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/lib.rs Exports new io module
src/model/metadata.rs Refactored to builder pattern, added biological metadata enums
src/io/mod.rs Public API facade with reader/writer structs and configuration types
src/io/util.rs Bio-forge integration with bidirectional model conversions
src/io/error.rs Centralized error handling with thiserror and contextual information
src/io/sdf/* V2000-compliant SDF reader/writer with element inference
src/io/mol2/* TRIPOS Mol2 parser with section-based parsing
src/io/pdb/* PDB reader/writer with structure preparation pipeline
src/io/mmcif/* mmCIF reader/writer mirroring PDB functionality
src/io/bgf/* BGF writer with chain/residue sorting and CONECT generation
src/io/lammps/* LAMMPS Data/Settings file writer with hybrid style support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/io/lammps/writer.rs Outdated
@TKanX TKanX merged commit 6ebf34e into main Dec 8, 2025
2 checks passed
@TKanX TKanX deleted the feature/3-implement-modular-io-subsystem-with-encapsulated-bio-forge-pipeline branch December 8, 2025 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement ✨ New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Modular I/O Subsystem with Encapsulated bio-forge Pipeline

2 participants