Skip to content

Architect Core Data Models and Unified System Representation for dreid-forge #1

@TKanX

Description

@TKanX

Description:

This foundational task establishes the high-performance data architecture for dreid-forge, the orchestration engine for DREIDING force field parameterization. The goal is to create a unified, type-safe data model capable of representing both biological macromolecules (imported with rich metadata from bio-forge) and arbitrary chemical systems (from small molecules to crystals). This design strictly separates the "pure chemical structure" (Input/Intermediate State) from the "parameterized force field topology" (Output State), ensuring a clean, unidirectional data flow. This task involves defining the core enums, the unified System container, and the complex ForgedSystem structure that will hold the final atom types, partial charges, and all DREIDING potential terms (bonds, angles, torsions, and non-bonded interactions).

Tasks:

  • Phase 1: Project Initialization

    • Initialize a new Rust library project named dreid-forge.
    • Configure Cargo.toml with crate metadata (description, keywords: force-field, dreiding, molecular-dynamics, parameterization), authors, and license.
    • Create the primary directory structure: src/model, src/io, src/forge, src/constants.rs, and src/error.rs.
    • Add zero-cost dependencies: thiserror (for errors) and serde (for potential serialization). Note: Keep model dependency-free from heavy math crates; use primitive arrays for coordinates.
  • Phase 2: Implement Fundamental Chemical Types

    • In src/model/types.rs:
      • Define the Element enum (repr(u8) covering Z=1..118 + Unknown). Implement FromStr and Display.
      • Implement atomic_mass() method for Element.
      • Define the BondOrder enum (Single, Double, Triple, Aromatic, Amide).
  • Phase 3: Define Unified Input System

    • In src/model/atom.rs:
      • Define the Atom struct containing element and position ([f64; 3]).
    • In src/model/metadata.rs:
      • Define AtomResidueInfo to store bio-specific data (atom name, residue name, residue ID, chain ID, insertion code).
      • Define BioMetadata container holding a vector of AtomResidueInfo.
    • In src/model/system.rs:
      • Define the System struct:
        • atoms: Vec<Atom>
        • bonds: Vec<Bond> (referencing atom indices).
        • box_vectors: Option<[[f64; 3]; 3]> for PBC support.
        • bio_metadata: Option<BioMetadata> (Optional, present only for Bio workflows).
  • Phase 4: Implement Parameterized Output Models

    • In src/model/topology.rs:
      • Define AtomParam struct (per-atom properties: charge, mass, type_index).
      • Define ForgedSystem struct (The final artifact):
        • system: The underlying System.
        • atom_types: Vec<String> (Unique list of DREIDING types).
        • atom_properties: Vec<AtomParam>.
        • potentials: Potentials container.
      • Define the Potentials container holding vectors for all interaction terms.
      • Implement Interaction Enums (strictly following DREIDING equations):
        • Bonded (Index-based):
          • BondPotential: Harmonic and Morse variants.
          • AnglePotential: Cosine-Harmonic (Eq 10a) and Theta-Harmonic (Eq 11).
          • DihedralPotential: Cosine form with barrier, periodicity, and phase (Eq 13).
          • ImproperPotential: Planar and Umbrella/Inversion forms (Eq 28/29).
        • Non-Bonded (Type-based):
          • VdwPairPotential: Stores pre-combined parameters for Atom Type pairs (LennardJones 12-6 and X6 Exp-6).
        • H-Bond (Index-based):
          • HBondPotential: Explicit Donor-Hydrogen-Acceptor triplets with well depth and equilibrium distance (Eq 38).
  • Phase 5: API Exposure

    • In src/model/mod.rs:
      • Re-export all submodules to form a clean public API surface.
    • Tests:
      • Write unit tests to verify System construction and BioMetadata optionality.
      • Verify memory layout of Atom and ForgedSystem ensures no unnecessary bloat.

Metadata

Metadata

Assignees

Labels

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions