Skip to content

Evolve API to Infer Formal Charges and Lone Pairs from Molecular Graph #9

@TKanX

Description

@TKanX

Description:

This task refactors the core perception pipeline to eliminate the need for users to supply formal_charge as an input. The library's API will be simplified to accept a MolecularGraph containing only elemental identities and bond connectivity. A new, advanced perception stage will be implemented to automatically infer formal charges and lone pair counts based on established chemical principles (octet rule, valence, and common bonding patterns). This change will make the library more robust, user-friendly, and chemically intelligent, as it will be able to correctly interpret ionic species like zwitterionic amino acids, carboxylates, and ammonium groups directly from their topological structure. This evolution shifts the burden of chemical correctness from the user to the library, representing a major step towards a fully automated parameterization tool.

Tasks:

  • Phase 1: Update Core Data Structures and Public API

    • In src/core/graph.rs, remove the formal_charge: i8 field from the AtomNode struct.
    • Update the MolecularGraph::add_atom method signature to pub fn add_atom(&mut self, element: Element) -> usize.
    • In src/processor/graph.rs, modify the AtomView struct to retain the formal_charge field, as this will now be a perceived property, not an input.
    • Update ProcessingGraph::new to initialize formal_charge to a default value (e.g., 0), to be correctly populated during the perception stage.
    • Update all public-facing examples and documentation (README.md, lib.rs) to reflect the new, simpler add_atom API.
  • Phase 2: Implement Charge and Lone Pair Perception Logic

    • In src/processor/perception.rs, create a new private function perceive_charge_and_lone_pairs(atom: &AtomView, graph: &ProcessingGraph) -> (i8, u8).
    • Algorithm:
      • Implement the base case logic using valence, bonding electrons, and the octet rule to determine the most likely neutral state.
      • Implement a set of heuristic rules to identify common ionic patterns:
        • Quaternary Nitrogen: A 4-coordinate nitrogen should be assigned a +1 charge.
        • Carboxylate Oxygen: A 1-coordinate oxygen bonded to a carbon that is also double-bonded to another oxygen should be assigned a -1 charge.
        • Nitro Oxygen: A 1-coordinate oxygen bonded to a nitrogen that is also double-bonded to another oxygen should be assigned a -1 charge.
        • Phosphate Oxygen: Implement similar logic for singly-bonded oxygens attached to phosphorus in a phosphate-like environment.
      • Ensure the function correctly calculates and returns both the inferred formal_charge and the corresponding lone_pairs count.
    • Refactor perceive_electron_counts to be a two-pass process: first calculate valence and bonding electrons for all atoms, then in a second pass, call perceive_charge_and_lone_pairs for each atom. This ensures all bonding information is available for context-dependent charge inference.
  • Phase 3: Update Downstream Logic and Templates

    • In src/processor/templates.rs, review all FunctionalGroupTemplate definitions.
    • Refactor Predicates: Modify any template matching predicates that previously relied on a user-input formal_charge. For example, the Carboxylate template should now match based on the newly perceived lone_pairs count (e.g., lone_pairs == 3 for the O⁻ atom) instead of formal_charge == -1.
    • Correctness: Ensure that this refactoring does not alter the template's specificity and correctly identifies the intended functional groups.
  • Phase 4: Update and Expand Integration Tests

    • In tests/harness.rs, remove the charge field from AtomBlueprint and update build_from_blueprint to use the new add_atom signature.
    • Update All Tests: Modify every existing test case in tests/cases/ to remove the now-obsolete charge information.
    • Verification: Run the entire test suite. All tests, especially those for zwitterionic amino acids (GLYCINE_ZWITTERION, etc.), nucleic acid backbones (DINUCLEOTIDE_BACKBONE), and nitro compounds (TRINITROBENZENE), must pass without modification to their expected atom types. This validates that the new perception logic is correct.
    • New Test Cases:
      • Add a test case for a simple quaternary ammonium salt (e.g., tetramethylammonium) to explicitly validate the N⁺ perception.
      • Add a test case for a simple sulfoxide or sulfone to check the behavior with hypervalent main-group elements.

Metadata

Metadata

Assignees

Labels

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions