Skip to content

Fix: safe defaults for BaseModel dict fields#24098

Merged
crazywoola merged 4 commits intolanggenius:mainfrom
hyongtao-code:fix-basemodel
Aug 21, 2025
Merged

Fix: safe defaults for BaseModel dict fields#24098
crazywoola merged 4 commits intolanggenius:mainfrom
hyongtao-code:fix-basemodel

Conversation

@hyongtao-code
Copy link
Copy Markdown
Contributor

@hyongtao-code hyongtao-code commented Aug 18, 2025

Fixes #24063

Co-authored-by: asukaminato0721[email protected]

The Python script automatically generated using ChatGPT is as follows:

python script
"""
Fix BaseModel class attributes that use '{}' as default, turning them into
`Field(default_factory=dict)` safely.

Rules:
- Rewrite ONLY inside classes that subclass BaseModel (including qualified base like pydantic.BaseModel).
- Rewrite ONLY at class body top-level (not inside methods/functions).
- Rewrite when the annotation is:
    * absent, OR
    * simple dict: `dict` / `Dict`, OR
    * dict with Any value: `dict[str, Any]` / `Dict[str, Any]`
- SKIP complex generics like `dict[str, list[GraphEdge]] = {}` and EMIT SUGGESTION to use defaultdict(list) or setdefault.
- Auto ensure `from pydantic import Field` exists.

Usage:
  python fix_pydantic_mutable_dicts.py --path src --diff        # dry-run + color diff
  python fix_pydantic_mutable_dicts.py --path src --write       # in-place modify
"""

import argparse
import os
import sys
import difflib
from typing import List, Optional, Tuple

import libcst as cst
import libcst.matchers as m
from libcst.metadata import PositionProvider, ParentNodeProvider, MetadataWrapper


# ---------- Color helpers ----------
def supports_color(stream) -> bool:
    return hasattr(stream, "isatty") and stream.isatty()

COLOR_RED = "\033[91m"
COLOR_GREEN = "\033[92m"
COLOR_RESET = "\033[0m"

def colorize(line: str, enable_color: bool) -> str:
    if not enable_color:
        return line
    if line.startswith("+") and not line.startswith("+++"):
        return f"{COLOR_GREEN}{line}{COLOR_RESET}"
    if line.startswith("-") and not line.startswith("---"):
        return f"{COLOR_RED}{line}{COLOR_RESET}"
    return line

def make_color_diff(src: str, dst: str, filename: str, enable_color: bool) -> str:
    diff = difflib.unified_diff(
        src.splitlines(True),
        dst.splitlines(True),
        fromfile=f"a/{filename}",
        tofile=f"b/{filename}",
    )
    return "".join(colorize(line, enable_color) for line in diff)


# ---------- BaseModel detection ----------
def is_basemodel_base(arg: cst.Arg) -> bool:
    """Detect BaseModel (Name or Attribute chain ending with .BaseModel)."""
    value = arg.value
    if isinstance(value, cst.Name):
        return value.value == "BaseModel"
    if isinstance(value, cst.Attribute):
        attr = value
        while isinstance(attr, cst.Attribute):
            if isinstance(attr.attr, cst.Name) and attr.attr.value == "BaseModel":
                return True
            attr = attr.value
    return False


# ---------- Annotation checks ----------
def _is_name(n: cst.CSTNode, *names: str) -> bool:
    return isinstance(n, cst.Name) and n.value in names

def _rightmost_name_is(node: cst.CSTNode, *names: str) -> bool:
    cur = node
    while isinstance(cur, cst.Attribute):
        if isinstance(cur.attr, cst.Name) and cur.attr.value in names:
            return True
        cur = cur.value
    return isinstance(cur, cst.Name) and cur.value in names

def is_simple_dict_annotation(ann: Optional[cst.Annotation]) -> bool:
    """
    True for:
      - no annotation
      - `dict` / `Dict`
      - `dict[str, Any]` / `Dict[str, Any]`
    """
    if ann is None:
        return True
    node = ann.annotation

    # dict / Dict (not subscripted)
    if isinstance(node, cst.Name) and node.value in ("dict", "Dict"):
        return True

    # dict[...] / Dict[...]
    if isinstance(node, cst.Subscript):
        # base must be dict/Dict or typing.Dict
        base_ok = False
        if isinstance(node.value, cst.Name) and node.value.value in ("dict", "Dict"):
            base_ok = True
        elif isinstance(node.value, cst.Attribute) and _rightmost_name_is(node.value, "Dict"):
            base_ok = True
        if not base_ok:
            return False

        # Expect two slices: key, value
        if not node.slice or len(node.slice) < 2:
            return False

        val_slice = node.slice[1]
        if not isinstance(val_slice, cst.SubscriptElement):
            return False
        val_value = val_slice.slice.value

        # value is Any (Name Any or typing.Any)
        if isinstance(val_value, cst.Name) and val_value.value == "Any":
            return True
        if isinstance(val_value, cst.Attribute) and _rightmost_name_is(val_value, "Any"):
            return True

    return False


def is_dict_subscript_with_list_value(ann: Optional[cst.Annotation]) -> bool:
    """
    Detect dict[*, list[*]] / Dict[*, List[*]] to emit suggestions (not auto-fix).
    """
    if ann is None:
        return False
    node = ann.annotation
    if not isinstance(node, cst.Subscript):
        return False

    base_ok = False
    if isinstance(node.value, cst.Name) and node.value.value in ("dict", "Dict"):
        base_ok = True
    elif isinstance(node.value, cst.Attribute) and _rightmost_name_is(node.value, "Dict"):
        base_ok = True
    if not base_ok:
        return False

    if not node.slice or len(node.slice) < 2:
        return False
    val_slice = node.slice[1]
    if not isinstance(val_slice, cst.SubscriptElement):
        return False
    val_value = val_slice.slice.value

    # list[...] or List[...]
    if isinstance(val_value, cst.Subscript):
        if isinstance(val_value.value, cst.Name) and val_value.value.value in ("list", "List"):
            return True
        if isinstance(val_value.value, cst.Attribute) and _rightmost_name_is(val_value.value, "List"):
            return True
    elif isinstance(val_value, cst.Name) and val_value.value in ("list", "List"):
        return True
    elif isinstance(val_value, cst.Attribute) and _rightmost_name_is(val_value, "List"):
        return True

    return False


# ---------- Transformer ----------
class FixTransformer(cst.CSTTransformer):
    METADATA_DEPENDENCIES = (PositionProvider, ParentNodeProvider,)

    def __init__(self):
        super().__init__()
        self.in_basemodel_stack: List[bool] = []
        self.function_depth: int = 0  # >0 means inside a method/function
        self.did_change: bool = False

        # import management
        self.seen_pydantic_from_imports: List[cst.ImportFrom] = []
        self.field_already_imported: bool = False
        self.must_insert_field_import: bool = False
        self.import_append_target: Optional[cst.ImportFrom] = None

        # suggestions (line, varname, message)
        self.suggestions: List[Tuple[int, str, str]] = []

    # ----- import handling -----
    def visit_ImportFrom(self, node: cst.ImportFrom) -> Optional[bool]:
        if node.module and isinstance(node.module, cst.Name) and node.module.value == "pydantic":
            self.seen_pydantic_from_imports.append(node)
            names = node.names if isinstance(node.names, list) else []
            for n in names:
                if isinstance(n, cst.ImportAlias) and isinstance(n.name, cst.Name) and n.name.value == "Field":
                    self.field_already_imported = True
        return True

    def leave_ImportFrom(self, original_node: cst.ImportFrom, updated_node: cst.ImportFrom) -> cst.ImportFrom:
        if self.import_append_target is original_node:
            names = list(updated_node.names) if isinstance(updated_node.names, list) else []
            if not any(isinstance(n.name, cst.Name) and n.name.value == "Field" for n in names):
                names.append(cst.ImportAlias(name=cst.Name("Field")))
                self.field_already_imported = True
                self.did_change = True
                return updated_node.with_changes(names=names)
        return updated_node

    def leave_Module(self, original_node: cst.Module, updated_node: cst.Module) -> cst.Module:
        if self.must_insert_field_import and not self.field_already_imported and self.import_append_target is None:
            new_import = cst.SimpleStatementLine(
                body=[cst.ImportFrom(module=cst.Name("pydantic"),
                                     names=[cst.ImportAlias(name=cst.Name("Field"))])]
            )
            body = list(updated_node.body)
            insert_at = 0
            # skip module docstring
            if body and isinstance(body[0], cst.SimpleStatementLine):
                first = body[0]
                if first.body and isinstance(first.body[0], cst.Expr) and isinstance(first.body[0].value, cst.SimpleString):
                    insert_at = 1
            body.insert(insert_at, new_import)
            self.did_change = True
            return updated_node.with_changes(body=body)
        return updated_node

    # ----- scope tracking -----
    def visit_ClassDef(self, node: cst.ClassDef) -> Optional[bool]:
        is_bm = any(is_basemodel_base(b) for b in node.bases)
        self.in_basemodel_stack.append(is_bm)
        return True

    def leave_ClassDef(self, original_node: cst.ClassDef, updated_node: cst.ClassDef) -> cst.ClassDef:
        self.in_basemodel_stack.pop()
        return updated_node

    def visit_FunctionDef(self, node: cst.FunctionDef) -> Optional[bool]:
        self.function_depth += 1
        return True

    def leave_FunctionDef(self, original_node: cst.FunctionDef, updated_node: cst.FunctionDef) -> cst.FunctionDef:
        self.function_depth -= 1
        return updated_node

    # ----- helpers -----
    def _pos_line(self, node: cst.CSTNode) -> int:
        pos = self.get_metadata(PositionProvider, node, None)
        return pos.start.line if pos else -1

    def _ensure_field_import(self):
        if not self.field_already_imported:
            if self.seen_pydantic_from_imports:
                self.import_append_target = self.seen_pydantic_from_imports[0]
            else:
                self.must_insert_field_import = True

    def _in_bm_class_body(self) -> bool:
        return bool(self.in_basemodel_stack and self.in_basemodel_stack[-1]) and self.function_depth == 0

    def _rewrite_value_to_field_dict(self, value_node: cst.BaseExpression) -> cst.BaseExpression:
        self._ensure_field_import()
        self.did_change = True
        return cst.Call(
            func=cst.Name("Field"),
            args=[cst.Arg(keyword=cst.Name("default_factory"), value=cst.Name("dict"))],
        )

    # ----- field rewriting -----
    def leave_AnnAssign(self, original_node: cst.AnnAssign, updated_node: cst.AnnAssign) -> cst.AnnAssign:
        # Only class attributes in BaseModel class body
        if not self._in_bm_class_body():
            return updated_node

        if updated_node.value and m.matches(updated_node.value, m.Dict(elements=[])):
            # complex dict with list value? -> don't rewrite; suggest usage
            if not is_simple_dict_annotation(updated_node.annotation):
                if is_dict_subscript_with_list_value(updated_node.annotation):
                    varname = original_node.target.value if isinstance(original_node.target, cst.Name) else ""
                    msg = "Consider `defaultdict(list)` or `setdefault` instead of bare dict for list values."
                    self.suggestions.append((self._pos_line(updated_node), varname, msg))
                return updated_node

            # simple dict or dict[*, Any] -> safe rewrite
            new_value = self._rewrite_value_to_field_dict(updated_node.value)
            return updated_node.with_changes(value=new_value)

        return updated_node

    def leave_Assign(self, original_node: cst.Assign, updated_node: cst.Assign) -> cst.Assign:
        # Only class attributes in BaseModel class body (avoid method bodies)
        if not self._in_bm_class_body():
            return updated_node
        if len(updated_node.targets) != 1:
            return updated_node

        # Example: outputs = {}
        if m.matches(updated_node.value, m.Dict(elements=[])):
            new_value = self._rewrite_value_to_field_dict(updated_node.value)
            return updated_node.with_changes(value=new_value)

        return updated_node


# ---------- FS / runner ----------
def find_python_files(root: str) -> List[str]:
    out: List[str] = []
    if os.path.isfile(root) and root.endswith(".py"):
        return [root]
    for dirpath, dirnames, filenames in os.walk(root):
        base = os.path.basename(dirpath)
        if base in (".git", ".venv", "venv", "__pycache__", ".mypy_cache", ".pytest_cache", "node_modules"):
            continue
        for fn in filenames:
            if fn.endswith(".py"):
                out.append(os.path.join(dirpath, fn))
    return out


def process_file(path: str, do_write: bool, show_diff: bool, enable_color: bool) -> Tuple[bool, str, List[Tuple[int, str, str]]]:
    """
    Return (changed, diff_text, suggestions)
    """
    try:
        src = open(path, "r", encoding="utf-8").read()
    except (OSError, UnicodeDecodeError):
        return False, "", []

    try:
        mod = cst.parse_module(src)
    except cst.ParserSyntaxError:
        return False, "", []

    wrapper = MetadataWrapper(mod)
    transformer = FixTransformer()
    new_mod = wrapper.visit(transformer)

    if not transformer.did_change and not transformer.suggestions:
        return False, "", []

    dst = new_mod.code
    if do_write and transformer.did_change:
        with open(path, "w", encoding="utf-8") as f:
            f.write(dst)

    diff_text = ""
    if show_diff and transformer.did_change:
        diff_text = make_color_diff(src, dst, path, enable_color)

    return transformer.did_change, diff_text, transformer.suggestions


def main() -> None:
    ap = argparse.ArgumentParser(description="Fix BaseModel fields with '{}' default to Field(default_factory=dict) safely.")
    ap.add_argument("--path", required=True, help="Path to a .py file or a directory to scan.")
    ap.add_argument("--write", action="store_true", help="Apply changes in place. Default is dry-run.")
    ap.add_argument("--diff", action="store_true", help="Show unified diff for changed files.")
    ap.add_argument("--no-color", action="store_true", help="Disable ANSI color in diff output.")
    args = ap.parse_args()

    if not os.path.exists(args.path):
        print(f"[!] Path not found: {args.path}", file=sys.stderr)
        sys.exit(2)

    files = find_python_files(args.path)
    if not files:
        print("[i] No Python files found.")
        return

    enable_color = (not args.no_color) and supports_color(sys.stdout)

    changed_files = 0
    for fp in files:
        changed, diff_text, suggestions = process_file(fp, args.write, args.diff, enable_color)
        if changed:
            changed_files += 1
            mode = "WRITE" if args.write else "DRY  "
            print(f"[{mode}] {fp}")
            if diff_text:
                sys.stdout.write(diff_text)
        for (line, varname, msg) in suggestions:
            loc = f"{fp}:{line}" if line > 0 else fp
            var = f" `{varname}`" if varname else ""
            print(f"[SUGGEST]{var} at {loc}: {msg}")

    if args.write:
        print(f"\nDone. Files modified: {changed_files}")
    else:
        print(f"\nDry-run complete. Files that would change: {changed_files}")


if __name__ == "__main__":
    main()

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Aug 18, 2025
@hyongtao-code hyongtao-code marked this pull request as draft August 18, 2025 07:44
@hyongtao-code hyongtao-code marked this pull request as ready for review August 18, 2025 08:07
Copy link
Copy Markdown
Member

@crazywoola crazywoola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 19, 2025
@hyongtao-code
Copy link
Copy Markdown
Contributor Author

Thank you.

@crazywoola crazywoola merged commit 106ab7f into langgenius:main Aug 21, 2025
7 checks passed
Eric-Guo pushed a commit to Eric-Guo/dify that referenced this pull request Aug 22, 2025
Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Eric-Guo pushed a commit to Eric-Guo/dify that referenced this pull request Aug 22, 2025
Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Gnomeek added a commit to Gnomeek/dify that referenced this pull request Aug 22, 2025
ci

add assertion

docs: format all md files (langgenius#24195)

Signed-off-by: yihong0618 <[email protected]>

hotfix: fix multiple case match syntax (langgenius#24204)

feat: notice of the expire of education verify (langgenius#24210)

Co-authored-by: Copilot <[email protected]>

Feat: Education (langgenius#24208)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

chore: translate i18n files (langgenius#24211)

Co-authored-by: iamjoel <[email protected]>

Httpx example (langgenius#24151)

add tyck tool (currently ignore the error) (langgenius#22592)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

Fix the bug of automatically appending basepath to image resource. (langgenius#24201)

fix: keep idempotent when init AnalyticdbVectorBySql (langgenius#24239)

Co-authored-by: xiaozeyu <[email protected]>

fix: dataset doc-form compatible (langgenius#24177)

Co-authored-by: huangzhuo <[email protected]>

feature: add test containers base tests for saved message service (langgenius#24259)

Mcp support resource discovery (langgenius#24223)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

fix: loop exit condition accepts variables from nodes inside the loop  langgenius#24183: (langgenius#24257)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

feat: show the start time with seconds of the app logs (langgenius#24267)

fix(api):Fix the issue of empty and not empty operations failing in k… (langgenius#24276)

Co-authored-by: liumin <[email protected]>

fix: rollback when AnalyticDB create zhparser failed (langgenius#24260)

Co-authored-by: xiaozeyu <[email protected]>

fix: value_type check failed when updating variables (langgenius#24274)

Co-authored-by: me <[email protected]>

[Test] add unit tests for ProviderConfigEncrypter encrypt/mask/decrypt (langgenius#24280)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

refactor: replace try-except blocks with contextlib.suppress for cleaner exception handling (langgenius#24284)

style: replace `h-[1px]` with `h-px` to unify the writing format of Tailwind CSS (langgenius#24146)

Fix: safe defaults for BaseModel dict fields (langgenius#24098)

Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

[CHORE]: x: T = None to x: Optional[T] = None (langgenius#24217)

feature: add test containers base tests for tag service (langgenius#24313)

[Test] add unit tests for web_reader_tool.py (langgenius#24309)

Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

Annotations example (langgenius#24304)

an example of sessionmaker (langgenius#24246)

feat: implement TooltipManager for managing tooltip lifecycle (langgenius#24236)

Co-authored-by: crazywoola <[email protected]>

Flask 3.1.2 upgrade fix by Avoids using current_user in background thread  (langgenius#24290)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

fix: Optimize scrolling experience on plugin page (langgenius#24314) (langgenius#24322)

auto format md files (langgenius#24242)

Feat/chat message image first for agent and advanced_chat APP (langgenius#23796)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

Update knowledge_retrieval_node.py (langgenius#24111)

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

example of next(, None) (langgenius#24345)

feat: Add default value support for all workflow start node variable types (langgenius#24129)

Co-authored-by: crazywoola <[email protected]>
Co-authored-by: Copilot <[email protected]>

refactor: simplify repository factory with Django-style import_string (langgenius#24354)

Fix missing database commit in provider update handler (langgenius#24357)

Node search supports model and name search (langgenius#24331)

[Chore/Refactor] except StopIteration -> next( , None)
qiaofenlin pushed a commit to qiaofenlin/dify that referenced this pull request Aug 24, 2025
Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
ZeroZ-lab pushed a commit to ZeroZ-lab/dify that referenced this pull request Aug 25, 2025
Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@hyongtao-code hyongtao-code deleted the fix-basemodel branch August 27, 2025 05:35
qiqizjl pushed a commit to qiqizjl/dify that referenced this pull request Aug 27, 2025
Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
HarryReidx pushed a commit to HarryReidx/dify that referenced this pull request Sep 1, 2025
Co-authored-by: Yongtao Huang <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
(cherry picked from commit 106ab7f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Chore/Refactor] unify use Field for dict field

3 participants