Skip to content

Added #34277 -- Add Conditional WHERE Clause to bulk_create for SQLite and PostgreSQL#17515

Open
HamaBarhamou wants to merge 8 commits intodjango:mainfrom
HamaBarhamou:feature/bulk-create-where-clause
Open

Added #34277 -- Add Conditional WHERE Clause to bulk_create for SQLite and PostgreSQL#17515
HamaBarhamou wants to merge 8 commits intodjango:mainfrom
HamaBarhamou:feature/bulk-create-where-clause

Conversation

@HamaBarhamou
Copy link
Copy Markdown

@HamaBarhamou HamaBarhamou commented Nov 23, 2023

#34227
This pull request introduces a significant enhancement to Django's ORM by adding a conditional WHERE clause capability to the bulk_create method for SQLite and PostgreSQL backends. This feature allows users to specify conditions under which updates should occur in case of conflicts during bulk insert operations, offering more control and flexibility in data handling.

Key Changes and Implementation:

  • Added a where_clause parameter to the bulk_create method.
  • Integrated this feature with SQLite and PostgreSQL backends by handling the where_clause in the respective database operations.
  • Created comprehensive tests to ensure functionality and reliability across the supported databases.

Future Vision and Rationale for Current Scope:

  • The current implementation accepts raw SQL strings for the where_clause. The goal was to establish a foundation upon which further enhancements can be built.
  • Future work could involve extending support to other databases and allowing Django's Q and F expressions for the where_clause, making it more "Djangonic". However, this would require a more intricate handling and understanding of each database's capabilities and syntax.
  • I decided to limit the scope of this initial implementation to ensure reliability and ease of integration. Tackling the complexity of database-specific syntax and Django expressions in a single go could introduce risks and prolong the review and integration process.
  • A dedicated ticket and subsequent contributions will focus on these advanced features, ensuring thorough testing and consideration for each step of the enhancement.

This contribution represents a starting point for a broader feature addition to Django's ORM. I'm looking forward to feedback and discussions on how we can further evolve this functionality in future iterations.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

@shangxiao
Copy link
Copy Markdown
Contributor

@HamaBarhamou Thanks 🏆 I left a few cursory comments

@HamaBarhamou HamaBarhamou changed the title Fixed #34277 -- Add Conditional WHERE Clause to bulk_create for SQLite and PostgreSQL Added #34277 -- Add Conditional WHERE Clause to bulk_create for SQLite and PostgreSQL Nov 23, 2023
@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 2e12bd8 to d1d09d3 Compare November 23, 2023 15:19
Copy link
Copy Markdown
Member

@ngnpope ngnpope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the initial work on this - it'll be a nice feature to have.

I've added a bunch of comments with regards to making things more consistent, but I think the main blocker to progress will be properly supporting passing expressions to where_clause rather than backend-specific SQL which will leave this open to increased risk of SQL injection.

@HamaBarhamou
Copy link
Copy Markdown
Author

Thanks for the initial work on this - it'll be a nice feature to have.

I've added a bunch of comments with regards to making things more consistent, but I think the main blocker to progress will be properly supporting passing expressions to where_clause rather than backend-specific SQL which will leave this open to increased risk of SQL injection.

Thank you so much, ngnpope, for your constructive and detailed feedback. I am delighted to hear that this feature will be a welcomed addition and I take note of your suggestions to improve my pull request. I will work on the modifications you have recommended and incorporate them into the code. Your assistance is highly valuable and greatly contributes to my understanding of the Django project.

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from d1d09d3 to be3f6bc Compare November 24, 2023 16:53
@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch 2 times, most recently from dd057fd to 485cd47 Compare December 8, 2023 12:20
@HamaBarhamou
Copy link
Copy Markdown
Author

Dear colleagues, @ngnpope , @shangxiao , I believe I have completed the modifications for my recent pull request. I eagerly await your feedback and suggestions for any necessary improvements. Thank you for your support and collaboration

Copy link
Copy Markdown
Member

@ngnpope ngnpope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @HamaBarhamou 🙂

I think you missed some of my previous comments. Unfortunately GitHub collapses stuff and it's easy to miss - look for the "XX hidden conversations" bits and click "Load more...".

I've added some more comments. The big things of interest going forward are:

  • Where do we compile the condition expression - can we push down, etc. ?
  • Can we make the condition expression more general than hardcoding to F and Q?
  • How do we handle PostgreSQL's EXCLUDED?
    • I think this could be a subclass of F with some special handling perhaps
    • We could also probably use the name EXCLUDED on MySQL for the table alias
      • Currently we have AS new, but that can be anything and new is a bit generic

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's flatten this down a bit. And also make sure we're using the correct database and connection...

Suggested change
model = self.query.model
query = Query(model)
query.add_q(expression)
compiler = query.get_compiler(connection=connection)
node = query.where
sql, params = compiler.compile(node)
query = Query(self.query.model)
query.add_q(expression)
compiler = query.get_compiler(self.using, self.connection)
sql, params = compiler.compile(query.where)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth making sure that params is a tuple here as per other methods:

Suggested change
return sql, params
return sql, tuple(params)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the returned parameters are not tuples but arrays. they will be used here: param_rows += [params_on_conflict] in the compiler.py file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that this name feels a little bit generic...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a proposal for a name

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "", []
return "", ()

@HamaBarhamou
Copy link
Copy Markdown
Author

HamaBarhamou commented Dec 11, 2023

Thanks for the updates @HamaBarhamou 🙂

I think you missed some of my previous comments. Unfortunately GitHub collapses stuff and it's easy to miss - look for the "XX hidden conversations" bits and click "Load more...".

I've added some more comments. The big things of interest going forward are:

  • Where do we compile the condition expression - can we push down, etc. ?

  • Can we make the condition expression more general than hardcoding to F and Q?

  • How do we handle PostgreSQL's EXCLUDED?

    • I think this could be a subclass of F with some special handling perhaps

    • We could also probably use the name EXCLUDED on MySQL for the table alias

      • Currently we have AS new, but that can be anything and new is a bit generic
  • Comment gérons-nous les PostgreSQL EXCLUDED?

Hello @ngnpope and the team 🙂,

Thank you for your insightful questions. Here are my thoughts:

Compiling the condition expression: I will further reflect on the possibility of pushing the condition expression lower in the execution stack. It's an interesting point that I wish to explore more.

Generalizing the condition expression: My approach did not involve specific hardcoding for F and Q. I have utilized Django's existing mechanisms, which, to my understanding, are already suited to handle these expressions in a general manner. If my understanding diverges from what you expect, I am open to clarifications.

Handling EXCLUDED in PostgreSQL: My work is a continuation of PR #13065 where EXCLUDED was incorporated. My goal has been to add a WHERE clause to the existing query. If you think there are aspects I might have missed or misinterpreted regarding EXCLUDED, I am open to your guidance to ensure my contribution aligns perfectly with the project's expectations.

I am still in a learning phase and appreciate your patience and guidance in this process.

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from d7e9041 to fa900e1 Compare December 18, 2023 11:57
@HamaBarhamou
Copy link
Copy Markdown
Author

HamaBarhamou commented Dec 18, 2023

Dear Team @ngnpope @shangxiao ,

I am writing to inform you that I have completed the necessary adjustments on the project and am ready for another review.

However, I would like to seek your opinion on a specific point: is the current state of my work, which does not yet include an implementation for the MySQL backends, sufficient for acceptance? In other words, is it imperative to also integrate MySQL for my contribution to be considered complete, or are the current improvements adequate for acceptance?

Thank you for your feedback and guidance.

Copy link
Copy Markdown
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamaBarhamou Thanks 👍 I left initial comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use versionchanged, not versionadded. Also version* annotations are for small notes, not for the entire documentation of new features. You should change a signature in docs and document the new argument directly in the bulk_create() docs.

.. versionchanged:: 5.1

    The `condition` parameter was added.

Moreover docs should be wrapped at 79 chars.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixxm is that all I have to sum up in 79 characters?

.. versionadded:: 5.1

   Added the ``condition`` parameter and the ``Excluded`` class.

   The ``condition`` parameter allows specifying a conditional SQL WHERE clause 
   for bulk insert operations involving conflict resolution. This feature is 
   currently supported only by PostgreSQL and SQLite backends (starting from 
   SQLite version 3.42.0) and enables conditional updates based on specified 
   criteria during these operations.

   ``condition`` accepts Django's ``F``, ``Q``, and now ``Excluded`` expressions, 
   providing a secure and consistent way to define conditional logic while ensuring 
   protection against SQL injection and simplifying the construction of complex conditional 
   logic.

   The ``Excluded`` class is a special extension of Django's ``F`` expressions designed for handling
   conflict resolution in bulk insert operations. While F expressions reference fields in the existing
   database, ``Excluded`` refers to values in the proposed data for insertion, particularly in scenarios
   where these values are excluded due to a conflict during an ON CONFLICT clause in SQL.

   Unlike ``F`` expressions, ``Excluded`` cannot be used in general query contexts but is specifically
   tailored for use in ``bulk_create(..., update_conflicts=True)`` scenarios.

   Example using ``F``, ``Q``, and ``Excluded`` expressions:

    .. code-block:: pycon

        >>> from datetime import timedelta
        >>> from django.db.models import F, Q, Excluded
        >>> from django.utils import timezone
        >>> from your_app.models import Item  # Replace 'your_app' with the name of your application
        # Initial insertion with last update date
        >>> items = [
        ...     Item(id=1, name="Item 1", last_updated=timezone.now() - timedelta(days=1)),
        ...     Item(id=2, name="Item 2", last_updated=timezone.now()),
        ... ]
        >>> Item.objects.bulk_create(items)

        # Attempt to update with conditions
        >>> updated_items = [
        ...     Item(id=1, name="New Item 1", last_updated=timezone.now() - timedelta(days=2)),
        ...     Item(id=2, name="New Item 2", last_updated=timezone.now() - timedelta(days=3)),
        ... ]

        >>> Item.objects.bulk_create(
        ...     updated_items,
        ...     update_conflicts=True,
        ...     update_fields=["name", "last_updated"],
        ...     unique_fields=["id"],
        ...     condition=(
        ...         Q(last_updated__gt=F("last_updated") + timedelta(days=1))
        ...         | Q(last_updated__lt=Excluded("last_updated") - timedelta(hours=12))
        ...     ),
        ... )

    In this example, the rows are updated if ``last_updated`` is more recent than the recorded date plus one day,
    or if ``last_updated`` is earlier than the proposed one minus 12 hours ``(Excluded("last_updated") - timedelta(hours=12))``.

how do you explain all that addition in just 79 characters? or do you mean 79 characters per line?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"wrapped at 79 chars" not "limited to the 79 chars", so 79 chars per line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot hard-code vendor names in expressions. Please add a new feature flag.

Copy link
Copy Markdown
Author

@HamaBarhamou HamaBarhamou Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I @felixxm understand your concern about hard-coding vendor names. My primary goal is not to check if a backend supports Excluded, but to generate the correct SQL code depending on the backend for use cases like condition=Q(rank__gt=Excluded("number") + 1) & ~Q(name__startswith="A") | Q(number__gt=3).

I am very open to suggestions for improving this implementation. Perhaps we could consider a more abstract method in BaseDatabaseOperations, which can be adapted by each backend to handle Excluded appropriately, while maintaining a uniform and 'Django-esque' interpretation of Excluded across different backends. Your input on the best way forward would be highly valuable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new feature flag.

Have you seen my comment? ☝️

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new feature flag.

Have you seen my comment? ☝️

I really don't know if we understand each other. I'll suggest something in the next revision and you'll let me know.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluded makes sense only for backend that support supports_update_conflicts_with_condition, so there is not need to check vendor names you should check supports_update_conflicts_with_condition flag in the Excluded.as_sql(), e.g.

def as_sql(self, compiler, connection):
    if not connection.features.supports_update_conflicts_with_condition:
        raise NotSupportedError(...)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluded makes sense only for backend that support supports_update_conflicts_with_condition, so there is not need to check vendor names you should check supports_update_conflicts_with_condition flag in the Excluded.as_sql(), e.g.

def as_sql(self, compiler, connection):
    if not connection.features.supports_update_conflicts_with_condition:
        raise NotSupportedError(...)

yes indeed. you're right. i hadn't thought of that.
However, our discussions have made me rethink my approach to the question. But I'll make a note of it in case I need to keep that approach.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation should be handled in the SQLInsertCompiler.as_sql(). There is no need to add a backend-specific hook for this, there is nothing backend-specific in the compilation, and there is no need to pass expression/query/sql/params back and forth. Moreover, moving it back to the SQLInsertCompiler.as_sql() will allow avoiding all unnecessary changes in DatabaseOperations.

Copy link
Copy Markdown
Author

@HamaBarhamou HamaBarhamou Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation should be handled in the SQLInsertCompiler.as_sql(). There is no need to add a backend-specific hook for this, there is nothing backend-specific in the compilation, and there is no need to pass expression/query/sql/params back and forth. Moreover, moving it back to the SQLInsertCompiler.as_sql() will allow avoiding all unnecessary changes in DatabaseOperations.

@ngnpope @felixxm I must have misinterpreted it then:
#17515 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's not a subclass of F?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @felixxm, initially, Excluded was indeed a subclass of F. However, we encountered some limitations, particularly with the SQLite backend. When using operations like Excluded('rank') + 1, we found errors, for example, the absence of the _output_field_or_none attribute in Excluded objects. To solve these problems, Excluded was redesigned to inherit directly from Expression. This approach enabled us to handle the specifics of excluded values in conflicting update operations, while avoiding F inheritance problems."

Copy link
Copy Markdown
Member

@felixxm felixxm Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why you need output_field? Excluded doesn't depend on the output field or uses it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also documented Excluded as an extension of F which is currently not True.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why you need output_field? Excluded doesn't depend on the output field or uses it.

here's my class as I originally defined it

class Excluded(F):
    def __init__(self, field_name):
        super().__init__(field_name)
        if isinstance(field_name, str):
            self.lhs = F(field_name)
        else:
            self.lhs = field_name

    def resolve_expression(
        self, query=None, allow_joins=True, reuse=None, summarize=False, for_save=False
    ):
        c = self.copy()
        c.lhs = self.lhs.resolve_expression(
            query, allow_joins, reuse, summarize, for_save
        )
        return c

    def as_sql(self, compiler, connection):
        lhs_sql, lhs_params = compiler.compile(self.lhs)
        db_vendor = connection.vendor

        if db_vendor in ["postgresql", "sqlite"]:
            sql = f"EXCLUDED.{lhs_sql.split('.')[-1]}"
        elif db_vendor == "mysql":
            sql = f"VALUES({lhs_sql.split('.')[-1]})"
        else:
            raise NotImplementedError(
                f"Backend '{db_vendor}' is not supported for the Excluded operation."
            )

        return sql, lhs_params

    def copy(self):
        copy = super().copy()
        copy.lhs = self.lhs
        return copy

it works for this kind of operation

UpsertConflict.objects.bulk_create(
        updated_items,
        update_conflicts=True,
        update_fields=["name", "rank"],
        unique_fields=["number"],
        condition=Q(rank__gt=Excluded("number") ) & ~Q(name__startswith="A") | Q(number__gt=3)
    )

but not for

UpsertConflict.objects.bulk_create(
        updated_items,
        update_conflicts=True,
        update_fields=["name", "rank"],
        unique_fields=["number"],
        condition=Q(rank__gt=Excluded("number") +1) & ~Q(name__startswith="A") | Q(number__gt=3)
    )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remind you that it's on the sqlite backend that the problem lies.

Why Excluded is not required on PostgreSQL?

Why do we need Excluded to be a subclass of F?

Because it's natural, the only difference between F and Excluded is that you want to use EXCLUDED instead of a table name.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remind you that it's on the sqlite backend that the problem lies.

Why Excluded is not required on PostgreSQL?

Why do we need Excluded to be a subclass of F?

Because it's natural, the only difference between F and Excluded is that you want to use EXCLUDED instead of a table name.

allow me to submit my current work for further review and discussion. i feel there is a lot to be said for the keyword "Excluded".
Besides, I think there must be a serious problem under the hood of F. We'll talk about it later.

Copy link
Copy Markdown
Contributor

@sarahboyce sarahboyce Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a suggestion, you might want to reduce the scope of this PR to getting condition into bulk_create which supports F and Q expressions. Then once that's landed, you can create a new PR with adding Excluded support. - was not a good suggestion sorry

@HamaBarhamou I would advise to pursue Mariusz suggestions and subclass F. I hear you're saying there are some test failures but we can update the solution as we go and figure those out.
Mariusz has years and years of Django ORM experience. I've always ended with something that has a much cleaner design after applying his feedback. We're aiming for something intuitive and maintainable as well as well functioning. So although you might feel like this is a backwards step, it might be one step back to go two steps forward.

Trying something else doesn't mean this doesn't work - we can always come back to this. ❤️

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a suggestion, you might want to reduce the scope of this PR to getting condition into bulk_create which supports F and Q expressions. Then once that's landed, you can create a new PR with adding Excluded support. - was not a good suggestion sorry

@HamaBarhamou I would advise to pursue Mariusz suggestions and subclass F. I hear you're saying there are some test failures but we can update the solution as we go and figure those out. Mariusz has years and years of Django ORM experience. I've always ended with something that has a much cleaner design after applying his feedback. We're aiming for something intuitive and maintainable as well as well functioning. So although you might feel like this is a backwards step, it might be one step back to go two steps forward.

Trying something else doesn't mean this doesn't work - we can always come back to this. ❤️

I agree with you @sarahboyce . But my problem is that I need a solution that works, which is not the case with @felixxm's proposal, and I had already implemented this approach without success. Normally the simple fact of inheriting from F should allow me to do operations like Ecluded('rank')+1. Perhaps this is a deeper problem that requires a deeper understanding of F. I'll try again to solve the problem with Ecluded as a subclass of F.
But if it doesn't work, I can't see the point of leaving a solution that does work to take a solution that has proven limits.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixxm , @sarahboyce this approach seems to work.

class Excluded(F):
    def resolve_expression(self, *args, **kwargs):
        super().resolve_expression(*args, **kwargs)
        self.output_field = None
        return self

    def as_sql(self, compiler, connection):
        if not connection.features.supports_excluded_expression:
            raise NotSupportedError(
                f"{connection.vendor} doesn't support partial updating rows on "
                "conflicts during INSERT"
            )
        quoted_field_name = compiler.quote_name_unless_alias(self.name)
        sql = f"EXCLUDED.{quoted_field_name}"
        return sql, []

    def __repr__(self):
        return f"{self.__class__.__name__}({self.name})"

    @cached_property
    def output_field(self):
        """Return the output type of this expressions."""
        output_field = self._resolve_output_field()
        if output_field is None:
            self._output_field_resolved_to_none = True
            raise FieldError("Cannot resolve expression type, unknown output_field")
        return output_field

    @cached_property
    def _output_field_or_none(self):
        """
        Return the output field of this expression, or None if
        _resolve_output_field() didn't return an output type.
        """
        try:
            return self.output_field
        except FieldError:
            if not self._output_field_resolved_to_none:
                raise

I only copied the functionsdef output_field(self)and def _output_field_or_none(self) from the BaseExpression class into Excluded.
knowing that Excluded inherits from F and F from Combinable and Combinable from nothing, and that BaseExpression doesn't intervene in the inheritance sequence, I wonder why Excluded needs output_field to work when it shouldn't.
But hey, it works, so I'm fine with it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise NotImplementedError(
raise NotSupportedError(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is untested.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is untested.

I don't understand. What hasn't been tested?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to say it clearer 🤔 This line is untested. Raised error is untested.

Copy link
Copy Markdown
Author

@HamaBarhamou HamaBarhamou Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've rethought this approach. i'll update within 24 hours. hope you enjoy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing to reconsider. Tests are required for new lines.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing to reconsider. Tests are required for new lines.

I'll make a note.

Copy link
Copy Markdown
Author

@HamaBarhamou HamaBarhamou Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @felixxm i'm looking for a way to reproduce exactly the production environment on my machine for test execution. would you have a suggestion.

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 4eeaba2 to 43dece9 Compare December 21, 2023 11:10
@HamaBarhamou
Copy link
Copy Markdown
Author

HamaBarhamou commented Dec 21, 2023

Hello team @ngnpope, @felixxm, and @shangxiao 🙌,

I'm ready for another review. 🚀

I would like to bring to your attention the keyword EXCLUDED introduced in Django. I'm wondering if we should consider renaming EXCLUDED to avoid confusion with the EXCLUDED keyword specific to SQLite and PostgreSQL. From Django's perspective, EXCLUDED refers to the fields of new rows being inserted into the database, which implies using backend-specific keywords (like EXCLUDED, VALUES, NEW...). Hence, it seems necessary to clearly distinguish EXCLUDED in the Django context from EXCLUDED in the backend context.

What are your thoughts on this? Your insights and suggestions would be greatly appreciated. 😊👍

Thanks for your time and continued collaboration! 🌟

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it directly to the Excluded.as_sql() as I requested:

    def as_sql(self, compiler, connection):
        if not connection.features.supports_update_conflicts_with_condition:
            raise NotSupportedError(
                f"{connection.vendor} doesn't support partial updating rows on "
                "conflicts during INSERT"
            ) 
        ...

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 46361a4 to 9d9e606 Compare December 22, 2023 19:36
@HamaBarhamou
Copy link
Copy Markdown
Author

I'm ready for a new revision

@HamaBarhamou
Copy link
Copy Markdown
Author

Hi @ngnpope @kezabelle , don't forget to check the "Patch needs improvement:" box if you've finished the revision.

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 1df4eb4 to b8b83c8 Compare March 12, 2024 15:14
@HamaBarhamou
Copy link
Copy Markdown
Author

Hi @ngnpope ready for another revision

@nessita
Copy link
Copy Markdown
Contributor

nessita commented Mar 12, 2024

Hi @ngnpope ready for another revision

Hello @HamaBarhamou! Thank you for your continued work in this branch. I wanted to mention two things:

  1. I believe that Nick asked you to please squash all commits into one. This does not seem to have done. If you have any doubts, this post is a good resource to learn.
  2. There is no need to mention people by nickname when a PR is ready for a re-review, reviewers will get to this when they have some availability. The proper procedure to get a re-review is described in these docs. You just need to unset the relevant ticket flags so this PR gets added back to the "patches needing review" section of the Django Developer Dahsboard.

You already did (2) which is great, but you'd still need to do the first request (rebase and squash). I'll set the ticket to "patch needs improvement" again just so we reflect the latest status following the reviewers comments.

@HamaBarhamou
Copy link
Copy Markdown
Author

  1. I believe that Nick asked you to please squash all commits into one. This does not seem to have done. If you have any doubts, this post is a good resource to learn.

Thank you for the resource

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from b8b83c8 to 3284392 Compare March 13, 2024 09:42
@HamaBarhamou
Copy link
Copy Markdown
Author

Ready for another revision

Comment on lines +1743 to +1747
if not getattr(self.query.condition, "conditional", False):
raise ValueError(
"The 'condition' parameter of bulk_create must be a conditional "
"expression, an instance of Q, F, or Excluded."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is untested

Comment on lines +1750 to +1752
compiler = query.get_compiler(connection=self.connection)
node = query.where
sql, params = compiler.compile(node)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compiler = query.get_compiler(connection=self.connection)
node = query.where
sql, params = compiler.compile(node)
compiler = query.get_compiler(self.using, self.connection)
sql, params = compiler.compile(query.where)

This is still required from this comment.

Comment on lines +2481 to +2487
The ``Excluded`` class, extends Django's ``F``
class and is specifically designed to handle conflict resolution in bulk insert
operations. Unlike ``F`` expressions, which reference fields in the existing
database, ``Excluded`` refers to values in the proposed data for insertion.
Moreover, similar to ``F`` expressions, ``Excluded`` supports arithmetic and logical
operations, making it particularly useful in ``bulk_create(..., update_conflicts=True)``
scenarios where complex conditions are required.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably document Excluded separately in docs/ref/models/expressions.txt
Then we can have people link to the Excluded docs specifically
This would have a versionadded note

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably document Excluded separately in docs/ref/models/expressions.txt Then we can have people link to the Excluded docs specifically This would have a versionadded note

who are the people who need to create this link?
do you have any resources to share with me for this task?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is written in reStructuredText and if you read Django's guide on writing documentation this also tells you how to reference and link things
One of the things I do is check how something similar is documented, find the source code for those docs and use that as a template. 👍

Comment on lines +302 to +308
# Does the backend support partial updating rows on conflicts during
# INSERT?
supports_update_conflicts_with_condition = False

# Does the backend support the special "excluded" table used to reference
# the values originally proposed for insertion in ON CONFLICT DO UPDATE?
supports_excluded_expression = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two flags or can these be the same flag?
In other words, can we just use supports_update_conflicts_with_condition everywhere?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better separate as there is no guarantee that there will be this special table in other implementations. For example, if MySQL/MariaDB had supported conditional updates, we could have made the following change:

diff --git a/django/db/backends/mysql/operations.py b/django/db/backends/mysql/operations.py
index 7fabb0e05e..5bbed93a15 100644
--- a/django/db/backends/mysql/operations.py
+++ b/django/db/backends/mysql/operations.py
@@ -435,18 +435,18 @@ class DatabaseOperations(BaseDatabaseOperations):
     def on_conflict_suffix_sql(self, fields, on_conflict, update_fields, unique_fields):
         if on_conflict == OnConflict.UPDATE:
             conflict_suffix_sql = "ON DUPLICATE KEY UPDATE %(fields)s"
             # The use of VALUES() is deprecated in MySQL 8.0.20+. Instead, use
             # aliases for the new row and its columns available in MySQL
             # 8.0.19+.
             if not self.connection.mysql_is_mariadb:
                 if self.connection.mysql_version >= (8, 0, 19):
-                    conflict_suffix_sql = f"AS new {conflict_suffix_sql}"
-                    field_sql = "%(field)s = new.%(field)s"
+                    conflict_suffix_sql = f"AS EXCLUDED {conflict_suffix_sql}"
+                    field_sql = "%(field)s = EXCLUDED.%(field)s"
                 else:
                     field_sql = "%(field)s = VALUES(%(field)s)"
             # Use VALUE() on MariaDB.
             else:
                 field_sql = "%(field)s = VALUE(%(field)s)"
 
             fields = ", ".join(
                 [

This would have effectively allowed the excluded expression to work, but only for MySQL 8.0.19+. I realise this is hypothetical...

Comment on lines +992 to +995
@skipUnlessDBFeature(
"supports_update_conflicts", "supports_update_conflicts_with_condition"
)
@skipIfDBFeature("supports_excluded_expression")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@skipUnlessDBFeature(
"supports_update_conflicts", "supports_update_conflicts_with_condition"
)
@skipIfDBFeature("supports_excluded_expression")
@skipUnlessDBFeature(
"supports_update_conflicts_with_condition",
"supports_excluded_expression",
)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping these decorators is intentional: this test asserts that Excluded is not supported, so it only runs on backends that support conditional upserts but do not support Excluded; backends that do support Excluded must skip it

@HamaBarhamou
Copy link
Copy Markdown
Author

Hello,

I apologize for the recent silence, I'm currently busy. I just wanted to know how to proceed with the docs/releases/5.1.txt file, since Django 5.1 is out. Should I transfer my work to docs/releases/5.2.txt ?

@sarahboyce
Copy link
Copy Markdown
Contributor

I apologize for the recent silence, I'm currently busy. I just wanted to know how to proceed with the docs/releases/5.1.txt file, since Django 5.1 is out. Should I transfer my work to docs/releases/5.2.txt ?

Yes that's right ✅️ and no need to apologize

@rustybrooks
Copy link
Copy Markdown

Hello,

I was considering adding this feature myself, but looked to see if there was a PR in progress, and I'm glad to see that there is

Is there any update on the progress of this PR being accepted? It would be a great improvement!

@HamaBarhamou
Copy link
Copy Markdown
Author

HamaBarhamou commented Apr 17, 2025

Bonjour,

J'envisageais d'ajouter cette fonctionnalité moi-même, mais j'ai regardé s'il y avait une PR en cours, et je suis heureux de voir qu'il y en a une

Avez-vous des nouvelles de l'avancement de cette demande de participation ? Ce serait une grande amélioration !

thank you for your interest in this branch. i didn't have enough time, i admit, but i'm planning to start working on it again soon

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 3284392 to ade87ab Compare September 24, 2025 23:49
@HamaBarhamou
Copy link
Copy Markdown
Author

HamaBarhamou commented Sep 26, 2025

docs/releases/5.1.txt

hi @sarahboyce.
docs/releases/6.0.txt or docs/releases/6.1.txt ?

@ngnpope
Copy link
Copy Markdown
Member

ngnpope commented Sep 26, 2025

Hi. Sarah is away on maternity leave.

We've now hit feature freeze for 6.0 and the first alpha has been released, so this should now target 6.1 🙂

@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 4a769d9 to 40e354d Compare September 26, 2025 14:03
Introduced a new `condition=` parameter for QuerySet.bulk_create() and
abulk_create() that appends a WHERE clause to ON CONFLICT DO UPDATE on
backends that support it (PostgreSQL and SQLite ≥ 3.42).

Also added the Excluded(name) expression (a subclass of F) to reference
the proposed row values (SQL EXCLUDED.*) within conflict conditions.

Backend feature flags:
- BaseDatabaseFeatures.supports_update_conflicts_with_condition
- BaseDatabaseFeatures.supports_excluded_expression

Updated docs for bulk_create (condition=) and added a new section for the
Excluded expression. Added synchronous and asynchronous tests covering Q,
F, and Excluded conditions.

When `condition` is None, behavior is unchanged; existing semantics for
update_conflicts/unique_fields are preserved.
…ion; adjusted tests.

- Added Excluded() docs in ref/models/expressions.
- Documented condition= for bulk_create in querysets and release notes (5.1 & 6.0).
- Added/updated tests for conditional updates and Excluded in bulk_create.
@HamaBarhamou HamaBarhamou force-pushed the feature/bulk-create-where-clause branch from 40e354d to f742709 Compare September 26, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants