Skip to content

Precompiled queries #25009

@roji

Description

@roji

This tracks the precompiled query feature, where EF generates interceptors at publish time to intercept static LINQ query operators and execute the query directly, without going through compilation. This:

  • NativeAOT: this is (currently) a prerequisite to NativeAOT support, since we're not (yet) going to make query compilation NativeAOT-compatible (this may be done in 10 to support dynamic queries).
  • Improve query runtime: no more parameter extraction, cache lookup, etc. A bit similar to compiled queries but goes even further, and without requiring the user to use any special APIs.
  • Reduced startup time: no more EF query compilation.

Note that although this is a prerequisite to NativeAOT support, precompiled queries can be used in non-NativeAOT applications to get the above performance benefits (faster execution and startup time).

The main subtasks are tracked on the general NativeAOT epic.

  • Integrate precompilation into dotnet publish, and possibly also into dotnet ef for manual precompilation.
    • Note that we'll need to be able to raise warnings from this step (e.g. some query failed compilation, or a dynamic query was detected), and to cause the publish to fail (warnings as errors).

** PREVIOUS DESIGN THOUGHTS **

General design

When a LINQ query is first encountered, EF "compiles" it, producing a code-generated shaper, SQL (for relational databases), etc. This process is both a bit long (increasing startup times), and incompatible with AOT environments (since code generation is used at runtime). While several approaches have been discussed in the past to improve this (e.g. #16496), with the advent of source generators we have some new possibilities. I've done some work on a proof-of-concept source generator which identifies EF queries and precompiles them; the work is far from complete but indicates that the approach is feasible.

In a nutshell, we would:

  1. Identify a query in user source code
    • A first implementation would identify invocations of EF's compiled query API (EF.CompileQuery); this is trivial and low-risk way to immediately identify EF queries in the user's code.
    • We could later also attempt to precompile regular queries which don't use EF.CompileQuery. This would be an additional step in which we identify DbSets (as member accesses on a DbContext-typed identfier), and then walk up the syntax tree, progressively including methods as long as they accept IQueryable. Once we reach a method which doesn't accept IQueryable (e.g. ToList), we've reached the end of the query to be compiled.
    • Dynamically-constructed queries wouldn't be supported.
  2. Transform the query to a LINQ expression tree
    • Once we have a Roslyn syntax tree representing a query (either from EF.CompileQuery or from a regular query), it needs to be transformed into a LINQ expression tree, which is what EF's query pipeline requires.
    • Unlike the Roslyn structures, LINQ expression trees refer to actual .NET types, MemberInfos, etc. We would therefore need to load the user's assembly (from the input compilation given to the source generator), and use reflection to load actual types from it (e.g. entity CLR types). See note on AssemblyLoadContext below.
  3. Compile the query with EF Core
    • Once we have a LINQ expression tree, we need to pass it to EF's query compiler. To do this:
      • We instantiate the user's DbContext type, using the parameterless constructor
      • Extract the IQueryCompiler service from it
      • Invoke the compiler, passing it the LINQ expression tree.
    • The output of this compilation is another LINQ expression tree, which instantiates e.g. a SingleQueryingEnumerable given a QueryContext. This output tree must not contain any compiled elements, e.g. the shaper must be present in non-compiled form. This would require some refactoring of the last parts of the query pipeline.
  4. Generate C# out of the compilation output
    • In the normal flow, the output LINQ expression tree is now compiled to produce a lambda (returning e.g. an enumerable given a QueryContext).
    • In the AOT flow, the expression tree would instead be outputted as C# code into a file emitted by the source generator. This generated code would be invoked by EF as part of startup, and would pre-populate its query cache.
    • This would require writing a component to convert a LINQ expression tree to C# code - possibly passing through a Roslyn syntax tree for maximum flexibility etc..

The final code added by the source generator would look something like the following:

var selectExpression = ...;

var readColumns = ...;

var relationalCommandCache = new RelationalCommandCache(
    memoryCache,
    querySqlGeneratorFactory,
    RelationalParameterBasedSqlProcessFactory,
    selectExpression,
    readColumns,
    useRelationalNulls: false
);

var shaper = ...;

var enumerable = new SingleQueryingEnumerable<Blog>(
    (RelationalQueryContext)QueryCompilationContext.QueryContextParameter,
    relationalCommandCache,
    shaper,
    typeof(Blog),
    standAloneStateManager: false,
    detailedErrorsEnabled: false,
    threadSafetyChecksEnabled: true);

// Pre-populate EF Core's cache with the above enumerable

Additional notes:

  • The above does not cover relational command caching (including SQL), which depends on parameter nullability. This means that some query compilation still remains at runtime (but no code generation).
  • We may be able to reuse previously-precompiled queries if their source file hasn't change (e.g. store file hashes). This would make this feature suitable also for speeding up the developer inner loop.
  • Query precompilation isn't necessarily dependent on using compiled models (Reduce EF Core application startup time via compiled models #1906), though using that would speed the process up.
  • This could be helpful (thanks @bricelam)

EDIT: Following internal discussion it has become clear that doing this as a source generator isn't practical (see #25009 (comment) below). Instead, this would be a design-time CLI command or similar.

  • This would most likely be opt-in-only (via a csproj property), and probably makes most sense in Release builds.
  • When loading user assemblies (and their dependents), we probably want to isolate them in their own AssemblyLoadContext. This isn't trivial - we need to take Roslyn-provided syntax tree and semantic models (default assembly loader), transform them into an expression tree, and pass that into the query pipeline isolated inside the special AssemblyLoadContext. In my prototype, the default AssemblyLoadContext is used to avoid these issues.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions