Supporting _include:iterate and _revinclude:iterate #1314
Supporting _include:iterate and _revinclude:iterate #1314limorl merged 24 commits intomicrosoft:masterfrom
Conversation
…ression_WhenSearchedWithPost_ThenCorrectBundleShouldBeReturned() due to count summary
|
FYI @eladiw |
|
This pull request introduces 1 alert when merging 98de3ca into 2e8c1b6 - view on LGTM.com new alerts:
|
eladiw
left a comment
There was a problem hiding this comment.
Great work.
I had some questions and comments.
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Outdated
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Outdated
Show resolved
Hide resolved
| <ItemGroup> | ||
| <ProjectReference Include="..\Microsoft.Health.Fhir.SqlServer\Microsoft.Health.Fhir.SqlServer.csproj" /> | ||
| <ProjectReference Include="..\Microsoft.Health.Fhir.Core\Microsoft.Health.Fhir.Core.csproj" /> | ||
| <ProjectReference Include="..\Microsoft.Health.Fhir.Stu3.Core\Microsoft.Health.Fhir.Stu3.Core.csproj" /> |
There was a problem hiding this comment.
Is it possible to avoid the version specific reference here?
There was a problem hiding this comment.
That was the only way I could get it working. Had to add it when added the new IncludeRewriterTests.cs file.
Do you have any suggestion? I'd be happy to give it a try.
There was a problem hiding this comment.
Suggestion above. I'm assuming this is to fulfill the requirement for ModelInfoProvider?
|
This pull request introduces 2 alerts when merging c38223b into 6468d57 - view on LGTM.com new alerts:
|
src/Microsoft.Health.Fhir.Core/Features/Search/Expressions/Parsers/ExpressionParser.cs
Outdated
Show resolved
Hide resolved
| return new IncludeExpression(resourceType, refSearchParameter, originalType.ToString(), targetType, wildCard, isReversed); | ||
| referencedTypes = new List<string>(); | ||
| _searchParameterDefinitionManager.GetSearchParameters(resourceType) | ||
| .Where(p => p.Type == ValueSets.SearchParamType.Reference).ToList() |
There was a problem hiding this comment.
This looks pretty allocation-heavy. Also, it seems like something we could cache (and populate lazily).
There was a problem hiding this comment.
Agree. The challenge was to propagate the searchParameterDefinitionManager to the IncludeExpression, so that it will be used upon need. That didn't make much sense to me since IncludeExpression represents a parsed expression, so ideally searchParameterDefinitionManager shouldn't be used in it.
Maybe we can use a delegate that is applied once when ReferencedTypes is used and populates it. WDYT?
There was a problem hiding this comment.
Actually, after giving it another thought, It’s worth noting that it will always be used when the expression is _include wildcard since it adds the potential reference types to the cte dictionary to be used later. .
So I suggest to at least populate it for wildcard paramegers only.
WDYT?
src/Microsoft.Health.Fhir.Core/Features/Search/Expressions/IncludeExpression.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Core/Features/Search/Expressions/Parsers/ExpressionParser.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
| // Order the following query: | ||
| // [base]/MedicationDispense?_include:iterate=Patient:general-practitioner&_include:iterate=MedicationRequest:patient&_include=MedicationDispense:prescription&_id=smart-MedicationDispense-567 | ||
|
|
||
| var refSearchParameter = _searchParameterDefinitionManager.GetSearchParameter("Patient", "general-practitioner"); |
There was a problem hiding this comment.
One suggestion I have for avoiding references to SPDM is to just create the Search Parameter you need (e.g. new SearchParameterInfo("_id", SearchParamType.Token, new Uri(ResourceId))). If this doesn't work let me know, I'll create a task to follow up
There was a problem hiding this comment.
It's more challenging to create SearchParameterInfo since I also need to populate the URI and the target types, etc. I think it's better to use SPDM, since this is how it is being used in production code and it populates all the fields correctly. Otherwise, we may miss populating one of the fields and we won't be testing with the right entities.
WDYT?
There was a problem hiding this comment.
I do agree in principle, however, by only linking to Stu3 it misses R4 and R5 Search parameter definitions which may be different. Keeping the DataLayer projects free of version specific references has been very helpful with maintaining multiple versions of FHIR. If we want the full coverage in all versions of FHIR would it make sense to move these to https://github.com/microsoft/fhir-server/tree/master/test/Microsoft.Health.Fhir.Shared.Tests.Integration?
I know its a SQL Specific rewriter, but there isn't a SQL unit test project that offers this coverage, thoughts? //cc @johnstairs
There was a problem hiding this comment.
The way I see it, the R4 and R5 parameter definitions are not needed for this uni tests since we generated the relevant test cases for the unit tests using Stu3 parameters. We are not checking all the possible parameters and their version fit, we are checking a sorting algorithm... we need a few parameters (not all of them)
| <ItemGroup> | ||
| <ProjectReference Include="..\Microsoft.Health.Fhir.SqlServer\Microsoft.Health.Fhir.SqlServer.csproj" /> | ||
| <ProjectReference Include="..\Microsoft.Health.Fhir.Core\Microsoft.Health.Fhir.Core.csproj" /> | ||
| <ProjectReference Include="..\Microsoft.Health.Fhir.Stu3.Core\Microsoft.Health.Fhir.Stu3.Core.csproj" /> |
There was a problem hiding this comment.
Suggestion above. I'm assuming this is to fulfill the requirement for ModelInfoProvider?
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
|
It looks like the link url in the bundle is leaving out the GET https://localhost:44348/MedicationRequest?_include=MedicationRequest:patient&_include:iterate=Patient:general-practitioner&_id=123{
"resourceType": "Bundle",
"id": "949805aa-9bc2-4195-b581-98df15005424",
"meta": {
"lastUpdated": "2020-10-09T20:29:01.8167293+00:00"
},
"type": "searchset",
"link": [
{
"relation": "self",
"url": "https://localhost:44348/MedicationRequest?_include=MedicationRequest%3Apatient&_id=123"
}
]
} |
src/Microsoft.Health.Fhir.Shared.Core/Features/Search/SearchOptionsFactory.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.SqlServer/Features/Search/Expressions/Visitors/IncludeRewriter.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.SqlServer/Features/Search/Expressions/Visitors/IncludeRewriter.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.SqlServer/Features/Search/Expressions/Visitors/IncludeRewriter.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Health.Fhir.SqlServer/Features/Search/Expressions/Visitors/IncludeRewriter.cs
Outdated
Show resolved
Hide resolved
@johnstairs good catch! I removed them from the unsupported parameters list and added validation tests. |
…erBy ascending instead of descending
…t showing up in the bundle url
|
Parenthesis indentation is inconsistent (let's follow the style of cte7 AS
(
SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial
FROM cte6
),
cte8 AS
(
SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
FROM dbo.ReferenceSearchParam refSource
INNER JOIN dbo.Resource refTarget
ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
AND refSource.ReferenceResourceId = refTarget.ResourceId
WHERE refSource.SearchParamId = @p11
AND refTarget.IsHistory = 0
AND refSource.IsHistory = 0
AND refSource.ResourceTypeId = @p12
AND refSource.ResourceSurrogateId IN (SELECT TOP(@p13) Sid1 FROM cte3)
),
cte9 AS
(
SELECT DISTINCT TOP (@p14) Sid1, IsMatch, CASE WHEN count(*) over() > @p15 THEN 1 ELSE 0 END AS IsPartial
FROM cte8
),
cte10 AS
(
SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
FROM dbo.ReferenceSearchParam refSource
INNER JOIN dbo.Resource refTarget
ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
AND refSource.ReferenceResourceId = refTarget.ResourceId
WHERE refSource.SearchParamId = @p11
AND refTarget.IsHistory = 0
AND refSource.IsHistory = 0
AND refSource.ResourceTypeId = @p12
AND refSource.ResourceSurrogateId IN (SELECT TOP(@p16) Sid1 FROM cte7)
), |
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
...lth.Fhir.SqlServer/Features/Search/Expressions/Visitors/QueryGenerators/SqlQueryGenerator.cs
Show resolved
Hide resolved
@johnstairs Yes, this is because SQLQueryGenerator uses Microsoft.Health.SqlServer.IndentedStringBuilder which has no option to un-indent (removing chars from the end doesn;t work and is quite ugly). Note that when calling recursively to AcceptVisitor, in the case where we are iterating over multiple results sets, that is done within an already indented expression sue to the main call: |
True, but I thought we agreed it will be in a separated PR. We even have an open issue for it: #1309 |
|
Regarding the indentation, let's revisit later. We can expose the indentation level as a property on the writer, but I think I would look at an approach that does not require recursion. But later... |
|
It looks like we still are applying a TOP clause on the intersection for non-iterate includes: DECLARE @p0 SmallInt = 77;
DECLARE @p1 VarChar(64) = '1';
DECLARE @p2 Int = 11;
DECLARE @p3 SmallInt = 828;
DECLARE @p4 Int = 10;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
WITH cte0 AS
(
SELECT ResourceSurrogateId AS Sid1
FROM dbo.Resource
WHERE IsHistory = 0
AND IsDeleted = 0
AND ResourceTypeId = @p0
AND ResourceId = @p1
),
cte1 AS
(
SELECT DISTINCT TOP (@p2) Sid1, 1 AS IsMatch, 0 AS IsPartial
FROM cte0
ORDER BY Sid1 ASC
),
cte2 AS
(
SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
FROM dbo.ReferenceSearchParam refSource
INNER JOIN dbo.Resource refTarget
ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
AND refSource.ReferenceResourceId = refTarget.ResourceId
WHERE refSource.SearchParamId = @p3
AND refTarget.IsHistory = 0
AND refSource.IsHistory = 0
AND refSource.ResourceTypeId = @p0
AND refSource.ResourceSurrogateId IN (SELECT TOP(@p4) Sid1 FROM cte1)
),
cte3 AS
(
SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial
FROM cte2
),
...I don't know if that's new from this PR... |
It wasn't introduced by this PR. I only removed the TOP for iterate expressions since they select from an IncludeLimit cte. I think we should change the non-iterate Top in a separate PR. The include limit feature is not tested AFAIK and there's a bit of a challenge to test it by providing a test configuration. It deserves attention. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@johnstairs The build seems to fail on issues not related to the PR changes, but rather environment setup. |
|
@limorl Sorry, your PR environment got stuck in a bad state due to a provisioning issue we were having last week. I'm cleaning up the environment and will kick off a new build. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| private class IncludeExpressionTopologicalSort | ||
| { | ||
| // Based on Khan's algorithm. See https://en.wikipedia.org/wiki/Topological_sorting. | ||
| // The search queries are acyclic. |
There was a problem hiding this comment.
How do we know that they are acyclic?
There was a problem hiding this comment.
Perhaps a better way to say it was that we are not supporting cyclic queries (cyclic queries are actually recursive queries). According to the Fhir Team, it's up to the server implementor to decide how deep.
We agreed that recursive queries are out of scope.
If it is a cyclic query (recursive query), we can throw an error. It's really a product decision.
There was a problem hiding this comment.
Yes, I think we should throw an error if we encounter a cycle.
| { | ||
| // Based on Khan's algorithm. See https://en.wikipedia.org/wiki/Topological_sorting. | ||
| // The search queries are acyclic. | ||
| public static IList<TableExpression> Sort(IEnumerable<TableExpression> includeExpressions) |
There was a problem hiding this comment.
The complexity of this implementation is pretty high. My thought would have been to think of it in this way:
Suppose you have a query like
/MedicationDispense?_include=MedicationDispense:context&_include:iterate=Encounter:appointment&_include:iterate=Encounter:based-on&_include:iterate=Appointment:patient&_include:iterate=ServiceRequest:specimen
I would model it in this way:
The edges arise from Requires to Produces, indicating a dependency. I would then do a DFS to create a topological sort. You would need a dictionary mapping a type name to the nodes that "produce it".
What am I missing?
There was a problem hiding this comment.
That's pretty much what I did. Only that I was looking at all combinations of _include and _revinclude pairs.
Note that for _revinclude , the Requires and Produces are reversed. For example _revinclude=MedicationDispense:prescription requires MedicationRequest and produces MedicationDispense.
So basically, the direction of the edge is determined by the IsDependencyEdge() function.
The runtime of the topological sort (Khan's algorithm) is O(V+E) and the implementation is simple IMO.
And building the graph was O(V^2).
There was a problem hiding this comment.
What we could do to make it cleaner and more intuitive is basically add IReadOnlyCollection Requires and IReadOnlyCollection Produces properties to IncludeExpression. This way IsDependencyEdge(x,y) is replaced with asking if x.Requires.Intersection(y.Produces).Any().
You can see how it looks like on my private branch: limorl@4e9a31d
The sort itself is simple enough and of the same complexity of DFS O(V+E), after the graph is built.
Are you ok with my suggestion above or should we keep as is?
There was a problem hiding this comment.
That is a lot clearer. I've expanded on it to show you what I had in mind: ca25c8b. (Quickly implemented - would not claim it's properly tested. Also would need to add a proper error message for the cycle detection and a test)
There was a problem hiding this comment.
Nice commit! I like the a AssertAppearsBefore, I was thinking about implementing it, but it wasn't needed in most tests after all
I'll adopt some of your changes, but continue with Khan's implementation if that's ok.
I will add the error.
…lude iterate queries
| } | ||
| } | ||
| } | ||
| throw new InvalidSearchOperationException("Cyclic include iterate queries are not allowed"); |
There was a problem hiding this comment.
"not supported". And please put the string in a resx file.
There was a problem hiding this comment.
Oops.. Saw that after I merged.
The reason I didn;t add the string to resx is that I wanted to be consistent with how all the Rewriters throw InvalidSearchOperationException.
Will fix in a new PR
Description
The :iterate modifier (:recurse in STU3) allows to query iteratively for resources.
For example, returning MedicationRequest ‘mr123’, and the Patient resources it references and those patient's Practitioner:
/MedicationRequest?_include=MedicationRequest:patient&_include:iterate=Patient:general-practitioner&_id=mr123The solution is based mainly on updating SqlQueryGenerator to store a dictionary of result resource type and the cte names associated with them. This way, when iterating over results, the selection is done from the relevant cte (or ctes in case of multiple results sets).
For this to work, the IncludeRewriter should have been updated to sort the search parameters so that an _include/_revinclude parameter will appear after all the expressions yielding the result sets it selects from.
The feature doesn't fully implement the spec and some capabilities were left out following a discussion with the PG.
In scope of this PR:
Supporting _include:iterate expressions, with single/multiple iterations and with wildcards.
For example:
/MedicationRequest?_include=MedicationRequest:patient&_include:iterate=Patient:general-practitioner&_id=mr123MedicationRequest?_include:MedicationRequest:*&_include:iterate=Patient:general-practitionerMedicationRequest&_include:MedicationRequest:patient&_include:iterate=Patient:* MedicationRequest?_include:MedicationRequest:*&_include:iterate=Patient:*Supporting _revinclude:iterate expressions with single/multiple iterations and with wildcard.
Practitioner?_revinclude=Patient:*&_revinclude:iterate=MedicationRequest:patientPractitioner?_revinclude:iterate=MedicationRequest:*&_revinclude=Patient:general-practitionerPatient?_revinclude:iterate=MedicationDispense:*&_revinclude=MedicationRequest:*Out of scope of this PR:
Support recursive _include:iterate and _revinclude:iterate queries #1310
Recursive iterate expressions are expression where the Source and the Target are of the same type. For example:
Organization?_include:iterate=Organization:partof Patient&_include=Patient:organization&_include:iterate=Organization:partofSupporting _revinclude:iterate expressions where there is more than a single target type and the target type is not specified:
Unsupported:
Patient?_revinclude=MedicationRequest:patient&_revinclude:iterate=MedicationDispense:performerSupported:
Patient?_revinclude=MedicationRequest:patient&_revinclude:iterate=MedicationDispense:performer:patientSupported:
Patient?_revinclude=MedicationRequest:patient&_revinclude:iterate=MedicationDispense:prescription (since prescription can be only of type MedicationRequest)_revinclude:iterate with WildCard. If the wildcard is associated with the _revinclude:iterate parameter, it is not supported. However, _revinclude:iterate can be used to iteratively query a results set of a wildcard expression:
Unsupported:
Practitioner?_revinclude=Patient:general-practitioner&_revinclude:iterate=MedicationRequest:*Supported: Practitioner?_revinclude=Patient:*&_revinclude:iterate=MedicationRequest:patient
Limit the number of included results for _include, and not only for _revinclude #1309
The current implementation limits the item count of each included result set. In case where _include:iterate (or _revinclude:iterate) is applied on more than one result set, each application is limited.
For example:
MedicationDispense?_include=MedicationDispense:patient&_include=MedicationDispense:prescription&_include:iterate=MedicationRequest:patient&_include:iterate=Patient:general-practitionerHere, '_include:iterate=Patient:general-practitioner' refers to two included result sets (ctes):
MedicationDispense:patient' andMedicationRequest:patient`. So each selection will be limited.Related issues
Addresses #1313
Testing