-
Notifications
You must be signed in to change notification settings - Fork 257
feat: variable length pattern matching #685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
JFYI, we have a configured pre-commit hook |
|
@SemyonSinchenko Thanks for letting me know. It seems that it failed due to the formatting. I pushed the formatted result by the |
We have quite a strict rules for scalafix... Just add something like case _ => throw new GraphFramesUnreachableException(), should fix it. |
SemyonSinchenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @goungoun !
|
Thanks! @SemyonSinchenko. |
rjurney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add one more unit test.
|
There are two ways to handle quantified path patterns. Everyone wants to use the pattern such as |
This is excellent. Do you think you know how to implement variable length patterns in another PR? That is a highly desirable, requested feature! |
|
@rjurney I think |
|
I didn't implement the parser side, still tricky, but it is better to see the code to validate what it looks like. |
|
The variable pattern relies on the dataframe |
|
I've added more tests on the edge cases and checked the error message. Not supported: |
|
@rjurney @SemyonSinchenko, There were several commits to cover more cases and edge cases. It is not easy to track changes from the original branch. How can I make it easier for your review? Is it better to open a new PR with a new branch? |
At least if we merge the first PR from you, I won't need anymore to approve each CI run :) |
|
@goungoun I would like to mention it again, that we have a |
|
@SemyonSinchenko Thanks for your help. By the way, the named edge pattern from the Issue #539 is not supported. Not Supported: |
|
@goungoun Tbh Im fine with it. There will be another and more Cypher-like API built on top of the PropertyGraphFrame that will cover that. I would like to leave the GraphFrames pattern matching more for motiffs finding instead of MATCH-like queries. |
|
@rjurney what do you think? |
|
@SemyonSinchenko I referenced the Issue #539 and support named edge. If the edge is named e, it will generate _e1, _e2, _e3. Supported: Not Supported: |
rjurney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| withClue("Failed to catch parse error with unsupported variable length pattern") { | ||
| intercept[InvalidParseException] { | ||
| Pattern.parse("(u)-[*2..]->(v)") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh thank you for this clarification! After we merge this PR, want to take on variable lengths as well? :)
|
@goungoun please let us know if you need any support for more contributions! Thanks! |
|
@rjurney @SemyonSinchenko It is merged to the master! Thank you so much! |
Thanks for the contribution! |
What changes were proposed in this pull request?
This PR implements variable-length pattern matching. e.g.
(u)-[*3]->(v),(u)-[*1..3]->(v)https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf (page 15)
Why are the changes needed?
Users must manually write a long pattern without the feature. After the change, the parser can handle the pattern by creating intermediate vertices and edges. For example
(u)-[*3]->(v)is converted to(u)-[]->(_v1);(_v1)-[]->(_v2);(_v2)-[]->(v);internally. It also supports named edge(u)-[e*3]->(v)creating _e1, _e2, _e3.The variable length pattern is implemented on top of the fixed-length pattern. It is split into fixed-length pattern. For example
(u)-[*2..5]->(v)is(u)-[*2]->(v)or(u)-[*3]->(v)or(u)-[*4]->(v)or(u)-[*5]->(v). Then union all the results.