-
Notifications
You must be signed in to change notification settings - Fork 256
feat: initial top-level structure of PropertyGraph #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial top-level structure of PropertyGraph #613
Conversation
src/main/scala/org/graphframes/propertygraph/property/EdgePropertyGroup.scala
Outdated
Show resolved
Hide resolved
src/main/scala/org/graphframes/propertygraph/property/PropertyGroup.scala
Outdated
Show resolved
Hide resolved
src/main/scala/org/graphframes/propertygraph/property/VertexPropertyGroup.scala
Outdated
Show resolved
Hide resolved
src/main/scala/org/graphframes/propertygraph/property/VertexPropertyGroup.scala
Outdated
Show resolved
Hide resolved
src/main/scala/org/graphframes/propertygraph/property/EdgePropertyGroup.scala
Outdated
Show resolved
Hide resolved
src/main/scala/org/graphframes/propertygraph/PropertyGraphFrame.scala
Outdated
Show resolved
Hide resolved
|
@james-willis It looks like I addressed all the comments, thanks a lot for the feedback. Could you take another look? |
src/main/scala/org/graphframes/propertygraph/property/EdgePropertyGroup.scala
Outdated
Show resolved
Hide resolved
|
Added:
|
src/main/scala/org/graphframes/propertygraph/PropertyGraphFrame.scala
Outdated
Show resolved
Hide resolved
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #613 +/- ##
==========================================
- Coverage 87.82% 80.83% -6.99%
==========================================
Files 22 30 +8
Lines 1092 1320 +228
Branches 124 166 +42
==========================================
+ Hits 959 1067 +108
- Misses 133 253 +120 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Changes from the latest review round
|
|
cc: @SauronShepherd , @Kimahriman |
|
Just for the simplicity / context: I made a human-readable description with diagram of what is PropertyGraph in my recent blogpost. And this PR provides an implementation exactly following that description of the concept. |
What changes were proposed in this pull request?
An initial and top level structures for the future
PropertyGraphFramemodule.Let's imagine we have a property graph that represent the graph of legal entities for the needs of AML.
In that case our vertices can be:
And edges can be:
The core idea is relying on three classes:
PropertyGraphFrameEdgePropertyGroupVertexPropertyGroupEach property group contains it's own data as
DataFrameand metadata.VertexPropertyGroupcontains data, related to a group of vertices. For example, for entities it can some metadata, like name, assets, legal form, history of AML violations, etc. For persons it may something else, like name, surname.EdgePropertyGroupfor example, for an edge group entity-pay-entity is a weighted edge group with directed edges. And for a case two entities share one board member (person) it is undirected and unweighted edge.PropertyGraphFrameis just a sequence of edge and vertex groups with an ability to get aGraphFrameobject and call algos like clustering, shortest pathes, etc.Why are the changes needed?
GraphFrameAPI is very low-level. It is really hard to realize where edges are directed, where edges are undirected. How to construct it, etc. It is nice for library devs, but it may be really hard to work with it for end users like analysts.PropertyGraphabstraction we are open to add a support of theOpenCypher/GremlinOverall: #602, #565 + some others.
IMPORTANT
Like Spark structures are based on
Parquet, I mostly base the proposed structure on Apache GraphAr (incubating) as the only known to me "open-table" format for property graphs!DISCLAIMER
At the same time, I'm PPMC in GraphAr and there is a potential conflict of interests!
I do not want to write more code before discussing the overall concept and idea. When we are done, I will continue the work by providing additional methods to the API.