[#551] csv graph collection import and export#976
[#551] csv graph collection import and export#976ChrizZz110 merged 10 commits intodbs-leipzig:developfrom
Conversation
There was a problem hiding this comment.
Some comments about the code.
I think we should leave the "old" CSV format in the io package, maybe in a deprecated subpackage, but a utils.converter.deprecated.sources package doesn't seem fitting, since it's not actually a converter.
I would suggest:
- Keep the old CSV format in the
iopackage, make clear that it should not be used. - Add a converter from the old CSV format to the new.
- Change the name from
OldCSVDataSourceto something more fitting, likeCSVDataSourceWithoutGraphHeads(name is up for debate, but calling it "old" seems wrong, since it is currently the recommended format) - After this PR: Update Gradoop version to 0.6 since this is a major change.
- Maybe create an Issue for all those CSV-related changes to summarize all changes to this format.
| * | ||
| * The string needs to be encoded in the following format: | ||
| * | ||
| * edge-id;source-id;target-id;edge-label;value_1|value_2|...|value_n |
There was a problem hiding this comment.
Please put comments like this in a {@code ...} block
| * @param limit resulting array length | ||
| * @return tokens | ||
| */ | ||
| public String[] split(String s, int limit) { |
There was a problem hiding this comment.
This method can be protected, it should not be part of the API.
| * | ||
| * The string needs to be encoded in the following format: | ||
| * | ||
| * vertex-id;vertex-label;value_1|value_2|...|value_n |
| import org.gradoop.flink.io.impl.csv.metadata.MetaData; | ||
|
|
||
| /** | ||
| * /** |
There was a problem hiding this comment.
Please remove this line (duplicate block-comment start).
| * The datasource expects files separated by vertices and edges, e.g. in the following directory | ||
| * structure: | ||
| * | ||
| * csvRoot |
There was a problem hiding this comment.
Put those lines in a <pre><code> ... </code></pre> block to avoid unwanted newlines in Javadoc.
| mapWriter(record, label); | ||
| String label = ((CSVElement) record).getLabel(); | ||
| if (label.equals("")) { | ||
| throw new RuntimeException("IndexedCSVDataSink requires a label for every element."); |
There was a problem hiding this comment.
Maybe use a different Exception here, a RuntimeException seems too severe.
| return f4; | ||
| } | ||
|
|
||
| public void setLabel(String label) { |
There was a problem hiding this comment.
Please use @Override here, since you declared that method in a new interface.
(Also not part of this PR, but please add short Javadoc comments for the other methods of this class.)
| public interface CSVElement { | ||
| /** | ||
| * Returns the elements label. | ||
| * @return label |
There was a problem hiding this comment.
Missing Javadoc comment for label.
| f0 = id; | ||
| } | ||
|
|
||
| public String getLabel() { |
| @Test | ||
| public void testPageRankExecution() throws Exception { | ||
| LogicalGraph input = getSocialNetworkLoader().getLogicalGraphByVariable("g0"); | ||
| input.print(); |
p-f
left a comment
There was a problem hiding this comment.
When writing a LogicalGraph with a graph head having Properties, the Properties are written to CSV, but discarded in CSVDataSource.getLogicalGraph, because the ReduceCombination creates a new graph head.
| */ | ||
| @Override | ||
| public LogicalGraph getLogicalGraph() { | ||
| return getGraphCollection().reduce(new ReduceCombination()); |
There was a problem hiding this comment.
Using a ReduceCombination will create a new graph head and therefore discard graph head properties.
There was a problem hiding this comment.
The reduce combination used when reading a single logical graph is the way most data sources work.
I opened Issue #974 to discuss improvements. Regarding this pull request, I would keep it this way and change this behavior for all the affected data sources.
There was a problem hiding this comment.
Okay, I forgot about that issue. So yes, leave it like this.
|
Please update from develop and fix the conflict in |
…mat and minor fixes
019e261 to
b75fd03
Compare
…for complex property types and clean indexed CSV paths of illegal characters
|
…al graph import
…doop into #551_support_graph_collection
| import org.apache.flink.api.java.tuple.Tuple3; | ||
|
|
||
| /** | ||
| * Tuple representing an edge in a CSV file. |
There was a problem hiding this comment.
'an graphhead in a csv file'
|
Good job @2start and @timo95, after the docu fix i'll approve. |
ChrizZz110
left a comment
There was a problem hiding this comment.
Some small changes.
I do not like the pattern to move deprecated classes in a package called "deprecated". I think the usual way is to mark the classes with the deprecated annotation and let them in the package thy belong to. But if you agree with this pattern, we can leave it like this.
| /** | ||
| * Returns the path to the vertex file containing only vertices with the specified label. | ||
| * | ||
| * @param label vertex label |
There was a problem hiding this comment.
It's the graphHead label not the vertex label. Same for description 2 lines above.
| /** | ||
| * {@inheritDoc} | ||
| * | ||
| * graph heads will be disposed at the moment. The following issue attempts to provide |
There was a problem hiding this comment.
(1) Upper case letter at the beginning of a sentence.
(2) Update the Class-JavaDoc with the additional graphs.csv file
| } | ||
|
|
||
| /** | ||
| * {@inheritDoc} |
There was a problem hiding this comment.
No @inheritdoc if no additional description is provided
| import org.gradoop.flink.io.impl.csv.metadata.MetaData; | ||
|
|
||
| /** | ||
| * /** |
There was a problem hiding this comment.
In the current online branch, it is not fixed.
| * The string needs to be encoded in the following format: | ||
| * | ||
| * vertex-id;vertex-label;value_1|value_2|...|value_n | ||
| * {@code vertex-id;[graph-ids];vertex-label;value_1|value_2|...|value_n} |
There was a problem hiding this comment.
Line 25 -> Remove line with wrong comment definition
| import org.junit.Test; | ||
| import org.junit.rules.TemporaryFolder; | ||
|
|
||
| public class LogicalGraphCSVToCSVTest extends GradoopFlinkTestBase { |
There was a problem hiding this comment.
JavaDoc for class and functions missing
| */ | ||
| package org.gradoop.flink.io.impl.csv.conversion; | ||
|
|
||
| import org.gradoop.flink.io.api.DataSink; |
There was a problem hiding this comment.
Wrong indentation at import statements
| import org.junit.Test; | ||
| import org.junit.rules.TemporaryFolder; | ||
|
|
||
| public class LogicalGraphIndexedCSVToCSVTest extends GradoopFlinkTestBase { |
There was a problem hiding this comment.
JavaDoc for class and functions missing
| } | ||
|
|
||
| @Test | ||
| public void testWriteLogicalGraph() throws Exception { |
| } | ||
|
|
||
| @Test | ||
| public void testReadSingleGraph() throws Exception { |
This pull request adds support to read and write graph collections using the csv data source and sinks.