Skip to content

[#551] csv graph collection import and export#976

Merged
ChrizZz110 merged 10 commits intodbs-leipzig:developfrom
klaudworks:#551_support_graph_collection
Oct 4, 2018
Merged

[#551] csv graph collection import and export#976
ChrizZz110 merged 10 commits intodbs-leipzig:developfrom
klaudworks:#551_support_graph_collection

Conversation

@klaudworks
Copy link
Copy Markdown
Contributor

This pull request adds support to read and write graph collections using the csv data source and sinks.

Copy link
Copy Markdown
Collaborator

@p-f p-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about the code.

I think we should leave the "old" CSV format in the io package, maybe in a deprecated subpackage, but a utils.converter.deprecated.sources package doesn't seem fitting, since it's not actually a converter.

I would suggest:

  • Keep the old CSV format in the io package, make clear that it should not be used.
  • Add a converter from the old CSV format to the new.
  • Change the name from OldCSVDataSource to something more fitting, like CSVDataSourceWithoutGraphHeads (name is up for debate, but calling it "old" seems wrong, since it is currently the recommended format)
  • After this PR: Update Gradoop version to 0.6 since this is a major change.
  • Maybe create an Issue for all those CSV-related changes to summarize all changes to this format.

*
* The string needs to be encoded in the following format:
*
* edge-id;source-id;target-id;edge-label;value_1|value_2|...|value_n
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put comments like this in a {@code ...} block

* @param limit resulting array length
* @return tokens
*/
public String[] split(String s, int limit) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be protected, it should not be part of the API.

*
* The string needs to be encoded in the following format:
*
* vertex-id;vertex-label;value_1|value_2|...|value_n
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{@code ...} block here too

import org.gradoop.flink.io.impl.csv.metadata.MetaData;

/**
* /**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line (duplicate block-comment start).

* The datasource expects files separated by vertices and edges, e.g. in the following directory
* structure:
*
* csvRoot
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put those lines in a <pre><code> ... </code></pre> block to avoid unwanted newlines in Javadoc.

mapWriter(record, label);
String label = ((CSVElement) record).getLabel();
if (label.equals("")) {
throw new RuntimeException("IndexedCSVDataSink requires a label for every element.");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a different Exception here, a RuntimeException seems too severe.

return f4;
}

public void setLabel(String label) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use @Override here, since you declared that method in a new interface.
(Also not part of this PR, but please add short Javadoc comments for the other methods of this class.)

public interface CSVElement {
/**
* Returns the elements label.
* @return label
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Javadoc comment for label.

f0 = id;
}

public String getLabel() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use @Override here too.

@Test
public void testPageRankExecution() throws Exception {
LogicalGraph input = getSocialNetworkLoader().getLogicalGraphByVariable("g0");
input.print();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that, thanks 👍

Copy link
Copy Markdown
Collaborator

@p-f p-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing a LogicalGraph with a graph head having Properties, the Properties are written to CSV, but discarded in CSVDataSource.getLogicalGraph, because the ReduceCombination creates a new graph head.

*/
@Override
public LogicalGraph getLogicalGraph() {
return getGraphCollection().reduce(new ReduceCombination());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a ReduceCombination will create a new graph head and therefore discard graph head properties.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reduce combination used when reading a single logical graph is the way most data sources work.
I opened Issue #974 to discuss improvements. Regarding this pull request, I would keep it this way and change this behavior for all the affected data sources.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I forgot about that issue. So yes, leave it like this.

@p-f
Copy link
Copy Markdown
Collaborator

p-f commented Sep 26, 2018

Please update from develop and fix the conflict in ReducePropertyMetaData

@klaudworks klaudworks force-pushed the #551_support_graph_collection branch from 019e261 to b75fd03 Compare September 27, 2018 12:31
…for complex property types and clean indexed CSV paths of illegal characters
@timo95
Copy link
Copy Markdown
Contributor

timo95 commented Sep 27, 2018

import org.apache.flink.api.java.tuple.Tuple3;

/**
* Tuple representing an edge in a CSV file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'an graphhead in a csv file'

@galpha
Copy link
Copy Markdown
Contributor

galpha commented Sep 28, 2018

Good job @2start and @timo95, after the docu fix i'll approve.

Copy link
Copy Markdown
Contributor

@ChrizZz110 ChrizZz110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes.
I do not like the pattern to move deprecated classes in a package called "deprecated". I think the usual way is to mark the classes with the deprecated annotation and let them in the package thy belong to. But if you agree with this pattern, we can leave it like this.

/**
* Returns the path to the vertex file containing only vertices with the specified label.
*
* @param label vertex label
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the graphHead label not the vertex label. Same for description 2 lines above.

/**
* {@inheritDoc}
*
* graph heads will be disposed at the moment. The following issue attempts to provide
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Upper case letter at the beginning of a sentence.
(2) Update the Class-JavaDoc with the additional graphs.csv file

}

/**
* {@inheritDoc}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @inheritdoc if no additional description is provided

import org.gradoop.flink.io.impl.csv.metadata.MetaData;

/**
* /**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current online branch, it is not fixed.

* The string needs to be encoded in the following format:
*
* vertex-id;vertex-label;value_1|value_2|...|value_n
* {@code vertex-id;[graph-ids];vertex-label;value_1|value_2|...|value_n}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 25 -> Remove line with wrong comment definition

import org.junit.Test;
import org.junit.rules.TemporaryFolder;

public class LogicalGraphCSVToCSVTest extends GradoopFlinkTestBase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc for class and functions missing

*/
package org.gradoop.flink.io.impl.csv.conversion;

import org.gradoop.flink.io.api.DataSink;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation at import statements

import org.junit.Test;
import org.junit.rules.TemporaryFolder;

public class LogicalGraphIndexedCSVToCSVTest extends GradoopFlinkTestBase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc for class and functions missing

}

@Test
public void testWriteLogicalGraph() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc missing

}

@Test
public void testReadSingleGraph() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc missing

Copy link
Copy Markdown
Contributor

@ChrizZz110 ChrizZz110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect

@ChrizZz110 ChrizZz110 merged commit bf4dac0 into dbs-leipzig:develop Oct 4, 2018
galpha pushed a commit to galpha/gradoop that referenced this pull request Oct 9, 2018
0x002A pushed a commit to ChrizZz110/gradoop that referenced this pull request Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants