Skip to content

(WIP) Add option to import json dump file (#291)#293

Closed
hbruch wants to merge 1 commit intokomoot:masterfrom
mfdz:issue_291
Closed

(WIP) Add option to import json dump file (#291)#293
hbruch wants to merge 1 commit intokomoot:masterfrom
mfdz:issue_291

Conversation

@hbruch
Copy link
Copy Markdown
Collaborator

@hbruch hbruch commented Jan 12, 2018

This is a first attempt to provide a json dump import (see #291). Importing a json dump on my machine was about 10x faster than importing from nominatim.

However, there are some points I'd like to receive feedback on before doing a PR for merge:

  • currently, no sanity checking is done on import, neither of the action line, nor the source line, should it?
  • the JsonDumper does not export the PhotonDoc.uid. It should either be exported or the PhotonDoc should be restored from json so it can be recalculated from source
  • JsonDumpConnector does not convert a source line into a PhotonDoc but bypasses the public Importer interface to add the source string. This is ugly, but avoids (unnecessary?) parsing and formatting of the source (but would require uid handling, see above)
  • the elasticsearcg.Importer uses BulkRequestBuilder. Using BulkProcessor would provide concurrent updates, as well as retries with exponential backup in case of failures (this should be another PR, I think)

@hbruch hbruch changed the title (WIP) Add option to import json dump file (WIP) Add option to import json dump file (#291) Jan 12, 2018
try {
this.reader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException e) {
log.error("Json dump file '%1' not found", filename);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should throw a RuntimeException to avoid usage of this connector.

}
}

public void setImporter(de.komoot.photon.Importer importer) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be provided on construction and then importer and reader could be final?

@karussell
Copy link
Copy Markdown
Collaborator

currently, no sanity checking is done on import, neither of the action line, nor the source line, should it?

Sanity of which form like empty content?

It should either be exported or the PhotonDoc should be restored from json so it can be recalculated from source

Which option do you prefer?

This is ugly, but avoids (unnecessary?) parsing and formatting of the source (but would require uid handling, see above)

Can't judge about it. Both methods seems ok to me.

this should be another PR, I think

Yes, speed is not important for the first implementation IMO

@karussell
Copy link
Copy Markdown
Collaborator

Have included some improvements of this in #438

@karussell karussell closed this Oct 12, 2019
@hbruch hbruch deleted the issue_291 branch May 22, 2020 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants