In chatting with @cguedes, we decided this field isn't really necessary. For now, we can consider the source_filename to be the unique identifier of the paper, since the filesystem will require it to be unique anyway.
Longer term we should probably so something like a UUID or database primary key. @shauryr mentioned that S2 uses a SHA1 of the PDF as an identified, so we could also adopt similar.
In chatting with @cguedes, we decided this field isn't really necessary. For now, we can consider the
source_filenameto be the unique identifier of the paper, since the filesystem will require it to be unique anyway.Longer term we should probably so something like a UUID or database primary key. @shauryr mentioned that S2 uses a SHA1 of the PDF as an identified, so we could also adopt similar.