Possible race condition in subject creation during batch upload via API

Currently at Harvard, we're doing our EAD uploads via an [ingest script](https://github.com/harvard-library/aspace-utils/blob/master/ingest_aspace.rb) that talks to the API, largely so we can have per file rather than per-process failure on errors.  This has largely worked out well, but is fairly slow across 6000+ finding aids, so our script does multiple uploads at once in hopes of speeding a 4-day process into a overnight process.

When testing this, I noticed that when we do uploads in parallel, we get a class of errors that we don't see when uploading serially:

```
Couldn't create version of: #<AgentCorporateEntity:0x69c234c6>
Couldn't create version of: #<AgentCorporateEntity:0x13619867>
Couldn't create version of: #<AgentCorporateEntity:0x3a539471>
Couldn't create version of: #<AgentCorporateEntity:0x35e9e1e7>
Couldn't create version of: #<Subject:0x62ea4d3c>
Couldn't create version of: #<Subject:0x2354d32d>
Couldn't create version of: #<Subject:0x262c6aa6>
Couldn't create version of: #<Subject:0x2bfaa80c>
```

This appears to happen when records with identical subject terms or corpnames are uploaded in the same batch.  My guess is that this is because the "create or fetch existing" logic for those models is not serialized or isolated, and they're getting identical IDs and one or the other (or both?) are getting kicked out.

I don't currently have time to fix this, and have just resigned myself to slow imports, but it's something that could come up infrequently in actual use, I suspect, and it would be VERY nice to be able to parallelize import scripts. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible race condition in subject creation during batch upload via API #517

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible race condition in subject creation during batch upload via API #517

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions