Skip to content

Can't insert all my data into sqlite database. #56

@kellytsorb

Description

@kellytsorb

Hello,
I used grobid to covert 2657 pdf files in xml and then with this command #!python -m paperetl.file /Users/kellytsorb/paperetl/file/XML_files /Users/kellytsorb/paperetl/SQLite
I insert the xml files into database that this comand creates but only 549 of these are inserted and I don't know why because in the past some of the papers that aren't inserted now I tried a smaller number of them and they were okk. Is there a limitation of number of articles that I can insert into database?

Process Process-2:
Traceback (most recent call last):
File "/Users/kellytsorb/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/Users/kellytsorb/anaconda3/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/execute.py", line 94, in process
for result in Execute.parse(*params):
File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/execute.py", line 74, in parse
yield TEI.parse(stream, source)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kellytsorb/anaconda3/lib/python3.11/site-packages/paperetl/file/tei.py", line 37, in parse
title = soup.title.text
^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'text'
Total articles inserted: 549

Thank you in advance!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions