Skip to content

chore(backend): added logging to chunking#336

Merged
gjreda merged 2 commits into
mainfrom
add-logging-chunking
Aug 1, 2023
Merged

chore(backend): added logging to chunking#336
gjreda merged 2 commits into
mainfrom
add-logging-chunking

Conversation

@shauryr

@shauryr shauryr commented Jul 29, 2023

Copy link
Copy Markdown
Collaborator

sometimes the backend is not able to chunk the full text from the pdfs because of file not found errors.

@codecov

codecov Bot commented Jul 29, 2023

Copy link
Copy Markdown

Codecov Report

Merging #336 (5bb3486) into main (adc02a8) will decrease coverage by 0.01%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
- Coverage   86.35%   86.34%   -0.01%     
==========================================
  Files         150      150              
  Lines        8487     8490       +3     
  Branches      957      957              
==========================================
+ Hits         7329     7331       +2     
- Misses       1147     1148       +1     
  Partials       11       11              
Files Changed Coverage Δ
python/sidecar/shared.py 93.47% <66.66%> (-0.91%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@shauryr shauryr requested review from gjreda and hammer July 29, 2023 02:43
@gjreda

gjreda commented Aug 1, 2023

Copy link
Copy Markdown
Collaborator

@shauryr This LGTM, but do you have a sense of why this is happening? Was it just something that came up in your debugging?

I don't think it should ever happen unless there is a bug on the front end, since the frontend knows all the file paths and passes them to the backend.

@gjreda gjreda merged commit 83d5b0b into main Aug 1, 2023
@gjreda gjreda deleted the add-logging-chunking branch August 1, 2023 16:46
@shauryr

shauryr commented Aug 1, 2023

Copy link
Copy Markdown
Collaborator Author

@shauryr This LGTM, but do you have a sense of why this is happening? Was it just something that came up in your debugging?

I don't think it should ever happen unless there is a bug on the front end, since the frontend knows all the file paths and passes them to the backend.

@gjreda when I ran the ingest command, the chunks variable was empty for me in references.json file. On closer inspection I found that for just chunking the code was not able to find the pdfs in the directory it was looking for.
Then I found out that python is writing .staging, .storage and .grobid to a different folder than the UI. Python was writing these folder to a dir outside refstudio dir in my dev work folder while UI was doing it here /Users/shaurya/Library/Application Support/studio.ref.desktop/project-x.

It maybe an issue because I have configured the project incorrectly. Am I doing something wrong here?

@gjreda

gjreda commented Aug 1, 2023

Copy link
Copy Markdown
Collaborator

Then I found out that python is writing .staging, .storage and .grobid to a different folder than the UI. Python was writing these folder to a dir outside refstudio dir in my dev work folder while UI was doing it here /Users/shaurya/Library/Application Support/studio.ref.desktop/project-x.

This sounds like it is related to your environment variables. The application passes the env variables on each call to the sidecar (here), whereas if you are running just the python cli app, it will be using whatever env variables you have set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants