Fix/variable not read in custom loader#5558
Conversation
Summary of ChangesHello @Nek-11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue affecting custom document loaders where environment variables (e.g., API keys) were not being correctly resolved during the 'Process' operation. The problem stemmed from the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully addresses the reported issue by correctly propagating the workspaceId to the document loader's init() options, which resolves the $vars being empty during the "Process" step. The changes to _splitIntoChunks() and previewChunks() effectively ensure that the workspaceId is available where needed. Additionally, the removal of the unused docId parameter in deleteVectorStoreFromStore is a good cleanup.
However, several informative comment blocks and inline comments were removed, which could impact the long-term maintainability and understanding of the codebase. While the code remains functional, the absence of these explanations might make it harder for future developers to grasp the intent and logic behind certain operations. It is recommended to restore or rephrase these comments to ensure code clarity.
|
@gemini-code-assist check again now I merged from main |
|
Thanks for the update, @Nek-11! I understand you've merged from As an AI, I don't automatically re-evaluate pull requests after new commits. If you'd like me to perform a fresh review on the updated code, please invoke me again using the |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses the issue of $vars not being available in custom loaders by passing the workspaceId through the _splitIntoChunks function. This resolves the authentication failures during the 'Process' step.
However, the PR also includes several refactorings that seem unrelated to the primary fix. I've identified a few potential issues in these changes:
- A breaking API change in the
deleteVectorStoreFromStoreendpoint, which no longer supports deletion bydocId. - A redundant database write operation within
_insertIntoVectorStoreWorkerThread. - The potential loss of
docIdin the metadata of documents being upserted to the vector store, which could impact filtering and deletion capabilities.
Please review the detailed comments for suggestions on how to address these points.
I am having trouble creating individual review comments. Click here to see my feedback.
packages/server/src/controllers/documentstore/index.ts (517-521)
This change removes the ability to delete vectors for a specific docId from this endpoint, as the docId query parameter is no longer being processed and passed to the service layer. This constitutes a breaking API change. If this was unintentional, the previous behavior should be restored. If it was intentional, this change in functionality should be clearly documented in the pull request description.
packages/server/src/services/documentstore/index.ts (1357-1360)
By changing how the document metadata is constructed, you are losing the docId. This ID is a top-level property on the DocumentStoreFileChunk entity and is not part of the metadata JSON string. It is essential for many vector store operations, such as filtering or targeted deletions. Please restore the previous logic to include it in the final document metadata.
return new Document({
pageContent: chunk.pageContent,
metadata: {
...JSON.parse(chunk.metadata),
docId: chunk.docId
}
})packages/server/src/services/documentstore/index.ts (1322)
The parent function insertIntoVectorStore already calls saveVectorStoreConfig before invoking this worker thread. Calling it again here is redundant and results in an unnecessary database write. The previous implementation correctly fetched the already-updated entity. It's recommended to revert to that logic to avoid the extra database operation.
// Configuration already saved by insertIntoVectorStore, just retrieve the entity
const entity = await appDataSource.getRepository(DocumentStore).findOneBy({
id: data.storeId,
workspaceId: workspaceId
})
if (!entity) {
throw new InternalFlowiseError(StatusCodes.NOT_FOUND, `Document store ${data.storeId} not found`)
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a critical bug where variables were not being read in custom document loaders due to a missing workspaceId. The fix is well-implemented by propagating the workspaceId through the previewChunks and _splitIntoChunks functions, ensuring it's available during the loader's initialization. The changes are clean, targeted, and correctly address the issue described. The detailed problem description and testing notes are also very helpful.
|
thank you! |
|
Thank you so much for the fix. I was not updating to >=3.0.5 because of this issue. I can't wait to get the new version released. |
Problem
After PR #4891 ("Bugfix/update workspaceId to vars"), the
getVars()function inpackages/components/src/utils.tswas updated to requireworkspaceId- returning an empty array if not provided:However,
_splitIntoChunks()inpackages/server/src/services/documentstore/index.tswas not updated to passworkspaceIdto the document loader'sinit()options. This caused$varsto be empty when using Custom Document Loader during the "Process" step, resulting in undefined API keys and authentication failures.The "Execute" button worked because it uses a different code path (
executeCustomNodeFunction) that correctly passesworkspaceId.Solution
workspaceIdparameter to_splitIntoChunks()functionpreviewChunks()to destructureworkspaceIdfromIExecutePreviewLoaderparamsworkspaceIdto_splitIntoChunks()callworkspaceIdin the options object passed to document loader'sinit()methodTesting
Tested on production deployed Flowise app:
$varsfor API keys (FireCrawl, Mistral, Azure OpenAI)