Skip to content

Added kwargs to text preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix#386

Merged
dvdjlaw merged 2 commits intomasterfrom
add_text_kwargs
Nov 9, 2020
Merged

Added kwargs to text preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix#386
dvdjlaw merged 2 commits intomasterfrom
add_text_kwargs

Conversation

@truongc2
Copy link
Copy Markdown
Collaborator

@truongc2 truongc2 commented Nov 9, 2020

Resolves #381
Added **kwargs to filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix in text_preprocessing.py

@truongc2 truongc2 requested a review from dvdjlaw November 9, 2020 16:04
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 9, 2020

Codecov Report

Merging #386 (5d4fc1f) into master (9f5ba2b) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #386   +/-   ##
=======================================
  Coverage   90.58%   90.58%           
=======================================
  Files          41       41           
  Lines        1826     1826           
=======================================
  Hits         1654     1654           
  Misses        172      172           
Impacted Files Coverage Δ
data_describe/text/text_preprocessing.py 96.11% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3dbc9c5...5d4fc1f. Read the comment docs.

Copy link
Copy Markdown
Member

@dvdjlaw dvdjlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to addressing the kwargs for filter_extremes, please also:

  • Update the title as it will be used in release notes (past tense, more descriptive of what changed)
  • Add the proper label (bug) as it will be used in release notes

Default is 10.
no_above: Keep tokens which are contained in no more than no_above portion of
documents (fraction of total corpus size). Default is 0.2.
**kwargs: Other arguments to be passed to gensim.corpora.Dictionary.filter_extremes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to use the kwargs in this func

@truongc2 truongc2 added the bug Something isn't working label Nov 9, 2020
@truongc2 truongc2 changed the title Add kwargs to text_preprocessing.py Added kwargs to text_preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix Nov 9, 2020
@truongc2 truongc2 changed the title Added kwargs to text_preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix Added kwargs to text preprocessing functions: filter_dictionary, create_doc_term_matrix, and create_tfidf_matrix Nov 9, 2020
@dvdjlaw dvdjlaw merged commit 9d297dd into master Nov 9, 2020
@dvdjlaw dvdjlaw deleted the add_text_kwargs branch November 9, 2020 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add kwargs to text_preprocessing

2 participants