You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following contains a high-level view of what will be the next main enhancement steps. This document will be kept up-to-date and improved frequently. This work will be mainly conducted by @mk2510 and @henrifroese as part of their SummerOfCode project.
Most of Texthero data structure are list of list ([["a", "document"], ["another", "document"]]), can we leverage parallelization? We can learn from spaCy. Mandatory read: 100-times-faster-nlp; look at this for parallelization
The following contains a high-level view of what will be the next main enhancement steps. This document will be kept up-to-date and improved frequently. This work will be mainly conducted by @mk2510 and @henrifroese as part of their SummerOfCode project.
Version 1.10
TokenSeriesTokenSeries as input to every representation function #44term_frequencytocount()+ add functionterm_frequencycount(s) and term_frequency(s) #61HeroSeriesVectorSeries/TokenSeries?representationfunctions to deal withHeroSeries+ (DocumentTermDF) Support "Pandas Series Representation" #43Performance: speed-up the library
spaCy: tokenize with Spacy #131Software development:
Support Embeddings through Flair
Add Topic Modeling
This include also "topic modeling visualization" to get insights out of it
Extra