pynto is a Python package that lets you manipulate a data frame as a stack of columns, using the the expressiveness of the concatenative/stack-oriented paradigm.
With pynto you chain together functions called words to formally specify how to calculate each column of your data frame. The composed words can be lazily evaluated over any range of rows to create your data frame.
Words add, remove or modify columns. They can operate on the entire stack or be limited to certain columns using a column indexer. Composed words will operate in left-to-right order, with operators following their operands in postfix (Reverse Polish Notation) style. More complex operations can be specified using quotations, anonymous blocks of words that do not operate immediately, and combinators, higher-order words that control the execution of quotations.
Here's a program to calculate deviations from moving average for each column in a table using the combinator/quotation pattern.
>>> import pynto as pt
>>> ma_dev = ( # create a pynto expression by concatenating words
>>> pt.load('stock_prices') # append columns to stack from the built-in database
>>> + ~(pt.dup + pt.ravg(20) + pt.sub) # quotation: copy column, calc moving avg, subtract
>>> + pt.map # use the map combinator to apply the quotation
>>> ) # to each column in the stack
>>>
>>> df = ma_dev.rows['2021-06-01':] # evaluate over a range of rows to get a DataFrame
>>> pt.db['stocks_ma_dev'] = df # save the results back to the database
- Expressive: Pythonic syntax; Combinatory logic for modular, reusable code
- Performant: Memoization to eliminate duplicate operations
- Batteries included: Built-in time series database
- Interoperable: Seamless integration with Pandas/numpy
pip install pynto
Words are composed using the + operator. Each word resolves from the pt namespace.
>>> # Compose words that add a column of 10s to the stack, duplicate the column,
>>> # and then multiply the columns together
>>> ten_squared = pt.c(10) + pt.dup + pt.mul
Add constant-value columns to the stack using pt.c(value). You can also add constants directly with + using numeric literals. pt.r(n) adds whole number-value constant columns from 0 to n - 1.
>>> ten_squared = pt.c(10) + pt.dup + pt.mul # using pt.c()
>>> ten_squared = 10 + pt.dup + pt.mul # using numeric literal with +
To evaluate your expression, you use a row indexer. Specify rows by date range using the .rows[start:stop (exclusive):periodicity] syntax. None slicing arguments default to the widest range available. int indices also work with the .rows indexer. .first, and .last are included for convenience.
>>> ten_squared.rows['2021-06-01':'2021-06-03','B'] # evaluate over a two business day date range
c
2021-06-01 100.0
2021-06-02 100.0
Combinators are higher-order functions that allow pynto to do more complicated things like branching and looping. Combinators operate on quotations, expressions that are pushed to the stack instead of operating on the stack. Use the ~ operator to create a quotation from a word or expression. The map combinator applies a quotation at the top of the stack over each column below in the stack.
>>> (pt.c(9) + pt.c(10) + ~(pt.dup + pt.mul) + pt.map).last
c c
2021-06-02 81.0 100.0
~ supports single words, expressions, and nesting:
>>> ~pt.neg # quote a single word
>>> ~(pt.dup + pt.mul) # quote an expression
>>> ~(~(pt.c(100) + pt.sub) + pt.call) + pt.map # nested quotation
>>> ~~quoted_expr # unquote (convert quotation back to word)
Each column has a string header. hset sets the header to a new value. Headers are useful for filtering or arranging columns.
>>> (pt.c(9) + pt.c(10) + ~(pt.dup + pt.mul) + pt.map + pt.hset('a','b')).last
a b
2021-06-02 81.0 100.0
Column indexers specify the columns on which a word operates, overriding the word's default. Use .cols[indexer] to set the column indexer. Positive int indices start from the bottom (left) of the stack and negative indices start from the top.
By default add has a column indexer of [-2:]
>>> (pt.r5 + pt.add).last
c c c c
2021-06-02 0.0 1.0 2.0 7.0
Change the column indexer of add to [:] to sum all columns
>>> (pt.r5 + pt.add.cols[:]).last
c
2025-06-02 10.0
You can also index columns by header, using regular expressions
>>> (pt.r3 + pt.hset('a,b,c') + pt.add.cols['(a|c)']).last
b a
2025-06-02 1.0 2.0
Use .copy to keep the original columns and .discard to remove non-selected columns:
>>> (pt.r5 + pt.pull.cols[2:4].copy).values[0] # copy selected, keep originals
>>> (pt.r5 + pt.pull.cols[2:4].discard).values[0] # move selected, discard rest
>>> (pt.r5 + pt.pull.copy).values[0] # copy with default selector
Words are composed using the + operator.
>>> squared = pt.dup + pt.mul
>>> ten_squared2 = pt.c(10) + squared # same thing
Words can also be defined globally in the pynto vocabulary.
>>> pt.define['squared'] = pt.dup + pt.mul
pynto has built-in database functionality that lets you save DataFrames and Series to a Redis database. The database saves the underlying numpy data in native byte format for zero-copy retrieval. Each DataFrame column is saved as an independent key and can be retrieved or updated on its own. The database also supports three-dimensional frames that have a two-level MultiIndex.
>>> pt.db['my_df'] = expr.rows['2021-06-01':'2021-06-03']
>>> pt.load('my_df').rows[:]
constant constant
2021-06-01 81.0 100.0
2021-06-02 81.0 100.0
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| c | [-1:] | values | Pushes constant columns for each of values |
| day_count | [-1:] | Pushes a column with the number of days in the period | |
| from_pandas | [:] | pandas, round_ | Pushes columns from Pandas DataFrame or Series pandas |
| load | [-1:] | Pushes columns of a DataFrame saved to internal DB as key | |
| nan | [-1:] | values | Pushes a constant nan-valued column |
| period_ordinal | [-1:] | Pushes a column with the period ordinal | |
| r | [-1:] | n | Pushes constant columns for each whole number from 0 to n - 1 |
| randn | [-1:] | Pushes a column with values from a random normal distribution | |
| timestamp | [-1:] | Pushes a column with the timestamp of the end of the period |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| drop | [-1:] | Removes selected columns | |
| dup | [-1:] | Duplicates columns | |
| hsort | [:] | Sorts columns by header | |
| id | [:] | Identity/no-op | |
| interleave | [:] | parts | Divides columns in parts groups and interleaves the groups |
| keep | [:] | Removes non-selected columns | |
| nip | [-1:] | Removes non-selected columns, defaulting selection to top | |
| pull | [:] | Brings selected columns to the top | |
| rev | [:] | Reverses the order of selected columns | |
| roll | [:] | Permutes selected columns | |
| swap | [-2:] | Swaps top and bottom selected columns |
Use the ~ operator to create quotations:
| Expression | Description |
|---|---|
~pt.word |
Wraps a single word as a quotation |
~(pt.word1 + pt.word2) |
Wraps an expression as a quotation |
~~quoted |
Unwraps a quotation back into a standard word |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| halpha | [:] | Set headers to alphabetical values | |
| happly | [:] | header_func | Apply header_func to headers_ |
| hformat | [:] | format_spec | Apply format_spec to headers |
| hreplace | [:] | old, new | Replace old with new in headers |
| hset | [:] | headers | Set headers to *headers |
| hsetall | [:] | headers | Set headers to *headers repeating, if necessary |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| call | [:] | Applies quotation | |
| cleave | [:] | num_quotations | Applies all preceding quotations |
| compose | [:] | num_quotations | Combines quotations |
| hmap | [:] | Applies quotation to stacks created grouping columns by header | |
| ifexists | [:] | count | Applies quotation if stack has at least count columns |
| ifexistselse | [:] | count | Applies top quotation if stack has at least count columns, otherwise applies second quotation |
| ifheaders | [:] | predicate | Applies top quotation if list of column headers fulfills predicate |
| ifheaderselse | [:] | predicate | Applies quotation if list of column headers fulfills predicate, otherwise applies second quotation |
| map | [:] | every | Applies quotation in groups of every |
| partial | [-1:] | quoted, this | Pushes stack columns to the front of quotation |
| repeat | [:] | times | Applies quotation times times |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| ffill | [:] | lookback, leave_end | Fills nans with previous values, looking back lookback before range and leaving trailing nans unless not leave_end |
| fill | [:] | Fills nans with value | |
| fillfirst | [-1:] | lookback | Fills first row with previous non-nan value, looking back lookback before range |
| join | [-2:] | date | Joins two columns at date |
| sync | [:] | Align available data by setting all values to NaN when any values is NaN | |
| zero_first | [-1:] | Changes first value to zero | |
| zero_to_na | [-1:] | Changes zeros to nans |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| resample_avg | [:] | Sets periodicity resampling method to avg | |
| resample_first | [:] | Sets periodicity resampling method to first | |
| resample_firstnofill | [:] | Sets periodicity resampling method to first | |
| resample_last | [:] | Sets periodicity resampling method to last | |
| resample_lastnofill | [:] | Sets periodicity resampling method to last with no fill | |
| resample_max | [:] | Sets periodicity resampling method to max | |
| resample_min | [:] | Sets periodicity resampling method to min | |
| resample_sum | [:] | Sets periodicity resampling method to sum | |
| set_periodicity | [-1:] | periodicity | Changes column periodicity to periodicity, then resamples |
| set_start | [-1:] | start | Changes period start to start, then resamples |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| add | [-2:] | ignore_nans | Addition |
| avg | [-2:] | ignore_nans | Arithmetic average |
| div | [-2:] | ignore_nans | Division |
| max | [-2:] | ignore_nans | Maximum |
| med | [-2:] | ignore_nans | Median |
| min | [-2:] | ignore_nans | Minimum |
| mod | [-2:] | ignore_nans | Modulo |
| mul | [-2:] | ignore_nans | Multiplication |
| pow | [-2:] | ignore_nans | Power |
| std | [-2:] | ignore_nans | Standard deviation |
| sub | [-2:] | ignore_nans | Subtraction |
| var | [-2:] | ignore_nans | Variance |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| nadd | [-2:] | ignore_nans | Addition |
| navg | [-2:] | ignore_nans | Arithmetic average |
| ndiv | [-2:] | ignore_nans | Division |
| nmax | [-2:] | ignore_nans | Maximum |
| nmed | [-2:] | ignore_nans | Median |
| nmin | [-2:] | ignore_nans | Minimum |
| nmod | [-2:] | ignore_nans | Modulo |
| nmul | [-2:] | ignore_nans | Multiplication |
| npow | [-2:] | ignore_nans | Power |
| nstd | [-2:] | ignore_nans | Standard deviation |
| nsub | [-2:] | ignore_nans | Subtraction |
| nvar | [-2:] | ignore_nans | Variance |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| ewm_mean | [-1:] | window | Exponentially-weighted moving average |
| ewm_std | [-1:] | window | Exponentially-weighted standard deviation |
| ewm_var | [-1:] | window | Exponentially-weighted variance |
| radd | [-1:] | window | Addition |
| ravg | [-1:] | window | Arithmetic average |
| rcor | [-2:] | window | Correlation |
| rcov | [-2:] | window | Covariance |
| rdif | [-1:] | window | Lagged difference |
| rlag | [-1:] | window | Lag |
| rmax | [-1:] | window | Maximum |
| rmed | [-1:] | window | Median |
| rmin | [-1:] | window | Minimum |
| rret | [-1:] | window | Lagged return |
| rstd | [-1:] | window | Standard deviation |
| rvar | [-1:] | window | Variance |
| rzsc | [-1:] | window | Z-score |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| cadd | [-1:] | Addition | |
| cavg | [-1:] | Arithmetic average | |
| cdif | [-1:] | Lagged difference | |
| clag | [-1:] | Lag | |
| cmax | [-1:] | Maximum | |
| cmin | [-1:] | Minimum | |
| cmul | [-1:] | Multiplication | |
| cret | [-1:] | Lagged return | |
| cstd | [-1:] | Standard deviation | |
| csub | [-1:] | Subtraction | |
| cvar | [-1:] | Variance |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| rcadd | [-1:] | Addition | |
| rcavg | [-1:] | Arithmetic average | |
| rcdif | [-1:] | Lagged difference | |
| rclag | [-1:] | Lag | |
| rcmax | [-1:] | Maximum | |
| rcmin | [-1:] | Minimum | |
| rcmul | [-1:] | Multiplication | |
| rcret | [-1:] | Lagged return | |
| rcstd | [-1:] | Standard deviation | |
| rcsub | [-1:] | Subtraction | |
| rcvar | [-1:] | Variance |
| Word | Default Selector | Parameters | Description |
|---|---|---|---|
| abs | [-1:] | Absolute value | |
| dec | [-1:] | Decrement | |
| exp | [-1:] | Exponential | |
| expm1 | [-1:] | Exponential minus one | |
| inc | [-1:] | Increment | |
| inv | [-1:] | Multiplicative inverse | |
| lnot | [-1:] | Logical not | |
| log | [-1:] | Natural log | |
| log1p | [-1:] | Natural log of increment | |
| neg | [-1:] | Additive inverse | |
| rank | [:] | Row-wise rank | |
| sign | [-1:] | Sign | |
| sqrt | [-1:] | Square root |
