GitHub - punkbrwstr/pynto: Time series analysis in Python using the concatenative paradigm

pynto: Data analysis in Python using stack-based programming

pynto is a Python package that lets you manipulate a data frame as a stack of columns, using the the expressiveness of the concatenative/stack-oriented paradigm.

How does it work?

With pynto you chain together functions called words to formally specify how to calculate each column of your data frame. The composed words can be lazily evaluated over any range of rows to create your data frame.

Words add, remove or modify columns. They can operate on the entire stack or be limited to certain columns using a column indexer. Composed words will operate in left-to-right order, with operators following their operands in postfix (Reverse Polish Notation) style. More complex operations can be specified using quotations, anonymous blocks of words that do not operate immediately, and combinators, higher-order words that control the execution of quotations.

What does it look like?

Here's a program to calculate deviations from moving average for each column in a table using the combinator/quotation pattern.

>>> import pynto as pt
>>> ma_dev = (                             # create a pynto expression by concatenating words
>>>     pt.load('stock_prices')            # append columns to stack from the built-in database
>>>     + ~(pt.dup + pt.ravg(20) + pt.sub) # quotation: copy column, calc moving avg, subtract
>>>     + pt.map                           # use the map combinator to apply the quotation
>>> )                                      # to each column in the stack
>>>
>>> df = ma_dev.rows['2021-06-01':]        # evaluate over a range of rows to get a DataFrame
>>> pt.db['stocks_ma_dev'] = df            # save the results back to the database

Why pynto?

Expressive: Pythonic syntax; Combinatory logic for modular, reusable code
Performant: Memoization to eliminate duplicate operations
Batteries included: Built-in time series database
Interoperable: Seamless integration with Pandas/numpy

Get pynto

pip install pynto

Reference

The Basics

Composing words

Words are composed using the + operator. Each word resolves from the pt namespace.

>>> # Compose words that add a column of 10s to the stack, duplicate the column,
>>> # and then multiply the columns together
>>> ten_squared = pt.c(10) + pt.dup + pt.mul

Constant literals

Add constant-value columns to the stack using pt.c(value). You can also add constants directly with + using numeric literals. pt.r(n) adds whole number-value constant columns from 0 to n - 1.

>>> ten_squared = pt.c(10) + pt.dup + pt.mul    # using pt.c()
>>> ten_squared = 10 + pt.dup + pt.mul          # using numeric literal with +

Row indexers

To evaluate your expression, you use a row indexer. Specify rows by date range using the .rows[start:stop (exclusive):periodicity] syntax. None slicing arguments default to the widest range available. int indices also work with the .rows indexer. .first, and .last are included for convenience.

>>> ten_squared.rows['2021-06-01':'2021-06-03','B']                   # evaluate over a two business day date range
                 c
2021-06-01     100.0
2021-06-02     100.0

Quotations and Combinators

Combinators are higher-order functions that allow pynto to do more complicated things like branching and looping. Combinators operate on quotations, expressions that are pushed to the stack instead of operating on the stack. Use the ~ operator to create a quotation from a word or expression. The map combinator applies a quotation at the top of the stack over each column below in the stack.

>>> (pt.c(9) + pt.c(10) + ~(pt.dup + pt.mul) + pt.map).last
                 c         c
2021-06-02      81.0     100.0

~ supports single words, expressions, and nesting:

>>> ~pt.neg                                        # quote a single word
>>> ~(pt.dup + pt.mul)                             # quote an expression
>>> ~(~(pt.c(100) + pt.sub) + pt.call) + pt.map   # nested quotation
>>> ~~quoted_expr                                  # unquote (convert quotation back to word)

Headers

Each column has a string header. hset sets the header to a new value. Headers are useful for filtering or arranging columns.

>>> (pt.c(9) + pt.c(10) + ~(pt.dup + pt.mul) + pt.map + pt.hset('a','b')).last
                 a         b
2021-06-02      81.0     100.0

Column indexers

Column indexers specify the columns on which a word operates, overriding the word's default. Use .cols[indexer] to set the column indexer. Positive int indices start from the bottom (left) of the stack and negative indices start from the top.

By default add has a column indexer of [-2:]

>>> (pt.r5 + pt.add).last
              c    c    c    c
2021-06-02  0.0  1.0  2.0  7.0

Change the column indexer of add to [:] to sum all columns

>>> (pt.r5 + pt.add.cols[:]).last
               c
2025-06-02  10.0

You can also index columns by header, using regular expressions

>>> (pt.r3 + pt.hset('a,b,c') + pt.add.cols['(a|c)']).last
              b    a
2025-06-02  1.0  2.0

Use .copy to keep the original columns and .discard to remove non-selected columns:

>>> (pt.r5 + pt.pull.cols[2:4].copy).values[0]     # copy selected, keep originals
>>> (pt.r5 + pt.pull.cols[2:4].discard).values[0]   # move selected, discard rest
>>> (pt.r5 + pt.pull.copy).values[0]                # copy with default selector

Defining words

Words are composed using the + operator.

>>> squared = pt.dup + pt.mul
>>> ten_squared2 = pt.c(10) + squared    # same thing

Words can also be defined globally in the pynto vocabulary.

>>> pt.define['squared'] = pt.dup + pt.mul

The Database

pynto has built-in database functionality that lets you save DataFrames and Series to a Redis database. The database saves the underlying numpy data in native byte format for zero-copy retrieval. Each DataFrame column is saved as an independent key and can be retrieved or updated on its own. The database also supports three-dimensional frames that have a two-level MultiIndex.

>>> pt.db['my_df'] = expr.rows['2021-06-01':'2021-06-03']
>>> pt.load('my_df').rows[:]
              constant  constant
2021-06-01      81.0     100.0
2021-06-02      81.0     100.0

pynto built-in vocabulary

Column Creation

Word	Default Selector	Parameters	Description
c	[-1:]	values	Pushes constant columns for each of values
day_count	[-1:]		Pushes a column with the number of days in the period
from_pandas	[:]	pandas, round_	Pushes columns from Pandas DataFrame or Series pandas
load	[-1:]		Pushes columns of a DataFrame saved to internal DB as key
nan	[-1:]	values	Pushes a constant nan-valued column
period_ordinal	[-1:]		Pushes a column with the period ordinal
r	[-1:]	n	Pushes constant columns for each whole number from 0 to n - 1
randn	[-1:]		Pushes a column with values from a random normal distribution
timestamp	[-1:]		Pushes a column with the timestamp of the end of the period

Stack Manipulation

Word	Default Selector	Parameters	Description
drop	[-1:]		Removes selected columns
dup	[-1:]		Duplicates columns
hsort	[:]		Sorts columns by header
id	[:]		Identity/no-op
interleave	[:]	parts	Divides columns in parts groups and interleaves the groups
keep	[:]		Removes non-selected columns
nip	[-1:]		Removes non-selected columns, defaulting selection to top
pull	[:]		Brings selected columns to the top
rev	[:]		Reverses the order of selected columns
roll	[:]		Permutes selected columns
swap	[-2:]		Swaps top and bottom selected columns

Quotation

Use the ~ operator to create quotations:

Expression	Description
`~pt.word`	Wraps a single word as a quotation
`~(pt.word1 + pt.word2)`	Wraps an expression as a quotation
`~~quoted`	Unwraps a quotation back into a standard word

Header manipulation

Word	Default Selector	Parameters	Description
halpha	[:]		Set headers to alphabetical values
happly	[:]	header_func	Apply header_func to headers_
hformat	[:]	format_spec	Apply format_spec to headers
hreplace	[:]	old, new	Replace old with new in headers
hset	[:]	headers	Set headers to *headers
hsetall	[:]	headers	Set headers to *headers repeating, if necessary

Combinators

Word	Default Selector	Parameters	Description
call	[:]		Applies quotation
cleave	[:]	num_quotations	Applies all preceding quotations
compose	[:]	num_quotations	Combines quotations
hmap	[:]		Applies quotation to stacks created grouping columns by header
ifexists	[:]	count	Applies quotation if stack has at least count columns
ifexistselse	[:]	count	Applies top quotation if stack has at least count columns, otherwise applies second quotation
ifheaders	[:]	predicate	Applies top quotation if list of column headers fulfills predicate
ifheaderselse	[:]	predicate	Applies quotation if list of column headers fulfills predicate, otherwise applies second quotation
map	[:]	every	Applies quotation in groups of every
partial	[-1:]	quoted, this	Pushes stack columns to the front of quotation
repeat	[:]	times	Applies quotation times times

Data cleanup

Word	Default Selector	Parameters	Description
ffill	[:]	lookback, leave_end	Fills nans with previous values, looking back lookback before range and leaving trailing nans unless not leave_end
fill	[:]		Fills nans with value
fillfirst	[-1:]	lookback	Fills first row with previous non-nan value, looking back lookback before range
join	[-2:]	date	Joins two columns at date
sync	[:]		Align available data by setting all values to NaN when any values is NaN
zero_first	[-1:]		Changes first value to zero
zero_to_na	[-1:]		Changes zeros to nans

Resample methods

Word	Default Selector	Parameters	Description
resample_avg	[:]		Sets periodicity resampling method to avg
resample_first	[:]		Sets periodicity resampling method to first
resample_firstnofill	[:]		Sets periodicity resampling method to first
resample_last	[:]		Sets periodicity resampling method to last
resample_lastnofill	[:]		Sets periodicity resampling method to last with no fill
resample_max	[:]		Sets periodicity resampling method to max
resample_min	[:]		Sets periodicity resampling method to min
resample_sum	[:]		Sets periodicity resampling method to sum
set_periodicity	[-1:]	periodicity	Changes column periodicity to periodicity, then resamples
set_start	[-1:]	start	Changes period start to start, then resamples

Row-wise Reduction

Word	Default Selector	Parameters	Description
add	[-2:]	ignore_nans	Addition
avg	[-2:]	ignore_nans	Arithmetic average
div	[-2:]	ignore_nans	Division
max	[-2:]	ignore_nans	Maximum
med	[-2:]	ignore_nans	Median
min	[-2:]	ignore_nans	Minimum
mod	[-2:]	ignore_nans	Modulo
mul	[-2:]	ignore_nans	Multiplication
pow	[-2:]	ignore_nans	Power
std	[-2:]	ignore_nans	Standard deviation
sub	[-2:]	ignore_nans	Subtraction
var	[-2:]	ignore_nans	Variance

Row-wise Reduction Ignoring NaNs

Word	Default Selector	Parameters	Description
nadd	[-2:]	ignore_nans	Addition
navg	[-2:]	ignore_nans	Arithmetic average
ndiv	[-2:]	ignore_nans	Division
nmax	[-2:]	ignore_nans	Maximum
nmed	[-2:]	ignore_nans	Median
nmin	[-2:]	ignore_nans	Minimum
nmod	[-2:]	ignore_nans	Modulo
nmul	[-2:]	ignore_nans	Multiplication
npow	[-2:]	ignore_nans	Power
nstd	[-2:]	ignore_nans	Standard deviation
nsub	[-2:]	ignore_nans	Subtraction
nvar	[-2:]	ignore_nans	Variance

Rolling Window

Word	Default Selector	Parameters	Description
ewm_mean	[-1:]	window	Exponentially-weighted moving average
ewm_std	[-1:]	window	Exponentially-weighted standard deviation
ewm_var	[-1:]	window	Exponentially-weighted variance
radd	[-1:]	window	Addition
ravg	[-1:]	window	Arithmetic average
rcor	[-2:]	window	Correlation
rcov	[-2:]	window	Covariance
rdif	[-1:]	window	Lagged difference
rlag	[-1:]	window	Lag
rmax	[-1:]	window	Maximum
rmed	[-1:]	window	Median
rmin	[-1:]	window	Minimum
rret	[-1:]	window	Lagged return
rstd	[-1:]	window	Standard deviation
rvar	[-1:]	window	Variance
rzsc	[-1:]	window	Z-score

Cumulative

Word	Default Selector	Description
cadd	[-1:]	Addition
cavg	[-1:]	Arithmetic average
cdif	[-1:]	Lagged difference
clag	[-1:]	Lag
cmax	[-1:]	Maximum
cmin	[-1:]	Minimum
cmul	[-1:]	Multiplication
cret	[-1:]	Lagged return
cstd	[-1:]	Standard deviation
csub	[-1:]	Subtraction
cvar	[-1:]	Variance

Reverse Cumulative

Word	Default Selector	Description
rcadd	[-1:]	Addition
rcavg	[-1:]	Arithmetic average
rcdif	[-1:]	Lagged difference
rclag	[-1:]	Lag
rcmax	[-1:]	Maximum
rcmin	[-1:]	Minimum
rcmul	[-1:]	Multiplication
rcret	[-1:]	Lagged return
rcstd	[-1:]	Standard deviation
rcsub	[-1:]	Subtraction
rcvar	[-1:]	Variance

One-for-one functions

Word	Default Selector	Description
abs	[-1:]	Absolute value
dec	[-1:]	Decrement
exp	[-1:]	Exponential
expm1	[-1:]	Exponential minus one
inc	[-1:]	Increment
inv	[-1:]	Multiplicative inverse
lnot	[-1:]	Logical not
log	[-1:]	Natural log
log1p	[-1:]	Natural log of increment
neg	[-1:]	Additive inverse
rank	[:]	Row-wise rank
sign	[-1:]	Sign
sqrt	[-1:]	Square root

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
resources		resources
src/pynto		src/pynto
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

pynto: Data analysis in Python using stack-based programming

How does it work?

What does it look like?

Why pynto?

Get pynto

Reference

The Basics

Composing words

Constant literals

Row indexers

Quotations and Combinators

Headers

Column indexers

Defining words

The Database

pynto built-in vocabulary

Column Creation

Stack Manipulation

Quotation

Header manipulation

Combinators

Data cleanup

Resample methods

Row-wise Reduction

Row-wise Reduction Ignoring NaNs

Rolling Window

Cumulative

Reverse Cumulative

One-for-one functions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages