me nugget: function

Showing posts with label function. Show all posts

Thursday, April 2, 2015

Add text using relative plot region coordinates

Here is a simple but useful function for adding text to a plot device using the relative x, y coordinates (i.e. between 0 and 1). I found myself programming these few lines repeatedly and decided to finally bundle it into a function, called reltext(). The function can be found below, and is also available in the sinkr package (https://github.com/marchtaylor/sinkr).

"sinkr" - a collection of functions featured on "me nugget"

The R package sinkr (version 1.0) has now been released: https://github.com/marchtaylor/sinkr

I have finally gotten around to learning how to create an R package and decided to start by bundling functions that I have featured on the blog. Thanks to the R Studio team for making this so easy (in combination with the R packages roxygen2 and devtools). In addition to the great tips on the R Studio website, I found the following Youtube videos helpful along the way:

Create an R Package in RStudio (Basic package creation)
Creating RStudio projects from GitHub Repositories (especially the tip to use the SSH clone URL link rather than HTTPS)

Being new to the world of R packages (and due to the eclectic nature of the functions in sinkr), I'm not confident to upload this to CRAN. But, one can easily install the package using devtools and the following code (see https://github.com/hadley/devtools for OS-specific package updating tips):

library(devtools)
install_github("marchtaylor/sinkr")

For those using functions that were featured on this site in the past, the present versions may differ slightly, especially in the case of function names. For example, function names like plot.stacked were getting associated with the general plot function during the package build and thus "." characters have been removed from all function names (e.g. plotStacked).

sinkr functions include:

Data mountains and streams - stacked area plots in R

Below are two functions for producing stacked area plots. The first is the more typical approach where sequential series are stacked on top of another (function: plot.stacked), while the second approach is the more aesthetically-oriented version called a "stream plot" (function: plot.stream), which alternates series on either side of a meandering baseline (see here for the motivation, and here for the inspiration).

Arguments are similar for both functions regarding the input of x and y series and polygon attributes (fill color, border color, border line width). The stream plot also requires that the degree of meandering for the baseline be defined by the arguments frac.rand and spar; frac.rand, controls the meander amplitude (uniform random numbers added to baseline as a fraction of the total y range) and spar controls the amount of smoothing (as fit by the function smooth.spline).

The plot above colors the series with a color gradient of when the first appear in the series, while the plot below colors series by their maximum value. The order of the plotting of the series can also affect the the emphasis on the plot. By default, plotting order is sequential by column, although two ordering options are built-in to the functions: order by maximum value, and order by first appearance.

The plot.stacked function:

New version of image.scale function

(Note: the most recent version, imageScale can be found in the sinkr package: https://github.com/marchtaylor/sinkr)

Below is an updated version of the image.scale function. In the old version, one had to constantly use additional arguments to suppress axes and their labels. The new version contains the additional arguments axis.pos (1, 2, 3, or 4) for defining the side of the axis, and add.axis (TRUE or FALSE), for defining whether the axis is plotted. Based on the position of the axis, the scale color levels are automatically drawn in a horizontal (axis.pos = 1[bottom] or 3[top]) or vertical (axis.pos = 2[left] or 4[right]) orientation. For the right plot above, the argument add.axis=FALSE so that additional control over axis ticks and labels could be added in an additional step with axis(). The function mtext() can be used to add additional labels to the scale.

The image.scale function:

Lomb-Scargle periodogram for unevenly sampled time series

In the natural sciences, it is common to have incomplete or unevenly sampled time series for a given variable. Determining cycles in such series is not directly possible with methods such as Fast Fourier Transform (FFT) and may require some degree of interpolation to fill in gaps. An alternative is the Lomb-Scargle method (or least-squares spectral analysis, LSSA), which estimates a frequency spectrum based on a least squares fit of sinusoid.

The above figure shows a Lomb-Scargle periodogram of a time series of sunspot activity (1749-1997) with 50% of monthly values missing. As expected (link1, link2), the periodogram displays a a highly significant maximum peak at a frequency of ~11 years.

The function comes from a nice set of functions that I found here: http://research.stowers-institute.org/efg/2005/LombScargle/R/index.htm. An accompanying paper focusing on its application to time series of gene expression can be found here.

Below is a comparison to an FFT of the full time series. For another great resource on spectral analysis, and time series-related R methods in general, see the following website: http://zoonek2.free.fr/UNIX/48_R/15.html.

To reproduce the example:

Spirograph with R

Just had to figure out how to replicate this old toy of mine with R! I had no idea how long it's been around:

DINEOF (Data Interpolating Empirical Orthogonal Functions)

I finally got around to reproducing the DINEOF method (Beckers and Rixon, 2003) for optimizing EOF analysis on gappy data fields - it is especially useful for remote sensing data where cloud cover can result in large gaps in data. Their paper gives a nice overview of some of the various methods that have been used for such data sets. One of these approaches, which I have written about before, involves deriving EOFs from a covariance matrix as calculated from available data. Unfortunately, as the author's point out, such covariance matrices are no longer positive-definite, which can lead to several problems. The DINEOF method seems to overcome several of these issues.

Maximum Covariance Analysis (MCA)

Maximum Covariance Analysis (MCA) (Mode 1; scaled) of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S. MCA coefficients (scaled) are below. The mode represents 94% of the squared covariance fraction (SCF).

Maximum Correlation Analysis (MCA) is similar to Empirical Orthogonal Function Analysis (EOF) in that they both deal with the decomposition of a covariance matrix. In EOF, this is a covariance matrix based on a single spatio-temporal field, while MCA is based on the decomposition of a "cross-covariance" matrix derived from two fields.

Define intermediate color steps for colorRampPalette

The following function, color.palette(), is a wrapper for colorRampPalette() and allows some increased flexibility in defining the spacing between main color levels. One defines both the main color levels (as with colorRampPalette) and an optional vector containing the number of color levels that should be put in between at equal distances.

The above figure shows the effect on a color scale (see image.scale) containing 5 main colors (blue, cyan, white, yellow, and red). The result of colorRampPalette (upper) produces an equal number of levels between the main colors. By increasing the number of intermediate colors between blue-cyan and yellow-red (lower), the number of color levels in the near white range is reduced. The resulting palette, for example, was better in highlighting the positive and negative values of an Emprical Orthogonal Function (EOF) mode.

[Updates]: The following approach has serious shortcomings, which I have recently become aware of. In a comparison of gappy EOF approaches Taylor et al. (2013) [pdf] show that this traditional approach is not as accurate as others. Specifically, the approach of DINEOF (Data Interpolating Empirical Orthogonal Functions) proved to be the most accurate. I have outlined the DINEOF algorithm in another post [link]. and show a comparison of gappoy EOF methods here: http://menugget.blogspot.de/2014/09/pca-eof-for-data-with-missing-values.html. The R package "sinkr" now contains a version of the function ("eof") for easy installation: https://github.com/menugget/sinkr

-----------------

The following is a function for the calculation of Empirical Orthogonal Functions (EOF). For those coming from a more biologically-oriented background and are familiar with Principal Component Analysis (PCA), the methods are similar. In the climate sciences the method is usually used for the decomposition of a data field into dominant spatial-temporal modes.

Converting values to color levels

Adding color to a plot is helpful in many situations for visualizing an additional dimension of the data. Again, I wrote the below function "val2col" in R after having coded this manually over and over in the past. It uses similar arguments as the image function in that one defines the colors to be used as well as optional break points or z-limits by which the data is binned into those colors. The typical situation where I use the function is with the mapping of climate data, yet the addition of color to an XY plot can often be easier on the eyes than adding an additional z-axis spatial dimension. In combination with the image.scale function, that I previously posted, the data can be quickly interpretted.
As an example, gridded sea level pressure is plotted above as projected polygons using the packages maps and mapproj. Values were converted to colors with the val2col function and the image.scale function plotted a corresponding scale. For those interested in using netcdf files, the example also uses the ncdf package for reading the data files into R.

Adding a scale to an image plot

[NOTE: new version of the image.scale function can be found here: http://menugget.blogspot.de/2013/12/new-version-of-imagescale-function.html.]

Here's a function that allows you to add a color scale legend to an image plot (or probably any plot needing a z-level scale). I found myself having to program this over and over again, and just decided to make a plotting function for future use. While I really like the look of levelplot(), the modular aspect of image() makes it much more handy to combine with other plotting commands or overlays.
For example, as far as I can tell, the simple addition of the triangle symbol to mark the highest point in the above map of Maunga Whau volcano is not possible with levelplot.
After adding this symbol, the function below - image.scale() - was used to add the accompanying color scale to another area of the device.

The function...

Clarke and Ainsworth's BIOENV and BVSTEP (and BIO-BIO etc...)

Nonmetric Multidimensional Scaling (NMDS) plot of vegetation sample dissimilarities with best correlating environmental variables (left) and species (right) plotted as vectors (datasets "varespec" and "varechem" from the package vegan)

The R package "vegan" contains a version of Clarke and Ainsworth's (1993) BIOENV analysis allowing for the comparison of distance/similarity matrices between two sets of data having either samples or variables in common. The typical setup is in the exploration of environmental variables that best correlate to sample similarities of the biological community (e.g. species biomass or abundance). In this case, the similarity matrix of the community is fixed, while subsets of the environmental variables are used in the calculation of the environmental similarity matrix. A correlation coefficient (typically Spearman rank correlation coefficient) is then calculated between the two matrices and the best subset of environmental variables can then be identified and further subjected to a permutation test to determine significance.

This can be a very helpful analysis in the exploration of the often highly dimensional space of community samples. The method is also widely accepted by the scientific community due to its flexibility across a wide variety of data and is completely non-parametric - Clarke and Ainsworth's (1993) paper describing the method has 674 citations on Google Scholar at the time of this posting.

The R package "vegan" incorporates this routine in the function bioenv(). An example of a BIOENV exploration between vegetation community data (dataset "varespec" in the vegan package) and the environmental data (dataset "varechem" in the vegan package) :

Image color palette replacement

Here is an example of a function I wrote to change the color palette used in an image. The above example comes from a black and white original, although color images can also be used. The function first converts the image to grayscale in order to have levels of color intensity between 0-255. Using a new color palette with 256 color levels, the gray levels are replaced with a rgb (red, blue, green) vector from the new palette. The results can be very strange...
The package biOps is required for reading and writing the .jpeg files.

the function...

map.xyz(): interpolation of XYZ data and projection onto a map

I am still struggling to get a grasp of R's mapping capabilities. Part of my frustration lies in the fact that I often work on areas near the poles, which complicates interpolation across the 180 degree line. For smaller areas, interpolation can be done using the interp() function in the package akima. I have taken the results from interp and projected the image onto a map. You will need the akima, maps, and mapproj packages and the functions new.lon.lat(), earth.dist(), and pos2coord().

As an example I have mapped the distance from Mecca, Saudi Arabia:

The function...

#A function that is sometimes useful in determining the 
#coordinate(i.e. row and column number) of a matrix position
#(and vice-versa). 
#Either a vector of positions ("pos") 
#OR a 2 column matrix of matrix coordinates, ("coord", i.e. cbind(row,col)), 
#AND the matrix dimentions must be supplied (dim.mat, i.e. c(nrow,ncol)).
pos2coord<-function(pos=NULL, coord=NULL, dim.mat=NULL){
 if(is.null(pos) & is.null(coord) | is.null(dim.mat)){
  stop("must supply either 'pos' or 'coord', and 'dim.mat'")
 }
 if(is.null(pos) & !is.null(coord) & !is.null(dim.mat)){
  pos <- ((coord[,2]-1)*dim.mat[1])+coord[,1] 
  return(pos)
 }
 if(!is.null(pos) & is.null(coord) & !is.null(dim.mat)){
  coord <- matrix(NA, nrow=length(pos), ncol=2)
  coord[,1] <- ((pos-1) %% dim.mat[1]) +1
  coord[,2] <- ((pos-1) %/% dim.mat[1]) +1
  return(coord)
 }
}

Created by Pretty R at inside-R.org

Sunday, May 29, 2011

R functions for Earth geographic coordinate calculations

Here are some functions that I regularly use for geographic data (e.g. binning, filtering, calculation of new positions etc.).

#distance in kilometers between two long/lat positions (from "fossil" package)
earth.dist <- function (long1, lat1, long2, lat2) 
{
    rad <- pi/180
    a1 <- lat1 * rad
    a2 <- long1 * rad
    b1 <- lat2 * rad
    b2 <- long2 * rad
    dlon <- b2 - a2
    dlat <- b1 - a1
    a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
    c <- 2 * atan2(sqrt(a), sqrt(1 - a))
    R <- 6378.145
    d <- R * c
    return(d)
}

Created by Pretty R at inside-R.org

#degree bearing between two long/lat positions (from "fossil" package)
earth.bear <- function (long1, lat1, long2, lat2) 
{
    rad <- pi/180
    a1 <- lat1 * rad
    a2 <- long1 * rad
    b1 <- lat2 * rad
    b2 <- long2 * rad
    dlon <- b2 - a2
    bear <- atan2(sin(dlon) * cos(b1), cos(a1) * sin(b1) - sin(a1) * 
        cos(b1) * cos(dlon))
    deg <- (bear%%(2 * pi)) * (180/pi)
    return(deg)
}

Created by Pretty R at inside-R.org

new.lon.lat <-
function (lon, lat, bearing, distance) 
{
    rad <- pi/180
    a1 <- lat * rad
    a2 <- lon * rad
    tc <- bearing * rad
    d <- distance/6378.145
    nlat <- asin(sin(a1) * cos(d) + cos(a1) * sin(d) * cos(tc))
    dlon <- atan2(sin(tc) * sin(d) * cos(a1), cos(d) - sin(a1) * 
        sin(nlat))
    nlon <- ((a2 + dlon + pi)%%(2 * pi)) - pi
    npts <- cbind(nlon/rad, nlat/rad)
    return(npts)
}

Created by Pretty R at inside-R.org

#tells which lon lat positions are within the defined limits to the west, east, north, and south
lon.lat.filter <-
function (lon_vector, lat_vector, west, east, north, south) 
{
 if(west>east) {
  lon_vector_new=replace(lon_vector, which(lon_vector<0), lon_vector[which(lon_vector<0)]+360)
  east_new=east+360
 } else {
  lon_vector_new=lon_vector
  east_new=east
 }
  hits=which(lon_vector_new < east_new & lon_vector_new > west & lat_vector < north & lat_vector > south)
 return(hits)
}

Created by Pretty R at inside-R.org

me nugget

Thursday, April 2, 2015

Add text using relative plot region coordinates

Tuesday, September 2, 2014

"sinkr" - a collection of functions featured on "me nugget"

Monday, December 9, 2013

Data mountains and streams - stacked area plots in R

Thursday, December 5, 2013

New version of image.scale function

Thursday, January 10, 2013

Lomb-Scargle periodogram for unevenly sampled time series

Sunday, December 30, 2012

Spirograph with R

Tuesday, October 30, 2012

DINEOF (Data Interpolating Empirical Orthogonal Functions)

Tuesday, December 13, 2011

Maximum Covariance Analysis (MCA)

Thursday, November 24, 2011

Define intermediate color steps for colorRampPalette

Empirical Orthogonal Function (EOF) Analysis for gappy data

Monday, September 12, 2011

Converting values to color levels

Wednesday, August 31, 2011

Adding a scale to an image plot

Thursday, June 30, 2011

Clarke and Ainsworth's BIOENV and BVSTEP (and BIO-BIO etc...)

Friday, June 10, 2011

Image color palette replacement

Monday, May 30, 2011

map.xyz(): interpolation of XYZ data and projection onto a map

Array position to matrix coordinates conversion

Sunday, May 29, 2011

R functions for Earth geographic coordinate calculations

NPR Topics: Science

WWF - Environmental News