Excel on the Command Line
Jon Allen
[Link] - jj@[Link]
Perl ‘one-liners’
• Perl’s command-line options can be used to
manipulate data without writing a full script
perl -le ‘print 2+2’
perl -nle ‘/pattern/ and print’ [Link]
• Improves on standard UNIX tools (grep, sed, awk)
• Can use all features of Perl (special variables,
regular expressions, etc.)
Excel on the Command Line [Link]
Perl command-line options
• Common options used for one-liners:
-e = execute Perl code
-n = non-printing input loop
-p = printing input loop
-l = automatic line-end processing
-a = autosplit fields to @F
-F = field separator for -a
-i = in-place file editing
• See ‘perldoc perlrun’ for the complete list
• Read ‘Minimal Perl’ (by Tim Maher, published by
Manning) for an in-depth tutorial
Excel on the Command Line [Link]
Not all data is text
• However…
• These tools are limited to processing text data - it
is not possible to ‘grep’ a Word document or a PDF
file
• Not always a problem - log files, config data, are
often stored as plain text
• For non-techies, predominant file format is
Microsoft Excel (.xls)
Excel on the Command Line [Link]
Excel - Perl for Windows?
• No, not really! :-)
• But there are similarities
– Ubiquitous
– Versatile
– Low barrier to entry - simple learning curve
– DWIM
• Perl = Swiss Army Chainsaw of the Internet
• Excel = Swiss Army Chainsaw of Business
Excel on the Command Line [Link]
Introducing the Excel modules
• Always search the CPAN - the hard work may
already have been done!
• Reading:
– Spreadsheet::ParseExcel
– [Link]
• Writing:
– Spreadsheet::WriteExcel
– [Link]
Excel on the Command Line [Link]
Spreadsheet::ParseExcel
• Read the value of cell A1 in the first worksheet
#! /Users/jj/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseExcel;
my $file = shift @ARGV;
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($file);
my $worksheet = $excel->Worksheet(0);
my $cell = $worksheet->Cell(0,0);
print $cell->Value,"\n";
Excel on the Command Line [Link]
Spreadsheet::WriteExcel
• Create a new Excel file with a message in cell A1
#! /Users/jj/bin/perl
use strict;
use warnings;
use Spreadsheet::WriteExcel;
my $file = shift @ARGV;
my $excel = Spreadsheet::WriteExcel->new($file);
my $worksheet = $excel->add_worksheet();
$worksheet->write('A1','Hello, World!');
• Huge amount of Excel features supported
Excel on the Command Line [Link]
Back to the command line
• The CPAN Excel modules work well, but they are
not really suitable for writing one-liners:
perl -MSpreadsheet::ParseExcel -e ‘$excel =
Spreadsheet::ParseExcel::Workbook->Parse(“[Link]
• We’ve run out of space before even loading a file!
• Need new tools…
Excel on the Command Line [Link]
XLSgrep
• XLSgrep - proof of concept, simple replacement for
UNIX ‘grep’ command
XLSgrep ‘<pattern>’ [Link]
• Iterates over each cell in the spreadsheet, printing
those that match <pattern>
• See [Link]
• Works, but very limited
Excel on the Command Line [Link]
XLSgrep - source extract
use open IN => ":bytes";
my $excel = Spreadsheet::ParseExcel->new(
NotSetCell => 1,
CellHandler => sub {
my ($workbook,$sheet_index,$row,$col,$cell) = @_;
if (my $value = $cell->Value) {
chomp $value;
print "$value\n" if ($value =~ /$pattern/);
}
}
);
undef $/;
while (<>) {
$excel->Parse(\$_);
}
Excel on the Command Line [Link]
XLSperl
• XLSperl - aims to allow any Perl one-liner to run
against an Excel file with little or no modification
• Standard ‘grep’ command:
perl -nle ‘/pattern/ and print’ [Link]
XLSperl -nle ‘/pattern/ and print’ [Link]
• See [Link]
Excel on the Command Line [Link]
XLSperl - options
• Supports the following command-line options:
-e ‘code’ = Perl code to execute (required!)
-n = Assumes an input loop iterating
over each cell (default)
-a = Changes input loop to iterate over
each row, splitting cells to @F and %F
-F = Set input record separator (when
processing text data)
-p = Prints $_ at the end of each loop
-l = Automatic line-end processing
-w = Enable warnings
Excel on the Command Line [Link]
XLSperl - variables
• Adds new special variables:
$WS = Worksheet name
$ROW = Current row (1 .. x)
$COL = Current column name (A .. x)
$COLNUM = Current column number (1 .. x)
$CELL = Current cell (A1 .. ZZx)
@F = Array of cell values (in autosplit mode)
%F = Hash of cell values (in autosplit mode)
• @F and %F are tied (modifying $F[0] changes $F{A})
• $COL and $CELL are unavailable in autosplit mode
Excel on the Command Line [Link]
XLSperl - creating Excel files
• Adds new command - XLSprint
– Write a single row:
XLSprint “Hello”,“World”;
– Write to multiple filehandles (but don’t print plain text to
the same files!):
XLSprint STDOUT @cells;
XLSprint STDERR @errors;
• XLSperl can also process standard text documents,
enabling easy conversion to Excel format
Excel on the Command Line [Link]
XLSperl - examples
• Search multiple files and print matching cell
locations:
XLSperl -nle ‘/pattern/ and print
“$ARGV:$WS:$CELL - $_”’ *.xls
• Convert Excel file to CSV format:
XLSperl -lane ‘print join “,”,@F’ [Link]
• Extract specific column as text:
XLSperl -lane ‘print $F{B}’ [Link]
Excel on the Command Line [Link]
XLSperl - more examples
• Convert UNIX password file to Excel format:
XLSperl -F: -lane ‘next if /^#/; XLSprint @F’
/etc/passwd >[Link]
• Concatenate multiple Excel files:
XLSperl -lane ‘XLSprint @F’ *.xls >[Link]
• Extract rows from an Excel file:
XLSperl -lane ‘XLSprint @F if($ROW <= 10)’
[Link] >[Link]
Excel on the Command Line [Link]
Future enhancements
• Run as command interpreter (#!/usr/bin/XLSperl)
• Implement ‘-M’ option to load additional Perl
modules
• In-place editing of Excel files (use the ‘-i’ option),
that will keep formatting of original document
XLSperl -[Link] -pe ‘s/old/new/’ [Link]
• Any ideas?
Excel on the Command Line [Link]
Fin!
Thank you for listening!
Source code is available from
[Link]
Excel on the Command Line [Link]