0% found this document useful (0 votes)
197 views19 pages

Excel On The Command Line

This document discusses using Perl to process Excel files from the command line. It introduces the Spreadsheet::ParseExcel and Spreadsheet::WriteExcel modules for reading and writing Excel files in Perl. It then describes tools like XLSgrep and XLSperl that were created to enable using common Perl one-liners and options like grep, sed, and awk against Excel files directly from the command line. XLSperl in particular aims to allow any Perl one-liner to run against an Excel file with minimal modification by supporting special variables and options. Examples of uses like searching cells, converting formats, and extracting data are provided.

Uploaded by

openid_9V8YnhpL
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
197 views19 pages

Excel On The Command Line

This document discusses using Perl to process Excel files from the command line. It introduces the Spreadsheet::ParseExcel and Spreadsheet::WriteExcel modules for reading and writing Excel files in Perl. It then describes tools like XLSgrep and XLSperl that were created to enable using common Perl one-liners and options like grep, sed, and awk against Excel files directly from the command line. XLSperl in particular aims to allow any Perl one-liner to run against an Excel file with minimal modification by supporting special variables and options. Examples of uses like searching cells, converting formats, and extracting data are provided.

Uploaded by

openid_9V8YnhpL
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Excel on the Command Line

Jon Allen
[Link] - jj@[Link]
Perl ‘one-liners’
• Perl’s command-line options can be used to
manipulate data without writing a full script
perl -le ‘print 2+2’
perl -nle ‘/pattern/ and print’ [Link]

• Improves on standard UNIX tools (grep, sed, awk)


• Can use all features of Perl (special variables,
regular expressions, etc.)

Excel on the Command Line [Link]


Perl command-line options
• Common options used for one-liners:
-e = execute Perl code
-n = non-printing input loop
-p = printing input loop
-l = automatic line-end processing
-a = autosplit fields to @F
-F = field separator for -a
-i = in-place file editing

• See ‘perldoc perlrun’ for the complete list


• Read ‘Minimal Perl’ (by Tim Maher, published by
Manning) for an in-depth tutorial

Excel on the Command Line [Link]


Not all data is text
• However…
• These tools are limited to processing text data - it
is not possible to ‘grep’ a Word document or a PDF
file
• Not always a problem - log files, config data, are
often stored as plain text

• For non-techies, predominant file format is


Microsoft Excel (.xls)

Excel on the Command Line [Link]


Excel - Perl for Windows?
• No, not really! :-)
• But there are similarities
– Ubiquitous
– Versatile
– Low barrier to entry - simple learning curve
– DWIM

• Perl = Swiss Army Chainsaw of the Internet


• Excel = Swiss Army Chainsaw of Business

Excel on the Command Line [Link]


Introducing the Excel modules
• Always search the CPAN - the hard work may
already have been done!

• Reading:
– Spreadsheet::ParseExcel
– [Link]
• Writing:
– Spreadsheet::WriteExcel
– [Link]

Excel on the Command Line [Link]


Spreadsheet::ParseExcel
• Read the value of cell A1 in the first worksheet
#! /Users/jj/bin/perl

use strict;
use warnings;
use Spreadsheet::ParseExcel;

my $file = shift @ARGV;


my $excel = Spreadsheet::ParseExcel::Workbook->Parse($file);
my $worksheet = $excel->Worksheet(0);
my $cell = $worksheet->Cell(0,0);

print $cell->Value,"\n";

Excel on the Command Line [Link]


Spreadsheet::WriteExcel
• Create a new Excel file with a message in cell A1
#! /Users/jj/bin/perl

use strict;
use warnings;
use Spreadsheet::WriteExcel;

my $file = shift @ARGV;


my $excel = Spreadsheet::WriteExcel->new($file);
my $worksheet = $excel->add_worksheet();

$worksheet->write('A1','Hello, World!');

• Huge amount of Excel features supported

Excel on the Command Line [Link]


Back to the command line
• The CPAN Excel modules work well, but they are
not really suitable for writing one-liners:

perl -MSpreadsheet::ParseExcel -e ‘$excel =


Spreadsheet::ParseExcel::Workbook->Parse(“[Link]

• We’ve run out of space before even loading a file!


• Need new tools…

Excel on the Command Line [Link]


XLSgrep
• XLSgrep - proof of concept, simple replacement for
UNIX ‘grep’ command
XLSgrep ‘<pattern>’ [Link]

• Iterates over each cell in the spreadsheet, printing


those that match <pattern>
• See [Link]
• Works, but very limited

Excel on the Command Line [Link]


XLSgrep - source extract
use open IN => ":bytes";

my $excel = Spreadsheet::ParseExcel->new(
NotSetCell => 1,
CellHandler => sub {
my ($workbook,$sheet_index,$row,$col,$cell) = @_;
if (my $value = $cell->Value) {
chomp $value;
print "$value\n" if ($value =~ /$pattern/);
}
}
);

undef $/;
while (<>) {
$excel->Parse(\$_);
}

Excel on the Command Line [Link]


XLSperl
• XLSperl - aims to allow any Perl one-liner to run
against an Excel file with little or no modification

• Standard ‘grep’ command:

perl -nle ‘/pattern/ and print’ [Link]


XLSperl -nle ‘/pattern/ and print’ [Link]

• See [Link]

Excel on the Command Line [Link]


XLSperl - options
• Supports the following command-line options:
-e ‘code’ = Perl code to execute (required!)
-n = Assumes an input loop iterating
over each cell (default)
-a = Changes input loop to iterate over
each row, splitting cells to @F and %F
-F = Set input record separator (when
processing text data)
-p = Prints $_ at the end of each loop
-l = Automatic line-end processing
-w = Enable warnings

Excel on the Command Line [Link]


XLSperl - variables
• Adds new special variables:
$WS = Worksheet name
$ROW = Current row (1 .. x)
$COL = Current column name (A .. x)
$COLNUM = Current column number (1 .. x)
$CELL = Current cell (A1 .. ZZx)
@F = Array of cell values (in autosplit mode)
%F = Hash of cell values (in autosplit mode)

• @F and %F are tied (modifying $F[0] changes $F{A})


• $COL and $CELL are unavailable in autosplit mode

Excel on the Command Line [Link]


XLSperl - creating Excel files
• Adds new command - XLSprint
– Write a single row:
XLSprint “Hello”,“World”;
– Write to multiple filehandles (but don’t print plain text to
the same files!):
XLSprint STDOUT @cells;
XLSprint STDERR @errors;

• XLSperl can also process standard text documents,


enabling easy conversion to Excel format

Excel on the Command Line [Link]


XLSperl - examples
• Search multiple files and print matching cell
locations:
XLSperl -nle ‘/pattern/ and print
“$ARGV:$WS:$CELL - $_”’ *.xls

• Convert Excel file to CSV format:


XLSperl -lane ‘print join “,”,@F’ [Link]

• Extract specific column as text:


XLSperl -lane ‘print $F{B}’ [Link]

Excel on the Command Line [Link]


XLSperl - more examples
• Convert UNIX password file to Excel format:
XLSperl -F: -lane ‘next if /^#/; XLSprint @F’
/etc/passwd >[Link]

• Concatenate multiple Excel files:


XLSperl -lane ‘XLSprint @F’ *.xls >[Link]

• Extract rows from an Excel file:


XLSperl -lane ‘XLSprint @F if($ROW <= 10)’
[Link] >[Link]

Excel on the Command Line [Link]


Future enhancements
• Run as command interpreter (#!/usr/bin/XLSperl)
• Implement ‘-M’ option to load additional Perl
modules
• In-place editing of Excel files (use the ‘-i’ option),
that will keep formatting of original document

XLSperl -[Link] -pe ‘s/old/new/’ [Link]

• Any ideas?

Excel on the Command Line [Link]


Fin!

Thank you for listening!

Source code is available from


[Link]

Excel on the Command Line [Link]

You might also like