Inspired by Actual Events: Database

Showing posts with label Database. Show all posts

Friday, November 24, 2017

Too Many PreparedStatement Placeholders in Oracle JDBC

There are multiple causes of the ORA-01745 ("invalid host/bind variable name error") error when using an Oracle database. The Oracle 9i documentation on errors ORA-01500 through ORA-02098 provides more details regarding ORA-01745. It states that the "Cause" is "A colon in a bind variable or INTO specification was followed by an inappropriate name, perhaps a reserved word." It also states that the "Action" is "Change the variable name and retry the operation." In the same Oracle 12g documentation, however, there is no description of "cause" or "action" for ORA-01745, presumably because there are multiple causes and multiple corresponding actions associated with this message. In this post, I will focus on one of the perhaps less obvious causes and the corresponding action for that cause.

Some of the common causes for ORA-01745 that I will NOT be focusing on in this post include using an Oracle database reserved name (reserved word) as an identifier, extraneous or missing colon or comma, or attempting to bind structure names (rather than variables) to the placeholders.

In addition to the causes just listed and likely in addition to other potential causes of ORA-01745, another situation that can cause the ORA-01745 error is using too many ? placeholders in a JDBC PreparedStatement with the Oracle database. I will demonstrate in this post that the number of ? placeholders in a PreparedStatement that cause this ORA-01745 is 65536 (2¹⁶).

I have blogged previously on the ORA-01795 error that occurs when one attempts to include more than 1000 values in an Oracle SQL IN condition. There are multiple ways to deal with this limitation and one of the alternative approaches might be to use multiple ORs to "OR" together more than 1000 values. This will typically be implemented with a PreparedStatement and with a ? placeholder placed in the SQL statement for each value being OR-ed. This PreparedStatement-based alternate approach employing ? placeholders will only work as long as the number of vales being OR-ed together is smaller than 65536.

The code listing that follows demonstrates how a SQL query against the Oracle HR schema can be generated to make it easy to reproduce the ORA-01745 error with too many ? placeholders (full code listing is available on GitHub).

Building Up Prepared Statement with Specified Number of ? Placeholders

/**
 * Constructs a query using '?' for placeholders and using
 * as many of these as specified with the int parameter.
 *
 * @param numberPlaceholders Number of placeholders ('?')
 *    to include in WHERE clause of constructed query.
 * @return SQL Query that has provided number of '?" placeholders.
 */
private String buildQuery(final int numberPlaceholders)
{
   final StringBuilder builder = new StringBuilder();
   builder.append("SELECT region_id FROM countries WHERE ");
   for (int count=0; count < numberPlaceholders-1; count++)
   {
      builder.append("region_id = ? OR ");
   }
   builder.append("region_id = ?");
   return builder.toString();
}

The next code listing demonstrates building a PreparedStatement based on the query constructed in the last code listing and setting its placeholders with a number of consecutive integers that match the number of ? placeholders.

Configuring PreparedStatement's ? Placeholders

/**
 * Execute the provided query and populate a PreparedStatement
 * wrapping this query with the number of integers provided
 * as the second method argument.
 * 
 * @param query Query to be executed.
 * @param numberValues Number of placeholders to be set in the
 *    instance of {@code PreparedStatement} used to execute the
 *    provided query.
 */
private void executeQuery(final String query, final int numberValues)
{
   try (final Connection connection = getDatabaseConnection();
        final PreparedStatement statement = connection.prepareStatement(query))
   {
      for (int count = 0; count < numberValues; count++)
      {
         statement.setInt(count+1, count+1);
      }
      final ResultSet rs = statement.executeQuery();
      while (rs.next())
      {
         out.println("Region ID: " + rs.getLong(1));
      }
   }
   catch (SQLException sqlException)
   {
      out.println("ERROR: Unable to execute query - " + sqlException);
   }
}

The next screen snapshot shows the ORA-01745 error occurring when the number of ? placeholders applied is 65536.

This example shows that there is a maximum number of ? placeholders that can be used in an Oracle SQL statement. Fortunately, there are other ways to accomplish this type of functionality that do not have this ORA-01475 limit of 65536 ? placeholders or the 1000 IN elements limit that causes an ORA-01795 error

Monday, November 7, 2016

Fixed-Point and Floating-Point: Two Things That Don't Go Well Together

One of the more challenging aspects of software development can be dealing with floating-point numbers. David Goldberg's 1991 Computing Surveys paper What Every Computer Scientist Should Know About Floating-Point Arithmetic is a recognized classic treatise on this subject. This paper not only provides an in-depth look at how floating-point arithmetic is implemented in most programming languages and computer systems, but also, through its length and detail, provides evidence of the nuances and difficulties of this subject. The nuances of dealing with floating-point numbers in Java and tactics to overcome these challenges are well documented in sources such as JavaWorld's Floating-point Arithmetic, IBM DeveloperWorks's Java's new math, Part 2: Floating-point numbers and Java theory and practice: Where's your point?, Dr. Dobb's Java's Floating-Point (Im)Precision and Fixed, Floating, and Exact Computation with Java's Bigdecimal, Java Glossary's Floating Point, Java Tutorial's Primitive Data Types, and NUM04-J. Do not use floating-point numbers if precise computation is required.

Most of the issues encountered and discussed in Java related to floating-point representation and arithmetic are caused by the inability to precisely represent (usually) decimal (base ten) floating point numbers with an underlying binary (base two) representation. In this post, I focus on similar consequences that can result from mixing fixed-point numbers (as stored in a database) with floating-point numbers (as represented in Java).

The Oracle database allows numeric columns of the NUMBER data type to be expressed with two integers that represent "precision" and "scale". The PostgreSQL implementation of the numeric data type is very similar. Both Oracle's NUMBER(p,s) and PostgreSQL's numeric(p,s) allow the same datatype to represent essentially an integral value (precision specified but scale not specified), a fixed-point number (precision and scale specified), or a floating-point number (neither precision nor scale specified). Simple Java/JDBC-based examples in this post will demonstrate this.

For the examples in this post, a simple table named DOUBLES in Oracle and doubles in PostgreSQL will be created. The DDL statements for defining these simple tables in the two database are shown next.

createOracleTable.sql

CREATE TABLE doubles
(
   int NUMBER(5),
   fixed NUMBER(3,2),
   floating NUMBER
);

createPgTable.sql

CREATE TABLE doubles
(
   int numeric(5),
   fixed numeric(3,2),
   floating numeric
);

With the DOUBLES table created in Oracle database and PostgreSQL database, I'll next use a simple JDBC PreparedStatement to insert the value of java.lang.Math.PI into each table for all three columns. The following Java code snippet demonstrates this insertion.

Inserting Math.PI into DOUBLES Columns

/** SQL syntax for insertion statement with placeholders. */
private static final String INSERT_STRING =
   "INSERT INTO doubles (int, floating, fixed) VALUES (?, ?, ?)";


final Connection connection = getDatabaseConnection(databaseVendor);
try (final PreparedStatement insert = connection.prepareStatement(INSERT_STRING))
{
   insert.setDouble(1, Math.PI);
   insert.setDouble(2, Math.PI);
   insert.setDouble(3, Math.PI);
   insert.execute();
}
catch (SQLException sqlEx)
{
   err.println("Unable to insert data - " + sqlEx);
}

Querying DOUBLES Columns

/** SQL syntax for querying statement. */
private static final String QUERY_STRING =
   "SELECT int, fixed, floating FROM doubles";

final Connection connection = getDatabaseConnection(databaseVendor);
try (final Statement query = connection.createStatement();
     final ResultSet rs = query.executeQuery(QUERY_STRING))
{
   out.println("\n\nResults for Database " + databaseVendor + ":\n");
   out.println("Math.PI :        " + Math.PI);
   while (rs.next())
   {
      final double integer = rs.getDouble(1);
      final double fixed = rs.getDouble(2);
      final double floating = rs.getDouble(3);
      out.println("Integer NUMBER:  " + integer);
      out.println("Fixed NUMBER:    " + fixed);
      out.println("Floating NUMBER: " + floating);
   }
   out.println("\n");
}
catch (SQLException sqlEx)
{
   err.println("Unable to query data - " + sqlEx);
}

The output of running the above Java insertion and querying code against the Oracle and PostgreSQL databases respectively is shown in the next two screen snapshots.

Comparing Math.PI to Oracle's NUMBER Columns

Comparing Math.PI to PostgreSQL's numeric Columns

The simple examples using Java and Oracle and PostgreSQL demonstrate issues that might arise when specifying precision and scale on the Oracle NUMBER and PostgreSQL numeric column types. Although there are situations when fixed-point numbers are desirable, it is important to recognize that Java does not have a fixed-point primitive data type and use BigDecimal or a fixed-point Java library (such as decimal4j or Java Math Fixed Point Library) to appropriately deal with the fixed-point numbers retrieved from database columns expressed as fixed points. In the examples demonstrated in this post, nothing is really "wrong", but it is important to recognize the distinction between fixed-point numbers in the database and floating-point numbers in Java because arithmetic that brings the two together may not have the results one would expect.

In Java and other programming languages, one needs to not only be concerned about the effect of arithmetic operations and available precision on the "correctness" of floating-point numbers. The developer also needs to be aware of how these numbers are stored in relational database columns in the Oracle and PostgreSQL databases to understand how precision and scale designations on those columns can affect the representation of the stored floating-point number. This is especially applicable if the representations queried from the database are to be used in floating-point calculations. This is another (of many) examples where it is important for the Java developer to understand the database schema being used.

Friday, March 4, 2016

SQL: Counting Groups of Rows Sharing Common Column Values

In this post, I focus on using simple SQL SELECT statements to count the number of rows in a table meeting a particular condition with the results grouped by a certain column of the table. These are all basic SQL concepts, but mixing them allows for different and useful representations of data stored in a relational database. The specific aspects of a SQL query covered in this post and illustrated with simple examples are the aggregate function count(), WHERE, GROUP BY, and HAVING. These will be used to build together a simple single SQL query that indicates the number of rows in a table that match different values for a given column in that table.

I'll need some simple SQL data to demonstrate. The following SQL code demonstrates creation of a table called ALBUMS in a PostgreSQL database followed by use of INSERT statements to populate that table.

createAndPopulateAlbums.sql

CREATE TABLE albums
(
   title text,
   artist text,
   year integer
);

INSERT INTO albums (title, artist, year)
   VALUES ('Back in Black', 'AC/DC', 1980);
INSERT INTO albums (title, artist, year)
   VALUES ('Slippery When Wet', 'Bon Jovi', 1986);
INSERT INTO albums (title, artist, year)
   VALUES ('Third Stage', 'Boston', 1986);
INSERT INTO albums (title, artist, year)
   VALUES ('Hysteria', 'Def Leppard', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('Some Great Reward', 'Depeche Mode', 1984);
INSERT INTO albums (title, artist, year)
   VALUES ('Violator', 'Depeche Mode', 1990);
INSERT INTO albums (title, artist, year)
   VALUES ('Brothers in Arms', 'Dire Straits', 1985);
INSERT INTO albums (title, artist, year)
   VALUES ('Rio', 'Duran Duran', 1982);
INSERT INTO albums (title, artist, year)
   VALUES ('Hotel California', 'Eagles', 1976);
INSERT INTO albums (title, artist, year)
   VALUES ('Rumours', 'Fleetwood Mac', 1977);
INSERT INTO albums (title, artist, year)
   VALUES ('Kick', 'INXS', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('Appetite for Destruction', 'Guns N'' Roses', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('Thriller', 'Michael Jackson', 1982);
INSERT INTO albums (title, artist, year)
   VALUES ('Welcome to the Real World', 'Mr. Mister', 1985);
INSERT INTO albums (title, artist, year)
   VALUES ('Never Mind', 'Nirvana', 1991);
INSERT INTO albums (title, artist, year)
   VALUES ('Please', 'Pet Shop Boys', 1986);
INSERT INTO albums (title, artist, year)
   VALUES ('The Dark Side of the Moon', 'Pink Floyd', 1973);
INSERT INTO albums (title, artist, year)
   VALUES ('Look Sharp!', 'Roxette', 1988);
INSERT INTO albums (title, artist, year)
   VALUES ('Songs from the Big Chair', 'Tears for Fears', 1985);
INSERT INTO albums (title, artist, year)
   VALUES ('Synchronicity', 'The Police', 1983);
INSERT INTO albums (title, artist, year)
   VALUES ('Into the Gap', 'Thompson Twins', 1984);
INSERT INTO albums (title, artist, year)
   VALUES ('The Joshua Tree', 'U2', 1987);
INSERT INTO albums (title, artist, year)
   VALUES ('1984', 'Van Halen', 1984);

The next two screen snapshots show the results of running this script in psql:

At this point, if I want to see how many albums were released in each year, I could use several individual SQL query statements like these:

SELECT count(1) FROM albums where year = 1985;
SELECT count(1) FROM albums where year = 1987;

It might be desirable to see how many albums were released in each year without needing an individual query for each year. This is where using an aggregate function like count() with a GROUP BY clause comes in handy. The next query is simple, but takes advantage of GROUP BY to display the count of each "group" of rows grouped by the albums' release years.

SELECT year, count(1)
  FROM albums
 GROUP BY year;

The WHERE clause can be used as normal to narrow the number of returned rows by specifying a narrowing condition. For example, the following query returns the albums that were released in a year after 1988.

SELECT year, count(1)
  FROM albums
 WHERE year > 1988
 GROUP BY year;

We might want to only return the years for which multiple albums (more than one) are in our table. A first naive approach might be as shown next (doesn't work as shown in the screen snapshot that follows):

-- Bad Code!: Don't do this.
SELECT year, count(1)
  FROM albums
 WHERE count(1) > 1
 GROUP BY year;

The last screen snapshot demonstrates that "aggregate functions are not allowed in WHERE." In other words, we cannot use the count() in the WHERE clause. This is where the HAVING clause is useful because HAVING narrows results in a similar manner as WHERE does, but is used with aggregate functions and GROUP BY.

The next SQL listing demonstrates using the HAVING clause to accomplish the earlier attempted task (listing years for which multiple album rows exist in the table):

SELECT year, count(1)
  FROM albums
 GROUP BY year
HAVING count(1) > 1;

Finally, I may want to order the results so that they are listed in increasing (later) years. Two of the SQL queries demonstrated earlier are shown here with ORDER BY added.

SELECT year, count(1)
  FROM albums
 GROUP BY year
 ORDER BY year;

SELECT year, count(1)
  FROM albums
 GROUP BY year
HAVING count(1) > 1
 ORDER BY year;

SQL has become a much richer language than when I first began working with it, but the basic SQL that has been available for numerous years remains effective and useful. Although the examples in this post have been demonstrated using PostgreSQL, these examples should work on most relational databases that implement ANSI SQL.

Tuesday, October 6, 2015

Single Quotes in Oracle Database Index Column Specification

In my previous post, I mentioned that a downside of using double quotes to explicitly specify case in Oracle identifiers is the potential for being confused with the use of single quotes for string literals. Although I don't personally think this is sufficient reason to avoid use of double quotes for identifiers in Oracle, it is worth being aware of this potential confusion. When to use single quotes versus when to use double quotes has been a source of confusion for users new to databases that distinguish between the two for some time. In this post, I look at an example of how accidental misuse of single quote where no quote is more appropriate can lead to the creation of an unnecessary index.

The SQL in the simple script createPersonTable.sql generates a table called PEOPLE and an index will be implicitly created for this table's primary key ID column. However, the script also contains an explicit index creation statement that, at first sight, might appear to also create an index on this primary key column.

createPersonTable.sql

CREATE TABLE people
(
   id number PRIMARY KEY,
   last_name varchar2(100),
   first_name varchar2(100)
);

CREATE INDEX people_pk_index ON people('id');

We might expect the statement that appears to explicitly create the primary key column index to fail because that column is already indexed. As the output below shows, it does not fail.

When a query is run against the indexes, it becomes apparent why the explicit index creation did not fail. It did not fail because it was not creating another index on the same column. The single quotes around what appears to be the "id" column name actually make that 'id' a string literal rather than a column name and the index that is created is a function-based index rather than a column index. This is shown in the query contained in the next screen snapshot.

The index with name PEOPLE_PK_INDEX was the one explicitly created in the script and is a function-based index. The implicitly created primary key column index has a system-generated name. In this example, the function-based index is a useless index that provides no value.

It's interesting to see what happens when I attempt to explicitly create the index on the column by using double quotes with "id" and "ID". The first, "id", fails ("invalid identifier") because Oracle case folds the name 'id' in the table creation to uppercase 'ID' implicitly. The second, "ID", fails ("such column list already indexed") because, in this attempt, I finally am trying to create an index on the same column for which an index was already implicitly created.

In my original example, the passing of a literal string as the "column" to the index creation statement resulted in it being created as a useless function-based index. It could have been worse if my intended primary key column index hadn't already been implicitly created because then I might not have the index I thought I had. This, of course, could happen when creating an index for a column or list of columns that won't have indexes created for them implicitly. There is no error message to warn us that the single-quoted string is being treated as a string literal rather than as a column name.

Conclusion

The general rule of thumb to remember when working with quotation marks in Oracle database is that double quotes are for identifiers (such as column names and table names) and single quotes are for string literals. As this post has demonstrated, there are times when one may be misused in place of the other and lead to unexpected results without necessarily displaying an error message.

Monday, October 5, 2015

Downsides of Mixed Identifiers When Porting Between Oracle and PostgreSQL Databases

Both the Oracle database and the PostgreSQL database use the presence or absence of double quotes to indicate case sensitive or case insensitive identifiers. Each of these databases allows identifiers to be named without quotes (generally case insensitive) or with double quotes (case sensitive). This blog post discusses some of the potential negative consequences of mixing quoted (or delimited) identifiers and case-insenstive identifiers in an Oracle or PostgreSQL database and then trying to port SQL to the other database.

Advantages of Case-Sensitive Quoted/Delimiter Identifiers

There are multiple advantages of case sensitive identifiers. Some of the advertised (real and perceived) benefits of case sensitive database identifiers include:

Ability to use reserved words, key words, and special symbols not available to identifiers without quotes.
- PostgreSQL's keywords:
  - reserved ("only real key words" that "are never allowed as identifiers")
  - unreserved ("special meaning in particular contexts," but "can be used as identifiers in other contexts").
  - "Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.) This allows constructing table or column names that would otherwise not be possible, such as ones containing spaces or ampersands."
- Oracle reserved words and keywords:
  - Oracle SQL Reserved Words that can only be used as "quoted identifiers, although this is not recommended."
  - Oracle SQL Keywords "are not reserved," but using these keywords as names can lead to "SQL statements [that] may be more difficult to read and may lead to unpredictable results."
  - "Nonquoted identifiers must begin with an alphabetic character from your database character set. Quoted identifiers can begin with any character."
  - "Quoted identifiers can contain any characters and punctuations marks as well as spaces."
Ability to use the same characters for two different identifiers with case being the differentiation feature.
Avoid dependency on a database's implementation's case assumptions and provide "one universal version."
Explicit case specification avoids issues with case assumptions that might be changeable in some databases such as SQL Server.
Consistency with most programming languages and operating systems' file systems.
Specified in SQL specification and explicitly spells out case of identifiers rather than relying on specific implementation details (case folding) of particular database.
Additional protection in cases where external users are allowed to specify SQL that is to be interpreted as identifiers.

Advantages of Case-Insensitive Identifiers

There are also advantages associated with use of case-insensitive identifiers. It can be argued that case-insensitive identifiers are the "default" in Oracle database and PostgreSQL database because one must use quotes to specify when this default case-insensitivity is not the case.

Case-insensitivity is the "default" in Oracle and PostgreSQL databases.
The best case for readability can be used in any particular context. For example, allows DML and DDL statements to be written to a particular coding convention and then be automatically mapped to the appropriate case folding for various databases.
Avoids errors introduced by developers who are unaware of or unwilling to follow case conventions.
Double quotes (" ") are very different from single quotes (' ') in at least some contexts in both the Oracle and PostgreSQL databases and not using case-sensitive identifier double quotes eliminates need to remember the difference or worry about the next developer not remembering the difference.
- Oracle database single quotes: "Text, character, and string literals are always surrounded by single quotation marks."
- Oracle database double quotes: "A quoted identifier begins and ends with double quotation marks (")."
- PostgreSQL single quotes: Use single quotes in PostgreSQL for "literals"/"string values".
- PostgreSQL double quotes: "delimited identifier or quoted identifier ... is formed by enclosing an arbitrary sequence of characters in double-quotes (")."
Many of the above listed "advantages" may not really be good practices:
- Using reserved words and keywords as identifiers is probably not good for readability anyway.
- Using symbols allowed in quoted identifiers that are not allowed in unquoted identifiers may not be necessary or even desirable.
- Having two different variables of the same name with just different characters cases is probably not a good idea.

Default Case-Insensitive or Quoted Case-Sensitive Identifiers?

In Don’t use double quotes in PostgreSQL, Reuven Lerner makes a case for using PostgreSQL's "default" (no double quotes) case-insensitive identifiers. Lerner also points out that pgAdmin implicitly creates double-quoted case-sensitive identifiers. From an Oracle DBA perspective, @MBigglesworth79 calls quoted identifiers in Oracle an Oracle Gotcha and concludes, "My personal recommendation would be against the use of quoted identifiers as they appear to cause more problems and confusion than they are worth."

A key trade-off to be considered when debating quoted case-sensitive identifiers versus default case-insensitive identifiers is one of being able to (but also required to) explicitly specify identifiers' case versus not being able to (but not having to) specify case of characters used in the identifiers.

Choose One or the Other: Don't Mix Them!

It has been my experience that the worst choice one can make when designing database constructs is to mix case-sensitive and case-insensitive identifiers. Mixing of these make it difficult for developers to know when case matters and when it doesn't, but developers must be aware of the differences in order to use them appropriately. Mixing identifiers with implicit case and explicit case definitely violates the Principle of Least Surprise and will almost certainly result in a frustrating runtime bug.

Another factor to consider in this discussion is case folding choices implemented in Oracle database and PostgreSQL database. This case folding can cause unintentional consequences, especially when porting between two databases with different case folding assumptions. The PostgreSQL database folds to lowercase characters (non-standard) while the Oracle database folds to uppercase characters. This significance of this difference is exemplified in one of the first PostgreSQL Wiki "Oracle Compatibility Tasks": "Quoted identifiers, upper vs. lower case folding." Indeed, while I have found PostgreSQL to be heavily focused on being standards-compliant, this case folding behavior is one place that is very non-standard and cannot be easily changed.

About the only "safe" strategy to mix case-sensitive and case-insensitive identifiers in the same database is to know that particular database's default case folding strategy and to name even explicitly named (double quoted) identifiers with exactly the same case as the database will case fold non-quoted identifiers. For example, in PostgreSQL, one could name all identifiers in quotes with completely lowercase characters because PostgreSQL will default unquoted identifiers to all lowercase characters. However, when using Oracle, the opposite approach would be needed: all quoted identifiers should be all uppercase to allow case-sensitive and case-insensitive identifiers to be intermixed. Problems will arise, of course, when one attempts to port from one of these databases to the other because the assumption of lowercase or uppercase changes. The better approach, then, for database portability between Oracle and PostgreSQL databases is to commit either to using quoted case-sensitive identifiers everywhere (they are then explicitly named the same for both databases) or to use default case-insensitive identifiers everywhere (and each database will appropriately case fold appropriately in its own approach).

Conclusion

There are advantages to both identifiers with implicit case (case insensitive) and to identifiers with explicit (quoted and case sensitive) case in both Oracle database and PostgreSQL database with room for personal preferences and tastes to influence any decision on which approach to use. Although I prefer (at least at the time of this writing) to use the implicit (default) case-insensitive approach, I would rather use the explicitly spelled-out (with double quotes) identifier cases in all cases than mix the approach and use explicit case specification for identifiers in some cases and implicit specification of case of identifiers in other cases. Mixing the approaches makes it difficult to know which is being used in each table and column in the database and makes it more difficult to port the SQL code between databases such as PostgreSQL and Oracle that make different assumptions regarding case folding.

Additional Reading

Database identifiers, quoting and case sensitivity
Oracle Database 11g (11.1.1) Database SQL Language Reference: Schema Object Names and Qualifiers
PostgreSQL 9.4.4 Documentation: Identifiers and Keywords
Are there benefits to a case sensitive database?
Reason why oracle is case sensitive?

Monday, May 19, 2014

Hello Cassandra

A colleague recently told me about several benefits of Cassandra and I decided to try it out. Apache Cassandra is described in A Quick Introduction to Apache Cassandra as "one of today’s most popular NoSQL-databases." The main page for Apache Cassandra states that the "Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance." Cassandra is being used by companies such as eBay, Netflix, Adobe, Reddit, Instagram, and Twitter. This post is a summary of steps for getting started with Cassandra.

Apache Cassandra can be downloaded from the main Apache Cassandra web page. The Download page states that "the latest stable release of Apache Cassandra is 2.0.7 (released on 2014-04-18)" and this is the version I will discuss and use in this post.

For this post, I downloaded and installed the DataStax Community Edition from Planet Cassandra Downloads. The DataStax Community 2.0.7 edition includes "The Most Stable and Recommended Version of Apache Cassandra for Production (2.0.7)." There are DataStax Community Edition downloads available for Mac OS X, Microsoft Windows, and several flavors of Linux.

The next screen snapshot shows the directory listing for the "bin" directory of the Apache Cassandra included with the DataStax Community Edition installation.

From that "bin" directory, the Cassandra server can be started simply by running the appropriate executable. In the case of this single Windows machine, that command is cassandra.bat and this step is illustrated in the next screen snapshot.

The interactive command-line tool cqlsh is also located in the Apache Cassandra "bin" subdirectory. This tool is similar to SQL*Plus for Oracle databases, mysql for MySQL databases, and psql for PostgreSQL. It allows one to enter various CQL (Cassandra Query Language) statements such as inserting new data and querying data. Starting cqlsh from the command line on a Windows machine is shown in the next screen snapshot.

There are several useful observations that can be made from the previous image. As the output of from starting cqlsh shows, this version of Apache Cassandra is 2.0.7, this version of cqlsh is 4.1.1, and the relevant CQL specification is 3.1.1. The immediately previous screen snapshot also demonstrates help provided by running the "HELP" command. We can see that there are several "documented shell commands" as well as even more "CQL help topics."

The previous screen snapshot demonstrated that the "help" command in cqlsh lists individual topics on which the help command can be specifically run. For example, the next screen snapshot demonstrates the output from running "help types" in cqlsh.

In this screen snapshot, we see CQL data types that are supported in cqlsh such as ascii, text/varchar, decimal, int, double, timestamp, list, set, and map.

Keyspaces in Cassandra

Keyspaces are significant in Cassandra. Although this post covers Cassandra 2.0, the Cassandra 1.0 documentation nicely explains keyspaces in Cassandra: "In Cassandra, the keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application." This documentation goes on to explain that keyspaces are typically used to group column families by replication strategy. The next screenshot demonstrates creation of a keyspace in cqlsh and listing the available keyspaces.

The last screen snapshot included an example of using the command SELECT * FROM system.schema_keyspaces; to see the available keyspaces. When one just wants a list of the names of the available keyspaces without all of the other details, it is easy to use desc keyspaces as shown in the next screens snapshot.

Creating a Column Family ("Table")

With a keyspace created, a column family (or table) can be created. The next screen snapshot demonstrates using the newly created movies_keyspace with the use movies_keyspace; statement and then shows using the cqlsh command SOURCE (similar to using @ in SQL*Plus) to run an external file to create a table (column family). The screen snapshot demonstrates listing available tables with the desc tables command and listing specific details of a given table (MOVIES in this case) with the desc table movies command.

The above screen snapshot demonstrated running an external file called createMovie.cql using the SOURCE command. The code listing for the createMovie.cql file is shown next.

CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   PRIMARY KEY (title, year)
);

Inserting Data into and Querying from the Column Family

The next screen snapshot demonstrates how to insert data into the newly created column family/table [insert into movies_keyspace.movies (title, year, description) values ('Raiders of the Lost Ark', 1981, 'First of Indiana Jones movies.');]. The image also shows how to query the column family/table to see its contents [select * from movies].

Cassandra is NOT a Relational Database

Looking at the Cassandra Query Language (CQL) statements just shown might lead someone to believe that Cassandra is a relational database. However, CQL is a relational-like feature added to Cassandra 2.0 intended to help people with SQL expertise more readily adopt Cassandra. Similarly, triggers are being added in 2.0/2.1. Despite the presence of these Cassandra features intended to make it easier for relational database users to adopt Cassandra, there are significant differences between Cassandra and a relational database.

The Cassandra Data Model is a page in the Apache Cassandra 1.0 Documentation that describes some key differences between Cassandra and relational databases. These include:

"Cassandra does not enforce relationships between column families the way that relational databases do between tables"
- There are no foreign keys in Cassandra and there is no "joining" in Cassandra.
- Denormalization is not a shameful thing in Cassandra and is actually welcomed to a certain degree.
Cassandra "table" (column family) modeling should be done based on expected queries to be used.

Conclusion

I've just begun to get my feet wet with Cassandra but look forward to learning more about it. This post has focused on some basics of acquiring and starting to use Cassandra. There is much to learn about Cassandra and some "deeper" topics that really need to be understood to truly appreciate Cassandra include Cassandra architecture (and here), Cassandra Data Modeling (and here), and Cassandra's strengths and weaknesses.

Tuesday, December 31, 2013

Significant Software Development Developments of 2013

At the end of each calendar year, I like to summarize some of the most significant developments in the software development industry that happened during the year that is ending. The choice of these is entirely subjective and obviously colored by my own experience, background, perceptions, and preferences. Not worrying about the opinionated content of such a post, I now present the developments in software development that I consider most significant in 2013.

10. Gradle

Gradle appeared to me to enter the mainstream consciousness of software developers in a big way in 2013. I have been watching Gradle's development and playing with it a bit for some time now, but I have noticed that numerous open source projects now mention it prominently, it's referenced in many recently published Java books that aren't even about Gradle specifically, and Google selected Gradle to be delivered with its Android Studio product announced at Google I/O 2013. It took a while for Maven to breakthrough and compete with Ant and I think 2013 is seeing the beginning of Gradle's breakthrough to challenge Maven and Ant for predominance in Java-based build systems. Three books devoted to Gradle (the short Gradle: Beyond the Basics, the more comprehensive Gradle in Action, and the German Gradle: Modernes Build-Management - Grundlagen und Praxiseinsatz) have listed 2013 publication dates.

Gradle's rapidly rising popularity is nearly matched by its torrid rate of new releases. Gradle 1.4 ("faster builds that use less heap space"), Gradle 1.5 ("optimizations to dependency resolution"), Gradle 1.6 (improved Task ordering, "JaCoCo plugin for test coverage," and support for JUnit test categories), Gradle 1.7 ("fastest Gradle ever"), Gradle 1.8 (performance improvements and more native languages support), Gradle 1.9 (bug fixes and HTML dependency report), and Gradle 1.10 ("command-line usability features") were all released in 2013.

Gradle's success does not surprise me. It's Groovy foundation alone offers numerous advantages: Groovy scripts are easier to write procedural build-style code than is XML, Groovy has numerous syntactic shortcuts, Groovy is easily learned and applied by Java developers, Groovy has full JVM access, and Groovy includes built-in Ant support. On top of its inherent advantages from being built on Groovy, Gradle builds many useful and attractive features of its own on top of that Groovy foundation. Gradle adheres to several Ant and Maven conventions and supports Ivy and Maven repositories, making it straightforward to move from Maven or Ant+Ivy to Gradle.

Ruby on Rails helped bring Ruby into mainstream international prominence and, to a lesser degree, Grails helped do the same thing for Groovy. Gradle has the potential to pick up where Grails left off and push Groovy even further into the spotlight.

9. Trend Toward Single Language Development

For several years now, there has been a definite trend toward polyglot programming (polyglot persistence was even on last year's version of this post). Although this trend is likely to continue because some languages are better for scripting than others, some languages are better suited for web development than others, some languages are better suited for desktop development than others, some languages are better suited for realtime and embedded device development than others, some languages are better suited for scientific computing than others, and so on. However, I have seen indications of the pendulum swinging back at least a bit recently.

One of the arguments in favor of Node.js is the ability for JavaScript developers to use the same language on the "front-end" of a web application as on the "back-end." In the post Top Things I learned about development in 2013, Antonin Januska writes, "Working back to front in the same language is awesome." This is, of course, the reason that Java developers have in some cases been resistant to using JavaScript, Flex, or JavaFX Script (now deprecated) for front-ends of their Java applications (and why tools like Google Web Toolkit have been so popular). Java developers who write desktop applications (yes Virginia, desktop applications do exist) often experience the advantages of using the same language front-to-back as well.

One of Ceylon's most promising possibilities is the ability to write code in Ceylon that works on both Java Virtual Machines and JavaScript virtual machines and so could be used front-to-back in a Ceylon-based application. Indeed, the Ceylon page advertises, "[Ceylon] runs on both Java and JavaScript virtual machines, bridging the gap between client and server." Languages commonly associated with the Java Virtual Machine such as Scala and Kotlin also are offering the ability to compile to JavaScript.

A final example of the trend back to a single language in many environments is the use of Python in scientific computing as covered in the post The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch.

I'm not arguing that this trend means that there will only be one main programming language in the future. However, I do believe it is generally human nature to want to use the same language or approach as much as possible because it's what we're familiar with and using the same language helps us to leverage our experience more broadly in the same application. The trend seems to be for each of the major languages (C/C++, Java, JavaScript, Python, .NET languages, etc.) to continue building up their own "stack" to support end-to-end functionality within that language and its ecosystem of frameworks, libraries, and tools. It's no coincidence that once a new language starts to see significant adoption, it quickly begins to see development of a variety of frameworks, libraries, and toolkits that extend the reach of that language.

I also don't want to imply that this is the end of polyglot programming. I don't see any programming language that is the best fit in all cases and there is no doubt that the most valuable developers will be those familiar with more than one programming language.

8. Internet of Things

I first heard about the concept of the Internet of Things at JavaOne 2012 and it got even more attention at JavaOne 2013. Oracle is not the only company touting the Internet of Things. IBM is into the Internet of Things as are many others.

Some believe that 2014 will be the year of the Internet of Things. Tori Wieldt has called "Java and the Internet of Things" one of the "Top Java Stories of 2013." In 2013, two series in Hinkmond Wong's Weblog have focused on the Internet of Things with featured posts on a Thanksgiving Turkey Tweeter and on a Christmas Santa Detector.

Other useful articles with differing opinions on the Internet of Things include The Internet of Things: a tangled web, Here's the Scariest Part of the Internet of Things, The Internet Of Things Will Be Huge—Just Not As Huge As The Hype, Five Challenges For The Internet of Things Ecosystem, The Internet of things will not arrive in 2014, CES 2013: The Break-Out Year For The Internet Of Things, and Here's Why 'The Internet Of Things' Will Be Huge, And Drive Tremendous Value For People And Businesses.

On a lighter (or more dire, depending on your perspective) note related to The Internet of Things, Aaron Pressman writes, "The whole crazy 'Internet of Things' movement to put everything under network control seems tailor made for Hal" (2001: A Space Odyssey).

7. Mobile Development

If someone not familiar with our industry was to start reading our software development social media sites and forums, that person would likely come to the conclusion that the vast majority of software development today is development of mobile applications. I have long argued that blog posts and articles often skew toward more leading-edge topics than established topics for a variety of reasons (perception/reality that established topics are already well-covered, resume building, fun to play with and write about new things, etc.). That being stated, there is no question that mobile development is popular in reality and not just in perception. There is no question that a big part of HTML5's popularity and rapid adoption is the opportunity to write applications in one set of languages (HTML/JavaScript/CSS) that will run on multiple mobile devices. Numerous projects and tools are being released to allow for writing applications in one language and compiling them to native formats for various mobile devices.

6. Responsive Design

At the end of 2012, Pete Cashmore predicted that 2013 would be the "Year of Responsive Web Design" because of its "obvious benefits": "You build a website once, and it works seamlessly across thousands of different screens." I like Jordan Larkin's explanation of responsive web design:

The term "responsive web design" (or responsive mobile design) refers to websites that change and adapt their appearance for optimum viewing on all screen sizes, tablets, smartphones, ipods, kindles along with desktop and laptop computer screens. Occasionally, in the digital arts industry, it is called "fluid design", "adaptive website design" or "RWD". Unresponsive websites do not change to fit different screen sizes, which means they can be difficult to navigate and look at on smaller devices.

As a person who is using a smartphone for an increasing percentage of my daily online activities, but still uses the laptop frequently and the desktop occasionally, I am appreciating firsthand the web site authors whose web sites and pages work well on all of these devices. It's often satisfactory to have two different web sites (one for mobile devices and one for everything else) from a consumer point of view, but this obviously means essentially duplicate code for the site developers. Even from a consumer point of view, there are times when I find the mobile version of a site lacking in features and in those cases it'd be preferable to have the regular site on all devices as long as that regular site appeared nicely on all devices.

The highly informative web site A List Apart has a nice set of articles related to responsive web design.

5. Node.js

JavaScript, despite its flaws, has dominated the web browser for years. Although JavaScript on the server has been available for some time (such as with Rhino and more recently Nashorn in Java), Node.js seems to be doing for JavaScript on the server what Ruby on Rails did for Ruby: the framework is popularizing the language (or in this case, popularizing the language specifically on the server).

2013 has been a big year for Node.js. There are numerous blogs and articles written on it on seemingly a daily basis. Some of these articles include What You Need To Know About Node.js and Node.js keeps stealing Rails' thunder.

Several books on Node.js have been published in 2013. These include Node.js in Action, Learning Node.js: A Hands-On Guide to Building Web Applications in JavaScript, Node.js the Right Way: Practical, Server-Side JavaScript That Scales, Pro Node.js for Developers, Node.js Recipes: A Problem-Solution Approach, Mastering Node.js, Using Node.js for UI Testing, JavaScript on the Server Using Node.js and Express, and the final version of The Node Beginner Book.

4. Big Data

Big Data holds the same #4 spot on my list as it did last year. Apache Hadoop and the R Project are just two examples of popular products/languages riding the Big Data wave. Python too, is increasingly being chosen as the programming language of choice for working with big data sets.

Readers of java.net recently answered a survey regarding Big Data in which the closest thing to a consensus seemed to be that "Big Data Is Probably Significant, but not too Surprising."

3. HTML5

HTML5 saw initial hype, disappointed for a while, and seems to be back on its rapid rise in popularity. I don't call out JavaScript individually in this post, but group it with HTML and CSS as part of HTML5 (and its also grouped with Node.js in this post). Given that HTML5, for purposes of this blog post, represents all of these things, it easily makes my top three significant software development developments in 2013. As mentioned previously with regard to mobile development, HTML5 is a popular approach for generating applications once that can theoretically run on any mobile device.

HTML5 features are seeing increasing standardization in terms of implementations in major browser. JavaScript/HTML libraries such as Angular.js and Ember.js are building on the momentum that jQuery has brought to HTML5 development in recent years.

HTML5's success is even evident in languages not considered part of HTML5 themselves. For example, one of the most heavily advertised new features of Java EE 7 is its HTML5 support. Recent versions of NetBeans IDE (considered primarily a Java IDE despite its multiple language support) have also seemed to emphasize HTML5 among their most important new features in 2013.

2. Security

As more information is online and we depend increasingly on availability of our data online, security continues to be an important issue for software developers. The trend of highly visibility security incidents continued in 2013. These incidents affected Ruby on Rails, Java, and other languages. The increasing frequency of security patches led Oracle to change how it labels the versions of Java SE.

An early 2013 article, Safeguard your code: 17 security tips for developers, outlines tips developers can take to improve the security of their applications. An earlier article in 2013 spelled out the increasing security concerns companies face. The book Java Coding Guidelines: 75 Recommendations for Reliable and Secure Programs has also been published in 2013. The 10 Biggest Security Stories Of 2013 outlines some of the biggest security-related stories of 2013.

1. Technical Dysfunction

Sadly, from a software development perspective, 2013 may be most remembered for the high profile technical glitches that occurred. Words like "debacle," "disaster," and "meltdown" have been associated with these issues and, rightly or wrongly, have reflected poorly on our industry. The most high profile dysfunction has been the embarrassing United States healthcare site healthcare.org. However, the issues that affect reputations and customer confidence have not been limited to government. Wal-Mart and Target, two major retailers in the United States, have had notable web site issues in the latter part of 2013 as well. Cloud-impacting technical dysfunction has occurred in 2013 in several notable cases including Amazon Web Services (AWS) and Google (including the search engine).

There seems to be plenty of blame to go around, but it seems difficult to get a good read on exactly what has caused these high profile technical failures. With healthcare.org, for example, I've seen people blame all types of different culprits including not allotting enough time to the effort, not being agile enough, being too agile, failing for despite agile approaches, failing to estimate user load correctly, getting government involved, etc. Although the real reasons are probably multiple and varied in nature and probably interest software developers more than others, the perception of our industry has gotten dinged up in 2013.

Honorable Mention

Although the developments in software development listed below did not make my top ten, they are significant enough to me to make this "Honorable Mention" category (in no particular or implied order).

JSON

One of the benefits of XML many years now has been the ubiquity of XML support across different languages, tools, frameworks, and libraries. For example, I recently wrote about how easy it is to use Groovy to search Subversion logs because Subversion makes its log output available in XML format and Groovy knows XML well.

JSON has been very popular with developers for some time now, but there have been many cases where standard libraries and tools that supported XML did not support JSON, meaning that developers had to write custom writing/reading code for JSON when using those libraries and tools. I'm beginning to see a lot more JSON support with tools and libraries now. Programming languages are also providing nice JSON parsing/writing capabilities. For example, Groovy has had JSON support for some time and Java EE 7 (JAX-RS 2.0) includes JSON support via the Java API for JSON.

JSON has been prominent enough in 2013 to warrant being included in titles of two Packt Publishing books published in 2013: JavaScript and JSON Essentials and Developing RESTful Services with JAX-RS 2.0, WebSockets, and JSON.

Java EE 7 Released

Java EE 7 was officially released in 2013. In a testament to Java EE's current widespread use and expected potential use of Java EE 7, book publishers have already published several books on Java EE 7 including Java EE 7 First Look, Java EE 7 Essentials, Beginning Java EE 7, Java EE 7 Recipes: A Problem-Solution Approach, Introducing Java EE 7: A Look at What's New, and Java EE 7 Developer Handbook.

Although I've never embraced JavaServer Faces (JSF), the feature of Java EE 7 that has been most interesting to me is the support for Faces Flows. I first read about this feature when reviewing Java EE 7 First Look and Neil Griffin's post Three Cheers for JSF 2.2 Faces Flows have reinforced my interest in this feature. In the post A Realistic JSF 2.2 Faces Flows Example, Reza Rahman supports my opinion that this is a key feature in Java EE 7 to watch with the quote, "Faces Flows are one of the best hidden gems in Java EE 7." Michael and Faces Flows might persuade me to give JavaServer Faces another look.

A recent blog post shows integration of AngularJS with Java EE 7.

Single Page Applications

The advantage of web applications over desktop applications has always been significant easier deployment of web applications than of desktop applications. The cost, however, has been a less fluid experience and sometimes less performing application than could be provided on the desktop. The concept of single-page applications is to make web (and by extension mobile applications that use traditional web technologies) feel and behave more like a "single-page" desktop application. Newer JavaScript libraries such as Meteor are being designed for the "thicker client" style of single-page applications.

The Wikipedia page on Single Page Application lists some technical approaches to implementing this concept. The Manning book Single Page Web Applications was also released in 2013. It's subtitle is "JavaScript end-to-end" (another piece of evidence of the general movement toward a single language).

See the description of Meteor below for another nice explanation of how web development is moving toward what is essentially this concept of single-page applications.

AngularJS

It seems like one cannot read any software development social media sites without running across mention of AngularJS. Although its Google roots are undoubtedly part of its success, AngularJS enjoys success from cleanly addressing significant needs in HTML/JavaScript development (shifting appropriate dynamic functionality from pure JavaScript to HTML with clever binding). In his post 10 Reasons Why You Should Use AngularJS, Dmitri Lau states that "Angular is the only framework that doesn’t make MVC seem like putting lipstick on a pig." Jesus Rodriguez, in his post Why Does Angular.js Rock?, writes that AngularJS "excels in the creation of single-page-applications or even for adding some 'magic' to our classic web applications." K. Scott Allen writes in his post Why Use AngularJS?, "I like Angular because I have fun writing code on top of the framework, and the framework doesn't get in the way."

Ember.js

Ember.js is another JavaScript library seeing significant online coverage in 2013. Ember 1.0 was released on 31 August 2013 and Ember 1.2.0 was released on 4 December 2013.

Like AngularJS and Knockout, Ember.js's approach is to embrace HTML and CSS rather than trying to abstract them away.

Famo.us

The famo.us web page currently requires one to "sign up for the beta" before being able to "experience famo.us." It's subtitle is "a JavaScript engine and framework that solve HTML5 performance." Another famo.us page states, "famo.us is a front end framework that solves performance for HTML5 apps" and "works for phones, tablets, computers and television."

Famo.us is discussed in two late 2013 InfoWorld posts: Did these guys just reinvent the Web? and Fast and flashy: Famo.us JavaScript framework revealed.

At this point, famo.us is still in beta, but it could be big in 2014 if it is able to deliver on what is advertised in 2013.

Meteor

Meteor is described on its main page as "an open source platform" for writing "an entire app in pure JavaScript" and using the "same APIs ... on the client and the server." In the Paul Krill interview Meteor aims to make JavaScript programming fun again, Matt DeBergalis stated that Meteor was created to address the changing web development paradigm often referred to as single-page application:

There is a shift in the Web application platform, and specifically, people are starting to write a new kind of application, what we call a thick client, where most of the code is actually running inside the Web browser itself rather than in a data center. This is an architectural change from running the software in the data center and sending HTML on the wire to a model where we have data on the wire and the actual rendering, the displaying of the user interface, is happening on the client. ... That's why it feels more interactive. It's not page-based like the old Web. It's much more engaging."

MEAN Stack

Having a witty acronym helped advertise and communicate the LAMP stack (Linux, Apache HTTP Server, MySQL/MariaDB, PHP/Perl/Python) and arguably contributed to the adoption of this combination of technologies. With this in mind, I found Valeri Karpov's post The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js interesting. The post's author is another who points out the productivity advantages that can be gained from using a single language throughout an application. There is already a book underway: Getting MEAN with Mongo, Express, Angular, and Node. It will be interesting to watch this newly minted terminology and see if the stack and its name come close to the success that the LAMP stack and its name have enjoyed.

Commercial Support Dropped for GlassFish 4

Although it took longer to happen than most people probably anticipated, Oracle's dropping of commercial support for GlassFish 4 was formally announced in 2013 and is what most of us expected when we heard of Oracle purchasing Sun. The future of GlassFish is certainly cloudier now and expectations for GlassFish's future range from it being essentially dead to it thriving as the reference implementation.

Java IDEs

The major Java IDEs continued to add features to their already impressive feature sets in 2013. NetBeans had two major releases in 2013 with 7.3 released in February and 7.4 released in October. These two NetBeans releases added features such as Java EE 7 support, Project Easel for HTML5 development, Groovy 2.0 integration, JSON support, support for new JavaScript libraries (including Angular.js), native Java application packaging, Cordova integration, and improved support for non-JVM languages C/C++ and PHP.

IntelliJ IDEA 13 was released earlier this month. The release announcement highlights support for Java EE 7, improved Spring Framework integration, improved Android support thanks to IntelliJ IDEA Community Edition being used as the basis for Android Studio, improved database handling, and "refined Gradle support." Eclipse is often the IDE chosen for building a more specialized IDE such as Spring IDE (Spring Tool Suite), Scala IDE, or the new Ceylon IDE, so it's a particularly big deal that Google chose IntelliJ IDEA as the basis of its Android Studio.

Speaking of Eclipse, the seemingly most used Java-based IDE (especially when you consider the IDEs derived from it or based on it) also saw new releases in 2013. Eclipse 4.3 (Kepler) was released in 2013. There were also numerous popular plugins for Eclipse released in 2013.

Visual Studio 2013

Sun Microsystems was not the only company that saw desirable advantages and benefits from a single language that could be used at all layers of an application. Microsoft has implemented various efforts (Silverlight) for years to do the same thing. In 2013, Visual Studio 2013 was released with significant enhancements. These enhancements included improved support for languages not specific to Microsoft's .NET framework. Many of these better supported languages are on my list in this post: JavaScript, HTML, CSS, and Python.

Groovy

Groovy's 2.0 release (including static compilation support) made 2012 a big year for Groovy. Although 2013 did not see as significant of enhancements in the Groovy language, the year did start out with the announcement of Groovy 2.1. Perhaps the biggest part of that 2.1 release was Groovy's full incorporation of Java SE 7's invokedynamic, a major Java SE enhancement intended for non-Java languages like Groovy.

Groovy 2.2's release was announced toward the end of 2013. This release improved Groovy's invokedynamic support by adding OSGi manifests to the Groovy's invokedynamic-based JARs.

In The Groovy Conundrum, Andrew Binstock writes that "with the performance issues behind it, Groovy is a language primed for widespread use," but warns that Groovy is a language that is "easy to learn, but hard to master."

As is mentioned more than once in this post, Groovy has had a lot of additional exposure in 2013 thanks to Gradle's rapidly rising popularity. I believe that Gradle will continue to introduce Groovy to developers not familiar with Groovy or will motivate developers who have not looked at Groovy for some time to look at it again.

Scala

It seems to me that Scala continues to gain popularity among Java developers. I continue to see Scala enthusiasts gushing about Scala on various Java and JVM blog comments and forums. One piece of evidence of Scala's continuing and increasing popularity is the number of new books published in 2013 with Scala in their title. These include Scala in Action, Scala Cookbook: Recipes for Object-Oriented and Functional Programming, Functional Programming Patterns in Scala and Clojure: Write Lean Programs for the JVM, Scala Design Patterns: Patterns for Practical Reuse and Design, Play for Scala, Scala Object-Oriented Programming, and Getting Started with SBT for Scala.

For a much better background on what made 2013 a big year for Scala, see Jan Machacek's This Year in Scala (2013).

Ceylon

November 2013 saw "the first production release of the Ceylon language specification, compiler, and IDE." This announcement, available online at Ceylon 1.0.0 is now available, also states, "Ceylon 1.0 is a modern, modular, statically typed programming language for the Java and JavaScript virtual machines." Ceylon offers an Elipse-based IDE and has a formal specification. One of the factors favoring a successful future for Ceylon is its Red Hat sponsorship.

Kotlin

Kotlin is another language that compiles to the Java Virtual Machine or to a JavaScript virtual machine. It also has a strong sponsor in the world of Java in JetBrains, the company behind IntelliJ IDEA. 2013 saw several new releases of Kotlin: Kotlin M5.1, Kotlin M6, Kotlin M6.1, and Kotlin M6.2. I found the blog post Programming Android with Kotlin interesting because it demonstrates use of Kotlin and Gradle to build an Android application.

The Go programming language has had strong backing from Google and continues to receive significant online coverage. Go 1.1 and Go 1.2 (with apparently improved performance) were both released in 2013. Of special interest to me is Go's advertised source code backwards compatibility for all versions 1.x.

Camel

2013 was a big year for Apache Camel, the tool that "empowers you to define routing and mediation rules in a variety of domain-specific languages." Camel-related developments in 2013 included the release of 2.11.0, release of 2.12.0, and release of 2.12.2. These frequent releases and the addition of a new committer and PMC member are among the signs of a healthy open source project.

The release of the Camel Essential Components (DZone Refcardz #170) kicked off 2013 for Camel. Camel got increased attention on software development social media sites in 2013. Zemian Deng's Getting started with Apache Camel using Java was syndicated on Java Code Geeks (as was his Apache Camel using Groovy Introduction) and Niraj Singh's Introduction to Apache Camel was also syndicated on Java Code Geeks. AndrejV's entire blog Just Did Some Code has so far (5 posts in 2013) been devoted to coverage of Camel!

Spring Boot

It's still early to tell because Spring Boot is currently only at version 0.5, but Spring Boot has potential to be be widely adopted and used in the future. It looks like Spring Boot is inspired by and takes advantage of some of the best ideas in Ruby on Rails and Groovy and applies them to easy generation of Spring Framework-based applications.

Python

As stated earlier, Big Data is big and Python is getting a share of that Big Data action. The Continuum Analytics post Python for Big Data states, "Python is a powerful, flexible, open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. ... Python has a unique combination of being both a capable general-purpose programming language as well as being easy to use for analytical and quantitative computing." Tal Yarkoni echoes this statement and observes that his "scientific computing toolbox been steadily homogenizing" on Python.

Python 3.3.1, Python 3.3.2, and Python 3.3.3 were all released in 2013. Cython has joined Pyrex as an alternative for easily writing C extensions with Python syntax and there is even a book on Learning Cython Programming.

The article Python 3.4.0 goes to beta with slew of new modules talks about some of the new features coming with Python 3.4.0 (beta) such as a standard enumeration construct. The article also points out that one of the biggest frustrations with Python remains: the two versions of the language (2.x and 3.x) and no easy route from 2.x to 3.x. From a Java development perspective, I find this interesting because there was a time when arguments like Bruce Eckel's ("People who don't want to deal with these changes don't upgrade, and those people tend not to upgrade anyway") seemed logical and sensible. However, it's not quite as easy as it sounds, particularly when one starts to realize the impact of this on the entire set of products, libraries, and frameworks written for a language that can be heavily impacted and perhaps not usable for some time if ever with the new language.

PHP and HHVM

2013 saw the formal release of the PHP 5.5.x versions: PHP 5.5.0, PHP 5.5.1, PHP 5.5.2, PHP 5.5.3, PHP 5.5.4, PHP 5.5.5, PHP 5.5.6, and PHP 5.5.7.

At the beginning of 2013, Gabriel Manricks outlined reasons Why 2013 is the Year of PHP. Specifically Manricks described tools such as Laravel (including Eloquent ORM), Composer (dependency manager, including Packagist), and PHPUnit (test-driven development in PHP).

The Facebook project HHVM (HipHop Virtual Machine for PHP) was initially released in 2010, but seemed to see a lot more attention in 2013. The original HPHPc compiler compiled PHP into C++ and was another manifestation of the drive to use a single language for authoring an application even if its compiled form was different. The availability of the open source HipHop Virtual Machine (HHVM) for PHP should help address performance issues with PHP; that is seemingly Facebook's primary reason for developing it.

Android Studio

Android Studio was announced at Google I/O 2013 as "a new IDE that’s built with the needs of Android developers in mind" that is "based on the powerful, extensible IntelliJ IDEA Community Edition."

Cloud Computing

Interest in cloud computing remained strong and continued to grow rapidly in 2013. Many of the other items discussed in this post (Big Data, security, technical dysfunctions, etc.) have strong relationships to cloud computing. For more on the biggest cloud stories on 2013, see The 10 Biggest Cloud Stories Of 2013.

Internet Explorer 11

I have not used Internet Explorer except when forced to for a number of years. For a long time, I used Firefox almost exclusively and in recent years I've used Google Chrome almost exclusively on PCs and Firefox on Linux. When I have been forced by a particular web site to use Internet Explorer, I have done reluctantly and am reminded of the much slower performance of the browser than I'm used to in terms of startup and even navigation. I have noticed over this Christmas break, however, when I had to install Internet Explorer 11 manually because the automatic process kept failing, that it's a lot faster than even Internet Explorer 10 was. I still won't make it my primary browser, but it's nice that it performs much better when I do need to use it (such as to play Atari Arcade games without advertisements).

Internet Explorer 11 offers advantages for developers as well as users of the browser. Advertised benefits for developers (and by extension for users of these developers' applications) are improved standards compatibility, new F12 developer tools,

It's not all positive for Internet Explorer 11. Some people seem to want to downgrade to Explorer 10 and reportedly Internet Explorer 11 is presenting some problems for users of Microsoft's own CRM application (after earlier reportedly breaking Google and Outlook access).

It surprises me a bit that the main Microsoft IE URL (http://windows.microsoft.com/en-us/internet-explorer/download-ie) referenced by the Internet Explorer 11 Guide for Developers still advertises downloading of Internet Explorer 9, a version of that browser that Google has already stated they will no longer support.

Windows 8 Not As Successful

Windows 8 seems to be experiencing similar disappointment after Windows 7 that Windows Vista experienced after Windows XP. In fact, The 10 Biggest Software Stories Of 2013 states, "So it looks like Windows 7 will become the new Windows XP -- better get those downgrade rights ready."

Raspberry Pi

The Raspberry Pi continues to catch a lot of interest (2 million had been sold as of October of this year). There were seemingly endless posts on how to do a wide variety of things with the Raspberry Pi. Some of these that stood out most to me are Premium Mathematica software free on budget Raspberry Pi, GertDuino: An Arduino for Your Raspberry Pi, How an open-source computer kit for kids based on Raspberry Pi is taking over Kickstarter, Running OpenJFX on Raspberry Pi, and Simon Ritter: Do You Like Coffee with Your dessert? Java and the Raspberry Pi.

DevOps

The 10 Biggest Software Stories Of 2013 points out that "Cisco, Google and VMware last year invested in Puppet Labs" and that "another DevOps player, Opscode, raised $32 million and changed its name to Chef, the name of its flagship product."

Twitter Bootstrap

Bootstrap (alternatively known as Twitter Bootstrap and Twitter Blueprint) has become so popular and prevalent that there is now a popular (one of DZone's top posts in 2013)

post stating Please stop using Twitter Bootstrap. In August 2013, two years after the public release of Bootstrap, Bootstrap 3 was released (with 3.0.3 released in December). Everybody Should Code

The conversation of whether everybody should or code write code and develop software continued in 2013. Jessica Gross's writes about 10 places where anyone can learn to code and Megan O'Neil's article A Start-Up Aims to Teach Anyone to Write Computer Code features one of these places (Codecademy). Kevin Lindquist writes that software development isn’t just for coders anymore. Katie Elizabeth lists the reasons why everyone should learn to code while Chase Felker wonders if maybe not everybody should learn to code.

In 2013, President Obama (United States) asked America to learn computer science and it may be one of the few things Republicans and Democrats agree on, but Philip Bump argues, No, Mr. President, Not Everyone Needs to Learn How to Code.

SQL Strikes Back

NoSQL implementations have been all the rage for years, but 2013 seemed to present some resurgence in interest in SQL. Some NoSQL implementations added SQL or SQL-like syntax. SQLstream announces Return of the King: The Structured Query Language Is Back! and Jason Levitt and Sean Gallagher write "The hot new technology in Big Data is decades old: SQL."

The truth is that SQL never really went away and is heavily entrenched in software systems, but what is really interesting 2013 is the new thought and effort put into SQL development. word of Google's F1 SQL database ("a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases") is an example of this. Seth Proctor asks Do All Roads Lead Back to SQL? and writes of "NewSQL." In Software Development Trends for 2014, Werner Schuster states that "Many NoSQL DBs come with SQL or SQL-like languages" and asks, "Are SQL skills back in fashion?"

Conclusion

2013 may be remembered in popular culture as the year of selfies, but the developments in the software development world were far more substantive. In particular, it is interesting to see how developers in general are beginning to (or are remembering to) appreciate the ability to write an application front-to-back in a single language and to write responsive designs so that an application need be written only once for multiple platforms. The advantages of these approaches are not new to JVM (Java/Java EE) or CLR (.NET) developers, but seem to now be better appreciated by the more general developer community.

Previous Years' Significant Software Development Developments

The following are my posts similar to this one on items that I thought were of special import to software development in those respective years. I could definitely go back and add to some of these with things I've learned about since then for each of these years.

Inspired by Actual Events

Dustin's Pages

Friday, November 24, 2017

Too Many PreparedStatement Placeholders in Oracle JDBC

Monday, November 7, 2016

Fixed-Point and Floating-Point: Two Things That Don't Go Well Together

Friday, March 4, 2016

SQL: Counting Groups of Rows Sharing Common Column Values

Tuesday, October 6, 2015

Single Quotes in Oracle Database Index Column Specification

Monday, October 5, 2015

Downsides of Mixed Identifiers When Porting Between Oracle and PostgreSQL Databases

Monday, May 19, 2014

Hello Cassandra

Tuesday, December 31, 2013

Significant Software Development Developments of 2013

My Non-Technical Blog

Software Development

Affiliations

Affiliations

Support Wikipedia