Dorward

Java Excel API

08 December 2004

Java Excel API is an open source Java API which allows, among other things, Java developers to read Excel spreadsheets. At my previous job I needed to convert Excel files to CSV. A number of different techniques were trailed for this, and I ended up using the Java Excel API for its robustness (it was the only one that didn't throw a wobbly over some of the files I feed into it). Unfortunately, a bug was highlighted in the demo program distributed with the API for CSV output.

I haven't seen an official standard for CSV formatted data, but Creativyst has a clear pseudo-specification. The demo program does not cope with at least a couple of the rules:

  • Fields with embedded commas must be delimited with double-quote characters.
  • Fields that contain double quote characters must be surrounded by double-quotes, and the embedded double-quotes must each be represented by a pair of consecutive double quotes.

This caused us practical problems (such as part numbers with commas in them being treated as a part number followed by a quantity), so I modified the program to always quote fields (quotes are optional in some cases, required in others, but never forbidden) after performing a substitution on any quote characters within the field.

I prepared a CSV patch against version 2.4.3 (see below), which has been rejected from the main code tree as it uses features introduced in the current (at time of writing) stable version of Java. I may write a version for older versions of Java in the future, but as installing Java 1.4 is easy, I'm not in any hurry to do so.

diff -r -b -u jexcelapi-2.4.3/src/jxl/demo/CSV.java jexcelapi-dorward/src/jxl/demo/CSV.java
--- jexcelapi-2.4.3/src/jxl/demo/CSV.java   Fri Jun 18 12:00:07 2004
+++ jexcelapi-dorward/src/jxl/demo/CSV.java Fri Jun 18 11:52:24 2004
@@ -79,7 +79,7 @@
             {
               if (!(hide && row[0].isHidden()))
               {
-                bw.write(row[0].getContents());
+       bw.write("\"" + row[0].getContents().replaceAll("\"","\"\"") + "\"");
               }
               
               for (int j = 1; j < row.length; j++)
@@ -87,7 +87,7 @@
                 bw.write(',');
                 if (!(hide && row[j].isHidden()))
                 {
-                  bw.write(row[j].getContents());
+                  bw.write("\"" + row[j].getContents().replaceAll("\"","\"\"") + "\"");
                 }
               }
             }