Yenya's World

Tue, 18 Jul 2006

Comma-Separated Values?

While migrating IS MU to UTF-8, I rewrote the code for exporting tabular data to CSV file for MS Excel, factoring it out to a separate module. When I was at it, I have also added the Content-Disposition header, so that the exported file is saved under a sane filename, instead of the default of some_application.pl. So now the Excel exports are saved as files ending with the .csv suffix. Which is, interestingly enough, the source of problems and incompatibilities with MS Excel.

As I have verified, when I save the CSV file as file.pl, excel reads it correctly - it asks whether the TAB character is the field separator (indeed it is), whether Windows-1250 is the file encoding (it is), and happily imports the file. When the same file is named file.csv, Excel opens it without any question, but somehow does not recognize the TAB character as the field separator. So all the fields are merged to the first column, and the TAB characters are displayed as those ugly rectangles.

When I try to separate the fileds with semicolons, Excel happily opens the file (when named as *.csv), but with another file name, it is necessary to explicitly choose the semicolon as a separator. Just another example of MS stupidity - why the separator cannot be the same regardless of the file name? And by the way, what does CSV stand for? Comma-separated values? Colon-separated values? It does not work for commas nor colons. Just semicolons are detected correctly. Maybe it is some kind of newspeak invented by Microsoft.

I guess I keep the exports TAB-delimited, and just change the file name in the Content-Disposition header to use the .txt extension instead (altough something like .its_csv_you_stupid_excel would probably be more appropriate).

Section: /computers (RSS feed) | Permanent link | 5 writebacks

5 replies for this story:

Michal Schwarz wrote: Excel

Yes, Excel is funny program -- programmers in Microsoft really think that "CSV" means "Semicolon Separated Values" :(. BTW: First cell in CSV file cannot contain text "ID", otherwise Ecel thinks it is SYLK file, even when filename ends with ".csv".

Adelton wrote: Use Spreadsheet::WriteExcel

I had great experience with Spreadsheet::WriteExcel -- that way you know there won't be (nearly ;-) any problems on Excel front, and you can have CSV really CSV for non-Excel use.

Yenya wrote:

Michal Schwarz: are you the one I have studied with first year or two of my university study? Adelton: I have added another output format, using Text::CSV_PP (it works for UTF-8 char data as well, unlike Text::CSV_XS).

Adelton wrote: PP/XS/QQ

Hey, we both know that it's not the software you use, it's the data the software produces. So what does Text::CSV_PP do differently from Test::CSV_XS that it works with UTF-8? IIRC, it was only a matter of correct HTTP headers plus some games with Encode (but I used RayApp which complicated things a bit more).

Yenya wrote: Re: PP/XS

Try to encode and print the utf-8 data with Text::CSV_XS and Text::CSV_PP. The later one produces chars, the former bytes.

Reply to this story:

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Key image: key image (valid for an hour only)
Key value: (to verify you are not a bot)

About:

Yenya's World: Linux and beyond - Yenya's blog.

Links:

RSS feed

Jan "Yenya" Kasprzak

The main page of this blog

Categories:

Archive:

Blog roll:

alphabetically :-)