PCPYTOCSV Command for IBM ISeries AS/400 AS400

Introduction

PCPYTOCSV converts regular ISeries (AS/400, AS400) database files (PFs, LFs etc) to their CSV (comma-separated value) equivalents. The generated CSV records are variable-length. This is one of the commands in our ISeries utilities library (click this URL for full contact and pricing details).

You can choose any single character to separate field values e.g. a comma, a bar character '|', a tab character or even no character at all. We use the term 'CSV' to mean any of these.

Conversion of ISeries data to CSV typically occurs where data must be exported to a non-DB/2 database. For example via PC download then imported into an external database such as Microsoft Access or SQL Server. Some PC-based file extract tools will do conversion to CSV, but this tends to be very slow.

Using PCPYTOCSV, you preconvert to CSV on the ISeries then download as ‘plain text’ with just a simple EBCDIC->ASCII conversion. So the actual conversion to CSV is done on the ISeries, which compared to a PC has much better background (batch) job facilities and faster I/O with blocked database reads. The PC job itself will also run much more quickly if the conversion work is done by the ISeries.

See also the PCPYFRMCSV command for the reverse conversion from CSV.


Examples

PCPYTOCSV FROMFILE(GLMAST) TOFILE(GLMASTCSV)

The above command converts all records in the file GLMAST in *LIBL to the file GLMASTCSV also in *LIBL. This assumes that GLMASTCSV has already been created with a suitable record length e.g. CRTPF FILE(QGPL/GLMASTCSV) RCDLEN(500). The PCPYTOCSV command does not update any records in GLMAST. PCPYTOCSV command defaults mean that records in GLMASTCSV are created as follows:

  • records are added rather than replaced
  • alphanumeric fields are delimited by quotes e.g. a field with value of ABC appears as "ABC"
  • fields are separated by a comma e.g. "ABC",1.23,"DEF"
  • CR/LF (CarriageReturn/Linefeed) characters are appended at the end of every record

PCPYTOCSV FROMFILE(GLDATA/GLMAST) TOFILE(GLDATA/GLMASTCSV) FLDDLM('|') FLDDLMRPL(*DROP) CHRDLM('') MBROPT(*REPLACE) EOR(*CHAR) EORCHAR(X'0B')

The above command converts all records in the file GLMAST in GLDATA to the file GLMASTCSV also in GLDATA. GLMASTCSV records are created as follows:

  • records are replaced rather than appended
  • alphanumeric fields are not delimited e.g. a field with value of ABC appears verbatim as ABC
  • fields are separated by a bar character e.g. ABC|1.23|DEF
  • if the bar character appears in any source data field, it is dropped in the CSV data
  • the hexadecimal character X'0B' is added at the end of every record

Restrictions

There is a hard-coded limitation of 4096 characters (4KB) on the size of any CSV record that PCPYTOCSV processes. Could be increased easily if it represents a problem.


Processing

Database file fields are converted to CSV using the following rules:

1) Decimal numeric fields: Full stop '.' for decimal place. Leading '-' for negative values (+ve values unsigned). No other characters except 0-9.
2) Dates: always CCYYMMDD (8 digits numeric - no separators)
3) Times: always HHMMSS (6 digits numeric - no separators)
4) Timestamp: always CCYYMMDDHHMMSSnnnnnn where nnnnnn is milli-seconds (20 digits numeric - no separators).
5) The CHRDLM keyword on the command determines what character - if any - is used to delimit alphanumeric fields. All other field types 1) to 4) above are never delimited. The default character used for delimitation is the quote charcter e.g. CHRDLM('"'). You may specify no delimiter e.g. CHRDLM(''). You can use hexadecimal values for non-visible characters e.g. CHRDLM(X'07'). You can control what happens if the selected character occurs in any source alphanumeric field. CHRDLM('"') CHRDLMRPL(*DROP) uses the quote character and also removes the quote character from source data. CHRDLM('"') CHRDLMRPL(*CHAR) CHRDLMRPLC(' ') uses the quote character and replaces any occurrence of the quote character from the source data with a blank character.
6) The FLDDLM keyword on the command determines what character - if any - is used to separate fields. The separator character does not appear before the first field or after the last field on each output record - only ever between fields. The default character used for separation is the comma character e.g. FLDDLM(','). You may specify no separator e.g. FLDDLM(''). You can use hexadecimal values for non-visible characters e.g. FLDDLM(X'07'). You can control what happens if the selected character occurs in any source alphanumeric field. FLDDLM(',') FLDDLMRPL(*DROP) uses the comma character and also removes any comma characters from source data. FLDDLM(',') FLDDLMRPL(*CHAR) FLDDLMRPLC('-') uses the comma character and replaces any occurrence of the comma character from the source data with a dash character.
7) The EOR keyword on the command determines what character(s) are used to indicate end of record. This defaults to *CRLF for the CarriageReturn/LineFeed (hexadecimal X'0D0A') character combination - typical on Windows-based systems. If you specify EOR(*CHAR), then you can use the EORCHAR keyword to specify any single character to use for end of record e.g. EORCHAR('$') or EORCHAR(X'FF').
8) The number of digits output for numeric fractional values (digits after the decimal place for non-whole numbers) is controlled using the DPDIG command keyword. This allows 3 values as follows: *MIN1: always the full stop character for the decimal place followed by at least one fractional digit - zero if necessary; *USED: only non-zero fractional digits - with preceding full stop for decimal place if necessary; *ALL: all digits for fractional value whether zero or non-zero.


IBM rebranded the AS/400 - some users called it AS400 - to ISeries several years ago. All 3 names - ISeries, AS/400, AS400 - refer to exactly the same system.