Pivotal Greenplum all versions
When data is loaded from external sources with gpfdist/gpload/copy, the query might fail with "Invalid byte sequence for encoding "UTF8": 0xc942" ":
msong=# copy source_address from '/data/msong_env/cases/SR59883770/source_address.dat.0001' WITH DELIMITER '|' ;
ERROR: invalid byte sequence for encoding "UTF8": 0xc942
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY source_address, line 39097
Files from external sources might be encoded with different encoding rather than UTF8, however, the target GPDB is configured to accept UTF8 encoding from the client.
msong=# show client_encoding;
In such case, the database is not able to recognize the external files and fails with above errors during loading.
As a workaround, convert the external source files to UTF8 with iconv:
iconv -f original_charset -t utf-8 originalfile > newfile
As a solution, the external source files should always be encoded with the same encoding as the target database configured, check if following 3 outputs match:
1. Check for client encoding of target database:
2. Check for server encoding of target database:
3. Check for encoding of the external files: