On a secure HDFS cluster, HAWQ initialization may fail with an error message stating [WARN]:-Failed to create dfs filespace.
Reviewing further, hawq database logs under $MASTER_DATA_DIRECTORY/pg_log may reveal an error like below:
2014-01-03 21:16:36.293756 EST,,,p480707,th1440433952,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log:
E0103 21:16:36.293704 480943 Hdfs.cpp:27] authentication failed: GSSAPI error in client while negotiating security context in gss_init_sec_context() in SASL library. This is most likely due insufficient credentials or malicious interactions.
@ 0x7f7e55692145 Hdfs::Internal::SaslClient::connect()
@ 0x7f7e55681b22 Hdfs::Internal::RpcChannel::connect()
@ 0x7f7e556866d5 Hdfs::Internal::RpcChannel::invoke()
@ 0x7f7e5568be7e Hdfs::Internal::RpcClient::call()
@ 0x7f7e5568d2b6 Hdfs::Internal::Invoker::CallMethod()
@ 0x7f7e5569505a Hdfs::Internal::ClientNamenodeProtocolTranslator::getFileInfo()
Kerberos clients can do DNS lookups to canonicalize service principal names. This can cause difficulties when setting up Kerberos application servers, especially when the client’s name for the service is different from what the service thinks its name is. By default, kerberos client performs a reverse dns lookup, and if the retrieved hostname is different than the name with which the principals have been setup, such issues are observed.
In the below log snippet, refer to the hostname highlighted in red, which are the local hostname of the servers. However, principals were created at KDC using external hostname.
 1390440596.652217: Requesting tickets for host/hdw2.gphd.local@PHD.DEV.VSA.COM, referrals on
 1390440596.652263: Generated subkey for TGS request: aes256-cts/1F40
 1390440596.652360: etypes requested in TGS request: aes256-cts, aes128-cts, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
 1390440596.652557: Encoding request body and padata into FAST request
 1390440596.666230: Received answer from dgram 10.181.22.129:88
 1390440596.666282: Response was not from master KDC
 1390440596.666312: Decoding FAST response
 1390440596.666431: TGS request result: -1765328377/Server krbtgt/GPHD.LOCAL@PHD.DEV.VSA.COM not found in Kerberos database
 1390440596.667000: Convert service host (service with host as instance) on host hdw1 to principal
 1390440596.668733: Remote host after forward canonicalization: hdw1.gphd.local
 1390440596.669084: Remote host after reverse DNS processing: hdw1.gphd.local
 1390440596.669090: Get host realm for hdw1.gphd.local
 1390440596.669094: Use local host hdw1.gphd.local to get host realm
 1390440596.669097: Look up hdw1.gphd.local in the domain_realm map
 1390440596.669101: Look up .gphd.local in the domain_realm map
 1390440596.669104: Look up gphd.local in the domain_realm map
 1390440596.669108: Look up .local in the domain_realm map
 1390440596.669111: Look up local in the domain_realm map
 1390440596.669115: Got realm for host hdw1.gphd.local
 1390440596.669120: Got service principal host/hdw1.gphd.local@
 1390440596.669277: ccselect can't find appropriate cache for server principal host/hdw1.gphd.local@
Note: To enable debugging, refer to the the article : Enable kerberos debugging logs with HAWQ. Log snippets in this article are from the logs used for kerberos client debugging messages.
In /etc/krb5.conf, append the below value to turn off reverse DNS lookup, & place the updated files on the master & segment nodes. Then, go ahead with initializing database again, it should succeed.
Note: Make sure you have deleted HAWQ Master & segment data directories created during last failed initialization attempt, else initialization will fail.
rdns = false
Snippet from the below log message indicate external hostname (ex: s173vsainthdm01.domain.com) could now be retrieved.
 1390449858.962265: Convert service hdfs (service with host as instance) on host sl73vsainthdm01q.domain.com to principal
 1390449858.963968: Remote host after forward canonicalization: sl73vsainthdm01q.domain.com
 1390449858.964029: Remote host after reverse DNS processing: sl73vsainthdm01q.domain.com
 1390449858.964048: Get host realm for sl73vsainthdm01q.domain.com
 1390449858.964068: Use local host sl73vsainthdm01q.domain.com to get host realm
 1390449858.964077: Look up sl73vsainthdm01q.domain.com in the domain_realm map
 1390449858.964087: Look up .domain.com in the domain_realm map
 1390449858.964096: Look up domain.com in the domain_realm map
 1390449858.964105: Look up .com in the domain_realm map
 1390449858.964143: Look up com in the domain_realm map
 1390449858.964155: Got realm for host sl73vsainthdm01q.domain.com
 1390449858.964186: Got service principal hdfs/sl73vsainthdm01q.domain.com@
 1390449858.964950: ccselect can't find appropriate cache for server principal hdfs/sl73vsainthdm01q.domain.com@
 1390449858.965170: Getting credentials postgres@PHD.DEV.VSA.COM -> hdfs/sl73vsainthdm01q.domain.com@ using ccache FILE:/tmp/postgres.ccname
- Pivotal internal employee reference JIRA: GPSQL-1486
- Reading Material: http://web.mit.edu/kerberos/krb5-current/doc/admin/princ_dns.html