Dell, EMC, Dell Technologies, Cisco,

Sunday, December 18, 2016

Tips for populating Big SQL and Hive Hadoop tables with DATE types

When creating external #Hive tables defined with DATE columns, ensure that the values in the data files on #HDFS ( #Hadoop ) correspond to DATE values and not a mix of DATE and TIMESTAMP values. The same is true for when creating Hive tables and using the Hive INSERT or INSERT…SELECT commands to add data to tables. When Hive expects a DATE type, but instead finds a TIMESTAMP type in the data file, then a NULL value is inserted to the table. NULL values can have a negative impact on query performance especially for queries performed against partitioned tables where the partitioning keys are NULL values. This is because Hive will put all NULL values into one partition.

https://developer.ibm.com/hadoop/2016/12/16/tips-for-populating-big-sql-and-hive-hadoop-tables-with-date-types/

No comments:

Post a Comment