UNION vs UNION ALL in HIVE

SIMILARITY :
Both UNION and UNION ALL concatenate the result of two different SQLs. 

DIFFERENCES : 
  • UNION removes duplicates, whereas UNION ALL does not. UNION operation eliminates the duplicated rows from the result set but UNION ALL returns all rows after joining.
  • UNION ALL is faster than UNION. Additional work is needed to remove duplicates. In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted.
  • UNION result set is sorted in ascending order whereas UNION ALL Result set is not sorted
EXCEPTION : 
UNION ALL can be slower than UNION in real-world cases where the network such as the internet, is a bottleneck. The cost of transferring many duplicate rows can exceed the query execution time benefit. This has to be analyzed on a case-by-case basis.

NOTE : While using these commands all selected columns need to be of the same data type. 

REFERENCE : 
https://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all

Post a Comment

0 Comments