Impala – Create Table AS Select * FROM Table – is SLOW

Below query seems like the simplest way to create a replica of table. But simplicity comes with some cost as well.

Create table TABLE_NAME
AS
Select * from TABLE_NAME_2;

Above query will :

  1. NOT create partitions if there are any on TABLE_NAME_2
  2. run very slow

Instead of above we should follow following 2 way approach : 

  1. CREATE TABLE TABLE_NAME    Like TABLE_NAME_2;  — This will create partitions as well if any in TABLE_NAME_2.
  2. Insert Into Table TABLE_NAME Select * from TABLE_NAME_2; — This DML will run faster. 

STATS:

CREATE TABLE test.table_1 AS SELECT * FROM main.table_2;
// 72seconds , 9337252records, 875MB used as peak memory

CREATE TABLE test.device1 LIKE main.table_2;
INSERT INTO TABLE test.table_1 SELECT * FROM main.table_2;
// 20seconds, 9337252records, 1.4GB used as peak memory.