Impala Archives - DataForGeeks

Impala – Create Table AS Select * FROM Table – is SLOW

March 24, 2022July 22, 2016 by Nikhil Aggarwal

Below query seems like the simplest way to create a replica of table. But simplicity comes with some cost as well. Above query will : NOT create partitions if there are any on TABLE_NAME_2 run very slow Instead of above we should follow following 2 way approach : CREATE TABLE TABLE_NAME Like TABLE_NAME_2; — … Read more

Impala – Use Incremental stats instead of Full Table stats

March 24, 2022July 21, 2016 by Nikhil Aggarwal

If you have a table which is partitioned on a column then doingCompute stats TABLE_NAMEwill execute on all partitions. Internally compute stats run NDV function on each column to get numbers. However NDV function works faster then other count(COLUMN), but it will run for each partition which may be irrelevant when you are working/updating/modifying values … Read more

Impala – Optimise query when using to_utc_timestamp() function

May 29, 2025July 21, 2016 by Nikhil Aggarwal

From 40 minutes to just 4 minutes Impala to_utc_timestamp() function is used to convert date/timestamp timezone to UTC. But it works very slow. If you have less data in table even then you can easily notice its slow performance. I faced a similar issue and noticed it was taking around 40 minutes alone to complete … Read more

Real Numbers representation in Impala

March 24, 2022June 30, 2016 by Nikhil Aggarwal

Many a times we face challenge in keeping the precision scale of real numbers in database after applying complex mathematical functions. When dataset is small a small variation in actual number may not worry a lot. But when dataset is huge and when dealing in BigData then a small variation in one number can lead … Read more

Impala – Optimisation at partition level

March 27, 2022June 21, 2016 by Nikhil Aggarwal

We all know that to optimise our queries these 3 strategies are like most common : Partitioned table Bucketing Collecting Stats But sometimes a simple query will run on ALL partitions instead of one. You may notice your query should work on one partition but it will run on all partitions.Let me show you an … Read more