Sqoop – Optimise Import

Importing data using Sqoop is one of the most time consuming task of BigData environment. Sqoop is a powerful yet simple tool to import data from different RDBMSs into HDFS. But while importing data following 2 points should be considered with higher priority to reduce time : Number of Mappers: Mapper provides parallelism while importing … Read more

Real Numbers representation in Impala

Many a times we face challenge in keeping the precision scale of real numbers in database after applying complex mathematical functions. When dataset is small a small variation in actual number may not worry a lot. But when dataset is huge and when dealing in BigData then a small variation in one number can lead … Read more

Impala – Optimisation at partition level

We all know that to optimise our queries these 3 strategies are like most common : Partitioned table Bucketing Collecting Stats But sometimes a simple query will run on ALL partitions instead of one. You may notice your query should work on one partition but it will run on all partitions.Let me show you an … Read more