Impala – Optimise query when using to_utc_timestamp() function

From 40 minutes to just 4 minutes Impala to_utc_timestamp() function is used to convert date/timestamp timezone to UTC. But it works very slow. If you have less data in table even then you can easily notice its slow performance.  I faced a similar issue and noticed it was taking around 40 minutes alone to complete … Read more

Sqoop – Optimise Import

Importing data using Sqoop is one of the most time consuming task of BigData environment. Sqoop is a powerful yet simple tool to import data from different RDBMSs into HDFS. But while importing data following 2 points should be considered with higher priority to reduce time : Number of Mappers: Mapper provides parallelism while importing … Read more

Real Numbers representation in Impala

Many a times we face challenge in keeping the precision scale of real numbers in database after applying complex mathematical functions. When dataset is small a small variation in actual number may not worry a lot. But when dataset is huge and when dealing in BigData then a small variation in one number can lead … Read more

Impala – Optimisation at partition level

We all know that to optimise our queries these 3 strategies are like most common : Partitioned table Bucketing Collecting Stats But sometimes a simple query will run on ALL partitions instead of one. You may notice your query should work on one partition but it will run on all partitions.Let me show you an … Read more

Sqoop – How to import data in HDFS

Sqoop is a tool to transfer data between Hadoop and relational databases. It uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance. Basic syntax to run Sqoop is :sqoop tool-name [Generic-Args] [Tool-arguments] All Generic arguments should precede any tool arguments. All Generic Hadoop arguments are preceded by a … Read more

Sqoop – Handle NULL values

By default Sqoop import NULL as null, if you want to change this default configuration you can use following arguments. While importing data :  –null-string –null-non-string While exporting data :  –input-null-string –input-null-non-string Check this example for more clarification :  In above example : –null-string argument represents what should be writtern in HDFS whenever a NULL is identified in … Read more

Understand the FOR loop

This post was originally posted in my first blog — learntheprogramming.blogspot.com A “for” loop allows code to be repeatedly executed and is classified as an iteration statement.Unlike many other kinds of loops, such as the while loop, the for loop is often distinguished by an explicit loop counter or loop variable. This allows the body … Read more

First C Program

The above Program will produce the following Output :My First Program running Perfectly Line 1 and 2 :  “#include” is a preprocessor directive“stdio.h”(Standard Input Output) and “conio.h”(Console Input Oputput) are the Header files which contain the definitions of function clrscr(), printf() and getch()“.h” extension represents that this is a header file. Line 3 : void main() “void” … Read more

First JAVA Program

The above program will give the following as output :My First Program Running Perfectly Note : JAVA is case Sensitive Language(means System and system both are different), so type carefully.The following explanation will help you : Line 1 : import java.lang.*;This line is optional . If you will not write this line in your code then the compiler will … Read more

Google Operating System

On 19th Nov, 2009 Google announced its Google Chrome OS Open Source Project.  Google OS was in buzz from July when Google announced that they are working on Google Chrome OS.  Google chrome OS is developed specially for the peoples who spent most of their time on the internet. It’s been figured out that if you don’t … Read more