Sqoop – Optimise Import

Importing data using Sqoop is one of the most time consuming task of BigData environment. Sqoop is a powerful yet simple tool to import data from different RDBMSs into HDFS. But while importing data following 2 points should be considered with higher priority to reduce time : Number of Mappers: Mapper provides parallelism while importing … Read more

Sqoop – How to import data in HDFS

Sqoop is a tool to transfer data between Hadoop and relational databases. It uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance. Basic syntax to run Sqoop is :sqoop tool-name [Generic-Args] [Tool-arguments] All Generic arguments should precede any tool arguments. All Generic Hadoop arguments are preceded by a … Read more

Sqoop – Handle NULL values

By default Sqoop import NULL as null, if you want to change this default configuration you can use following arguments. While importing data :  –null-string –null-non-string While exporting data :  –input-null-string –input-null-non-string Check this example for more clarification :  In above example : –null-string argument represents what should be writtern in HDFS whenever a NULL is identified in … Read more