Sqoop is a tool to transfer data between Hadoop and relational databases. It uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.
Basic syntax to run Sqoop is :
sqoop tool-name [Generic-Args] [Tool-arguments]
- All Generic arguments should precede any tool arguments.
- All Generic Hadoop arguments are preceded by a single dash character (-), whereas tool-specific arguments start with two dashes(–), unless they are single-character arguments such as -P.
Example:
sqoop import --driver com.teradata.jdbc.TeraDriver \ --connect "jdbc:teradata://SERVER_IP_ADDRESS/DATABASE=ttmp" \ --username USERNAME \ --password PASSWORD \ --table TABLE_NAME \ --delete-target-dir \ --split-by transaction_id \ -m 8
- import : to import data from relational database.
- –driver : this is entry point of driver which is insalled in /usr/lib/sqoop/lib to connect to a specific relational database.
- –connect : specify database url and database to which you want to connect
- –username : username using which you want to import data
- –password : specify password ( this is insecure way .. for secure way use -P )
- –table : specify table which you want to import
- –delete-target-dir : by default sqoop import data in HDFS under directory named with tablename. If this directory already exists then sqoop will fail. To import fresh data everytime this option has to be selected.
- –split-by : mention column name on which you want to split so that multiple mappers can work in parallel to import data.
- -m of –num-mappers : specify number of mappers you want to use.
This is very basic command to import data from a relational database into HDFS using Sqoop.