Product Sales Analysis – 1

Write an SQL query that reports all product names of products in the “sales” table along with their selling year and price.   Input Table : sales sale_id product_id sale_year quantity price 20 80 2017 21 300 30 80 2018 23 400 40 13 2019 56 600   Input Table: product product_id product_name 80 Asus … Read more

Shortest distance in line on x-axis

Points of x coordinates are stored in table “points“. All entries of these points in table are integers and unique. Write a query to find shortest distance between two points on x-axis.   Input Table: points x-axis -3 -6 0 10 Output Table: shortest 3 Explanation: All possible combinations of all points in input table … Read more

Sales Analysis – 1

Tables – product and sales are provided. Being area manager, you are looking for best seller by total sales price to reward and motivate them for best results. If there is a tie, all sellers would be qualified for same reward. Write SQL query to fetch all seller_id by total sale price. Input Table: sales … Read more

Sales Analysis – 2

Tables – product and sales are provided. product table holds data of all products and sales table represent sale transaction that was completed. Write SQL query to identify buyer_id who bought S10 but not iPhone. Input Table: sales seller_id product_id buyer_id sale_date quantity price 11 10 71 2019-02-22 3 6000 11 20 72 2019-03-27 1 600 … Read more

Tutor taking atleast 3 classes for a student

Tutor data is provided tutor_students table below. Table contains data of a month with tutor_id and student_id along with date of month on which tutor took class.Write a SQL query that provides the pairs (tutor_id, student_id) where the tutor has taken at least 3 classes for a student in single month.   Input Table: tutor_students … Read more

Gamers Analysis – 1

Write an SQL query and identify first login date for each gamer.   Input Table: gamers_activity player_id device_id event_date games_played 7 22 2017-04-02 4 7 22 2017-06-05 3 9 33 2018-02-26 0 6 11 2017-05-07 7 6 44 2019-08-06 3   Output Table: player_id first_login 7 2017-04-02 9 2017-04-02 6 2017-04-02 Explanation: “gamers_activity” table shows the activity … Read more

Gamers Analysis – 2

Write a SQL query that identify the device that was first used to log in for each gamer.   Input Table: gamers_activity player_id device_id event_date games_played 7 22 2017-04-02 4 7 22 2017-06-05 3 9 33 2018-02-26 0 6 11 2017-05-07 7 6 44 2019-08-06 3   Output Table: player_id device_id 7 22 9 33 … Read more

Impala – Create Table AS Select * FROM Table – is SLOW

Below query seems like the simplest way to create a replica of table. But simplicity comes with some cost as well. Above query will : NOT create partitions if there are any on TABLE_NAME_2 run very slow Instead of above we should follow following 2 way approach :  CREATE TABLE TABLE_NAME    Like TABLE_NAME_2;  — … Read more

HDFS – Data Movement across clusters

You can move data in HDFS cluster using distcp command. distcp uses 10 mappers by default to bring data from source system. While doing data movement I encountered a problem in which data movement was failing because of checksum mismatch. If any block mismatch in the checksum then the complete data block was getting discarded.  Checksum is … Read more

Impala – Use Incremental stats instead of Full Table stats

If you have a table which is partitioned on a column then doingCompute stats TABLE_NAMEwill execute on all partitions. Internally compute stats run NDV function on each column to get numbers. However NDV function works faster then other count(COLUMN), but it will run for each partition which may be irrelevant when you are working/updating/modifying values … Read more