Snowflake – Performance Tuning and Best Practices

Note: This article is a compilation effort of multiple performance tuning methodologies in Snowflake. Some Text/Images in the following article has been referred from various interesting articles and book, details of which are captured under “References”. Introduction to Snowflake Snowflake is a SaaS-based Data Warehouse platform built over AWS (and other clouds) infrastructure. One of the … Read more

HDFS – Data Movement across clusters

You can move data in HDFS cluster using distcp command. distcp uses 10 mappers by default to bring data from source system. While doing data movement I encountered a problem in which data movement was failing because of checksum mismatch. If any block mismatch in the checksum then the complete data block was getting discarded.  Checksum is … Read more