Snowflake – Performance Tuning and Best Practices

Note: This article is a compilation effort of multiple performance tuning methodologies in Snowflake. Some Text/Images in the following article has been referred from various interesting articles and book, details of which are captured under “References”. Introduction to Snowflake Snowflake is a SaaS-based Data Warehouse platform built over AWS (and other clouds) infrastructure. One of the … Read more

Apache Spark – Performance Tuning and Best Practices

Note: This article is a compilation effort of multiple performance tuning methodologies in Apache Spark. Text/Images in following article has been referred from various interesting articles and book, details of which are captured under “References”. Tweak Configurations Viewing and Setting Apache Spark Configurations 4 ways of doing it : Way-1:Using $SPARK_HOME directory (Configuration changes in … Read more

Data Serialisation – Avro vs Protocol Buffers

Background File Formats Evolution Why not use CSV/XML/JSON?  Repeated or no meta information. Files are not splittable, so cannot be used in a map-reduce environment. Missing/ Limited schema definition and evolution support. Can leverage “JsonSchema” to maintain schema separately for JSON. It may still require transformation based on a schema, so why not consider Avro/Proto? … Read more

Count(*) – Explaining different behaviour in Joins

Observations :  Count(1) or Count(*) – This is never expanded on each column individually so will work perfectly fine on complete data.  Count(1) is more optimized then Count(*) Count(source.*) – source represents “Left table” of “Left Outer Join”: This will be evaluated as Count(source.col1, source.col2, …. source.colN ) So, if any column has NULL, then the complete row … Read more

Cost and Performance Analysis : CSV and Parquet Format

I was doing some cost comparison of using CSV files vs Parquet File. Interestingly, when using Parquet format, data scanning for similar queries, cost 99% less as compared to CSV format. Queries ( Mentioned only for Parquet) CSV ( 11.32 GB )Run Time (in sec) CSV ( 11.32 GB )DataScanned (in GB) PARQUET ( 4.1 GB )Run Time (in sec) PARQUET ( 4.1 GB )DataScanned (in GB) … Read more

Understand the FOR loop

This post was originally posted in my first blog — learntheprogramming.blogspot.com A “for” loop allows code to be repeatedly executed and is classified as an iteration statement.Unlike many other kinds of loops, such as the while loop, the for loop is often distinguished by an explicit loop counter or loop variable. This allows the body … Read more

First C Program

The above Program will produce the following Output :My First Program running Perfectly Line 1 and 2 :  “#include” is a preprocessor directive“stdio.h”(Standard Input Output) and “conio.h”(Console Input Oputput) are the Header files which contain the definitions of function clrscr(), printf() and getch()“.h” extension represents that this is a header file. Line 3 : void main() “void” … Read more

First JAVA Program

The above program will give the following as output :My First Program Running Perfectly Note : JAVA is case Sensitive Language(means System and system both are different), so type carefully.The following explanation will help you : Line 1 : import java.lang.*;This line is optional . If you will not write this line in your code then the compiler will … Read more

Google Operating System

On 19th Nov, 2009 Google announced its Google Chrome OS Open Source Project.  Google OS was in buzz from July when Google announced that they are working on Google Chrome OS.  Google chrome OS is developed specially for the peoples who spent most of their time on the internet. It’s been figured out that if you don’t … Read more

Google Wave

>>It is a web-based service, computing platform, and communications protocol designed to merge e-mail, instant messaging, wikis, and social networking.>>It has a strong collaborative and real-time focus supported by extensions that can provide, for example, spelling/grammar checking, automated translation among 40 languages, and numerous other extensions.>>A wave can be both a conversation and a document … Read more