Real Numbers representation in Impala

Many a times we face challenge in keeping the precision scale of real numbers in database after applying complex mathematical functions. When dataset is small a small variation in actual number may not worry a lot. But when dataset is huge and when dealing in BigData then a small variation in one number can lead to drastic change in overall computed result.

Here I am discussing issue with Floating datatype of Impala. Calling it as issue may not be correct so we can say this is as a limitation in these data types (Float or Double).

Consider you are importing data from Oracle database into Hdfs. After importing data and querying on it using Impala you may notice some slight changes in values like :    

  • In Oracle – 0.35   
  • In Impala – 0.3499999940395355

So, workaround to this problem is to use DECIMAL data type. After changing the data type to DECIMAL you will notice these small changes can be avoided and will not deviate from the actual result.

By definition on the Cloudera website, it itself says that using floating data types can lead to varied results.

For perfect consistency DECIMAL data type should be used whenever dealing with REAL numbers in Impala. Usage of FLOAT or REAL or DOUBLE  data type should be strictly avoided.