Temperature data Processing using hive



we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
--------------------------------------------------------------------------------------------------



input file :   temperature.txt

xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx


year starting position  in each line  :  6th column

length  of the year  :   4
temperature starting position  in each line :   13 column
  length of the temperature :  2

based on  these clues, first we need to seperate  year and temperature from the raw input table.
then  we can perform  required analytics.

step1.

hive> use halitics;
hive > create table  raw(str  string);
hive > load data local inpath 'temperature.txt' into table raw;
hive> select * from  raw;

o/p --->
xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx

step2.

hive> create table  tempr(y int, t int)
hive> insert overwrite table tempr
           select substr(str,6,4), substr(str,13,2) from raw;
hive> select * from tempr;

o/p--->

1950     20
1950     24
1950     21
1951     22
1951     19

now your data became structured.


step3.

hive> create table result(y int, max int);
hive> insert overwrite table result
              select y, max(t) from tempr  group by y;
hive>  select * from result;



step4.

hive> create table results(y int, max int, min int);
hive> insert overwrite table results
              select y, max(t) , min(t) from tempr  group by y;
hive>  select * from results;








0 comments: