Temperature data Processing using hive
we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
--------------------------------------------------------------------------------------------------
input file : temperature.txt
xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx
year starting position in each line : 6th column
length of the year : 4
temperature starting position in each line : 13 column
length of the temperature : 2
based on these clues, first we need to seperate year and temperature from the raw input table.
then we can perform required analytics.
step1.
hive> use halitics;
hive > create table raw(str string);
hive > load data local inpath 'temperature.txt' into table raw;
hive> select * from raw;
o/p --->
xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx
hive> create table tempr(y int, t int)
hive> insert overwrite table tempr
select substr(str,6,4), substr(str,13,2) from raw;
hive> select * from tempr;
o/p--->
1950 20
1950 24
1950 21
1951 22
1951 19
now your data became structured.
step3.
hive> create table result(y int, max int);
hive> insert overwrite table result
select y, max(t) from tempr group by y;
hive> select * from result;
step4.
hive> create table results(y int, max int, min int);
hive> insert overwrite table results
select y, max(t) , min(t) from tempr group by y;
hive> select * from results;
0 comments: