Temperature data(with positive and negative air temperatures) processing with hive

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact :04042026071

_____________________________________________________________________

input file : tempr.txt

xxxxx1950xxx-20xx
xxxxx1950xxx24xx
xxxxx1950xxx-21xx
xxxxx1950xxx10xx
xxxxx1951xxx-22xx
xxxxx1951xxx19xx
xxxxx1951xxx-24xx
xxxxx1951xxx22xx

I want to find out for each year max temperature and minimum temperature.

o/p required.

1950 20 -21
1951 22 -24

step1)
hive> use halitics;
hive> create table raw(str string);
hive> load data local inpath 'tempr.txt' into table raw;
step2)
hive> create table positives(y int, t int);
hive> create table nagatives live positives;
hive> insert overwrite table positives
select substr(str,6,4), substr(str,13,2) from raw where substr(str,13,1) != '-';
hive> insert overwrite table negatives
select substr(str,6,4) , substr(str,13,3) from raw where substr(str,13,1) = '-';
step3)
hive> create table target(y int, t int);
hive> insert overwrite table target
select * from (
select * from positives
union all
select * from negatives ) tab;
hive> create table result(y int, max int, min int);
hive> insert overwrite table result
select y, max(t), min(t) from target group by y;
hive> select * from result;
o/p -->

1950 20 -21
1951 22 -24

0 comments:

Temperature data Processing using hive

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

--------------------------------------------------------------------------------------------------

input file : temperature.txt

xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx

year starting position in each line : 6th column

length of the year : 4
temperature starting position in each line : 13 column
length of the temperature : 2

based on these clues, first we need to seperate year and temperature from the raw input table.
then we can perform required analytics.

step1.

hive> use halitics;
hive > create table raw(str string);
hive > load data local inpath 'temperature.txt' into table raw;
hive> select * from raw;

o/p --->
xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx

step2.

hive> create table tempr(y int, t int)
hive> insert overwrite table tempr
select substr(str,6,4), substr(str,13,2) from raw;
hive> select * from tempr;

o/p--->

1950 20
1950 24
1950 21
1951 22
1951 19

now your data became structured.

step3.

hive> create table result(y int, max int);
hive> insert overwrite table result
select y, max(t) from tempr group by y;
hive> select * from result;

step4.

hive> create table results(y int, max int, min int);
hive> insert overwrite table results
select y, max(t) , min(t) from tempr group by y;
hive> select * from results;

0 comments:

Appending rows of one table to another table.

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

---------------------------------------------------------------------------------------------------

table s : emp1 and emp2

if you directly use the following statement, table emp1 data will be overriden.

hive> insert overwrite table emp1
select * from emp2;

(we used * means,, the schema is same of emp1 and emp2 tables).

But I want to append rows of emp2 to emp1.

hive> insert overwrite table emp1
select * from (
select * from emp1
union all
select * from emp2) e;

note: in hive union should be used as sub query.

but the above process is bad way.

I want to append only females of emp2 to emp1.

hive> insert overwrite table emp1
select * from (
select * from emp1
union all
select * from emp2 where sex='f' ) e;

but the above two ways are bad ...

because,, for example , emp1 has 10lakh rows, and emp2 has 100 rows.
to append 100 rows of emp2 table the union query has to read 10lakh of emp1 and 100 emp2 rows.

I want to append 100 rows, with out reading 10 lakh rows of emp1....

emp1's backend hdfs directory is...
/user/hive/warehouse/halitics.db/emp1
suppose the file loaded into directory is emp.txt

emp2's backend hdfs directory is..
/user/hive/warehouse/halitics.db/emp2
suppose the file loaded into directory is emp2.txt

now I am going to append emp2 data to emp1.

from the command prompt,

$ hadoop fs -cp /user/hive/warehouse/halitics.db/emp2/emp2.txt /user/hive/warehouse/halitics.db/emp1

now, /halitics.db/emp1 directory has emp.txt and emp2.txt.

this , indirect way of loading and appending into table. But good way when compared with hive unions.

0 comments:

hive table to table copy

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact :04042026071

__________________________________________________________

1) hive> insert overwrite table tab1
select * from tab2;

all rows of tab2 will be loaded to tab1.

2) hive> insert overwrite table tab1
select * from tab2 where a>100;

only matching rows with criteria will be loaded into tab1.

3) tabx has ----> a int, b int, c int, d int
taby has ---> a int, d int columns

now how to load only a, d columns of taby to tabx.

hive> insert overwrite table tabx
select a, 0 as b, 0 as c, d from taby;

0 comments:

Loading data into hive tables

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 9640892992

_______________________________________________________________

input file : emp.txt

_____________________

101,amar,m,20000,hyd

102,amala,f,30000,pune

103,siva,m,40000,hyd

104,sivani,f,50000,hyd

105,hari,m,40000,pune

________________________

loading data from local file to hive table.

hive> create table halitics.emp(ecode string, ename string, sex string, esal int, city string)

> row format delimited fields terminated by ',';

hive> Load data local inpath 'emp.txt' into table halitcs.emp;

note: in above example halitics is database..

if you have already selected the database, no need use database name while using table name

hive>use halitics;

hive>Load data local inpath 'emp.txt' into table emp;

if you load again....

hive > Load data local inpath 'emp2.txt' into table emp;

now the rows of emp2.txt will be appended to emp table.

in hdfs,

/user/hive/warehouse/halitics.db/emp directory will have 2 files.. emp.txt and emp2.txt

_____________________________________

Loading data from hdfs file to hive table. :

______________________________________

hdfs file : /user/training/staff.txt

hive> load data inpath 'staff.txt' into table emp;

note: each load statement will append rows to the existed table.

_________________________________

Overriding table data with load statement

___________________________________

hive> load data local inpath 'emp.txt' overwrite into table emp;

now emp table will contain only emp.txt data.

0 comments:

Performing Transformations Using MapReduce (Map Only)

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

__________________________________________________________________

input file : emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

IN input file 3rd field is sex, and 4th field is Salary.
I want to transform "m" as "Male"
and Salary into grades as A,B,C with following criteria.

if salary is < 30000 ---> C
if salary is >=30000 and <50000 ------>B
is salary is >=50000 ------------------>A

in this case also we dont want Reducer... output expected:

101,amar,Male,C,hyd
102,amala,Female,B,pune
103,siva,Male,B,hyd
104,sivani,Female,A,hyd
105,hari,Male,B,pune

package my.map.red;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class Transformation{

public static class Map1 extends Mapper<LongWritable,Text,Text,Text> {

public void map(LongWritable k, Text v, Context con)
throws IOException, InterruptedException{

String line=v.toString();

String[] words=line.split(",");
String sex=words[2];
int sal=Integer.parseInt(words[3]);
String grade=new String();
if (sal>=50000)
grade="A";
else if(sal>=30000)
grade="B";
else
grade="C";
if(sex.matches("f"))
sex="Female";
else
sex="Male";

String newline=words[0]+","+words[1]+","+sex+","+grade+words[4];

con.write(new Text(newline), new Text());

}

public static void main(String[] args) throws Exception {

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path p1=new Path(files[0]);

Path p2=new Path(files[1]);

Job j = new Job(c,"trans");

j.setJarByClass(Transformation.class);

j.setMapperClass(Map1.class);

j.setNumReduceTasks(0);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(j,p1);
FileOutputFormat.setOutputPath(j, p2);

System.exit(j.waitForCompletion(true) ? 0:1);

}

0 comments:

Generating new Columns Using Map Reduce (Map Only)

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 09640892992

___________________________________________________________________________

input file : emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

output expected:

from the input file , 4th field is salary. I want to generate tax, and netsal fields.
tax to be applied as 10% .
netsal=sal-tax.

101,amar,m,20000,4000,16000,hyd
102,amala,f,30000,6000,24000,pune
103,siva,m,40000,8000,32000,hyd
104,sivani,f,50000,10000,40000,hyd
105,hari,m,40000,8000,32000,pune

package my.map.red;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class GenerateColumns{

public static class Map1 extends Mapper<LongWritable,Text,Text,Text> {

public void map(LongWritable k, Text v, Context con)
throws IOException, InterruptedException{

String line=v.toString();

                           String[] words=line.split(",");
   int sal=Integer.parseInt(words[3]);
   int tax = sal*10/100;
   int netsal = sal - tax;

String newline=words[0]+","+words[1]+","+words[2]+","+

words[3]+","+tax+","+netsal+","+words[4];

con.write(new Text(newline), new Text());

}

public static void main(String[] args) throws Exception {

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path p1=new Path(files[0]);

Path p2=new Path(files[1]);

Job j = new Job(c,"GeneraterColumns");

j.setJarByClass(GenerateColumns.class);

j.setMapperClass(Map1.class);

j.setNumReduceTasks(0);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(j,p1);
FileOutputFormat.setOutputPath(j, p2);

System.exit(j.waitForCompletion(true) ? 0:1);

}

0 comments:

Filtering Columns Using MapReduce (Map Only)

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

_____________________________________________________________________________

author : halitics.blogspot.in (bharat)
input file : emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

output expected:
only name, salary , and city fields to be written in output file.

amar,20000,hyd
amala,30000,pune
siva,40000,hyd
sivani,50000,hyd
hari,40000,pune

package my.map.red;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class FilterColumns{

public static class Map1 extends Mapper<LongWritable,Text,Text,Text> {

public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException{

String line=v.toString();

String[] words=line.split(",");

String newline = words[1]+","+ words[3]+","+words[4];
con.write(new Text(newline), new Text());

}

public static void main(String[] args) throws Exception {

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path p1=new Path(files[0]);

Path p2=new Path(files[1]);

Job j = new Job(c,"FilterColumns");

j.setJarByClass(FilterColumns.class);

j.setMapperClass(Map1.class);

j.setNumReduceTasks(0);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(j,p1);
FileOutputFormat.setOutputPath(j, p2);

System.exit(j.waitForCompletion(true) ? 0:1);

}

0 comments:

Filtering rows using MapReduce (Map only)

By: Unknown - - 1 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 9640892992

_________________________________________________________________________________

author : halitics.blogspot.in (bharat)
input file : emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

output expected:
only female rows to be written into output file.

package my.map.red;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class RowFilter{

public static class Map1 extends Mapper<LongWritable,Text,Text,Text> {

public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException{

String line=v.toString();

String[] words=line.split(",");

String sex=words[3];
if(sex.matches("f"))

con.write(v, new Text());

}

public static void main(String[] args) throws Exception {

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path p1=new Path(files[0]);

Path p2=new Path(files[1]);

Job j = new Job(c,"RowFilter");

j.setJarByClass(RowFilter.class);

j.setMapperClass(Map1.class);

j.setNumReduceTasks(0);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(j,p1);
FileOutputFormat.setOutputPath(j, p2);

System.exit(j.waitForCompletion(true) ? 0:1);

}

1 comments:

Multiple Input Files-Example Prog

By: Unknown - - 2 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 9640892992

________________________________________________________________________

File1:emp1.txt

................

101,f,3000

102,m,4000

103,f,5000

104,m,5000

105,m,9000

................

File2:emp2.txt

................

201,aaaa,m,10000,11

202,b,m,30000,11

203,c,f,6000,14

204,dd,f,90000,14

205,ee,m,10000,13

206,ff,f,10000,13

207,mm,m,30000,15

.............................................................................

MapReduce program:

.............................................................................

package my.map.red;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

//import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class MultipleFiles

{

public static class Map1 extends Mapper<LongWritable,Text,Text,IntWritable>

{

public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException

{

String line=v.toString();

String[] words=line.split(",");

String sex=words[1];

int sal=Integer.parseInt(words[2]);

con.write(new Text(sex), new IntWritable(sal));

}

public static class Map2 extends Mapper<LongWritable,Text,Text,IntWritable>

{

public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException

{

String line=v.toString();

String[] words=line.split(",");

String sex=words[2];

int sal=Integer.parseInt(words[3]);

con.write(new Text(sex), new IntWritable(sal));

}

public static class Red extends Reducer<Text,IntWritable,Text,IntWritable>

{

public void reduce(Text sex, Iterable<IntWritable> salaries, Context con)

throws IOException , InterruptedException

{

int tot=0;

for(IntWritable sal:salaries)

{

tot+=sal.get();

}

con.write(sex, new IntWritable(tot));

}

public static void main(String[] args) throws Exception

{

Configuration c=new Configuration();

String[] files=new GenericOptionsParser(c,args).getRemainingArgs();

Path p1=new Path(files[0]);

Path p2=new Path(files[1]);

Path p3=new Path(files[2]);

Job j = new Job(c,"multiple");

j.setJarByClass(MultipleFiles.class);

j.setMapperClass(Map1.class);

j.setMapperClass(Map2.class);

j.setReducerClass(Red.class);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(IntWritable.class);

MultipleInputs.addInputPath(j, p1, TextInputFormat.class, Map1.class);

MultipleInputs.addInputPath(j,p2, TextInputFormat.class, Map2.class);

FileOutputFormat.setOutputPath(j, p3);

System.exit(j.waitForCompletion(true) ? 0:1);

}

2 comments:

Map Reduce example : Number of words per each line

By: Unknown - - 4 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

_______________________________________________________________________

This program will find number of words per each line.

input hdfs file:

mydir/file1.txt
_______________________________

hadoop execution model is mapreduce
mapreduce is a backend business process logic
advantage of mapreduce is it will not sort raw data
but output of the mapper will be sorted

________________________________________________

o/p by this program:

line1 5
line2 7
line3 10
line4 8

__________________________________________________

package my.map.red.app;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.util.GenericOptionsParser;

public class LineWordsCount
{
public static class MapForLineWordsCount extends Mapper<LongWritable, Text, Text,

IntWritable>
{
int lineno = 1;
public void map(LongWritable key, Text value, Context con) throws IOException,

InterruptedException
{
String line = value.toString();

StringTokenizer token = new StringTokenizer(line);
while(token.hasMoreTokens())
{
String word = token.nextToken();
String l = "line"+lineno;
Text outputKey = new Text(l);
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}

lineno++;

} // end of map()
} //end of Mapper Class

/*
output of the mapper phase :

<line1, <1,1,1,1,1>>
<line2, <1,1,1,1,1,1,1>>
<line3, <1,1,1,1,1,1,1,1,1,1>>
<line4, <1,1,1,1,1,1,1,1>>

*/

public static class ReduceForLineWordsCount extends Reducer<Text, IntWritable, Text,

IntWritable>
{
public void reduce(Text line, Iterable<IntWritable> values, Context con) throws

IOException, InterruptedException
{

int sum = 0;

for(IntWritable value : values)
{

sum += value.get();

}

con.write(line , new IntWritable(sum));

} // end of reduce()
} // end of Reducer class
/*

output of the reducer

line1 5
line2 7
line3 10
line4 8

*/

// job definition

public static void main(String[] args) throws Exception
{

Configuration c = new Configuration();

String[] files = new GenericOptionsParser(c, args).getRemainingArgs();

Path input = new Path(files[0]);

Path output = new Path(files[1]);

Job j = new Job(c, "Linewordscount");

j.setJarByClass(LineWordsCount.class);

j.setMapperClass(MapForLineWordsCount.class);

j.setCombinerClass(ReduceForLineWordsCount.class);

j.setReducerClass(ReduceForLineWordsCount.class);

j.setOutputKeyClass(Text.class);

j.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(j, input);

FileOutputFormat.setOutputPath(j, output);

System.exit(j.waitForCompletion(true) ? 0:1);

} // end of main()

} end of main class

Halitics

Temperature data(with positive and negative air temperatures) processing with hive

0 comments:

Temperature data Processing using hive

0 comments:

Appending rows of one table to another table.

0 comments:

hive table to table copy

0 comments:

Loading data into hive tables

0 comments:

Performing Transformations Using MapReduce (Map Only)

0 comments:

Generating new Columns Using Map Reduce (Map Only)

0 comments:

Filtering Columns Using MapReduce (Map Only)

0 comments:

Filtering rows using MapReduce (Map only)

1 comments:

Multiple Input Files-Example Prog

2 comments:

Map Reduce example : Number of words per each line

4 comments:

Search

Follow us

Popular Posts

About Me

Blog Archive

Advertising