Temperature data(with positive and negative air temperatures) processing with hive

we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact :04042026071
_____________________________________________________________________



input file : tempr.txt

xxxxx1950xxx-20xx
xxxxx1950xxx24xx
xxxxx1950xxx-21xx
xxxxx1950xxx10xx
xxxxx1951xxx-22xx
xxxxx1951xxx19xx
xxxxx1951xxx-24xx
xxxxx1951xxx22xx

I want to find out  for each year max temperature  and minimum temperature.

o/p required.

1950    20   -21
1951    22   -24

step1)
hive> use halitics;
 hive> create table  raw(str string);
 hive> load data local inpath 'tempr.txt' into table raw;
step2)
 hive> create table positives(y int, t int);
 hive> create table nagatives live positives;
hive> insert overwrite table positives
           select substr(str,6,4), substr(str,13,2) from raw where substr(str,13,1) != '-';
hive> insert overwrite table negatives
            select substr(str,6,4) , substr(str,13,3) from raw where substr(str,13,1) = '-';
step3)
 hive> create table target(y int, t int);
 hive> insert overwrite table target
             select * from (
                  select * from positives
                       union all
                  select * from negatives ) tab;
 hive>  create table result(y int, max int, min int);
 hive> insert overwrite table result
              select  y, max(t), min(t) from target group by y;
 hive> select * from result;
 o/p -->
     
1950    20   -21
1951    22   -24


0 comments:

Temperature data Processing using hive



we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
--------------------------------------------------------------------------------------------------



input file :   temperature.txt

xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx


year starting position  in each line  :  6th column

length  of the year  :   4
temperature starting position  in each line :   13 column
  length of the temperature :  2

based on  these clues, first we need to seperate  year and temperature from the raw input table.
then  we can perform  required analytics.

step1.

hive> use halitics;
hive > create table  raw(str  string);
hive > load data local inpath 'temperature.txt' into table raw;
hive> select * from  raw;

o/p --->
xxxxx1950xxx20xx
xxxxx1950xxx24xx
xxxxx1950xxx21xx
xxxxx1951xxx22xx
xxxxx1951xxx19xx

step2.

hive> create table  tempr(y int, t int)
hive> insert overwrite table tempr
           select substr(str,6,4), substr(str,13,2) from raw;
hive> select * from tempr;

o/p--->

1950     20
1950     24
1950     21
1951     22
1951     19

now your data became structured.


step3.

hive> create table result(y int, max int);
hive> insert overwrite table result
              select y, max(t) from tempr  group by y;
hive>  select * from result;



step4.

hive> create table results(y int, max int, min int);
hive> insert overwrite table results
              select y, max(t) , min(t) from tempr  group by y;
hive>  select * from results;








0 comments:

Appending rows of one table to another table.





we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
---------------------------------------------------------------------------------------------------


table s :   emp1 and emp2

if you directly use the following statement, table emp1 data will be overriden.

hive>  insert  overwrite table emp1
           select * from emp2;

(we used * means,, the schema is same of emp1 and emp2 tables).

But I want to append rows of emp2  to emp1.

hive> insert overwrite table emp1
            select  * from (
                select  * from emp1
                        union all
                select * from emp2) e;

note: in hive union should be used as  sub query.
 
but the above process is bad way.

I want to append only females of emp2 to emp1.

hive> insert overwrite table emp1
            select  * from (
                select  * from emp1
                        union all
                select * from emp2  where sex='f' ) e;


but the above two ways are bad ...

because,, for example ,  emp1 has 10lakh rows, and emp2 has 100 rows.
  to append 100 rows of emp2 table  the union query has to read 10lakh of emp1 and 100 emp2 rows.

I want to append 100 rows, with out reading 10 lakh rows of emp1....

emp1's backend hdfs directory is...
     /user/hive/warehouse/halitics.db/emp1
        suppose the file loaded into directory is  emp.txt

emp2's backend  hdfs directory is..
      /user/hive/warehouse/halitics.db/emp2
        suppose the file loaded into directory is  emp2.txt

now   I am going to append emp2 data to emp1.

from the command prompt,

$ hadoop fs -cp /user/hive/warehouse/halitics.db/emp2/emp2.txt     /user/hive/warehouse/halitics.db/emp1

now,    /halitics.db/emp1 directory has  emp.txt and emp2.txt.

   this , indirect way of loading and appending into table.  But good way when compared with  hive unions.





 
  

0 comments:

hive table to table copy


we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact :04042026071
__________________________________________________________




1)  hive>  insert overwrite table  tab1
                  select  *  from tab2;

     all  rows  of tab2 will be loaded to tab1.

2)  hive>  insert overwrite table tab1
                 select * from tab2 where   a>100;

  only matching rows with criteria   will be loaded into tab1.

3) tabx has ---->   a int, b int, c int, d int
   taby has --->  a int, d int columns

 now how to load only   a, d columns of taby to tabx.

   hive> insert overwrite  table tabx
               select  a, 0 as b, 0 as c, d from   taby;


   

0 comments:

Loading data into hive tables

we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 9640892992
_______________________________________________________________



input file :  emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
________________________

loading data  from local file to  hive table.


hive> create table  halitics.emp(ecode string, ename string, sex string, esal int, city string)
      > row format  delimited  fields terminated by ',';

hive>  Load  data  local  inpath  'emp.txt' into table  halitcs.emp;

note:  in above example  halitics is database..
    if you have already selected the database,  no need use database name  while using table name

hive>use halitics;
hive>Load  data  local  inpath  'emp.txt' into table  emp;

if you load again....

hive > Load  data  local  inpath  'emp2.txt' into table  emp;

now    the  rows of emp2.txt will be appended to emp table.

 in hdfs, 

     /user/hive/warehouse/halitics.db/emp   directory will have  2 files.. emp.txt   and emp2.txt

_____________________________________
Loading  data    from  hdfs file to hive table.   :
______________________________________

hdfs file :    /user/training/staff.txt

hive> load data  inpath  'staff.txt' into table emp;

note:  each load statement will append rows to the existed table.

_________________________________
Overriding  table data with    load statement
___________________________________

hive>  load  data local inpath 'emp.txt'   overwrite  into table  emp;

now emp table will contain  only  emp.txt data.






0 comments:

Performing Transformations Using MapReduce (Map Only)



we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
__________________________________________________________________
input file :  emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

IN input file 3rd field is sex, and 4th field is Salary.
I want to transform "m" as "Male"
and Salary into grades as A,B,C with following criteria.

if salary is < 30000 ---> C
if salary is  >=30000 and <50000 ------>B
is salary is >=50000   ------------------>A

in this case also we dont want Reducer... output expected:

 101,amar,Male,C,hyd
102,amala,Female,B,pune
103,siva,Male,B,hyd
104,sivani,Female,A,hyd
105,hari,Male,B,pune



package my.map.red;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class  Transformation{
  public static class Map1 extends Mapper<LongWritable,Text,Text,Text>  {
         public void map(LongWritable k, Text v, Context con)
            throws IOException, InterruptedException{
                           String line=v.toString();
                           String[] words=line.split(",");
                           String sex=words[2];
                           int sal=Integer.parseInt(words[3]);
                           String grade=new String();
                          if (sal>=50000)
                                grade="A";
                          else if(sal>=30000)
                                    grade="B";
                                else
                                    grade="C";
                         if(sex.matches("f"))
                               sex="Female";
                         else
                                sex="Male";
              String newline=words[0]+","+words[1]+","+sex+","+grade+words[4];

                           con.write(new Text(newline), new Text());
              }
  }
  public static void main(String[] args) throws Exception  {
              Configuration c=new Configuration();
              String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
              Path p1=new Path(files[0]);
              Path p2=new Path(files[1]);
              Job j = new Job(c,"trans");
              j.setJarByClass(Transformation.class);
              j.setMapperClass(Map1.class);
              j.setNumReduceTasks(0);
              j.setOutputKeyClass(Text.class);
              j.setOutputValueClass(Text.class);
              FileInputFormat.addInputPath(j,p1);
               FileOutputFormat.setOutputPath(j, p2);
                System.exit(j.waitForCompletion(true) ? 0:1);
             
  }

}

0 comments:

Generating new Columns Using Map Reduce (Map Only)


we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 09640892992
___________________________________________________________________________

input file :  emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

output expected:

from the input file , 4th field is salary.  I want to generate tax, and netsal fields.
tax to be applied as 10% .
netsal=sal-tax.

 101,amar,m,20000,4000,16000,hyd
102,amala,f,30000,6000,24000,pune
103,siva,m,40000,8000,32000,hyd
104,sivani,f,50000,10000,40000,hyd
105,hari,m,40000,8000,32000,pune



package my.map.red;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class  GenerateColumns{
  public static class Map1 extends Mapper<LongWritable,Text,Text,Text>  {
         public void map(LongWritable k, Text v, Context con)
            throws IOException, InterruptedException{
                           String line=v.toString();
                           String[] words=line.split(",");
                           int sal=Integer.parseInt(words[3]);
                           int tax = sal*10/100;
                           int netsal = sal - tax;
              String newline=words[0]+","+words[1]+","+words[2]+","+
                                 words[3]+","+tax+","+netsal+","+words[4];

                           con.write(new Text(newline), new Text());
              }
  }
  public static void main(String[] args) throws Exception  {
              Configuration c=new Configuration();
              String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
              Path p1=new Path(files[0]);
              Path p2=new Path(files[1]);
              Job j = new Job(c,"GeneraterColumns");
              j.setJarByClass(GenerateColumns.class);
              j.setMapperClass(Map1.class);
              j.setNumReduceTasks(0);
              j.setOutputKeyClass(Text.class);
              j.setOutputValueClass(Text.class);
              FileInputFormat.addInputPath(j,p1);
               FileOutputFormat.setOutputPath(j, p2);
                System.exit(j.waitForCompletion(true) ? 0:1);
             
  }

}

0 comments:

Filtering Columns Using MapReduce (Map Only)



we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
_____________________________________________________________________________

author :  halitics.blogspot.in (bharat)
input file :  emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

output expected:
only name, salary , and city fields to be written  in output file.

 amar,20000,hyd
amala,30000,pune
siva,40000,hyd
sivani,50000,hyd
hari,40000,pune



package my.map.red;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class  FilterColumns{
  public static class Map1 extends Mapper<LongWritable,Text,Text,Text>  {
              public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException{
                           String line=v.toString();
                           String[] words=line.split(",");
                           String newline = words[1]+","+ words[3]+","+words[4];
                           con.write(new Text(newline), new Text());
              }
  }
  public static void main(String[] args) throws Exception  {
              Configuration c=new Configuration();
              String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
              Path p1=new Path(files[0]);
              Path p2=new Path(files[1]);
              Job j = new Job(c,"FilterColumns");
              j.setJarByClass(FilterColumns.class);
              j.setMapperClass(Map1.class);
              j.setNumReduceTasks(0);
              j.setOutputKeyClass(Text.class);
              j.setOutputValueClass(Text.class);
              FileInputFormat.addInputPath(j,p1);
               FileOutputFormat.setOutputPath(j, p2);
                System.exit(j.waitForCompletion(true) ? 0:1);
             
  }

}

0 comments:

Filtering rows using MapReduce (Map only)



we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 9640892992
_________________________________________________________________________________
author :  halitics.blogspot.in (bharat)
input file :  emp.txt

_____________________
101,amar,m,20000,hyd
102,amala,f,30000,pune
103,siva,m,40000,hyd
104,sivani,f,50000,hyd
105,hari,m,40000,pune
____________________

output expected:
only female rows to be written into output file.



package my.map.red;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class RowFilter{
  public static class Map1 extends Mapper<LongWritable,Text,Text,Text>  {
              public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException{
                           String line=v.toString();
                           String[] words=line.split(",");
                           String sex=words[3];
                           if(sex.matches("f"))
                                   con.write(v, new Text());
              }
  }
  public static void main(String[] args) throws Exception  {
              Configuration c=new Configuration();
              String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
              Path p1=new Path(files[0]);
              Path p2=new Path(files[1]);
              Job j = new Job(c,"RowFilter");
              j.setJarByClass(RowFilter.class);
              j.setMapperClass(Map1.class);
              j.setNumReduceTasks(0);
              j.setOutputKeyClass(Text.class);
              j.setOutputValueClass(Text.class);
              FileInputFormat.addInputPath(j,p1);
               FileOutputFormat.setOutputPath(j, p2);
                System.exit(j.waitForCompletion(true) ? 0:1);
             
  }

}

1 comments:

Multiple Input Files-Example Prog




we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 9640892992
________________________________________________________________________

File1:emp1.txt
................
101,f,3000
102,m,4000
103,f,5000
104,m,5000
105,m,9000
................
File2:emp2.txt
................
201,aaaa,m,10000,11
202,b,m,30000,11
203,c,f,6000,14
204,dd,f,90000,14
205,ee,m,10000,13
206,ff,f,10000,13
207,mm,m,30000,15

.............................................................................
MapReduce program:
.............................................................................
package my.map.red;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
//import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class MultipleFiles
{
  public static class Map1 extends Mapper<LongWritable,Text,Text,IntWritable>
  {
              public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException
              {
                           String line=v.toString();
                           String[] words=line.split(",");
                           String sex=words[1];
                           int sal=Integer.parseInt(words[2]);
                           con.write(new Text(sex), new IntWritable(sal));
              }
  }
  public static class Map2 extends Mapper<LongWritable,Text,Text,IntWritable>
  {
              public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException
              {
                           String line=v.toString();
                           String[] words=line.split(",");
                           String sex=words[2];
                           int sal=Integer.parseInt(words[3]);
                           con.write(new Text(sex), new IntWritable(sal));
              }
  }
  public static class Red extends Reducer<Text,IntWritable,Text,IntWritable>
  {
               public void reduce(Text sex, Iterable<IntWritable> salaries, Context con)
                throws IOException , InterruptedException
                {
                            int tot=0;
                            for(IntWritable sal:salaries)
                            {
                                    tot+=sal.get();
                            }
                            con.write(sex, new IntWritable(tot));
                       
                }
   }
  public static void main(String[] args) throws Exception
  {
              Configuration c=new Configuration();
              String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
              Path p1=new Path(files[0]);
              Path p2=new Path(files[1]);
              Path p3=new Path(files[2]);
              Job j = new Job(c,"multiple");
              j.setJarByClass(MultipleFiles.class);
              j.setMapperClass(Map1.class);
              j.setMapperClass(Map2.class);
              j.setReducerClass(Red.class);
              j.setOutputKeyClass(Text.class);
              j.setOutputValueClass(IntWritable.class);
              MultipleInputs.addInputPath(j, p1, TextInputFormat.class, Map1.class);
              MultipleInputs.addInputPath(j,p2, TextInputFormat.class, Map2.class);
      FileOutputFormat.setOutputPath(j, p3);
      System.exit(j.waitForCompletion(true) ? 0:1);
             
  }

}

2 comments:

Map Reduce example : Number of words per each line


we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
_______________________________________________________________________

This program will find number of words per each line.

input hdfs file:

    mydir/file1.txt
_______________________________

hadoop execution model is mapreduce
mapreduce is a backend business process logic
advantage of mapreduce is it will not sort raw data
but output of the mapper will be sorted

________________________________________________

o/p by this program:

  line1  5
  line2  7
  line3  10
  line4  8

__________________________________________________

 package my.map.red.app;

 import java.io.IOException;

 import java.util.StringTokenizer;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.mapreduce.util.GenericOptionsParser;

 public class LineWordsCount
 {
    public static class MapForLineWordsCount extends Mapper<LongWritable, Text, Text,

IntWritable>
    {
           int lineno = 1;
           public void map(LongWritable key, Text value, Context con) throws IOException,

InterruptedException
           {
               String line = value.toString();

               StringTokenizer token = new StringTokenizer(line);
               while(token.hasMoreTokens())
               {
                  String word = token.nextToken();
                  String l = "line"+lineno;
                  Text outputKey = new Text(l);
                  IntWritable outputValue = new IntWritable(1);
                  con.write(outputKey, outputValue);
               }

               lineno++;

            } // end of map()
    } //end of Mapper Class

  /*
     output of the mapper phase :
   
      <line1, <1,1,1,1,1>>
      <line2, <1,1,1,1,1,1,1>>
      <line3, <1,1,1,1,1,1,1,1,1,1>>
      <line4, <1,1,1,1,1,1,1,1>>

  */

   public static class ReduceForLineWordsCount extends Reducer<Text, IntWritable, Text,

IntWritable>
   {
          public void reduce(Text line, Iterable<IntWritable> values, Context con) throws

IOException, InterruptedException
          {

               int sum = 0;

               for(IntWritable value : values)
               {

                   sum += value.get();

               }

               con.write(line , new IntWritable(sum));

          } // end of reduce()
   } // end of Reducer class
/*

 output of the reducer

   line1 5
   line2 7
   line3 10
   line4 8

*/

 // job definition

   public static void  main(String[] args) throws Exception
   {

           Configuration c = new Configuration();

           String[] files = new GenericOptionsParser(c, args).getRemainingArgs();

           Path input = new Path(files[0]);

           Path output = new Path(files[1]);

           Job j = new Job(c, "Linewordscount");

           j.setJarByClass(LineWordsCount.class);

           j.setMapperClass(MapForLineWordsCount.class);

           j.setCombinerClass(ReduceForLineWordsCount.class);

           j.setReducerClass(ReduceForLineWordsCount.class);

           j.setOutputKeyClass(Text.class);

           j.setOutputValueClass(IntWritable.class);

           FileInputFormat.addInputPath(j, input);

           FileOutputFormat.setOutputPath(j, output);

           System.exit(j.waitForCompletion(true) ? 0:1);

   } // end of main()

} end of main class

4 comments: