Appending rows of one table to another table.





we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
---------------------------------------------------------------------------------------------------


table s :   emp1 and emp2

if you directly use the following statement, table emp1 data will be overriden.

hive>  insert  overwrite table emp1
           select * from emp2;

(we used * means,, the schema is same of emp1 and emp2 tables).

But I want to append rows of emp2  to emp1.

hive> insert overwrite table emp1
            select  * from (
                select  * from emp1
                        union all
                select * from emp2) e;

note: in hive union should be used as  sub query.
 
but the above process is bad way.

I want to append only females of emp2 to emp1.

hive> insert overwrite table emp1
            select  * from (
                select  * from emp1
                        union all
                select * from emp2  where sex='f' ) e;


but the above two ways are bad ...

because,, for example ,  emp1 has 10lakh rows, and emp2 has 100 rows.
  to append 100 rows of emp2 table  the union query has to read 10lakh of emp1 and 100 emp2 rows.

I want to append 100 rows, with out reading 10 lakh rows of emp1....

emp1's backend hdfs directory is...
     /user/hive/warehouse/halitics.db/emp1
        suppose the file loaded into directory is  emp.txt

emp2's backend  hdfs directory is..
      /user/hive/warehouse/halitics.db/emp2
        suppose the file loaded into directory is  emp2.txt

now   I am going to append emp2 data to emp1.

from the command prompt,

$ hadoop fs -cp /user/hive/warehouse/halitics.db/emp2/emp2.txt     /user/hive/warehouse/halitics.db/emp1

now,    /halitics.db/emp1 directory has  emp.txt and emp2.txt.

   this , indirect way of loading and appending into table.  But good way when compared with  hive unions.





 
  

0 comments: