Appending rows of one table to another table.
we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
---------------------------------------------------------------------------------------------------
table s : emp1 and emp2
if you directly use the following statement, table emp1 data will be overriden.
hive> insert overwrite table emp1
select * from emp2;
(we used * means,, the schema is same of emp1 and emp2 tables).
But I want to append rows of emp2 to emp1.
hive> insert overwrite table emp1
select * from (
select * from emp1
union all
select * from emp2) e;
note: in hive union should be used as sub query.
but the above process is bad way.
I want to append only females of emp2 to emp1.
hive> insert overwrite table emp1
select * from (
select * from emp1
union all
select * from emp2 where sex='f' ) e;
but the above two ways are bad ...
because,, for example , emp1 has 10lakh rows, and emp2 has 100 rows.
to append 100 rows of emp2 table the union query has to read 10lakh of emp1 and 100 emp2 rows.
I want to append 100 rows, with out reading 10 lakh rows of emp1....
emp1's backend hdfs directory is...
/user/hive/warehouse/halitics.db/emp1
suppose the file loaded into directory is emp.txt
emp2's backend hdfs directory is..
/user/hive/warehouse/halitics.db/emp2
suppose the file loaded into directory is emp2.txt
now I am going to append emp2 data to emp1.
from the command prompt,
$ hadoop fs -cp /user/hive/warehouse/halitics.db/emp2/emp2.txt /user/hive/warehouse/halitics.db/emp1
now, /halitics.db/emp1 directory has emp.txt and emp2.txt.
this , indirect way of loading and appending into table. But good way when compared with hive unions.
because,, for example , emp1 has 10lakh rows, and emp2 has 100 rows.
to append 100 rows of emp2 table the union query has to read 10lakh of emp1 and 100 emp2 rows.
I want to append 100 rows, with out reading 10 lakh rows of emp1....
emp1's backend hdfs directory is...
/user/hive/warehouse/halitics.db/emp1
suppose the file loaded into directory is emp.txt
emp2's backend hdfs directory is..
/user/hive/warehouse/halitics.db/emp2
suppose the file loaded into directory is emp2.txt
now I am going to append emp2 data to emp1.
from the command prompt,
$ hadoop fs -cp /user/hive/warehouse/halitics.db/emp2/emp2.txt /user/hive/warehouse/halitics.db/emp1
now, /halitics.db/emp1 directory has emp.txt and emp2.txt.
this , indirect way of loading and appending into table. But good way when compared with hive unions.
0 comments: