Hive Tables



we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
___________________________________________________________________________
Hive Tables:
                Hive tables are classified into two types.
Ø  Inner tables
Ø  External tables
Inner Tables:
                Whenever a hive table is created in the following default location a directory will be created for the table.If a file is loaded into table,the file will be copied into table directory.
Ex: Hive> Create table samp(str string);
      now IN HDFS following directory will be created.
     /user/hive/warehouse/samp
If you load a file,
                Ex: Hive > load data local inpath ‘file1.txt’ into table samp;
In HDFS
                /user/hive/warehouse/samp/file1.text
                In above path samp is directory file1.text is file which loaded in to table. When you select the rows of table,hive will read data from all files of table directory.
             Ex: Hive > select * from samp;
Now hive will read data from /user/hive/warehouse/samp/file1.txt file.
                If you drop inner table,the table directory will also be deleted from HDFS.
              That means of table is dropped,you loose mete data and also data.
                                Ex: Hive > drop table samp;

                                From HDFS, /user/hive/warehouse/samp/ directory will be deleted.
External Table:
Ø  Table uses custom location
Ø  When table is dropped,the backendHDFS table directory will not be deleted.
That means,if table is dropped only metadata (table from hive) will be dropped.But data will be safely available in HDFS directory.
Ex:
     Hive > create external table sample(str string)  location ‘/user/mydir’;
Now in HDFS, /user/mydir will be created for table.

If you load a file into table, hive > load data local inpath ‘samp.text’ into table sample;

In HDFS, /user/mydir/samp.txt

If you select rows from sample table
                hive > select * from sample;
Now hive will read data from all files of following directory
                /user/mydir/samp.txt
If sample table is dropped.
                hive > drop table sample;
Now from hive, sample table will be deleted.
But backend HDFS directory still available.
                /user/mydir/samp.txt
So still the data can be reused by hive or other echosystems of hadoop.

The Summary
Inner tables : 1.Use default warehouse location.
                          2.if table dropped ,data and metadata will be lost.

External tables: 1.Uses custom location of HDFS.
                              2. If  table  is  dropped,  only  metadata will be lost. 

5 comments:

Hive and Hql




we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
_______________________________________________________________
Hive & Hql
In hive the query language used is hql(hive query language)
Hive DDL Statements
                Create
                Alter
Drop
Hive DML Statements
                Insert
                Select
Create:
                Used to Create Database and tables and views.
Alter:
                Used to modify structure of hive tables such as changing data types and adding new columns etc.
Drop:
                To drop hive database,tables and views.
Insert: ( Only mass insertion available in hive)
                To copy data  from one table to another table.
Select:
                To select rows from hive tables or views.
Other Statements of HQL
                Union all
                 Joins etc..
Hive features:
Ø  Hive has string parsers to handle string and unstructured text data.
Ø  Hive has XML parsers to handle XML data.
Ø  Hive has json parsers to process json data.
Ø  Hive has URL parsers to make URL data structured.
Ø  Hive has serde feature,which is short from of serialization and deserialization.
Ex: HiveJsonserde
Ø  Hive has UDF(user defined function) feature) feature to develop custom frnctionalities.
Hive UDFs can be written in java,python,c++,ruby and r.



                 

0 comments:

What Is Hive?





we also offer , online and classroom trainings
we support in POC
author: Bharat (sree ram)
contact : 04042026071
__________________________________________________________
WHAT IS HIVE   ?      
Hive is one of important echo system in hadoop framework, 
by which , you can process and analyze HDFS files data .
Hive is also called data warehouse environment of hadoop framework.
The language used in hive is  hql (Hive Query language) which is similar to sql of rdbms.
but there are lots of differences between hive and rdbms.
Hive supports only batch process (bulk data processing) , and does not support row level operations such as reading a row randomly (ex: select * from sales where prid='909') , inserting a single row (ex: insert into sales values(......) )etc..
hql does not have dml statements to delete and update rows, but by using indirect methods we can update or delete data of hive tables.
hive will run on top hdfs and mapreduce.
Hive storage is HDFS:
 this means, when you create a table in hive , in hdfs one table directory will be created.        
 If  you load any file into hive table, the file will be copied into its backend hdfs directory.
Hive execution model is mapreduce :
this means, when you submit hql statement, the hql statement will be converted into MapReduce code, and the converted code will be submitted to jvm. so hadoop can execute the hql statement in MapReduce style.
so , developer/analyst can easily process or analyze the data using hql statements with out  writing complex java programs.
Especially, hive is good for adhoc reporting or analytics.
but sql or hql is not solution for every situation of analytics. Because for your analytics, some custom functionalities are required , which are not available in hive built in functions.
These custom functionalities can be developed and written in hive UDFs(User defined functions).
hive udfs can be developed in following languages:
    --> java
    --> python
    --> c++
    --> Ruby
    --> R (statistical programming)
These udfs to be registered in hive, and then can be called any number of times.
   Author:
   Bharat Ram  


0 comments:

Home

0 comments: