Hive Tables

By: Unknown - - 5 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

___________________________________________________________________________

Hive Tables:

Hive tables are classified into two types.

Ø Inner tables

Ø External tables

Inner Tables:

Whenever a hive table is created in the following default location a directory will be created for the table.If a file is loaded into table,the file will be copied into table directory.

Ex: Hive> Create table samp(str string);

now IN HDFS following directory will be created.

/user/hive/warehouse/samp

If you load a file,

Ex: Hive > load data local inpath ‘file1.txt’ into table samp;

In HDFS

/user/hive/warehouse/samp/file1.text

In above path samp is directory file1.text is file which loaded in to table. When you select the rows of table,hive will read data from all files of table directory.

Ex: Hive > select * from samp;

Now hive will read data from /user/hive/warehouse/samp/file1.txt file.

If you drop inner table,the table directory will also be deleted from HDFS.

That means of table is dropped,you loose mete data and also data.

Ex: Hive > drop table samp;

From HDFS, /user/hive/warehouse/samp/ directory will be deleted.

External Table:

Ø Table uses custom location

Ø When table is dropped,the backendHDFS table directory will not be deleted.

That means,if table is dropped only metadata (table from hive) will be dropped.But data will be safely available in HDFS directory.

Ex:

Hive > create external table sample(str string) location ‘/user/mydir’;

Now in HDFS, /user/mydir will be created for table.

If you load a file into table, hive > load data local inpath ‘samp.text’ into table sample;

In HDFS, /user/mydir/samp.txt

If you select rows from sample table

hive > select * from sample;

Now hive will read data from all files of following directory

/user/mydir/samp.txt

If sample table is dropped.

hive > drop table sample;

Now from hive, sample table will be deleted.

But backend HDFS directory still available.

/user/mydir/samp.txt

So still the data can be reused by hive or other echosystems of hadoop.

The Summary

Inner tables : 1.Use default warehouse location.

2.if table dropped ,data and metadata will be lost.

External tables: 1.Uses custom location of HDFS.

2. If table is dropped, only metadata will be lost.

5 comments:

Hive and Hql

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

_______________________________________________________________

Hive & Hql

In hive the query language used is hql(hive query language)

Hive DDL Statements

Create

Alter

Drop

Hive DML Statements

Insert

Select

Create:

Used to Create Database and tables and views.

Alter:

Used to modify structure of hive tables such as changing data types and adding new columns etc.

Drop:

To drop hive database,tables and views.

Insert: ( Only mass insertion available in hive)

To copy data from one table to another table.

Select:

To select rows from hive tables or views.

Other Statements of HQL

Union all

Joins etc..

Hive features:

Ø Hive has string parsers to handle string and unstructured text data.

Ø Hive has XML parsers to handle XML data.

Ø Hive has json parsers to process json data.

Ø Hive has URL parsers to make URL data structured.

Ø Hive has serde feature,which is short from of serialization and deserialization.

Ex: HiveJsonserde

Ø Hive has UDF(user defined function) feature) feature to develop custom frnctionalities.

Hive UDFs can be written in java,python,c++,ruby and r.

0 comments:

What Is Hive?

By: Unknown - - 0 comments

we also offer , online and classroom trainings

we support in POC

author: Bharat (sree ram)

contact : 04042026071

__________________________________________________________

WHAT IS HIVE ?

Hive is one of important echo system in hadoop framework,

by which , you can process and analyze HDFS files data .

Hive is also called data warehouse environment of hadoop framework.

The language used in hive is hql (Hive Query language) which is similar to sql of rdbms.

but there are lots of differences between hive and rdbms.

Hive supports only batch process (bulk data processing) , and does not support row level operations such as reading a row randomly (ex: select * from sales where prid='909') , inserting a single row (ex: insert into sales values(......) )etc..

hql does not have dml statements to delete and update rows, but by using indirect methods we can update or delete data of hive tables.

hive will run on top hdfs and mapreduce.

Hive storage is HDFS:

this means, when you create a table in hive , in hdfs one table directory will be created.

If you load any file into hive table, the file will be copied into its backend hdfs directory.

Hive execution model is mapreduce :

this means, when you submit hql statement, the hql statement will be converted into MapReduce code, and the converted code will be submitted to jvm. so hadoop can execute the hql statement in MapReduce style.

so , developer/analyst can easily process or analyze the data using hql statements with out writing complex java programs.

Especially, hive is good for adhoc reporting or analytics.

but sql or hql is not solution for every situation of analytics. Because for your analytics, some custom functionalities are required , which are not available in hive built in functions.

These custom functionalities can be developed and written in hive UDFs(User defined functions).

hive udfs can be developed in following languages:

--> java

--> python

--> c++

--> Ruby

--> R (statistical programming)

These udfs to be registered in hive, and then can be called any number of times.

Author:

Bharat Ram

halitics.blogspot.in

0 comments:

Home

By: Unknown - - 0 comments

map reduce wordcount example 1 map reduce wordcount example 2 map reduce wordcount example 3

Halitics

Hive Tables

5 comments:

Hive and Hql

0 comments:

What Is Hive?

0 comments:

Home

0 comments:

Search

Follow us

Popular Posts

About Me

Blog Archive

Advertising