MapReduce Lession1
Map Reduce:_
1) Map Reduce is an execution model in hadoop framework
2) mapreduce is a batch process
which is subdivided into two seperate Phases.
1) Map Reduce is an execution model in hadoop framework
2) mapreduce is a batch process
which is subdivided into two seperate Phases.
i) Mapper Phase
ii) Reducer Phase
i) Mapper Phase:-
From raw input file, It seperates required Output Key and Output value.
ii) Reducer Phase:-
mapper output is sent as input to Reducer
Reducer has two responsibilties:
a) grouping data based on key
b) aggregating (summarization).
In distributed systems, (cluster) mapper and reducer are executed in seperate systems(slave nodes).
hdfsinput ---- mapper ----- o/p ---- reducer --------- hdfs o/p
mapper output is called intermediate data or shuffled data.
the process of sending mapper output to reducer is called shuffling.
once reducer output is produced, mapper output will be deleted.
ii) Reducer Phase
i) Mapper Phase:-
From raw input file, It seperates required Output Key and Output value.
ii) Reducer Phase:-
mapper output is sent as input to Reducer
Reducer has two responsibilties:
a) grouping data based on key
b) aggregating (summarization).
In distributed systems, (cluster) mapper and reducer are executed in seperate systems(slave nodes).
hdfsinput ---- mapper ----- o/p ---- reducer --------- hdfs o/p
mapper output is called intermediate data or shuffled data.
the process of sending mapper output to reducer is called shuffling.
once reducer output is produced, mapper output will be deleted.
0 comments: