MapReduce Lession1



Map Reduce:_
 
    1) Map Reduce is an execution model in hadoop framework

    2) mapreduce is a  batch process
       which is subdivided into two seperate Phases.
 
       i) Mapper Phase
       ii) Reducer Phase


    i) Mapper Phase:-

          From raw input file, It seperates required Output Key and Output value.

    ii) Reducer Phase:-

       mapper output is sent as input to Reducer

       Reducer has two responsibilties:

        a) grouping data based on key


        b) aggregating (summarization).


    In distributed systems, (cluster)   mapper and reducer are executed in seperate systems(slave nodes).

  hdfsinput ---- mapper ----- o/p ---- reducer  ---------  hdfs o/p


    mapper output is called intermediate data or shuffled data.

  the process of sending mapper output to reducer is called shuffling.

  once reducer output is produced, mapper output will be deleted.  

  


 

0 comments: