What is data locality in Hadoop?

Data locality refers to the ability to move the computation close to where the actual data resides on the node, instead of moving large data to computation. This minimises network congestion and helps in high throughput and faster execution.

Three categories of Data Locality is hadoop:
Data local data locality : The map task runs on the same node where the data block resides.
Intra-Rack data locality : The map task runs on the different node  and the corresponding data block resides on different node of the same rack.
Inter-Rack data localityThe map task runs on the different node  and the corresponding data block resides on different node of the different rack.


Reference:
https://www.quora.com/What-does-the-term-data-locality-mean-in-Hadoop

Post a Comment

1 Comments