MapReduce (Part 1)

?
  • Created by: smrc
  • Created on: 17-05-19 18:43
What is MapReduce?
MapReduce is a new programming model for data parallel programming.
1 of 27
What is MapReduce used for?
Often used for the distributed processing of Big Data stored on a distributed file system.
2 of 27
Who pioneered MapReduce?
MapReduce was pioneered by Google, and popularised by the Hadoop implementation from Apache.
3 of 27
Give 2 examples of the uses of MapReduce and state what type of algorithm is used in each case.
1. Ranking of webpages by importance (matrix algorithm) 2. Search in "friends" networks on social media sites (graph algorithm)
4 of 27
Explain the basic idea of MapReduce.
Instead of using a powerful supercomputer for doing the processing we use clusters or racks of commodity processors. Each cluster or rack consists of 1+ processors. Clusters or racks are connected by Ethernet or inexpensive switches.
5 of 27
What is MapReduce designed to be tolerant of?
Hardware failures.
6 of 27
Describe the properties of a Distributed File System.
1. Computations on data may take a long time. 2. Don't want to have to restart from beginning when a component fails. So files are stored redundantly by keeping copies at several different storage nodes.
7 of 27
In what situation does a Distributed File System work best?
DFS works best for large files that are rarely changed.
8 of 27
Describe how a programmer uses MapReduce.
Programmer has to write 2 functions called Map and Reduce.
9 of 27
What does a system do when using MapReduce?
System: 1. Manages parallel execution 2. Coordinates the tasks that execute Map and Reduce 3. Deals with task failures in a transparent way
10 of 27
Explain the steps involved in the MapReduce Paradigm: Map.
1. Some number of Map tasks are given 1+ chunks of data from a DFS. 2. These Map tasks turn each chunk into a sequence of key-value pairs, according to the code written by the user for the Map function. 3. These key-value pairs are collected by a...
11 of 27
Explain the steps involved in the MapReduce Paradigm: Reduce.
1. The key are divided among all the Reduce tasks so all key-value pairs with the same key go to the same Reduce task. 2. The Reduce tasks work on one key at a time, and combine all the values associated with that key in some way. 3. The way that ...
12 of 27
What do input files for a Map task consist of?
Elements.
13 of 27
What is a chunk in Map Tasks?
A collection of elements.
14 of 27
What is significant about the way elements are stored in a Map task?
No element is stored across 2 chunks.
15 of 27
State the input and output of the Map function.
The Map function takes an input element as its argument and produces zero or more key-value pairs as output.
16 of 27
Give an example of a MapReduce Algorithm. (Hint: Word Count)
Given a repository of documents we want to count the number of occurrences of each word over all the documents. Each document is an element. The Map task reads a document and breaks it into words, w1, w2, w3, etc. Each word is a key, and the ...
17 of 27
Explain the Master Controller's involvement in the process of Grouping By Key.
Master controller groups key-value pairs by key. It knows how many Reduce tasks there will be (this value(r) is usually set by the user). Master controller picks a hash function that maps keys to an integer between 0 and r-1. This determines which...
18 of 27
In the process of Grouping By Key what happens to the key-value pair?
Each key-value pair is put into 1 of r files. Each file is input for 1 Reduce task.
19 of 27
What argument is passed into each Reduce function in the Reduce tasks? Thus what is the input into the Reduce task handling key k?
Each Reduce function gets a sequence of key-list-of-value pairs. Thus the input to the Reduce task handling key k is (k, [v1, v2, ... vn]).
20 of 27
In general how many keys can a Reduce task handle?
Several.
21 of 27
What happens to the outputs from all the Reduce tasks?
They are merged into a single file. The Reduce function then counts the number of items in each key's list of values.
22 of 27
In the Word Count example what does the Reduce function need to do?
It needs to count the number of items in each key's list of values.
23 of 27
What happens in a Reduce task after the output are merged into a single file?
It outputs the key (in the word count example a word) and the count as the key-value pair
24 of 27
Explain why the MapReduce algorithm be refined? Using the word count example explain how this can be done.
If it doesn't matter what order the Reduce function is applied then some of what the Reduce task does can be done in the Map task. In Word Count example the Map task could count the number of occurrences of each word in its input and output a ...
25 of 27
Suppose we have 2 input files. file1 contains the lines: Hello World, Goodby World. file2 contains the lines: Hello Cardiff, Goodbye Cardiff. How would a MapReduce implementation process these files in the Word Count example? Explain Map Walk-through
Each file will be processed by a different Map function. Output for file1 will be: <Hello,1> <World,1> <Goodbye,1> <World,1> Output for file2 will be: <Hello,1> <Cardiff,1> <Goodbye,1> <Cardiff,1> These <key,value> pairs are passed to the MC.
26 of 27
Suppose we have 2 input files. file1 contains the lines: Hello World, Goodby World. file2 contains the lines: Hello Cardiff, Goodbye Cardiff. How would a MapReduce implementation process these files in the Word Count example? Explain MC Walk-through
Master Controller will form a list of values associated with each key, and sort by key. Output will be: <Cardiff, [1,1]> <Goodbye, [1,1]> <Hello, [1,1]> <World, [1,1]. These <key, list-of-values> pairs are passed to the Reduce function.
27 of 27

Other cards in this set

Card 2

Front

What is MapReduce used for?

Back

Often used for the distributed processing of Big Data stored on a distributed file system.

Card 3

Front

Who pioneered MapReduce?

Back

Preview of the front of card 3

Card 4

Front

Give 2 examples of the uses of MapReduce and state what type of algorithm is used in each case.

Back

Preview of the front of card 4

Card 5

Front

Explain the basic idea of MapReduce.

Back

Preview of the front of card 5
View more cards

Comments

No comments have yet been made

Similar Computing resources:

See all Computing resources »See all Emerging Technologies resources »