@@ -522,9 +522,9 @@ common ones are as follows.
522522 <td > <b >reduceByKey</b >(<i >func</i >, [<i >numTasks</i >]) </td >
523523 <td > When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs where the
524524 values for each key are aggregated using the given reduce function. <b >Note:</b > By default,
525- this uses Spark's default number of parallel tasks (2 for local machine, 8 for a cluster) to
526- do the grouping. You can pass an optional <code >numTasks </code > argument to set a different
527- number of tasks.</td >
525+ this uses Spark's default number of parallel tasks (local mode is 2, while cluster mode is
526+ decided by the config property <code >spark.default.parallelism </code >) to do the grouping.
527+ You can pass an optional < code >numTasks</ code > argument to set a different number of tasks.</td >
528528</tr >
529529<tr >
530530 <td > <b >join</b >(<i >otherStream</i >, [<i >numTasks</i >]) </td >
@@ -743,8 +743,9 @@ said two parameters - <i>windowLength</i> and <i>slideInterval</i>.
743743 <td > When called on a DStream of (K, V) pairs, returns a new DStream of (K, V)
744744 pairs where the values for each key are aggregated using the given reduce function <i >func</i >
745745 over batches in a sliding window. <b >Note:</b > By default, this uses Spark's default number of
746- parallel tasks (2 for local machine, 8 for a cluster) to do the grouping. You can pass an optional
747- <code >numTasks</code > argument to set a different number of tasks.
746+ parallel tasks (local mode is 2, while cluster mode is decided by the config property
747+ <code >spark.default.parallelism</code >) to do the grouping. You can pass an optional
748+ <code >numTasks</code > argument to set a different number of tasks.
748749 </td >
749750</tr >
750751<tr >
@@ -956,9 +957,10 @@ before further processing.
956957### Level of Parallelism in Data Processing
957958Cluster resources maybe under-utilized if the number of parallel tasks used in any stage of the
958959computation is not high enough. For example, for distributed reduce operations like ` reduceByKey `
959- and ` reduceByKeyAndWindow ` , the default number of parallel tasks is 8. You can pass the level of
960- parallelism as an argument (see the
961- [ ` PairDStreamFunctions ` ] ( api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions )
960+ and ` reduceByKeyAndWindow ` , the default number of parallel tasks is decided by the [ config property]
961+ (configuration.html#spark-properties) ` spark.default.parallelism ` . You can pass the level of
962+ parallelism as an argument (see the [ ` PairDStreamFunctions ` ]
963+ (api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions)
962964documentation), or set the [ config property] ( configuration.html#spark-properties )
963965` spark.default.parallelism ` to change the default.
964966
0 commit comments