@@ -50,7 +50,7 @@ Key differences are
5050* Changes to stored objects may not be immediately visible, both in directory listings and actual data access.
5151* The means by which directories are emulated may make working with them slow.
5252* Rename operations may be very slow and, on failure, leave the store in an unknown state.
53- * Seeking within a file may require new REST calls, hurting performance.
53+ * Seeking within a file may require new HTTP calls, hurting performance.
5454
5555How does affect Spark?
5656
@@ -66,7 +66,7 @@ connector to determine which uses are considered safe.
6666
6767### Installation
6868
69- With the relevant libraries on the classpath and Spark configured with the credentials,
69+ With the relevant libraries on the classpath and Spark configured with valid credentials,
7070objects can be can be read or written by using their URLs as the path to data.
7171For example ` sparkContext.textFile("s3a://landsat-pds/scene_list.gz") ` will create
7272an RDD of the file ` scene_list.gz ` stored in S3, using the s3a connector.
@@ -127,9 +127,9 @@ spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored true
127127
128128This uses the "version 2" algorithm for committing files, which does less
129129renaming than the "version 1" algorithm, though as it still uses ` rename() `
130- to commit files, it is still unsafe to use in some environments .
130+ to commit files, it may be unsafe to use.
131131
132- Bear in mind that storing temporary files can run up charges; delete
132+ As storing temporary files can run up charges; delete
133133directories called ` "_temporary" ` on a regular basis to avoid this.
134134
135135
@@ -144,6 +144,8 @@ spark.sql.parquet.filterPushdown true
144144spark.sql.hive.metastorePartitionPruning true
145145```
146146
147+ These minimise the amount of data read during queries.
148+
147149### ORC I/O Settings
148150
149151For best performance when working with ORC data, use these settings:
@@ -155,7 +157,9 @@ spark.sql.orc.cache.stripe.details.size 10000
155157spark.sql.hive.metastorePartitionPruning true
156158```
157159
158- #### <a name =" checkpointing " ></a >Spark Streaming and Object Storage
160+ Again, these minimise the amount of data read during queries.
161+
162+ ## Spark Streaming and Object Storage
159163
160164Spark Streaming can monitor files added to object stores, by
161165creating a ` FileInputDStream ` to monitor a path in the store through a call to
0 commit comments