Question
when using this library in Spark to run the SQL query and get the data into a Spark dataframe, if the session is accidentally killed from SQL database side, there's no reset error thrown as expected, but the count result has indicated that the data is not read completely.
steps to reproduce:
(1) use this library spark to run the SQL query and get the data into a Spark dataframe, code shown as below:
%scala
val token = "<token>"
val jdbcHostname = "xinrandatabseserver.database.windows.net"
val jdbcDatabase = "xinranSQLDatabase"
val jdbcPort = 1433
val jdbcUrl = "jdbc:sqlserver://%s:%s;databaseName=%s;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net".format(jdbcHostname, jdbcPort, jdbcDatabase)+ ";accessToken="
import java.util.Properties
val connectionProperties = new Properties()
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)
connectionProperties.setProperty("accesstoken", token)
val sql_pushdown = "(select UNITS from payment_balance_new) emp_alias"
val df_stripe_dispute = spark.read.option("connectRetryCount", 200).option("numPartitions",1).jdbc(url=jdbcUrl, table=sql_pushdown, properties=connectionProperties)
df_stripe_dispute.count()
(2) kill the session from the SQL database side by "kill [SPID]"
(3) check the count() result, which is less than the actual one, but no error exception is thrown, the command is executed successfully.
Need help on
In general Spark connects to SqlServer and reads the data using the below code where rs is com.microsoft.sqlserver.jdbc.SQLServerResultSet, so whenever you killed the session rs.next() returned false but didn't throw the exception.
override protected def getNext(): InternalRow = {
if (rs.next()) {
inputMetrics.incRecordsRead(1)
var i = 0
while (i < getters.length) {
getters(i).apply(rs, mutableRow, i)
if (rs.wasNull) mutableRow.setNullAt(i)
i = i + 1
}
mutableRow
} else {
finished = true
null.asInstanceOf[InternalRow]
}
}
Based on the code analysis of com.microsoft.sqlserver.jdbc.SQLServerResultSet.next() there is a possibility that it returns false when the connection is broken, You can refer the code here https://github.com/microsoft/mssql-jdbc/blob/main/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResultSet.java#L1011-L1107
Could you help check why the exception is not thrown as expected? Thanks!
Question
when using this library in Spark to run the SQL query and get the data into a Spark dataframe, if the session is accidentally killed from SQL database side, there's no reset error thrown as expected, but the count result has indicated that the data is not read completely.
steps to reproduce:
(1) use this library spark to run the SQL query and get the data into a Spark dataframe, code shown as below:
(2) kill the session from the SQL database side by "kill [SPID]"
(3) check the count() result, which is less than the actual one, but no error exception is thrown, the command is executed successfully.
Need help on
In general Spark connects to SqlServer and reads the data using the below code where rs is com.microsoft.sqlserver.jdbc.SQLServerResultSet, so whenever you killed the session rs.next() returned false but didn't throw the exception.
Based on the code analysis of com.microsoft.sqlserver.jdbc.SQLServerResultSet.next() there is a possibility that it returns false when the connection is broken, You can refer the code here https://github.com/microsoft/mssql-jdbc/blob/main/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerResultSet.java#L1011-L1107
Could you help check why the exception is not thrown as expected? Thanks!