0.12.5 (2024-12-03)
Improvements
- Use
sipHash64
instead of md5
in Clickhouse for reading data with {"partitioning_mode": "hash"}
, as it is 5 times faster.
- Use
hashtext
instead of md5
in Postgres for reading data with {"partitioning_mode": "hash"}
, as it is 3-5 times faster.
- Use
BINARY_CHECKSUM
instead of HASHBYTES
in MSSQL for reading data with {"partitioning_mode": "hash"}
, as it is 5 times faster.
Big fixes
- In JDBC sources wrap
MOD(partitionColumn, numPartitions)
with ABS(...)
to make al returned values positive. This prevents data skew.
- Fix reading table data from MSSQL using
{"partitioning_mode": "hash"}
with partitionColumn
of integer type.
- Fix reading table data from Postgres using
{"partitioning_mode": "hash"}
lead to data skew (all the data was read into one Spark partition).