0.12.0 (2024-09-03)
Breaking Changes
- Change connection URL used for generating HWM names of S3 and Samba sources:
: *
smb://host:port
->smb://host:port/share
s3://host:port
->s3://host:port/bucket
(#304)
- Update DB connectors/drivers to latest versions:
: * Clickhouse
0.6.0-patch5
→0.6.5
- MongoDB
10.3.0
→10.4.0
- MSSQL
12.6.2
→12.8.1
- MySQL
8.4.0
→9.0.0
- Oracle
23.4.0.24.05
→23.5.0.24.07
- Postgres
42.7.3
→42.7.4
- MongoDB
- Update
Excel
package from0.20.3
to0.20.4
, to include Spark 3.5.1 support. (#306)
Features
- Add support for specifying file formats (
ORC
,Parquet
,CSV
, etc.) inHiveWriteOptions.format
(#292):Hive.WriteOptions(format=ORC(compression="snappy"))
- Collect Spark execution metrics in following methods, and log then in DEBUG mode:
: *
DBWriter.run()
FileDFWriter.run()
Hive.sql()
Hive.execute()
This is implemented using custom SparkListener
which wraps the entire method call, and
then report collected metrics. But these metrics sometimes may be missing due to Spark architecture,
so they are not reliable source of information. That’s why logs are printed only in DEBUG mode, and
are not returned as method call result. (#303)
- Generate default jobDescription
based on currently executed method. Examples:
: * DBWriter.run(schema.table) -> Postgres[host:5432/database]
* MongoDB[localhost:27017/admin] -> DBReader.has_data(mycollection)
* Hive[cluster].execute()
If user already set custom jobDescription
, it will left intact. (#304)
- Add log.info about JDBC dialect usage (#305):
|MySQL| Detected dialect: 'org.apache.spark.sql.jdbc.MySQLDialect'
JDBC.fetch
and JDBC.execute
methods. (#303)
Bug Fixes
- Fix passing
Greenplum(extra={"options": ...})
during read/write operations. (#308) - Do not raise exception if yield-based hook whas something past (and only one)
yield
.