Skip to content

0.9.0 (2023-08-17)

Breaking Changes

  • Rename methods:
  • DBConnection.read_dfDBConnection.read_source_as_df
  • DBConnection.write_dfDBConnection.write_df_to_target (#66)
  • Rename classes:
  • HDFS.slotsHDFS.Slots
  • Hive.slotsHive.Slots

Old names are left intact, but will be removed in v1.0.0 (#103) - Rename options to make them self-explanatory: * Hive.WriteOptions(mode="append")Hive.WriteOptions(if_exists="append") * Hive.WriteOptions(mode="overwrite_table")Hive.WriteOptions(if_exists="replace_entire_table") * Hive.WriteOptions(mode="overwrite_partitions")Hive.WriteOptions(if_exists="replace_overlapping_partitions") * JDBC.WriteOptions(mode="append")JDBC.WriteOptions(if_exists="append") * JDBC.WriteOptions(mode="overwrite")JDBC.WriteOptions(if_exists="replace_entire_table") * Greenplum.WriteOptions(mode="append")Greenplum.WriteOptions(if_exists="append") * Greenplum.WriteOptions(mode="overwrite")Greenplum.WriteOptions(if_exists="replace_entire_table") * MongoDB.WriteOptions(mode="append")Greenplum.WriteOptions(if_exists="append") * MongoDB.WriteOptions(mode="overwrite")Greenplum.WriteOptions(if_exists="replace_entire_collection") * FileDownloader.Options(mode="error")FileDownloader.Options(if_exists="error") * FileDownloader.Options(mode="ignore")FileDownloader.Options(if_exists="ignore") * FileDownloader.Options(mode="overwrite")FileDownloader.Options(if_exists="replace_file") * FileDownloader.Options(mode="delete_all")FileDownloader.Options(if_exists="replace_entire_directory") * FileUploader.Options(mode="error")FileUploader.Options(if_exists="error") * FileUploader.Options(mode="ignore")FileUploader.Options(if_exists="ignore") * FileUploader.Options(mode="overwrite")FileUploader.Options(if_exists="replace_file") * FileUploader.Options(mode="delete_all")FileUploader.Options(if_exists="replace_entire_directory") * FileMover.Options(mode="error")FileMover.Options(if_exists="error") * FileMover.Options(mode="ignore")FileMover.Options(if_exists="ignore") * FileMover.Options(mode="overwrite")FileMover.Options(if_exists="replace_file") * FileMover.Options(mode="delete_all")FileMover.Options(if_exists="replace_entire_directory")

Old names are left intact, but will be removed in v1.0.0 (#108) - Rename onetl.log.disable_clients_logging() to onetl.log.setup_clients_logging(). (#120)

Features

  • Add new methods returning Maven packages for specific connection class:
  • Clickhouse.get_packages()
  • MySQL.get_packages()
  • Postgres.get_packages()
  • Teradata.get_packages()
  • MSSQL.get_packages(java_version="8")
  • Oracle.get_packages(java_version="8")
  • Greenplum.get_packages(scala_version="2.12")
  • MongoDB.get_packages(scala_version="2.12")
  • Kafka.get_packages(spark_version="3.4.1", scala_version="2.12")

Deprecate old syntax: * Clickhouse.package * MySQL.package * Postgres.package * Teradata.package * MSSQL.package * Oracle.package * Greenplum.package_spark_2_3 * Greenplum.package_spark_2_4 * Greenplum.package_spark_3_2 * MongoDB.package_spark_3_2 * MongoDB.package_spark_3_3 * MongoDB.package_spark_3_4 (#87) - Allow to set client modules log level in onetl.log.setup_clients_logging().

Allow to enable underlying client modules logging in onetl.log.setup_logging() by providing additional argument enable_clients=True. This is useful for debug. (#120) - Added support for reading and writing data to Kafka topics.

For these operations, new classes were added. * Kafka (#54, #60, #72, #84, #87, #89, #93, #96, #102, #104) * Kafka.PlaintextProtocol (#79) * Kafka.SSLProtocol (#118) * Kafka.BasicAuth (#63, #77) * Kafka.KerberosAuth (#63, #77, #110) * Kafka.ScramAuth (#115) * Kafka.Slots (#109) * Kafka.ReadOptions (#68) * Kafka.WriteOptions (#68)

Currently, Kafka does not support incremental read strategies, this will be implemented in future releases. - Added support for reading files as Spark DataFrame and saving DataFrame as Files.

For these operations, new classes were added.

FileDFConnections: * SparkHDFS (#98) * SparkS3 (#94, #100, #124) * SparkLocalFS (#67)

High-level classes: * FileDFReader (#73) * FileDFWriter (#81)

File formats: * Avro (#69) * CSV (#92) * JSON (#83) * JSONLine (#83) * ORC (#86) * Parquet (#88)

Improvements

  • Remove redundant checks for driver availability in Greenplum and MongoDB connections. (#67)
  • Check of Java class availability moved from .check() method to connection constructor. (#97)