Skip to content

0.8.0 (2023-05-31)

Breaking Changes

  • Rename methods of FileConnection classes:
  • get_directoryresolve_dir
  • get_fileresolve_file
  • listdirlist_dir
  • mkdircreate_dir
  • rmdirremove_dir

New naming should be more consistent.

They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36) - Deprecate onetl.core.FileFilter class, replace it with new classes: * onetl.file.filter.Glob * onetl.file.filter.Regexp * onetl.file.filter.ExcludeDir

Old class will be removed in v1.0.0. (#43) - Deprecate onetl.core.FileLimit class, replace it with new class onetl.file.limit.MaxFilesCount.

Old class will be removed in v1.0.0. (#44) - Change behavior of BaseFileLimit.reset method.

This method should now return self instead of None. Return value could be the same limit object or a copy, this is an implementation detail. (#44) - Replaced FileDownloader.filter and .limit with new options .filters and .limits:

FileDownloader(
    ...,
    filter=FileFilter(glob="*.txt", exclude_dir="/path"),
    limit=FileLimit(count_limit=10),
)

FileDownloader(
    ...,
    filters=[Glob("*.txt"), ExcludeDir("/path")],
    limits=[MaxFilesCount(10)],
)

This allows to developers to implement their own filter and limit classes, and combine them with existing ones.

Old behavior still supported, but it will be removed in v1.0.0. (#45) - Removed default value for FileDownloader.limits, user should pass limits list explicitly. (#45) - Move classes from module onetl.core:

from onetl.core import DBReader
from onetl.core import DBWriter
from onetl.core import FileDownloader
from onetl.core import FileUploader

with new modules onetl.db and onetl.file:

from onetl.db import DBReader
from onetl.db import DBWriter

from onetl.file import FileDownloader
from onetl.file import FileUploader

Imports from old module onetl.core still can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)

Features

  • Add rename_dir method.

Method was added to following connections: * FTP * FTPS * HDFS * SFTP * WebDAV

It allows to rename/move directory to new path with all its content.

S3 does not have directories, so there is no such method in that class. (#40) - Add onetl.file.FileMover class.

It allows to move files between directories of remote file system. Signature is almost the same as in FileDownloader, but without HWM support. (#42)

Improvements

  • Document all public methods in FileConnection classes:
  • download_file
  • resolve_dir
  • resolve_file
  • get_stat
  • is_dir
  • is_file
  • list_dir
  • create_dir
  • path_exists
  • remove_file
  • rename_file
  • remove_dir
  • upload_file
  • walk (#39)
  • Update documentation of check method of all connections - add usage example and document result type. (#39)
  • Add new exception type FileSizeMismatchError.

Methods connection.download_file and connection.upload_file now raise new exception type instead of RuntimeError, if target file after download/upload has different size than source. (#39) - Add new exception type DirectoryExistsError - it is raised if target directory already exists. (#40) - Improved FileDownloader / FileUploader exception logging.

If DEBUG logging is enabled, print exception with stacktrace instead of printing only exception message. (#42) - Updated documentation of FileUploader. * Class does not support read strategies, added note to documentation. * Added examples of using run method with explicit files list passing, both absolute and relative paths. * Fix outdated imports and class names in examples. (#42) - Updated documentation of DownloadResult class - fix outdated imports and class names. (#42) - Improved file filters documentation section.

Document interface class onetl.base.BaseFileFilter and function match_all_filters. (#43) - Improved file limits documentation section.

Document interface class onetl.base.BaseFileLimit and functions limits_stop_at / limits_reached / reset_limits. (#44) - Added changelog.

Changelog is generated from separated news files using towncrier. (#47)

Misc

  • Improved CI workflow for tests.
  • If developer haven’t changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.
  • If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.
  • Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies
  • Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)