0.8.0 (2023-05-31)
Breaking Changes
- Rename methods of
FileConnection
classes: get_directory
→resolve_dir
get_file
→resolve_file
listdir
→list_dir
mkdir
→create_dir
rmdir
→remove_dir
New naming should be more consistent.
They were undocumented in previous versions, but someone could use these methods, so this is a breaking change. (#36)
- Deprecate onetl.core.FileFilter
class, replace it with new classes:
* onetl.file.filter.Glob
* onetl.file.filter.Regexp
* onetl.file.filter.ExcludeDir
Old class will be removed in v1.0.0. (#43)
- Deprecate onetl.core.FileLimit
class, replace it with new class onetl.file.limit.MaxFilesCount
.
Old class will be removed in v1.0.0. (#44)
- Change behavior of BaseFileLimit.reset
method.
This method should now return self
instead of None
.
Return value could be the same limit object or a copy, this is an implementation detail. (#44)
- Replaced FileDownloader.filter
and .limit
with new options .filters
and .limits
:
FileDownloader(
...,
filter=FileFilter(glob="*.txt", exclude_dir="/path"),
limit=FileLimit(count_limit=10),
)
FileDownloader(
...,
filters=[Glob("*.txt"), ExcludeDir("/path")],
limits=[MaxFilesCount(10)],
)
This allows to developers to implement their own filter and limit classes, and combine them with existing ones.
Old behavior still supported, but it will be removed in v1.0.0. (#45)
- Removed default value for FileDownloader.limits
, user should pass limits list explicitly. (#45)
- Move classes from module onetl.core
:
from onetl.core import DBReader
from onetl.core import DBWriter
from onetl.core import FileDownloader
from onetl.core import FileUploader
with new modules onetl.db
and onetl.file
:
from onetl.db import DBReader
from onetl.db import DBWriter
from onetl.file import FileDownloader
from onetl.file import FileUploader
Imports from old module onetl.core
still can be used, but marked as deprecated. Module will be removed in v1.0.0. (#46)
Features
- Add
rename_dir
method.
Method was added to following connections:
* FTP
* FTPS
* HDFS
* SFTP
* WebDAV
It allows to rename/move directory to new path with all its content.
S3
does not have directories, so there is no such method in that class. (#40)
- Add onetl.file.FileMover
class.
It allows to move files between directories of remote file system.
Signature is almost the same as in FileDownloader
, but without HWM support. (#42)
Improvements
- Document all public methods in
FileConnection
classes: download_file
resolve_dir
resolve_file
get_stat
is_dir
is_file
list_dir
create_dir
path_exists
remove_file
rename_file
remove_dir
upload_file
walk
(#39)- Update documentation of
check
method of all connections - add usage example and document result type. (#39) - Add new exception type
FileSizeMismatchError
.
Methods connection.download_file
and connection.upload_file
now raise new exception type instead of RuntimeError
,
if target file after download/upload has different size than source. (#39)
- Add new exception type DirectoryExistsError
- it is raised if target directory already exists. (#40)
- Improved FileDownloader
/ FileUploader
exception logging.
If DEBUG
logging is enabled, print exception with stacktrace instead of
printing only exception message. (#42)
- Updated documentation of FileUploader
.
* Class does not support read strategies, added note to documentation.
* Added examples of using run
method with explicit files list passing, both absolute and relative paths.
* Fix outdated imports and class names in examples. (#42)
- Updated documentation of DownloadResult
class - fix outdated imports and class names. (#42)
- Improved file filters documentation section.
Document interface class onetl.base.BaseFileFilter
and function match_all_filters
. (#43)
- Improved file limits documentation section.
Document interface class onetl.base.BaseFileLimit
and functions limits_stop_at
/ limits_reached
/ reset_limits
. (#44)
- Added changelog.
Changelog is generated from separated news files using towncrier. (#47)
Misc
- Improved CI workflow for tests.
- If developer haven’t changed source core of a specific connector or its dependencies, run tests only against maximum supported versions of Spark, Python, Java and db/file server.
- If developed made some changes in a specific connector, or in core classes, or in dependencies, run tests for both minimal and maximum versions.
- Once a week run all aganst for minimal and latest versions to detect breaking changes in dependencies
- Minimal tested Spark version is 2.3.1 instead on 2.4.8. (#32)