Large folders of small files example

This example project shows how to handle tracking large folder of small files in DVC, either as pipeline outputs or raw version-controlled inputs.

The input folder is a large dataset consisting of many small files. It was added to the project with:

calkit add input

Calkit automatically determined the --to option should be dvc-zip, since it is a large folder with many small files in it.

The pipeline in calkit.yaml also creates an output folder results, which is large in total size but consists of many small files. That output therefore uses the dvc-zip storage type to remain efficient. The only tradeoff is that the folder is then tracked as one unit, so if only one file within changes, the entire thing needs to be synced with the DVC remote.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.calkit/zip		.calkit/zip
.dvc		.dvc
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
calkit.yaml		calkit.yaml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
input-contents.txt		input-contents.txt
make-data.py		make-data.py
results-content.txt		results-content.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large folders of small files example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Large folders of small files example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages