
Bulk image downloader previous versions - theme.... Where
pixolution / PixolutionImageDownloader
Lightweight bulk image url list downloader written in Python3 by pixolution.org
It provides the following features:
- RateLimiter with throttling max downloads per interval using a simple token bucket algorithm without queue
- Multithreaded downloads
- Preserves the context path of the images (http://foo.bar/imgs/abs/img.jpg is stored into img/abs/img.jpg)
- Creates a file img_list_name.txt_errors.log containing failed images
- Can store images into download folder tree or directly into a tar file
- Low memory usage even with huge url lists by using BoundedExecutor that create threads in chunks
- Download progress bar with downloads/second (using tqdm, but big memory footprint)
Develop
Install the project into your local system as symlinked source:
virtual environment
You should use venv when working on the project.
Run once in the project folder:
python3 -mvenv .source bin/activate pip3 install -r requirements.txt
Run before working:
source bin/activate [ DO YOUR DEVELOPMENT WORK ] deactivate
Install
Install requirements:
sudo apt install python3-setuptools python3-pip
Install the project into your local system:
cd PixolutionImageDownloader/ python3 setup.py install
After install it is available as pxl_downloader in your systems CLI. Use it like this:
pxl_downloader --threads=8 download --tarfile --ratelimit-interval=2 --ratelimit-downloads=50 samples.csv downloads/
Deinstall it with:
python3 setup.py uninstall
Tests
To run a single test use:
python3 -m unittest tests/test_download_filetree.py
To run all available tests use:
python3 -m unittest discover tests/
Use it via run.sh script in project root or with pxl_downloader command after install
user@pixolution:~$ pxl_downloader --help usage: pxl_downloader [-h] [--threads THREADS] [--verbose] {download,status} ... image_list_file download_folder Lightweight mass image downloader written in Python3. positional arguments: {download,status} available commands download Download a list of images status Check the download folder and the given image list file and print some stats about that image_list_file A file with urls defered by newlines download_folder A folder to download the images to. optional arguments: -h, --help show this help message and exit --threads THREADS Number of threads to download or status check in parallel --verbose Show each image url to download in stdout instead of default progress bar ♥ Crafted with love in Berlin by pixolution.org ♥
Download options:
user@pixolution:~$ pxl_downloader download --help usage: pxl_downloader download [-h] [--tarfile] [--progressbar] [--ratelimit-interval RATELIMIT_INTERVAL] [--ratelimit-downloads RATELIMIT_DOWNLOADS] optional arguments: -h, --help show this help message and exit --tarfile Store downloaded images directly into tarfile instead of file structure --progressbar Show a tqdm progress bar. This needs more RAM because we need to put the image file list into RAM before we can start. --ratelimit-interval RATELIMIT_INTERVAL Interval in seconds (minimum 1.0) for the rate limiter. Default is 1.0 seconds. --ratelimit-downloads RATELIMIT_DOWNLOADS Number of downloads per interval (default interval 1 second). If negative no rate limit is applied. Default is -1
0 thoughts to “Bulk image downloader previous versions”