Search the Polona.pl website of the Polish National Library and download all images from publications
PyPolona is a free and open-source GUI (graphical) app that allows you to search in and download images from the Polona.pl digital library. It also works as a ppolona
CLI (command-line) tool. And it’s a Python package available from PyPI. The source is on Github.
Polona.pl provides digitized books, magazines, graphics, maps, music, fliers and manuscripts from collections of the National Library of Poland and co-operating institutions.
With PyPolona, you can:
The PyPolona GUI version is made from the command-line version, and uses the same settings as the ppolona
tool.
Remember: to run the GUI on macOS for the first time, Ctrl+click the DMG, choose Open, then Open, then drag to Applications folder, there Ctrl+click the app icon, choose Open, then Open.
Current version is 1.6.2.
--no-text-pdf
, PyPolona now downloads an additional PDF that has searchable text if it’s available--skip
is now --no-overwrite
-i
saves each downloaded document as subfolder with images, otherwise as one PDF (with metadata)--skip
instead of --overwrite
flagOn macOS, Ctrl+click the downloaded DMG, choose Open, then Open again, then drag the icon to the /Applications
folder.
When you run the app for the first time, Ctrl+click the Polona.app, choose Open, then click Open. Later, you can just double-click the icon. If the app does not run, double-click again.
On Windows, unzip the downloaded ZIP, double-click the setup_pypolona.exe
icon to install the app. You need 64-bit Windows.
If you have Python 3.8+, you can install the Python version with python3 -m pip install pypolona
.
/Applications/PyPolona.app
and choose Open, then choose Open. You can just double-click the next time to run it.PyPolona
from your start menu.ppolona
or python3 -m pypolona
In the Input tab:
In query, you can paste one or more URLs from Polona.pl (space-separated).
In Choose One you can change what the query field means:
adam mickiewicz
; go to the Options tab to customizeIn the Options tab:
In Space-separated languages, you can enter a space-separated list of languages like Polona uses them, e.g. polski niemiecki angielski
. Use the sidebar on the Polona website to find them.
In Sort search, you can sort the results by score, date, title or creator, in ascending or descending order.
In Output search results in format, you can choose a format in which search results will be output. If you choose ids, you click Restart and then paste them back into the query field. Choose urls to get clickable links.
In Save search results, you can optionally save the search results into the file.
Turn on Download found docs to download the content of the queried result.
Turn on Download JPEGs into subfolders to download each document content as a series of JPEGs. In the download folder, one subfolder will be created per document. The subfolder name starts with the publication year, then part of the title, then the ID. If you are also downloading searchable PDFs, an additional PDF with the _text
suffix will be saved in the subfolder. Also, a YAML file with some metadata will be saved in the subfolder.
Turn off Download JPEGs into subfolders to download each document content as one PDF. The app will not create subfolders. The PDF name starts with the publication year, then part of the title, then the ID. If you are also downloading searchable PDFs, an additional PDF with the _text
suffix will be also saved.
In the Options tab:
In Save downloaded docs in this folder, you can choose into which folder the app will download the documents. By default it uses the polona
subfolder on your desktop.
In Download max pages, you can limit the maximum number of pages that the app downloads for each document. This is useful for test downloads, since some documents may have hundreds or pages.
For some documents, Polona has an extra lower-resolution searchable PDF. By default, that PDF is also downloaded, and saved with a _text
suffix. Turn on Skip downloading searchable PDFs to not download these additional PDFs.
By default, the app will re-download and overwrite previously downloaded documents. Turn on Skip existing subfolders/PDFs to skip them.
Note: the CLI is ppolona
, not pypolona
/Applications/PyPolona.app/Contents/MacOS/ppolona -h
ppolona -h
or python3 -m pypolona -h
usage: ppolona [-h] [-S | -A | -I] [-D] [-i] [-l [language [language ...]]] [-s {score desc,date desc,date asc,title asc,creator asc}]
[-f {ids,urls,yaml,json}] [-o results_file] [-d download_folder] [-M num_pages] [-T] [-O]
query [query ...]
Search in and download from Polona.pl. GUI: Help › PyPolona 1.6.0 Help. CLI: ppolona -h
optional arguments:
-h, --help show this help message and exit
Input:
query query is a Polona.pl URL unless you choose search, advanced or ids
-S, --search Query is search query, see Options
-A, --advanced Query is advanced search query, see Documentation
-I, --ids Query is space-separated IDs
-D, --download Download found docs, see Options
-i, --images Download JPEGs into subfolders instead of PDF
Options:
-l [language [language ...]], --lang [language [language ...]]
Space-separated languages: polski angielski niemiecki...
-s {score desc,date desc,date asc,title asc,creator asc}, --sort {score desc,date desc,date asc,title asc,creator asc}
Sort search results by score, date, title or creator (descending or ascending)
-f {ids,urls,yaml,json}, --format {ids,urls,yaml,json}
Output search results in format
-o results_file, --output results_file
Save search results to this file
-d download_folder, --download-dir download_folder
Save downloaded docs in this folder
-M num_pages, --max-pages num_pages
Download max pages per doc (0: all)
-T, --no-text-pdf Skip downloading searchable PDFs
-O, --no-overwrite Skip existing subfolders/PDFs
The Polona is a bit overcomplicated to use, but fortunately, Polona publishes a JSON API. The pypolona package uses that API.
argparse
module into a simple GUI app. This project serves as a good example in how this can be done.To build the Python package, the Mac DMG and the Win EXE (via Wine):
./macdeploy prep && ./macdeploy build
pip3 install --user --upgrade .[dev]
python -m PyInstaller --distpath="app/build/dist-win" --workpath="app/build" -y "app/pyinstaller-win.spec"
"C:\Program Files (x86)\Inno Setup 6\ISCC.exe" /dMyAppVersion="1.1.7" app/pypolona.iss /Q
Copyright © 2020 Adam Twardoch. Licensed under the terms of the MIT license. This project is not affiliated with and not endorsed by Polona.