diff options
| author | Cody Logan <clpo13@gmail.com> | 2023-10-03 09:51:58 -0700 |
|---|---|---|
| committer | Cody Logan <clpo13@gmail.com> | 2023-10-03 09:51:58 -0700 |
| commit | 485df31f095a9b629a1dcc04af13956325856d8c (patch) | |
| tree | 04caf1ccaa5b584da84ae903ed864c97ea2a887c | |
| parent | b6fac1b7c0962e48a8f708efc9f535bb8552c9c6 (diff) | |
| download | wikiget-485df31f095a9b629a1dcc04af13956325856d8c.tar.gz wikiget-485df31f095a9b629a1dcc04af13956325856d8c.zip | |
Update README and do some code cleanup
| -rw-r--r-- | README.md | 102 | ||||
| -rw-r--r-- | src/wikiget/__init__.py | 2 | ||||
| -rw-r--r-- | src/wikiget/dl.py | 17 | ||||
| -rw-r--r-- | src/wikiget/validations.py | 2 | ||||
| -rw-r--r-- | src/wikiget/wikiget.py | 47 | ||||
| -rw-r--r-- | wikiget.1 | 8 | ||||
| -rw-r--r-- | wikiget.1.md | 49 |
7 files changed, 99 insertions, 128 deletions
@@ -3,49 +3,39 @@ [](https://github.com/clpo13/wikiget/actions/workflows/python.yml) [](https://badge.fury.io/py/wikiget) -Something like wget for downloading a file from MediaWiki sites (like Wikipedia -or Wikimedia Commons) using only the file name or the URL of its description -page. +Something like wget for downloading a file from MediaWiki sites (like Wikipedia or Wikimedia Commons) using only the +file name or the URL of its description page. Requires Python 3.7+. Get it with `pip install --user wikiget` or `pipx install wikiget`. ## Usage -`wikiget [-h] [-V] [-q | -v] [-f] [-s SITE] [-p PATH] [--username USERNAME] -[--password PASSWORD] [-o OUTPUT | -a] [-l LOGFILE] FILE` - -If `FILE` is in the form `File:Example.jpg` or `Image:Example.jpg`, it will be -fetched from the default site, which is "commons.wikimedia.org". If it's the -fully-qualified URL of a file description page, like -`https://en.wikipedia.org/wiki/File:Example.jpg`, the file is fetched from the -specified site, in this case "en.wikipedia.org". Full URLs may contain -characters your shell interprets differently, so you can either escape those -characters with a backslash `\` or surround the entire URL with single `'` or -double `"` quotes. Use of a fully-qualified URL like this may require setting -the `--path` flag (see next paragraph). - -The site can also be specified with the `--site` flag, though this will not have -any effect if the full URL is given. Non-Wikimedia sites should work, but you -may need to specify the wiki's script path with `--path` (where `index.php` and -`api.php` live; on Wikimedia sites it's `/w/`, but other sites may use `/` or -something else entirely). Private wikis (those requiring login even for read -access) are also supported with the use of the `--username` and `--password` -flags. - -More detailed information, such as the site used and full URL of the file, can -be displayed with `-v` or `--verbose`. Use `-vv` to display even more detail, -mainly debugging information or API messages. `-q` can be used to silence warnings. -A logfile can be specified with `-l` or `--logfile`. If this option is present, the -logfile will contain the same information as `-v` along with timestamps. New log -entries will be appended to an existing logfile. - -By default, the program won't overwrite existing files with the same name as the -target, but this can be forced with `-f` or `--force`. Additionally, the file -can be downloaded to a different name with `-o`. - -Files can be batch downloaded with the `-a` or `--batch` flag. In this mode, -`FILE` will be treated as an input file containing multiple files to download, -one filename or URL per line. If an error is encountered, execution stops +`wikiget [-h] [-V] [-q | -v] [-f] [-s SITE] [-p PATH] [--username USERNAME] [--password PASSWORD] [-o OUTPUT | -a] [-l LOGFILE] FILE` + +The only required parameter is `FILE`, which is the file you want to download. It can either be the name of the file on +the wiki, including the namespace prefix, or a link to the file description page. If `FILE` is in the form +`File:Example.jpg` or `Image:Example.jpg`, it will be fetched from the default site, which is "commons.wikimedia.org". +If it's the fully-qualified URL of a file description page, like `https://en.wikipedia.org/wiki/File:Example.jpg`, the +file is fetched from the site in the URL, in this case "en.wikipedia.org". Note: full URLs may contain characters your +shell interprets differently, so you can either escape those characters with a backslash `\` or surround the entire URL +with single `'` or double `"` quotes. Use of a fully-qualified URL like this may require setting the `--path` flag (see +next paragraph). + +The site can also be specified with the `--site` flag, though this will not have any effect if the full URL is given. +Non-Wikimedia sites should work, but you may need to specify the wiki's script path with `--path` (where `index.php` and +`api.php` live; on Wikimedia sites it's `/w/`, but other sites may use `/` or something else entirely). Private wikis +(those requiring login even for read access) are also supported with the use of the `--username` and `--password` flags. + +More detailed information, such as the site used and full URL of the file, can be displayed with `-v` or `--verbose`. +Use `-vv` to display even more detail, mainly debugging information or API messages. `-q` can be used to silence +warnings. A logfile can be specified with `-l` or `--logfile`. If this option is present, the logfile will contain the +same information as `-v` along with timestamps. New log entries will be appended to an existing logfile. + +By default, the program won't overwrite existing files with the same name as the target, but this can be forced with +`-f` or `--force`. Additionally, the file can be downloaded to a different name with `-o`. + +Files can be batch downloaded with the `-a` or `--batch` flag. In this mode, `FILE` will be treated as an input file +containing multiple files to download, one filename or URL per line. If an error is encountered, execution stops immediately and the offending filename is printed. ### Example usage @@ -70,13 +60,11 @@ Pull requests, bug reports, or feature requests are more than welcome. It's recommended that you use a [virtual environment manager](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/) -like venv or [virtualenv](https://virtualenv.pypa.io/en/latest/) to create an -isolated environment in which to install this package's dependencies so as not -to clutter your system Python environment: +like venv or [virtualenv](https://virtualenv.pypa.io/en/latest/) to create an isolated environment in which to install +this package's dependencies so as not to clutter your system Python environment: ```bash -# if you plan on submitting pull requests, fork the repo on GitHub -# and clone that instead +# if you plan on submitting pull requests, fork the repo on GitHub and clone that instead git clone https://github.com/clpo13/wikiget cd wikiget @@ -97,28 +85,22 @@ source venv/bin/activate Then run `pip install -e .` to invoke an ["editable" install](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs) -meaning any changes made to the source will be reflected immediately in the executable -script. Unit tests can be run with `pytest` (make sure to run `pip install pytest` -in the virtual environment first.) +meaning any changes made to the source will be reflected immediately in the executable script. Unit tests can be run +with `pytest` (make sure to run `pip install pytest` in the virtual environment first.) -Alternatively, using [Hatch](https://hatch.pypa.io/latest/), simply clone the repository -and run `hatch run test` to create the environment and run pytest. Also try `hatch shell` -or `hatch run wikiget --help`. +Alternatively, using [Hatch](https://hatch.pypa.io/latest/), simply clone the repository and run `hatch run test` to +create the environment and run pytest. Also try `hatch shell` or `hatch run wikiget --help`. ## License Copyright (C) 2018-2023 Cody Logan and contributors -This program is free software: you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation, either version 3 of the License, or -(at your option) any later version. +This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public +License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later +version. -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. +This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied +warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. -You should have received a copy of the GNU General Public License -along with this program (see [LICENSE](LICENSE)). If not, see -<https://www.gnu.org/licenses/>. +You should have received a copy of the GNU General Public License along with this program (see [LICENSE](LICENSE)). +If not, see <https://www.gnu.org/licenses/>. diff --git a/src/wikiget/__init__.py b/src/wikiget/__init__.py index 5b917cf..3946868 100644 --- a/src/wikiget/__init__.py +++ b/src/wikiget/__init__.py @@ -1,5 +1,5 @@ # wikiget - CLI tool for downloading files from Wikimedia sites -# Copyright (C) 2018-2021 Cody Logan and contributors +# Copyright (C) 2018-2023 Cody Logan and contributors # SPDX-License-Identifier: GPL-3.0-or-later # # Wikiget is free software: you can redistribute it and/or modify diff --git a/src/wikiget/dl.py b/src/wikiget/dl.py index 791db61..d32736f 100644 --- a/src/wikiget/dl.py +++ b/src/wikiget/dl.py @@ -1,5 +1,5 @@ # wikiget - CLI tool for downloading files from Wikimedia sites -# Copyright (C) 2018-2021 Cody Logan and contributors +# Copyright (C) 2018-2023 Cody Logan and contributors # SPDX-License-Identifier: GPL-3.0-or-later # # Wikiget is free software: you can redistribute it and/or modify @@ -65,8 +65,8 @@ def download(dl, args): if args.username and args.password: site.login(args.username, args.password) except ConnectionError as e: - # usually this means there is no such site, or there's no network - # connection, though it could be a certificate problem + # usually this means there is no such site, or there's no network connection, + # though it could be a certificate problem logging.error("Couldn't connect to specified site.") logging.debug("Full error message:") logging.debug(e) @@ -80,8 +80,8 @@ def download(dl, args): logging.debug(e) sys.exit(1) except (InvalidResponse, LoginError) as e: - # InvalidResponse: site exists, but we couldn't communicate with the - # API endpoint for some reason other than an HTTP error. + # InvalidResponse: site exists, but we couldn't communicate with the API + # endpoint for some reason other than an HTTP error. # LoginError: missing or invalid credentials logging.error(e) sys.exit(1) @@ -90,8 +90,8 @@ def download(dl, args): try: file = site.images[filename] except APIError as e: - # an API error at this point likely means access is denied, - # which could happen with a private wiki + # an API error at this point likely means access is denied, which could happen + # with a private wiki logging.error( "Access denied. Try providing credentials with " "--username and --password." @@ -102,8 +102,7 @@ def download(dl, args): sys.exit(1) if file.imageinfo != {}: - # file exists either locally or at a common repository, - # like Wikimedia Commons + # file exists either locally or at a common repository, like Wikimedia Commons file_url = file.imageinfo["url"] file_size = file.imageinfo["size"] file_sha1 = file.imageinfo["sha1"] diff --git a/src/wikiget/validations.py b/src/wikiget/validations.py index dc70df4..8ebd996 100644 --- a/src/wikiget/validations.py +++ b/src/wikiget/validations.py @@ -1,5 +1,5 @@ # wikiget - CLI tool for downloading files from Wikimedia sites -# Copyright (C) 2018, 2019, 2020 Cody Logan +# Copyright (C) 2018-2020 Cody Logan # SPDX-License-Identifier: GPL-3.0-or-later # # Wikiget is free software: you can redistribute it and/or modify diff --git a/src/wikiget/wikiget.py b/src/wikiget/wikiget.py index bc6de38..934107e 100644 --- a/src/wikiget/wikiget.py +++ b/src/wikiget/wikiget.py @@ -1,5 +1,5 @@ # wikiget - CLI tool for downloading files from Wikimedia sites -# Copyright (C) 2018-2021 Cody Logan and contributors +# Copyright (C) 2018-2023 Cody Logan and contributors # SPDX-License-Identifier: GPL-3.0-or-later # # Wikiget is free software: you can redistribute it and/or modify @@ -25,32 +25,27 @@ from wikiget.dl import download def main(): """ - Main entry point for console script. Automatically compiled by setuptools - when installed with `pip install` or `python setup.py install`. + Main entry point for console script. Automatically compiled by setuptools when + installed with `pip install` or `python setup.py install`. """ parser = argparse.ArgumentParser( description=""" - A tool for downloading files from - MediaWiki sites using the file name or + A tool for downloading files from MediaWiki sites using the file name or description page URL """, epilog=""" - Copyright (C) 2018-2023 Cody Logan - and contributors. - License GPLv3+: GNU GPL version 3 or later - <http://www.gnu.org/licenses/gpl.html>. - This is free software; you are free to - change and redistribute it under certain - conditions. There is NO WARRANTY, to the - extent permitted by law. + Copyright (C) 2018-2023 Cody Logan and contributors. License GPLv3+: GNU GPL + version 3 or later <http://www.gnu.org/licenses/gpl.html>. This is free + software; you are free to change and redistribute it under certain conditions. + There is NO WARRANTY, to the extent permitted by law. """, ) parser.add_argument( "FILE", help=""" - name of the file to download with the File: - prefix, or the URL of its file description page + name of the file to download with the File: prefix, or the URL of its file + description page """, ) parser.add_argument( @@ -96,9 +91,8 @@ def main(): output_options.add_argument( "-a", "--batch", - help="treat FILE as a textfile containing " - "multiple files to download, one URL or " - "filename per line", + help="treat FILE as a textfile containing multiple files to download, one URL " + "or filename per line", action="store_true", ) parser.add_argument( @@ -117,7 +111,7 @@ def main(): loglevel = logging.ERROR # configure logging: - # console log level is set via -v, -vv, and -q options + # console log level is set via -v, -vv, and -q options; # file log level is always info (TODO: add debug option) if args.logfile: # log to console and file @@ -128,8 +122,8 @@ def main(): ) console = logging.StreamHandler() - # TODO: even when loglevel is set to logging.DEBUG, - # debug messages aren't printing to console + # TODO: even when loglevel is set to logging.DEBUG, debug messages aren't + # printing to console console.setLevel(loglevel) console.setFormatter(logging.Formatter("[%(levelname)s] %(message)s")) logging.getLogger("").addHandler(console) @@ -137,8 +131,8 @@ def main(): # log only to console logging.basicConfig(level=loglevel, format="[%(levelname)s] %(message)s") - # log events are appended to the file if it already exists, - # so note the start of a new download session + # log events are appended to the file if it already exists, so note the start of a + # new download session logging.info(f"Starting download session using wikiget {wikiget.wikiget_version}") # logging.info(f"Log level is set to {loglevel}") @@ -159,16 +153,15 @@ def main(): sys.exit(1) else: with fd: - # store file contents in memory in case something - # happens to the file while we're downloading + # store file contents in memory in case something happens to the file + # while we're downloading for _, line in enumerate(fd): dl_list.append(line) # TODO: validate file contents before download process starts for line_num, url in enumerate(dl_list, start=1): s_url = url.strip() - # keep track of batch file line numbers for - # debugging/logging purposes + # keep track of batch file line numbers for debugging/logging purposes logging.info(f"Downloading '{s_url}' at line {line_num}:") download(s_url, args) else: @@ -1,6 +1,6 @@ .\" Automatically generated by Pandoc 3.1.8 .\" -.TH "WIKIGET" "1" "October 2, 2023" "Version 0.5.1" "Wikiget User Manual" +.TH "WIKIGET" "1" "October 3, 2023" "Version 0.5.1" "Wikiget User Manual" .SH NAME wikiget - download files from MediaWiki sites .SH SYNOPSIS @@ -70,7 +70,7 @@ download process. If the logfile already exists, new log information is appended to it. .TP -\f[B]f\f[R], --\f[B]force\f[R] -Force overwritng of existing files. +Force existing files to be overwritten. .TP -\f[B]a\f[R], --\f[B]batch\f[R] If this flag is set, \f[B]wikiget\f[R] will run in batch download mode @@ -94,7 +94,7 @@ wikiget --site en.wikipedia.org File:Example.jpg wikiget https://en.wikipedia.org/wiki/File:Example.jpg -o test.jpg .EE .SH BUG REPORTS -<https://github.com/clpo13/wikiget/issues> +https://github.com/clpo13/wikiget/issues .SH LICENSE Copyright (C) 2018-2023 Cody Logan and contributors .PP @@ -110,6 +110,6 @@ See the GNU General Public License for more details. .PP You should have received a copy of the GNU General Public License along with this program. -If not, see <https://www.gnu.org/licenses/>. +If not, see https://www.gnu.org/licenses/. .SH AUTHORS Cody Logan <clpo13@gmail.com>. diff --git a/wikiget.1.md b/wikiget.1.md index 11ab708..66227dc 100644 --- a/wikiget.1.md +++ b/wikiget.1.md @@ -1,6 +1,6 @@ % WIKIGET(1) Version 0.5.1 | Wikiget User Manual % Cody Logan <clpo13@gmail.com> -% October 2, 2023 +% October 3, 2023 # NAME @@ -15,23 +15,23 @@ wikiget - download files from MediaWiki sites # DESCRIPTION -Something like **wget**(1) for downloading a file from MediaWiki sites (like Wikipedia or Wikimedia Commons) -using only the file name or the URL of its description page. +Something like **wget**(1) for downloading a file from MediaWiki sites (like Wikipedia or Wikimedia Commons) using only +the file name or the URL of its description page. # OPTIONS *FILE* -: The file to be downloaded. If *FILE* is in the form *File:Example.jpg* or *Image:Example.jpg*, it will be - fetched from the default site, which is "commons.wikimedia.org". If it's the fully-qualified URL of a file - description page, like *https://en.wikipedia.org/wiki/File:Example.jpg*, the file is fetched from the site - in the URL, in this case "en.wikipedia.org". +: The file to be downloaded. If *FILE* is in the form *File:Example.jpg* or *Image:Example.jpg*, it will be fetched + from the default site, which is "commons.wikimedia.org". If it's the fully-qualified URL of a file description page, + like *https://en.wikipedia.org/wiki/File:Example.jpg*, the file is fetched from the site in the URL, in this case + "en.wikipedia.org". *BATCHFILE* -: In batch download mode (activated with \-**a** or \-\-**batch**), this is a text file containing multiple - file names or URLs to be downloaded, one per line. If an error is encountered during download, execution - stops immediately and the offending filename is printed. +: In batch download mode (activated with \-**a** or \-\-**batch**), this is a text file containing multiple file names + or URLs to be downloaded, one per line. If an error is encountered during download, execution stops immediately and + the offending filename is printed. \-**s**, \-\-**site** *SITE* @@ -39,8 +39,8 @@ using only the file name or the URL of its description page. \-**p**, \-\-**path** *PATH* -: Script path for the wiki, where "index.php" and "api.php" live. On Wikimedia sites, it's "/w/", the default, - but other sites may use "/" or something else entirely. +: Script path for the wiki, where "index.php" and "api.php" live. On Wikimedia sites, it's "/w/", the default, but + other sites may use "/" or something else entirely. \-\-**username** *USERNAME* @@ -52,8 +52,8 @@ using only the file name or the URL of its description page. \-**o**, \-\-**output** *OUTPUT* -: By default, the output filename is the same as the remote filename (without the File: or Image: prefix), - but this can be changed with this option. +: By default, the output filename is the same as the remote filename (without the File: or Image: prefix), but this + can be changed with this option. \-**l**, \-\-**logfile** *LOGFILE* @@ -62,7 +62,7 @@ using only the file name or the URL of its description page. \-**f**, \-\-**force** -: Force overwritng of existing files. +: Force existing files to be overwritten. \-**a**, \-\-**batch** @@ -91,21 +91,18 @@ wikiget https://en.wikipedia.org/wiki/File:Example.jpg -o test.jpg # BUG REPORTS -<https://github.com/clpo13/wikiget/issues> +https://github.com/clpo13/wikiget/issues # LICENSE Copyright (C) 2018-2023 Cody Logan and contributors -This program is free software: you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation, either version 3 of the License, or -(at your option) any later version. +This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public +License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later +version. -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. +This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied +warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. -You should have received a copy of the GNU General Public License -along with this program. If not, see <https://www.gnu.org/licenses/>. +You should have received a copy of the GNU General Public License along with this program. If not, see +https://www.gnu.org/licenses/. |
