Using grep and wget to download all hyperlinks to .pdf
Fri. January 28, 2011Categories: Information, Linux
Tags: bash script, download, grep, Linux, pdf, wget
I had a site that contained a load of pdfs that I wanted to download, to save me from clicking on each of the pdfs I did some googleing and found how to download all files ending in .pdf.
cat index.html | grep -o -e http://[^[:space:]\"]*.pdf | xargs wget
and for an even better approach you can make a little bash script that takes the URL as a parameter.
#!/bin/sh url=$1 curl $url | grep -o -e http://[^[:space:]\"]*.pdf | xargs wget
Thanks to Ubuntu Forums, and google.

Comments