Back to Top

Tuesday, September 06, 2011

Quick'n'dirty Mediawiki file crawler

URL='http://10.0.0.1' MIME='image/jpeg' \
  bash -c 'wget -q -O - "$URL/wiki/index.php?title=Special:MIMESearch&mime=$MIME&limit=500&offset=0" \
  | grep -Po "\/wiki\/images[^\"]+" \
  | xargs -n1 -I {} wget "$URL{}"'

What it does: it uses the "MIME search" functionality on the wiki to locate files of a certain mime type and then xargs+wget each of them.

Limitations:

  • A maximum of 500 files are downloaded
  • Downloads are not parallelized, thus slower than they could be

0 comments:

Post a Comment

You can use some HTML tags, such as <b>, <i>, <a>. Comments are moderated, so there will be a delay until the comment appears. However if you comment, I follow.