Wednesday, May 9, 2012

Mirroring a Web Site Directory with wget

I recently decided that I wanted to copy some course web sites from my university's CS department for use during this summer when I'll be on an unreliable Internet connection. There's also httrack, but I couldn't get it to copy everything correctly.

Solution: wget

wget -mk -w 0.25 --no-parent
  • -m mirror
  • -k convert links
  • --no-parent don't grab, just stuff in the specified directory.
  • -w wait time (seconds) between page grabs (be nice, otherwise you might DoS their servers)
And viola, you have an offline copy of the site directory now!

