Wget

From DreamHost
Jump to: navigation, search
The instructions provided in this article or section require shell access unless otherwise stated.

You can use the PuTTY client on Windows, or SSH on UNIX and UNIX-like systems such as Linux or Mac OS X.
Your account must be configured for shell access in the Control Panel.
More information may be available on the article's talk page.


Mainlining your server with wget, or How do I avoid the painful, slow download/upload process?

Using the wget program over SSH at the UNIX shell command line prompt is a great shortcut for uploading software or other files from a remote server to your DreamHost server. You can avoid the sometimes painful and slow download/upload process, and mainline downloads straight to DreamHost's server using their big, fast pipes.

Note: rsync may be a better (faster, less complicated) option for users migrating between two rsync enabled servers (such as moving from DH to DH PS)

Wget is a powerful tool, with lots of options, but even the basics are useful.

Prerequisites: You need to have an SSH or Telnet client, and know how to log on the server and change directory (cd command) to where you want to "inject" your files.

Basic Usage

wget http://www.your-favorite-download-site.org/directory/file-for-you.tar.gz

or

wget ftp://ftp-site.org/directory/file-for-you.tar.gz

It should be possible to copy/paste the URL from your browser to the shell command line, but you're on your own to find the right combination of menu, keyboard or mouse-click motions (e.g. Edit Copy, Ctrl-C, right-click; Edit Paste, Ctrl-V, right-click or center-click).

When the file is on the server, you may need to use [[w:gunzip}]], unzip and/or tar to expand and unpack the download. For the example above, tar xvzf file-for-you.tar.gz will do the trick. For a zip file: unzip file.zip. For a plain gzip file: gunzip file.gz.

If you need to pass variables to a script then you may need to enclose the URL in sinqle quotes. This will prevent the ampersand character from being interpreted as the shell command.

wget 'http://www.your-favorite-download-site.org/myscript.php?var1=foo&var2=bar'

Advanced Usage

To create a mirror image of a folder on a different server (with the same structure as the original has) you could simply ftp into the server and transfer it:

wget -r  ftp://username:password@yourdomain.com/folder/*

This will now download 'folder/' and everything within it keeping its directory structure. This can save you a lot of time rather than using wget on each file individually.

Now you could simply zip the folder using:

zip -r  folder.zip folder

and then clean up by deleting the copy:

rm -rf folder


Its a great way to backup your whole website at once and of course its very helpful moving large sites across hosts.

Download the entire contents of example.com

wget -r -l 0 http://www.example.com/


Taken from: GNU Wget Manual - Examples - Advanced Usage

Man Page Info

Do man wget in shell for more options, but the following is an excerpt:

NAME
       Wget - The non-interactive network downloader.

SYNOPSIS
       wget [option]... [URL]...

DESCRIPTION

GNU Wget is a free utility for non-interactive download of files from the Web.  
It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

Wget is non-interactive, meaning that it can work in the background, while the user is not logged on.  
This allows you to start a retrieval and disconnect from the system, letting Wget finish the work.  
By contrast, most of the Web browsers require constant user's presence, which can be a great 
hindrance when transferring a lot of data.

Wget can follow links in HTML and XHTML pages and create local versions of remote web sites, 
fully recreating the directory structure of the original site.  This is sometimes referred to as 
"recursive downloading."  While doing that, Wget respects the Robot Exclusion Standard 
(/robots.txt).  Wget can be instructed to convert the links in downloaded HTML files to the 
local files for offline viewing.

Wget has been designed for robustness over slow or unstable network connections; if a download 
fails due to a network problem, it will keep retrying until the whole file has been retrieved.  
If the server supports regetting, it will instruct the server to continue the download from 
where it left off.

Custom Installation

The instructions provided in this article or section are considered advanced.

You are expected to be knowledgeable in the UNIX shell.
Support for these instructions is not available from DreamHost tech support.
Server changes may cause this to break. Be prepared to troubleshoot this yourself if this happens.
We seriously aren't kidding about this.

For those of us who'd like to take advantage of the latest version of Wget, specifically versions with 'large file support', the information below will get you started.

Please keep in mind this article was designed for ADVANCED USERS who already have some *nix shell experience.
Custom Wget installations will NOT be supported by DH Staff.

TODO:

  • Create an 'uninstall' feature.
  • Custom OpenSSL option.


Create and run the follwing shell install script in your home directory:

wget_install.sh

#!/bin/sh
set -e

# Version 1.0.2, 2007-09-19
#
# - Initial Release 2007-09-19 by Chris Shymanik (chris@chipsncheese.com)
#   - Custom OpenSSL support still in development.
#   - Optional locale/man/info file wipe option added (1.0.2)


## USER CONFIGURATION OPTIONS

# Where do you want all this stuff built?
# ***Don't pick a directory that already exists!***
# NOte: Directories that don't exist will be created for you!
SRCDIR=${HOME}/source
# Set DISTDIR to somewhere persistent.
DISTDIR=${HOME}/dist
# Delete contents of DISTDIR after installation? (Default: Yes)
DISTDEL="Yes"
# Wipe "unneeded" contents (info, man, and locale directories)?
# (Default: Yes)
MINIMALINSTALL="Yes"
# Where to install everything to? (Default: ${HOME}
# Note: Best to leave this AS IS for now. You've been warned.
INSTALLDIR=${HOME}
# Set BINDIR to wherever you keep your binaries.
# The default is, ${HOME}/bin (/home/username/bin).
BINDIR=${HOME}/bin
# Set CONFIGDIR to wherever you keep your config (etc) files.
# The defauls is, ${HOME}/etc (/home/username/etc).
CONFIGDIR=${HOME}/etc
# Enable Custom OpenSSL Installation? (!!This feature is NOT currently functional!!)
## *DISABLED*
## CUSTOMSSL="No"
# Path to your OpenSSL install. (Default: /usr/local/ssl)
LIBSSL=/usr/local/ssl
# Set whatever nice value you wish here.
# Higher values indicate lower priority,
# Lower values indicate higher priority.
# Range: -20 to 20
NICE=19
# Name of the WGET install package
# (without any extension, ie: .tar.bz2)
WGT="wget-1.10.2"

# What features of WGET do you wish to enable or disable?
# ***Probably best not to change anything here!***
WGETFEATURES="--prefix=${SRCDIR}/installtmp \
--with-libssl=${LIBSSL} \
--disable-debug"
### Note: Debuging isn't really necessary so it's currently removed.

########## DO NOT MODIFY BELOW ##########
sleep 1s

# Push the bin directory into the path.
export PATH=${BINDIR}:$PATH

## Pre-download clean-up and checking.
# Clear and/or create the source directory.
if [ -d ${SRCDIR} ]; then
		  echo "Source directory already exists! Cleaning it..."
		  rm -rf ${SRCDIR}/*
else
		  echo "Creating source directory..."
		  mkdir -p ${SRCDIR}
fi
# Create the installtmp directory (needed for custom install locations).
if [ -d ${SRCDIR}/installtmp ]; then
		  echo "Something in the script is broken. Aborting..."
		  exit
else
		  echo "Creating the temporary install directory..."
		  mkdir -p ${SRCDIR}/installtmp
fi

# Check for existing wget install and remove it if exists. Else create it.
if [ -d ${BINDIR} ]; then
		  echo "Deleting wget binary if it exists..."
	if [ -a ${BINDIR}/wget ]; then
		  rm ${BINDIR}/wget >/dev/null 2>&1
	else
		  echo "    Wget binary does not exist."
	fi
else
		  echo "Creating BINDIR..."
		  mkdir -p ${BINDIR} >/dev/null 2>&1
fi
# Check for existing wget config directory and create it if it doesn't exist.
if [ -d ${CONFIGDIR} ]; then
		  echo "Config directory exists! Doing nothing..."
else
		  echo "Creating Config directory..."
		  mkdir -p ${CONFIGDIR}
fi

## Grab the required source archives.
set +e
cd ${DISTDIR}
# Wget options
WGETOPT="-t1 -T10 -w5 -q -c"

# Do a bit of error checking while grabbing the sources.
if [ -a ${DISTDIR}/${WGT}.tar.gz ]; then
	echo "Skipping wget of ${WGT}.tar.gz"
else
	wget $WGETOPT ftp://ftp.ucsb.edu/pub/mirrors/linux/gentoo/distfiles/${WGT}.tar.gz
	# If primary mirror fails, use the alternative mirror.
	if [ -a ${DISTDIR}/${WGT}.tar.gz ]; then
		echo "Got ${WGT}.tar.gz"
	else
		wget $WGETOPT http://ftp.gnu.org/gnu/wget/${WGT}.tar.gz
		# Check to make sure the alternative mirror worked.
		if [ -a ${DISTDIR}/${WGT}.tar.gz ]; then
			echo "Got ${WGT}.tar.gz"
		else
			echo "Failed to get ${WGT}.tar.gz. Aborting install!"
			exit 0
		fi
	fi
fi

## Unpack the source archives.
set -e
cd ${SRCDIR}
echo "Extracting ${WGT}..."
tar zxf ${DISTDIR}/${WGT}.tar.gz
echo "Done."

## Compile and install the package(s).
cd ${SRCDIR}/${WGT}
./configure ${WGETFEATURES}
# make clean
nice -n ${NICE} make
make install

## Post-install configuration.
sleep 2s
cd ${HOME} && clear

mv ${SRCDIR}/installtmp/bin/wget ${BINDIR}/wget
mv ${SRCDIR}/installtmp/etc/wgetrc ${CONFIGDIR}/wgetrc
# Minimal Install check
if [ ${MINIMALINSTALL} == "Yes" ]; then
	echo "Minimal Install selected. Wiping additional content."
	# Content is wiped during the post-install clean-up phase.
elif [ ${MINIMALINSTALL} == "No" ]; then
	echo "Full Install selected. Installing additional content."
	mv ${SRCDIR}/installtmp/share ${INSTALLDIR}/share
	mv ${SRCDIR}/installtmp/man ${INSTALLDIR}/man
	mv ${SRCDIR}/installtmp/info ${INSTALLDIR}/info
else
	echo "Unknown MINIMALINSTALL option! Keeping all content."
	mv ${SRCDIR}/installtmp/share ${INSTALLDIR}/share
	mv ${SRCDIR}/installtmp/man ${INSTALLDIR}/man
	mv ${SRCDIR}/installtmp/info ${INSTALLDIR}/info
fi


## Post-install clean-up.
# Kill some Lemmings...
rm -rf ${SRCDIR}/*

if [ ${DISTDEL} == "Yes" ]; then
	rm -rf ${DISTDIR}
elif [ ${DISTDEL} == "No" ]; then
	echo "Your DISTDIR will not be cleaned."
else
	echo "Unknown DISTDEL option! Keeping the contents of your DISTDIR by default."
	sleeps 1s
	rm -rf ${DISTDIR}
fi

## Post-Install Notes
echo ""
echo "   Post-Install Notes:"
echo " ======================="
echo "Please be sure to modify the .bash_profile file to reflect your binary directory's path."
echo "See the wiki article for an example."
echo ""

## End of install
echo "Installation completed!" `date +%r`

#EOF

Now modify your .bash_profile to include your binary path directory (ie. /home/username/bin)
so that your custom Wget install will be used by default:

umask 002
PS1='[\h]$ '
PATH=/home/username/bin:$PATH;

Done!