Wget
From DreamHost
| The instructions provided in this article or section require shell access unless otherwise stated. You can use the PuTTY client on Windows, or SSH on UNIX and UNIX-like systems such as Linux or Mac OS X. |
Contents |
Mainlining your server with wget, or How do I avoid the painful, slow download/upload process?
Using the wget program over SSH at the UNIX shell command line prompt is a great shortcut for uploading software or other files from a remote server to your DreamHost server. You can avoid the sometimes painful and slow download/upload process, and mainline downloads straight to DreamHost's server using their big, fast pipes.
Note: rsync may be a better (faster, less complicated) option for users migrating between two rsync enabled servers (such as moving from DH to DH PS)
Wget is a powerful tool, with lots of options, but even the basics are useful.
Prerequisites: You need to have an SSH or Telnet client, and know how to log on the server and change directory (cd command) to where you want to "inject" your files.
Basic Usage
wget http://www.your-favorite-download-site.org/directory/file-for-you.tar.gz
or
wget ftp://ftp-site.org/directory/file-for-you.tar.gz
It should be possible to copy/paste the URL from your browser to the shell command line, but you're on your own to find the right combination of menu, keyboard or mouse-click motions (e.g. Edit Copy, Ctrl-C, right-click; Edit Paste, Ctrl-V, right-click or center-click).
When the file is on the server, you may need to use [[w:gunzip}]], unzip and/or tar to expand and unpack the download. For the example above, tar xvzf file-for-you.tar.gz will do the trick. For a zip file: unzip file.zip. For a plain gzip file: gunzip file.gz.
If you need to pass variables to a script then you may need to enclose the URL in sinqle quotes. This will prevent the ampersand character from being interpreted as the shell command.
wget 'http://www.your-favorite-download-site.org/myscript.php?var1=foo&var2=bar'
Advanced Usage
To create a mirror image of a folder on a different server (with the same structure as the original has) you could simply ftp into the server and transfer it:
wget -r ftp://username:password@yourdomain.com/folder/*
This will now download 'folder/' and everything within it keeping its directory structure. This can save you a lot of time rather than using wget on each file individually.
Now you could simply zip the folder using:
zip -r folder.zip folder
and then clean up by deleting the copy:
rm -rf folder
Its a great way to backup your whole website at once and of course its very helpful moving large sites across hosts.
Download the entire contents of example.com
wget -r -l 0 http://www.example.com/
Taken from: GNU Wget Manual - Examples - Advanced Usage
Man Page Info
Do man wget in shell for more options, but the following is an excerpt:
NAME
Wget - The non-interactive network downloader.
SYNOPSIS
wget [option]... [URL]...
DESCRIPTION
GNU Wget is a free utility for non-interactive download of files from the Web.
It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
Wget is non-interactive, meaning that it can work in the background, while the user is not logged on.
This allows you to start a retrieval and disconnect from the system, letting Wget finish the work.
By contrast, most of the Web browsers require constant user's presence, which can be a great
hindrance when transferring a lot of data.
Wget can follow links in HTML and XHTML pages and create local versions of remote web sites,
fully recreating the directory structure of the original site. This is sometimes referred to as
"recursive downloading." While doing that, Wget respects the Robot Exclusion Standard
(/robots.txt). Wget can be instructed to convert the links in downloaded HTML files to the
local files for offline viewing.
Wget has been designed for robustness over slow or unstable network connections; if a download
fails due to a network problem, it will keep retrying until the whole file has been retrieved.
If the server supports regetting, it will instruct the server to continue the download from
where it left off.
Custom Installation
| The instructions provided in this article or section are considered advanced. You are expected to be knowledgeable in the UNIX shell. |
For those of us who'd like to take advantage of the latest version of Wget, specifically versions with 'large file support', the information below will get you started.
Please keep in mind this article was designed for ADVANCED USERS who already have some *nix shell experience.
Custom Wget installations will NOT be supported by DH Staff.
TODO:
- Create an 'uninstall' feature.
- Custom OpenSSL option.
Create and run the follwing shell install script in your home directory:
wget_install.sh
#!/bin/sh
set -e
# Version 1.0.2, 2007-09-19
#
# - Initial Release 2007-09-19 by Chris Shymanik (chris@chipsncheese.com)
# - Custom OpenSSL support still in development.
# - Optional locale/man/info file wipe option added (1.0.2)
## USER CONFIGURATION OPTIONS
# Where do you want all this stuff built?
# ***Don't pick a directory that already exists!***
# NOte: Directories that don't exist will be created for you!
SRCDIR=${HOME}/source
# Set DISTDIR to somewhere persistent.
DISTDIR=${HOME}/dist
# Delete contents of DISTDIR after installation? (Default: Yes)
DISTDEL="Yes"
# Wipe "unneeded" contents (info, man, and locale directories)?
# (Default: Yes)
MINIMALINSTALL="Yes"
# Where to install everything to? (Default: ${HOME}
# Note: Best to leave this AS IS for now. You've been warned.
INSTALLDIR=${HOME}
# Set BINDIR to wherever you keep your binaries.
# The default is, ${HOME}/bin (/home/username/bin).
BINDIR=${HOME}/bin
# Set CONFIGDIR to wherever you keep your config (etc) files.
# The defauls is, ${HOME}/etc (/home/username/etc).
CONFIGDIR=${HOME}/etc
# Enable Custom OpenSSL Installation? (!!This feature is NOT currently functional!!)
## *DISABLED*
## CUSTOMSSL="No"
# Path to your OpenSSL install. (Default: /usr/local/ssl)
LIBSSL=/usr/local/ssl
# Set whatever nice value you wish here.
# Higher values indicate lower priority,
# Lower values indicate higher priority.
# Range: -20 to 20
NICE=19
# Name of the WGET install package
# (without any extension, ie: .tar.bz2)
WGT="wget-1.10.2"
# What features of WGET do you wish to enable or disable?
# ***Probably best not to change anything here!***
WGETFEATURES="--prefix=${SRCDIR}/installtmp \
--with-libssl=${LIBSSL} \
--disable-debug"
### Note: Debuging isn't really necessary so it's currently removed.
########## DO NOT MODIFY BELOW ##########
sleep 1s
# Push the bin directory into the path.
export PATH=${BINDIR}:$PATH
## Pre-download clean-up and checking.
# Clear and/or create the source directory.
if [ -d ${SRCDIR} ]; then
echo "Source directory already exists! Cleaning it..."
rm -rf ${SRCDIR}/*
else
echo "Creating source directory..."
mkdir -p ${SRCDIR}
fi
# Create the installtmp directory (needed for custom install locations).
if [ -d ${SRCDIR}/installtmp ]; then
echo "Something in the script is broken. Aborting..."
exit
else
echo "Creating the temporary install directory..."
mkdir -p ${SRCDIR}/installtmp
fi
# Check for existing wget install and remove it if exists. Else create it.
if [ -d ${BINDIR} ]; then
echo "Deleting wget binary if it exists..."
if [ -a ${BINDIR}/wget ]; then
rm ${BINDIR}/wget >/dev/null 2>&1
else
echo " Wget binary does not exist."
fi
else
echo "Creating BINDIR..."
mkdir -p ${BINDIR} >/dev/null 2>&1
fi
# Check for existing wget config directory and create it if it doesn't exist.
if [ -d ${CONFIGDIR} ]; then
echo "Config directory exists! Doing nothing..."
else
echo "Creating Config directory..."
mkdir -p ${CONFIGDIR}
fi
## Grab the required source archives.
set +e
cd ${DISTDIR}
# Wget options
WGETOPT="-t1 -T10 -w5 -q -c"
# Do a bit of error checking while grabbing the sources.
if [ -a ${DISTDIR}/${WGT}.tar.gz ]; then
echo "Skipping wget of ${WGT}.tar.gz"
else
wget $WGETOPT ftp://ftp.ucsb.edu/pub/mirrors/linux/gentoo/distfiles/${WGT}.tar.gz
# If primary mirror fails, use the alternative mirror.
if [ -a ${DISTDIR}/${WGT}.tar.gz ]; then
echo "Got ${WGT}.tar.gz"
else
wget $WGETOPT http://ftp.gnu.org/gnu/wget/${WGT}.tar.gz
# Check to make sure the alternative mirror worked.
if [ -a ${DISTDIR}/${WGT}.tar.gz ]; then
echo "Got ${WGT}.tar.gz"
else
echo "Failed to get ${WGT}.tar.gz. Aborting install!"
exit 0
fi
fi
fi
## Unpack the source archives.
set -e
cd ${SRCDIR}
echo "Extracting ${WGT}..."
tar zxf ${DISTDIR}/${WGT}.tar.gz
echo "Done."
## Compile and install the package(s).
cd ${SRCDIR}/${WGT}
./configure ${WGETFEATURES}
# make clean
nice -n ${NICE} make
make install
## Post-install configuration.
sleep 2s
cd ${HOME} && clear
mv ${SRCDIR}/installtmp/bin/wget ${BINDIR}/wget
mv ${SRCDIR}/installtmp/etc/wgetrc ${CONFIGDIR}/wgetrc
# Minimal Install check
if [ ${MINIMALINSTALL} == "Yes" ]; then
echo "Minimal Install selected. Wiping additional content."
# Content is wiped during the post-install clean-up phase.
elif [ ${MINIMALINSTALL} == "No" ]; then
echo "Full Install selected. Installing additional content."
mv ${SRCDIR}/installtmp/share ${INSTALLDIR}/share
mv ${SRCDIR}/installtmp/man ${INSTALLDIR}/man
mv ${SRCDIR}/installtmp/info ${INSTALLDIR}/info
else
echo "Unknown MINIMALINSTALL option! Keeping all content."
mv ${SRCDIR}/installtmp/share ${INSTALLDIR}/share
mv ${SRCDIR}/installtmp/man ${INSTALLDIR}/man
mv ${SRCDIR}/installtmp/info ${INSTALLDIR}/info
fi
## Post-install clean-up.
# Kill some Lemmings...
rm -rf ${SRCDIR}/*
if [ ${DISTDEL} == "Yes" ]; then
rm -rf ${DISTDIR}
elif [ ${DISTDEL} == "No" ]; then
echo "Your DISTDIR will not be cleaned."
else
echo "Unknown DISTDEL option! Keeping the contents of your DISTDIR by default."
sleeps 1s
rm -rf ${DISTDIR}
fi
## Post-Install Notes
echo ""
echo " Post-Install Notes:"
echo " ======================="
echo "Please be sure to modify the .bash_profile file to reflect your binary directory's path."
echo "See the wiki article for an example."
echo ""
## End of install
echo "Installation completed!" `date +%r`
#EOF
Now modify your .bash_profile to include your binary path directory (ie. /home/username/bin)
so that your custom Wget install will be used by default:
umask 002 PS1='[\h]$ ' PATH=/home/username/bin:$PATH;
Done!

