Remote Server Access

3.1.3. Remote Server Access#

A server is essentially a computer that provides and processes information. Even your personal desktop or laptop can act as a server! You can read more about it here.

A remote server is a powerful computer that you can access over the internet. It’s typically a cloud-based resource that abstracts away the need to manage physical hardware. You can interact with it in various ways to perform tasks like hosting websites, processing data, streaming content, and more.

Large datasets are often hosted on remote servers to provide users with convenient and scalable access. These datasets may not be accessible via browser downloads or APIs due to their size, sensitivity, or access constraints.

In such cases, tools like SSH, SCP, SFTP, and rsync are used to securely connect to and interact with remote systems. These tools provide secure, reliable, and efficient ways to access and transfer data.

3.1.3.1. SSH#

SSH (Secure Shell) is one of the most commonly used protocols for interacting with remote servers. It provides command-line access to the remote system. You can use it to log in and execute commands on the remote machine. Most operating systems come with tools or software that allow you to SSH into a remote server (yes, it’s often used as a verb!).

On the remote server, the ssh server is a continuously running service that handles incoming SSH connections and provides access to the machine for authorized clients. ssh clients are programs that can connect to any SSH server. Operating systems like Linux and macOS come with the built-in ssh utility. On Windows, tools like PuTTY or Git Bash are commonly used for SSH access.

You can read more about SSH here.

SSH Components#

To connect to a remote server, you typically need the following details:

IP address of the remote server: This identifies where the server is located on the network.
Port: The network port on the remote machine that accepts SSH connections. Port 22 is the default, but it may vary depending on the server’s configuration.
Username: Just like logging into any computer, you need a username to access the remote server.
Password (optional): Some servers use password-based authentication to restrict access to authorized users.
Key (optional): Instead of a password, it’s best practice to use a cryptographic key pair (public and private key). The public key is stored on the server, and the private key is used by the user. When logging in, the server verifies the private key against the stored public key and grants access if they match.

Let’s look at an example of using SSH to log in to a remote server and execute commands securely.

The basic SSH connection command on Linux and macOS looks like this:

ssh -p <PORT> <USERNAME>@<REMOTE_SERVER_IP>

If the authentication type is password-based, you’ll be prompted to enter the password after running the command.

If the authentication type is key-based, you can specify the private key using the -i flag:

ssh -i <PATH_TO_PRIVATE_KEY> -p <PORT> <USERNAME>@<REMOTE_SERVER_IP>

After successful authentication, you’ll get command-line access to the remote server!

3.1.3.2. SCP#

Secure Copy Protocol (SCP) is used to securely upload or download files between local and remote systems. It provides an interface similar to the cp (copy) command on Linux and macOS, but with the ability to transfer files across different machines. Behind the scenes, it leverages the SSH protocol for secure data transfer.

If you know the exact directory structure of the file you want, you can use the scp command to transfer it from a remote server to your local machine - or vice versa!

SCP Components#

SCP requires a similar set of details as those used for SSH Components, with a few additional parameters:

Source (file/folder): The path to the file or folder that needs to be transferred.
Destination: The path where the file or folder should be copied to - either on the local machine or on the remote server.

Let’s look at some examples!

Sure, here are the updated scp examples with key-based authentication and custom port, matching the SSH format:

Let’s look at some examples:

Upload a local file to a remote server

scp -i <PATH_TO_PRIVATE_KEY> -P <PORT> path/to/local_file.csv <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/

Download a file from a remote server to your local system

scp -i <PATH_TO_PRIVATE_KEY> -P <PORT> <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/file.csv ./local_folder/

Upload multiple files to a remote server

scp -i <PATH_TO_PRIVATE_KEY> -P <PORT> file1.csv file2.csv <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/

Upload an entire folder to a remote server

scp -i <PATH_TO_PRIVATE_KEY> -P <PORT> -r path/to/local_folder/ <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/

Download an entire folder from a remote server to your local machine

scp -i <PATH_TO_PRIVATE_KEY> -P <PORT> -r <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/folder_name/ ./local_destination/

You can read more about scp here.

3.1.3.3. FTP and SFTP#

The File Transfer Protocol (FTP) is a standard protocol used to transfer files between a client and a server. However, it transmits data - including usernames and passwords - in plain text, making it insecure, especially over public networks. While FTP supports authentication, the lack of encryption poses serious security risks.

An extension of FTP is FTPS (FTP Secure), which adds SSL/TLS encryption to secure the data in transit.

Read more about FTP here. You can learn more about FTPS here.

Another extension is the Secure File Transfer Protocol (SFTP). Unlike FTP, SFTP is based on the SSH protocol and provides a secure, encrypted channel for file transfers. It offers an FTP-like interface but with much stronger security guarantees.

SFTP allows users to run a variety of file-management commands and transfer files or folders between local and remote systems. It is widely adopted as a secure, drop-in replacement for traditional FTP.

3.1.3.4. Rsync#

Remote sync (rsync) is a utility used to synchronize files and folders between a remote server and a local machine. It transfers only the parts of files that have changed, making it highly efficient for large or incremental updates.

While rsync uses the SSH protocol to connect to remote servers, it has its own internal algorithm to detect file differences (delta changes). This allows it to update only the modified portions, reducing bandwidth usage and speeding up the transfer process.

It’s especially useful for datasets that grow or change over time.

Learn more about rsync here.

Let’s look at some examples of rsync -

Sync a local file to a remote server (custom SSH key and port)

rsync -e "ssh -i <PATH_TO_PRIVATE_KEY> -p <PORT>" ./local_file.csv <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/

Sync a remote file to your local machine

rsync <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/file.csv ./local_folder/

Sync a local folder to a remote server

rsync -avz ./local_folder/ <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/

Sync a folder from a remote server to your local machine

rsync -avz <USERNAME>@<REMOTE_SERVER_IP>:/remote/path/folder_name/ ./local_destination/

-a : archive mode (preserves permissions, timestamps, etc.)
-v : verbose output
-z : compress file data during transfer