Copy data from one postgres instance to another. remote copy options explored: Copy, CSV and STDIN
Its very common use case to copy data from one database instance to another be it from local to staging, staging to production etc.
For copying data from one PostgreSQL instance to another, there are a couple of techniques.
You can take a data dump as an SQL file or as a CSV from your PostgreSQL instance and do a restore.
But what happens when you are dealing with Gigs of data?
It might not be a good idea to do a dump of 100s of Gigs to your local system and then copy to the remote host and restore.
Well, PostgreSQL also has a \copy
statement which can be quite handy in this case. What if I say that you can even do a remote copy of data from one database to another over ssh. Interesting isn't it.
Let's explore various options for copying data to from local/remote servers.
I am not going to cover the pg_dump
and pg_restore
methods here as there are a lot of resources out there explaining those in detail.
Understanding COPY
and \copy
statements
COPY
command is to input and output data between database and file only in the database server. If we connect to the database remotely (E.g. from another computer), we cannot use COPY
command from the remote computer. In other words, input file or output file has to be in the database server. This is because SQL statements that are entered at the psql
prompt are sent to the server before they are executed. This means that any file paths included in SQL statements are interpreted by the server. Since the server doesn't know what directory a user is in when they execute a statement, this means that all file paths have to be absolute.
On the other hand, \COPY
, the meta command provided by PostgreSQL, is to input or output file in the client computer. If we connect to database server remotely by utilizing psql
command, we can input or use a file on the client computer. This meta command initiates copies from the client (which is the psql
process in this case), and this allows it to interpret paths that are relative to the user's current directory.
Copy data from a CSV file to local database.
The simplest solution to copy data from one database to another is to save to a local file and then restore it
\COPY users TO 'users.csv' CSV HEADER
The above command selects the data that you want and then writes it to a CSV file using the \copy
command.
Now its time to restore the CSV file to the second database.
Loading the data is also pretty straightforward with the same \copy
command.
\COPY users FROM 'users.csv' WITH CSV HEADER;
Copy data using STDIN to a local database.
While it's useful to save the data to a local CSV file, it's not always needed. You can even copy the data from one database or table using the STDOUT-> STDIN technique.
psql
-h localhost \
-d your_primary_db \
-U postgres -c \
"\copy users (id, email, first_name, last_name) from STDIN
with delimiter as ','" \ < /tmp/users.csv
Note that the above statement also leverages SQL statements inside the \copy
statement thereby you can choose what data you need to copy.
Copy data from a CSV file to remote database.
The following command copies data from a local CSV file to a remote PostgreSQL database
psql \
-h remotehost \
-d your_primary_db \
-U postgres \
-c "\copy users (id, email, first_name, last_name) from '/tmp/users.csv' with delimiter as ','"
Copy data using STDIN to a remote database.
Now, lets use STDIN for reading the CSV file and copying data to a remote host.
psql
-h remotehost \
-d your_primary_db \
-U postgres -c \
"\copy users (id, email, first_name, last_name) from STDIN
with delimiter as ','" \ < /tmp/users.csv
Copy data from one server to another server
psql \
-U user_name \
-h production_server \
-d database_name \
-c "\\copy users to stdout" | \
psql -U user_name \
-h staging_server \
-d database_name \
-c "\\copy users from stdin"
The above command STDOUTs the data from the production server and copies the same data over STDIN on the staging server.
Other techniques
There are also a few other techniques that can be used to copy data from one server to another like setting up replication between databases or doing a full snapshot replication of the db.
Reference:
https://www.postgresql.org/docs/current/static/sql-copy.html
Hope that someone finds it useful. 🖖🏻