AWS Beginner Level

2,841 views

13 Easy Steps for Syncing Data from On-Premises To AWS S3 Using DataSync

Akash Kumar • Published on August 2, 2026

6 min read | 1,200 words

Dev Knowledge • Hub

Migrating massive on-premises datasets to the cloud is a complex engineering task that often poses challenges regarding transfer speed, data integrity, and network reliability. Amazon Web Services provides AWS DataSync, a fully managed online data transfer service that simplifies, automates, and accelerates data replication to AWS storage services. This step-by-step tutorial walks you through the practical process of configuring an AWS DataSync agent and setting up a secure, automated data synchronization pipeline from an on-premises NFS server directly to an Amazon S3 bucket.

⚡ Key Takeaways

AWS DataSync speeds up data transfers up to 10 times faster than open-source tools like rsync or rclone by using a proprietary network protocol.
An on-premises DataSync Agent must be deployed on a virtual machine (or EC2 instance for simulation) with a minimum of 4 vCPUs and 16 GiB RAM.
DataSync preserves file metadata, directory structures, and system permissions seamlessly between source and destination endpoints.
All transferred data is encrypted in transit using TLS, and integrity checks are executed automatically during and after the transfer process.

Why AWS DataSync is the Optimal Choice for Cloud Migration

Standard data transfer scripts (such as aws s3 sync) run into performance bottlenecks when handling millions of small files or struggling with network latency. AWS DataSync bypasses these limitations by employing a multi-threaded, proprietary data transfer protocol designed to maximize network bandwidth. DataSync handles connection retries automatically, manages data integrity verifications on the fly, and provides native integration with Amazon CloudWatch for end-to-end monitoring. This minimizes custom scripting efforts, letting engineers focus on data strategy rather than connection monitoring.

Understanding the DataSync Architecture

Before launching the configuration process, it is important to understand the four primary components of an AWS DataSync deployment:

DataSync Agent: A virtual appliance deployed in your local datacenter environment that reads from your source file systems and securely streams the data to AWS.
Source Location: The configuration endpoint representing the source system, which can include NFS shares, SMB shares, HDFS clusters, or self-managed object storage.
Destination Location: The AWS storage endpoint representing your target service, such as Amazon S3, Amazon EFS, or Amazon FSx.
Task: The logical execution job that binds the source location, destination location, and transfer parameters (e.g., bandwidth limits, scheduling, and validation criteria) together.

13 Steps to Sync On-Premises Data to Amazon S3

To simulate an on-premises datacenter inside AWS, we will deploy an Amazon EC2 instance to act as our local NFS server, alongside an EC2 instance hosting our DataSync Agent.

Step 1: Retrieve the Latest DataSync AMI ID

To begin, retrieve the latest, officially validated AWS DataSync Amazon Machine Image (AMI) ID for your target AWS Region. You can query the AWS Systems Manager (SSM) Parameter Store via the AWS Command Line Interface (CLI) by running the following command:

aws ssm get-parameter --name /aws/service/datasync/ami --region us-east-1

Step 2: Launch the DataSync Agent Instance

Launch an EC2 instance using the AMI ID retrieved in the previous step. For production environments, ensure you provision a host with at least 4 vCPUs and 16 GiB of RAM (a t2.xlarge instance is ideal for simulation). Assign a public IP address to the instance, and ensure the security group permits inbound traffic on port 80 (HTTP) specifically from your administrative workstation to retrieve the activation key safely.

Step 3: Access the AWS DataSync Console

Sign in to the AWS Management Console, navigate to the search bar, type DataSync, and click on the service to open the landing page. In the left navigation menu, click on Agents, and then click on Create agent to initiate the connection wizard.

Step 4: Retrieve the Agent Activation Key

Under the Hypervisor section, choose Amazon EC2. For the Service Endpoint, select Public endpoints. In the Agent Address field, input the public IP address of the running DataSync Agent instance you deployed in Step 2. Click the Get key button, which triggers a localized HTTP request to your agent on port 80 to fetch the activation token securely.

Step 5: Finalize Agent Creation

Once the activation key retrieval is successful, provide a descriptive name for your agent (e.g., on-premise-nfs-agent). Click the Create agent button. Within seconds, your agent will establish a connection to AWS and display a status of Online, indicating it is ready to execute data transfer tasks.

Step 6: Deploy the Simulated On-Premises Server

To simulate your local datacenter file system, deploy a standard Linux EC2 instance (a cost-effective t2.micro is sufficient). This host will act as your local file server and store the data payload that you wish to replicate to Amazon S3.

Step 7: Configure Network Security Group Rules

Ensure the security group attached to your simulated on-premises server allows inbound traffic on port 2049 (NFS) specifically from the security group of your DataSync Agent. This allows the agent to read file systems without exposing ports to the public internet.

Step 8: Install and Configure the NFS Daemon

Establish an SSH connection to your simulated on-premises server. Install the NFS utility package and start the NFS server daemon by running the following commands in your terminal:

sudo yum update -y
sudo yum install nfs-utils -y
sudo systemctl enable --now nfs-server

Step 9: Create and Share the Source Directory

Create a directory named /test to store your test files. Inside this directory, generate a mock text file named sampletext.txt. Configure your NFS exports file to share the directory with the DataSync Agent's IP address. Set the appropriate permissions using the following commands:

sudo mkdir /test
sudo chown -R nobody:nobody /test
echo "This is a sample file for AWS DataSync replication." | sudo tee /test/sampletext.txt
echo "/test *(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
sudo exportfs -arv

Step 10: Configure Source and Destination Endpoints

Return to the AWS DataSync Console. Click on Tasks, and then click Create task. Configure the source location with the following parameters:

Location Type: Network File System (NFS)
Agent: Select the agent you activated in Step 5
NFS Server: Enter the private IP address of your simulated NFS server
Mount Path: Enter /test

Click Next, and configure your destination location by selecting Amazon S3, choosing your target S3 bucket, and defining your desired S3 folder prefix.

Step 11: Configure Task Logging and Permissions

Provide a clear task name (e.g., nfs-to-s3-sync). Under the execution configurations, leave the data verification and transfer parameters at their default values. For the IAM execution role, choose the Autogenerate option, which automatically creates a secure IAM role granting DataSync permission to write files directly into your destination S3 bucket.

Step 12: Wait for Task Deployment to Complete

Click Next, review your configuration parameters on the summary page, and then click Create task. Monitor the task dashboard and wait a moment until the task status changes from Creating to Available.

Step 13: Execute and Verify the DataSync Task

Click the Start button on the task details page and select Start with defaults. AWS DataSync will analyze your source directory, calculate the differences, secure a connection, and begin streaming the files. Once the status displays Success, navigate to your target Amazon S3 bucket using the AWS S3 Console to verify that sampletext.txt has successfully synchronized with its directory structure intact.

Quick Comparison: AWS Online Data Transfer Options

Feature	AWS DataSync	AWS Storage Gateway	AWS Transfer Family
Primary Use Case	One-time or recurring bulk migrations and synchronization.	Hybrid storage caching; seamless local file access.	FTP, SFTP, and FTPS client integrations.
Performance Optimization	Proprietary network acceleration protocol.	Local cache disk read/write optimization.	Standard file transfer protocols over SSH/SSL.
Protocol Support	NFS, SMB, HDFS, Object Storage (S3 API).	NFS, SMB, iSCSI volume mappings.	SFTP, FTPS, FTP, AS2.
Integrity Checking	Automatic, end-to-end checksum verification.	Implicit storage layer consistency validations.	Client-managed verification models.

❓ Frequently Asked Questions

Can AWS DataSync sync data between other public clouds and AWS S3?

Yes. AWS DataSync can copy data from alternative public cloud storage providers, such as Google Cloud Storage (using the S3 compatible API) or Microsoft Azure Files (using the SMB protocol). You can deploy the DataSync agent on an Amazon EC2 instance or within your alternative cloud environment to orchestrate the migration.

Does AWS DataSync encrypt data during the synchronization process?

Absolutely. AWS DataSync ensures that all data transferred between the local datacenter agent and AWS storage services is encrypted in transit using Transport Layer Security (TLS). Furthermore, the data written to Amazon S3 is encrypted at rest using default S3 managed keys (SSE-S3) or your custom KMS keys.

How does AWS DataSync handle data validation?

AWS DataSync calculates and records a checksum for every file at the source location and compares it to the checksum of the copied file at the destination. You can configure tasks to verify only transferred data, verify the entire dataset upon task completion, or disable verification entirely to speed up transfers of non-critical data.

How does AWS DataSync charge for data transfers?

AWS DataSync is priced using a simple usage-based billing model. You are charged a flat rate per gigabyte (GB) of data transferred from your source system to AWS. There are no licensing fees, and you only pay for what you actually transfer. Standard AWS data transfer and storage costs apply separately.

🎯 Conclusion

AWS DataSync is a powerful, highly secure, and exceptionally efficient tool that eliminates the stress of cloud data migration. By automating deployment agents, managing location endpoints, and establishing structured replication tasks, you can successfully sync massive, on-premises file structures to AWS S3 with absolute peace of mind. Take control of your enterprise migrations by transitioning from unstable, custom-made transfer scripts to AWS DataSync today. Upgrading your cloud deployment and infrastructure capabilities with modern data services will ensure that your business remains highly resilient, scalable, and prepared for future growth!

Related Topics: AWS DataSync, on-premises migration, Amazon S3 bucket, NFS sharing, cloud data transfer, data synchronization, Systems Manager parameter, Storage Gateway comparison

12 Top Cloud Computing Skills to Nurture a Career in 2022

14 Best AI Tools for Developers and Tech Teams in 2026

Written By Akash Kumar

Senior Software Developer

Akash Kumar is a Senior Software Developer with 6+ years of experience as a full stack developer. He specializes in designing and building scalable web applications, optimizing cloud infrastructure, and implementing modern DevOps workflows.

13 Easy Steps for Syncing Data from On-Premises To AWS S3 Using DataSync

⚡ Key Takeaways

Why AWS DataSync is the Optimal Choice for Cloud Migration

Understanding the DataSync Architecture

13 Steps to Sync On-Premises Data to Amazon S3

Step 1: Retrieve the Latest DataSync AMI ID

Step 2: Launch the DataSync Agent Instance

Step 3: Access the AWS DataSync Console

Step 4: Retrieve the Agent Activation Key

Step 5: Finalize Agent Creation

Step 6: Deploy the Simulated On-Premises Server

Step 7: Configure Network Security Group Rules

Step 8: Install and Configure the NFS Daemon

Step 9: Create and Share the Source Directory

Step 10: Configure Source and Destination Endpoints

Step 11: Configure Task Logging and Permissions

Step 12: Wait for Task Deployment to Complete

Step 13: Execute and Verify the DataSync Task

Quick Comparison: AWS Online Data Transfer Options

❓ Frequently Asked Questions

Can AWS DataSync sync data between other public clouds and AWS S3?

Does AWS DataSync encrypt data during the synchronization process?

How does AWS DataSync handle data validation?

How does AWS DataSync charge for data transfers?

🎯 Conclusion

Written By Akash Kumar

Frequently Asked Questions (FAQ)

Was this page helpful?

Thank You!

Comments (0)

13 Easy Steps for Syncing Data from On-Premises To AWS S3 Using DataSync

⚡ Key Takeaways

Why AWS DataSync is the Optimal Choice for Cloud Migration

Understanding the DataSync Architecture

13 Steps to Sync On-Premises Data to Amazon S3

Step 1: Retrieve the Latest DataSync AMI ID

Step 2: Launch the DataSync Agent Instance

Step 3: Access the AWS DataSync Console

Step 4: Retrieve the Agent Activation Key

Step 5: Finalize Agent Creation

Step 6: Deploy the Simulated On-Premises Server

Step 7: Configure Network Security Group Rules

Step 8: Install and Configure the NFS Daemon

Step 9: Create and Share the Source Directory

Step 10: Configure Source and Destination Endpoints

Step 11: Configure Task Logging and Permissions

Step 12: Wait for Task Deployment to Complete

Step 13: Execute and Verify the DataSync Task

Quick Comparison: AWS Online Data Transfer Options

❓ Frequently Asked Questions

Can AWS DataSync sync data between other public clouds and AWS S3?

Does AWS DataSync encrypt data during the synchronization process?

How does AWS DataSync handle data validation?

How does AWS DataSync charge for data transfers?

🎯 Conclusion

Written By Akash Kumar

Frequently Asked Questions (FAQ)

Was this page helpful?

Thank You!

Comments (0)

Begin Programming Diagnostic

Compiling Cognitive Telemetry

Your Programming Skill Scan Report