How to sync large amounts of local documents with Office 365

With the Layer2 Cloud Connector, you can make use of your documents and files stored on a local file server or NAS in the Microsoft Cloud, such as Office 365, OneDrive for Business, Office Groups, or Microsoft Teams. But what to do when the number of files you want to use are amounting to 50K or more?

 

​Generally, the Layer2 Cloud Connector has no hard limit regarding the amount of files to keep in sync between local file servers and the Microsoft Cloud. But, depending on requirements, synchronizing all files in one big chunk into a single SharePoint library could be a bad idea. It can take quite a long time to migrate all these files into the cloud first time. Even when you succeeded, there are other issues that can arise from having too much content in a single library. Furthermore, keeping several large libraries in sync can lock down lots of system resources. To overcome these issues, here are our best practices.

 

Example of connection definition for filesystem to cloud replication

 

Fig.: Layer2 Cloud Connector Connection Manager to setup local data and file sync with Office 365.

 

Best practices to overcome Office 365 large file sync issues

 

Break your document sets up into smaller logical units

 

​Depending on your company, try to break up the content into units that fit into your organizational structure. For example, the data belonging to HR would migrate into a corresponding HR site. Even a single department can be split into smaller units, if necessary, for example to host specific project groups. Instead of trying to use different folders in a single library, use different libraries for each unit or even libraries on different sites. SharePoint, by design, is meant to hold different units of content in different libraries, so by splitting the content this way you are helping SharePoint perform better, as well as the Layer2 Cloud Connector. Maintaining one very big library in SharePoint can be a pain to both administrators and users, not to mention the performance issues that can be caused by accessing large libraries during peak usage hours.

Manage access rights when migrating documents

Additionally, using different libraries for different organizational units makes it as easy as possible to apply appropriate access rights to documents. Note that you cannot sync NTFS access rights to SharePoint because it works very different on both sides. It is best practice to assign access rights to SharePoint libraries only, not individual files. Documents inherit access rights from the SharePoint library on creation (by online users or by the Layer2 Cloud Connector sync).

Be sure to create a migration plan before you start with file sync

It is better to split the content into different libraries and use search or managed metadata to find documents in one result list. Once you’ve planned how you want to split out the content in SharePoint, you can create a Layer2 Cloud Connector connection for each library, manually - or automatically by PowerShell scripting. The connector can help you filter the local data with its SQL-like filtering options in the File System Provider to make sure the right content goes to the right place. You can also include or exclude specific folders and files by name, type, date, or size.

Scale up your environment in case of many connections

The Layer2 Cloud Connector can handle approximately 150 connections configured on one machine, and after version 7.8 it can handle even more. If you need more connections, you can scale up your Layer2 Cloud Connector environment by installing it on other machines (note, each installation needs to be licensed separately).

The SharePoint List View Threshold

The List View Threshold of SharePoint is a long known problem for users working with large lists. SharePoint can store millions of records or files in one library… but it has problems displaying these using views. The Layer2 Cloud Connector can overcome this limit during synchronization, but when using this library your users will still be affected by issues caused by the threshold. Do not use filtered or sorted views in the connection settings. Target against a flat view in the Layer2 Cloud Connector connection string in URL parameter, or apply a specific view using the VIEW parameter. If you use filtered views in SharePoint, make sure to index the filtered fields and do not exceed the 5000 items per view limit.

See here for guidance from Microsoft about how to deal with large lists and libraries:
https://support.office.com/en-us/article/Manage-large-lists-and-libraries-in-SharePoint-B8588DAE-9387-48C2-9248-C24122F07C59

See here for more details about the SharePoint limits and boundaries:
https://technet.microsoft.com/en-us/library/cc262787(v=office.16).aspx#Boundaries

See here for the List View Threshold explained in more technical detail:
https://technet.microsoft.com/en-us/library/cc262813.aspx#Throttling

More best practices advises for Office 365 document migration, backup, and sync 

​It is recommended that for the initial content migration, that you run each connection separately as a manually triggered sync. Once they have all completed successfully then you can schedule them to run automatically. It is best practice to schedule the connections so they run one-at-a-time, in serial, such that the first one finishes before the second one starts. This can be done by staggering the start times of the “First Synchronization” setting for the connections. Also, make sure the interval is appropriate – it is recommended that it be, at a minimum, the average time it takes for the sync to run when it has content to update.
 
If you have content that you need to sync more often, then see if it is possible to break that out into another connection that can run more frequently, and then the other connections can be run during the evening or other non-peak times. This will allow you to wisely use the resources of the machine to get the best performance and timely updates for the more important content.

Make use of SharePoint’s robust search capabilities. SharePoint provides a great search engine with nice web parts to manage and find your content. Having a strong content management plan that takes advantage of SharePoint’s search features will greatly help usability and findability of your sync’d content. Define your facetted search features or switch to a Managed Metadata search with the Term Store feature of SharePoint. If you are on an on-premises system, the Layer2 Auto Tagger can greatly help you with tagging Managed Metadata to your content. In case of SharePoint Online you can apply Managed Metadata to documents using the "Dynamic Columns" feature of the Layer2 Cloud Connector directly in C#.

Office 365 Document Synchronization Performance Test Results

To give an estimation of effort for Office 365 document migration, backup, and sync using the Layer2 Cloud Connector we can give the following example:

Hardware and Software Specifications:

The test server used to sync files with SharePoint Online was a Microsoft Azure machine A3 Standard.

 

  • Quad-core 2.10 GHz processor
  • 7 GB RAM
  • 8x500 max IOPS
  • Load Balancing
  • Windows Server 2012 R2
  • Layer2 Cloud Connector Professional Edition 64-bit (Version 7.6.2)

 

Connection Settings:


  • Data Entity 1 (Source) – Folder on C, no sub folders. The folder includes files with sizes between 4KB and 4MB. File formats are PNG, TXT, PDF, DOCX. 
  • Data Entity 2 (Target) – Office 365 SharePoint library inside a E3 plan. Empty standard document library with no additional settings.
  • One-way synchronization, e.g. for migration (Source -> Target)
  • Data Provider: Layer2 Data Provider for Office365 Fast File Sync / Layer2 Data Provider for File System
  • Auto Mapping enabled
  • Ignore changes within target: TRUE (because of its a one-way sync)

 

Amount of Files and Data Volume:


  • 1,000 files in one folder (490 MB), 0 in SP list
  • 5,000 files in one folder (2.38 GB), 0 in SP list
  • 10,000 files in one folder (4.78 GB), 0 in SP list 

 

Performance Test Procedure: 

Each connection was run separately with manual start.

Performance Test Results for initial migration, and for sync after a few files are changed:


These are results for each test, capturing the initial sync (migration from file share to Office 365) run time in minutes [min], RAM used during the initial sync in megabytes [MB], and an update sync (after some data changes) run in minutes [min]. The values are averages resulting of several runs per test. Below the table are additional notes about the results.

 

 

Office 365 Fast File Synchronization and Migration Performance Results

Amount of Files
1,000 Files
5,000 Files
10,000 Files
Data Volume in MB4902,3804,780
Duration of Initial Sync in Min113267
Number of Files per Minute*90.9156.3149.3
Sync Duration per File* in sec0.70.40.4
Data Volume MB per Minute*44.574.471.3
Duration of a Library Update in sec (no data changes)< 1 min< 1 min< 1 min
Duration of a Library Update in sec (100 - 200 files changed)< 1 min4 min4 min

 

Details about the RAM consumption are available in the Cloud Connector User Documentation.

 

* Rates are all related to the initial sync.

 

 

 

Performance Test Results for initial backup, and for sync after a few files are changed:

 

The backup test includes an initial sync from a document library in SharePoint Online (SPO) to an empty local folder. In this test we extended the test environment on one client machine (Surface 3 Pro, Windows 10 Anniversary Update, 8GB RAM).
In this case we used the Layer2 Data Provider for SharePoint (CSOM) Data Provider, not the Layer2 Data Provider for Office365 Fast File Sync.

 

Office 365 Backup Synchronization and Migration Performance Results

Amount of Files
1,000 Files
1,000 Files
1,000 Files
Backup TargetAzureLocal DiskNet Share
Data Volume in MB490490490
Duration of Initial Sync in Min81216
Used RAM *300-500 MB300-500 MB300-500 MB
CPU Average Utilization17%12%12%
Number of Files per Minute**1288160
Sync Duration per File** in sec0.50.71
Data Volume MB per Minute**634030
Duration of a Library Update in sec (no data changes)< 1 min< 1 min< 1 min
Duration of a Library Update in sec (10 files changed)< 1 min< 1 min< 1 min

 

* RAM usage fluctuated during run, depending on load. It was not increasing during run.

 

** Rates are all related to the initial sync.

 

Office 365 Document Synchronization - Next Steps

 

​Learn more about the features and benefits of Office 365 document synchronization via Layer2 Cloud Connector. Please register for download and evaluation of the Layer2 Cloud Connector here.

READY TO GO NEXT STEPS?

Icon for Product Regsitration - Layer2 leading solutions

Register for free download.

Keep your systems in sync. Download and try the Layer2 Cloud Connector today.

Contact Us Icon for Layer2 leading solutions

Questions? Contact us.

We are here to help. Contact us and our consulting will be happy to answer your questions.