We’ve all been there - Staring at the progress bar, wondering why task manager continues to say our 32-cores are sitting nearly idle while we run a geoprocessing tool on a huge dataset. OK, that huge dataset isn’t so large anymore - it’s the new normal. But why is ArcMap/Pro toying with us? We did the usual tweaks: data is local, fast drives, 64-bit BGP (for ArcMap), lots of free RAM. Still, the abnormally powerful “GIS workstation” computer (the one we had to beg for and write a justification document to get) is sitting there barely rotating the computer fans? Could be we aren’t taking advantage of the Parallel Processing Factor in ArcGIS.
Parallel processing is a technique which splits up a task into many smaller chunks. The chunks are then assigned to multiple CPUs, cores, or processes to work on parts of the job at the same time. This technique can often result in faster processing times for larger datasets. For the past few releases, Esri has been increasing the number of tools that can take advantage of parallel processing. This allows some tools to distribute their work across multiple cores. Very handy if you have a good computer and want to take advantage of the extra power.
There are some limitations of course. First, not all tools currently take advantage of this. Over time the number of tools has been expanding, so worth reviewing your favourite tools after an upgrade. Second, you might not get the performance boost you’d expect due to the data or analysis you are trying to perform. If using ArcMap, make sure the tool is running in 64-bit mode via Background Geo-Processing as well (BGP).
64-bit and multithreading
Parallel processing is related to 64-bit and multithreading, but there are differences. For most geoprocessing tools, performance is roughly the same between ArcGIS Pro and ArcMap (if using BGP), while some tools in Pro are being updated to be faster/improved over time. 64-bit geoprocessing does not automatically make tools faster - but it does allow for more memory allocation during processing. 32-bit is limited to 2GB memory, while 64-bit doesn’t have this limitation.
64-bit geoprocessing is more robust, results will be more accurate, and processes that previously hung, crashed, or ran out of memory may be able to complete successfully ~ Esri
Likewise, multithreading doesn’t increase the performance of geoprocessing directly. However, with a decent computer, multithreading allows geoprocessing to take place in its own thread. Running independently, this allows other functions to continue on the main thread such as pan/zoom on a map, work with symbology, etc. So multithreading is still very useful if you want to work on other items in your map while the analysis works away on its own thread. Multithreading is used in both Pro and ArcMap (if using background processing in ArcMap).
ArcGIS Parallel processing
So how can we speed up the process? One possible way is to take advantage of the Parallel Processing Factor in ArcGIS. Again, not all tools are set up to take advantage of Parallel processing but Esri has mentioned that more tools will have this feature at every release. ArcGIS Pro 2.1 has 70 geoprocessing tools and ArcMap 10.6 has approximately 30 tools.
ArcGIS Parallel processing is managed by the Parallel Processing Factor environment variable. Some tools now default to parallel processing, while most tools still require you to set the parallel processing factor. The documentation does a good job at letting you know which ones, however you can always set the parameters just in case - or to override the default. In ArcGIS, the environment section will list this variable if the option is available. You can set this variable using the following options:
The Parallel Processing Factor is also available when using the ArcPy module with Python. This can be set the same as other variables at the top of your script if you want to leverage for all tools that accept it:
arcpy.env.parallelProcessingFactor = "80%".
Following the usual best practices, along with leveraging Parallel Processing should reduce the amount of time you spend watching the progress bar. However, don’t go overboard. If you specify more processes than you have available, you could negatively impact performance. One exception to that rule is when the analysis is I/O heavy or processing directly to an Enterprise database connection:
The Add Rasters to Mosaic Dataset tool is I/O bound when the mosaic dataset is stored in an enterprise database. Also, the Build Overviews tool is primarily I/O bound to the disk. You can use more processes than your machine has cores by either specifying a percent value greater than 100% or a number of processes greater than the number of cores on your machine. ~Esri