This post builds on the previous Extract a Map Service Layer to Shapefile using Python post. It seems as though many are interested in getting past the pesky limits ArcGIS Server has on querying/extracting data to a shapefile. The initial post does mention this limitation, which is typically 1000 or 5000 features by default depending on the version (and configurable by admins). Part 2 will focus on getting past that limitation to all the juicy goodness… errr, all the data.
Note 2016-10-31: Sample code updated to sort objectIDs before creating groups.
Before We Begin
It might be a good idea to refresh on the details of part 1 as we aren’t covering the same pieces again. And before we go too far, the same disclaimers apply: Make sure you have permission/rights; use this code for good; Don’t bombard servers with your requests; Donate to SpatialTimes (worth a try).
Map Service to Shapefile with Python, No Limits
The first thing to check is the maximum query results allowed by ArcGIS Server. Since this can be configured by the Admin, it might be worth performing a manual query to see what the limits are. For this example, I’m assuming the typical Max Results = 1000 (most Admins don’t bother changing the default).
Get All ObjectIDs, Yes All of Them
Once the maximum is set, we can extract all of the ObjectIDs from the layer and begin to divide this into chunks of 1000 (or other value). Although the maximum results are set, the &returnIdsOnly=true parameter doesn’t have this same restriction. You can return all IDs regardless of how many there are. This result sets up the rest of our queries since you can’t assume ObjectIDs are sequential with no gaps. Update 2016-10-31: The IDsJSON variable originally assumed the IDs were in order which isn’t always the case (added the sorted function).
IDsRequest = serviceURL + serviceMap + "/" + str(serviceLayerID) + "/query?where=1%3D1&returnIdsOnly=true&f=pjson" IDsResponse = urllib2.urlopen(IDsRequest) IDsJSON = json.loads(IDsResponse.read()) IDsSorted = sorted(IDsJSON['objectIds'])
Divide and Conquer
Now that we have all the IDs, we need to turn these results into usable chunks, then setup our extraction queries - each chunk will run another query. A list will be created for the first 1000, then the next 1000, and the next. The last list will be any remainder (probably <1000).
args = [iter(AllOIDs)] * maxRequests ListOfLists = ([e for e in t if e != None] for t in itertools.izip_longest(*args))
Putting it all Together
We are now ready to take the Map Service to Shapefile with Python. We have the Map Service, all IDs, and divided those IDs into n lists. To query each list, we just use a python loop and make the same request as in the original post.
for SingleList in GroupOfLists: Query the service with ObjectID query Make a shapefile
This will create a unique shapefile for each chunk/iteration. If you want to merge/append into a single output, you can add a few lines of code to the end, or just use the Merge Tool or Append tool.
Below is the full code with the only changes required to the URL, Service, and Layer ID variables.
|#Name: Export ArcGIS Server Map Service Layer to Shapefile with Iterate|
|#Author: Bryan McIntosh|
|#Description: Python script that connects to an ArcGIS Server Map Service and downloads a single vector layer|
|# to shapefiles. If there are more features than AGS max allowed, it will iterate to extract all features.|
|ws = os.getcwd() + os.sep|
|#Set connection to ArcGIS Server, map service, layer ID, and server max requests (1000 is AGS default if not known).|
|serviceURL = "https://domain/arcgis/rest/services"|
|serviceMap = "/Public/Directory/MapServer"|
|serviceLayerID = 0|
|serviceMaxRequest = 1000|
|dataOutputName = "MyShapefile"|
|IDsRequest = serviceURL + serviceMap + "/" + str(serviceLayerID) + "/query?where=1%3D1&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&returnIdsOnly=true&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&f=pjson"|
|IDsResponse = urllib2.urlopen(IDsRequest)|
|IDsJSON = json.loads(IDsResponse.read())|
|IDsSorted = sorted(IDsJSON['objectIds'])|
|def defGroupList(n, iterable):|
|args = [iter(iterable)] * n|
|return ([e for e in t if e != None] for t in itertools.izip_longest(*args))|
|def defQueryExtractRequests(idMin, idMax):|
|myQuery = "&where=objectid+>%3D+" + idMin + "+and+objectid+<%3D+" + idMax|
|myParams = "query?geometryType=esriGeometryEnvelope&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=*&returnGeometry=true&geometryPrecision=&outSR=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&returnZ=false&returnM=false&returnDistinctValues=false&returnTrueCurves=false&f=pjson"|
|myRequest = serviceURL + serviceMap + "/" + str(serviceLayerID) + "/" + myParams + myQuery|
|response = urllib2.urlopen(myRequest)|
|myJSON = response.read()|
|# Write response to json text file|
|foo = open(dataOutputName + idMin + ".json", "w+")|
|# Create Feature Class|
|arcpy.JSONToFeatures_conversion(dataOutputName + idMin + ".json", ws + dataOutputName + idMin + ".shp")|
|#Get all objectIDs (OIDs) for the layer (there is no server limit for this request)|
|AllObjectIDs = defServiceGetIDs()|
|#Divide the OIDs into chunks since there is a limit to map queries (assumed limit stored in serviceMaxRequest variable)|
|ObjectID_Groups = list(defGroupList(serviceMaxRequest, AllObjectIDs))|
|#Create a shapefile for each chunk|
|for ObjectID_Group in ObjectID_Groups:|
|idMin = str(ObjectID_Group)|
|idMax = str(ObjectID_Group[-1])|
|#Append all shapefiles if desired|