If we are using Python with ArcGIS, there are existing tools to append data. However, calling a Geoprocessing tool from Python doesn’t feel very Pythonic. It also doesn’t give us access to the data before loading. Instead of calling the Append Tool, we can use cursors to append data and open the door to other data manipulation options. More specifically, we will use a SearchCursor on source input data, and an InsertCursor on a Target schema to show how we can replicate or enhance our Python script. Both the Append Tool and the Cursor approach have their pros and cons - just nice to have some options.
A Cursory Overview
Cursors are an extremely fast way to read/write features while giving you lots of control over the features, attributes, and filters. Cursors can include a SQL clause or deconstruct features into their individual points or vertices.
A cursor is a data access object that can be used either to iterate through the set of rows in a table or to insert new rows into a table. In our case, we are going to read and write features. If using ArcMap, there are two different classes of cursors - old and new. No need to discuss the old ones, just disregard them and just use the new arcpy.da.cursor classes.
First, we need to read the input features from a Feature Class that we want to append somewhere. Since we just need to read these features we can use a SearchCursor. This SearchCursor requires a minimum of two parameters, the input features (Feature Class, Layer, Table, Table View), and a list of field names. Use an asterisk (*) instead of a list of fields if you want to access all fields from the input table. There are some limitations with the asterisk which we will get into later.
# Setup a search cursor on our new data and iterate the rows with arcpy.da.SearchCursor("C:/input.gdb/newPoints","*") as cursor: for row in cursor: #Do stuff with each row
Once we have access to every row of input data, we can insert the records into another layer, or even manipulate the attributes before loading. The issue with the wildcard attribute approach (asterisk) is knowing how to access the correct attribute of interest. If the attributes are specified, the SearchCursor will maintain the attribute order as listed so we know how to access them.
Append data with the InsertCursor
The great thing about the Append Tool is the ability to auto-match fields with the schema_type parameter. You can either specify the matching fields or if the field names/types are the same, allow the tool to take care of it for you. When working with SearchCursors and InsertCursors, we don’t have the same luxury. If you know your input and target layers have the same schema and field order you can use the wildcard approach. If not, you will need to define the fields.
Let’s assume our schemas are identical for now. How would we load into our target layer? Let’s add the InsertCursor to our working sample. We will set up our new InsertCursor at the beginning, and with one more line of code inside the SearchCursor loop, we can add the data to our target dataset.
# Create an InsertCursor on our Target dataset targetCursor = arcpy.da.InsertCursor("C:/out.gdb/Points","*") # Setup a search cursor on our new data and interate the rows with arcpy.da.SearchCursor("C:/input.gdb/newPoints","*") as cursor: for row in cursor: #Insert into the target dataset (only Points or tabular data if using an asterisk) targetCursor.insertRow(row)
If we are loading point data or table data, this will work without issue if the fields list and field order match. However, there is a small issue with the default SHAPE field when updating multi-point, line, or polygon data.
A SHAPE issue?
This is technically documented, but it can be confusing why our previous sample would work on tables and point layers, then suddenly fail (silently) on a line or polygon layer. It will finish running with other attributes populated, but previewing the map will not return any spatial data.
Why does this happen? Performance defaults. Accessing geometry can be an expensive operation, so the default Geometry token is set to SHAPE (SHAPE@XY) to save precious time. This means if we want to load point data, the performance will be much better than for other geometry types. Yes, even faster than the Append Tool itself.
When using * (field list), geometry values will be returned in a tuple of the x,y-coordinates (equivalent to the SHAPE@XY
Again, this default is fine if inserting points with an InsertCursors (and non-existent for tables). But polygon, polyline, or multipoint features can only be created using the SHAPE@ token which will cost us some performance and extra lines of code.
SHAPE it up
We can deal with this SHAPE issue in many ways. The overall goal is to replace the SHAPE (SHAPE@XY) token reference with the full geometry SHAPE@ token. We can grab the attributes with
arcpy.Describe or the
arcpy.ListFields function. ListFields is
faster, while Describe provides multiple properties, such as data type, fields, indexes, and many others. Once we have the
fields, we pass into the cursors. Again, still assuming the fields are the same. If there is a change the order is different,
just sort the list.
inFC = "C:/input.gdb/newRoads" #Collect all fields except the Geometry field lstFields = [field.name for field in arcpy.ListFields(inFC) if field.type not in ['Geometry']] lstFields.append("SHAPE@") # add the full Geometry object # Similar to previous with fields specified targetCursor = arcpy.da.InsertCursor("C:/out.gdb/Roads",lstFields) with arcpy.da.SearchCursor(inFC,lstFields) as cursor: for row in cursor: targetCursor.insertRow(row)
We now have some raw code to append data using ArcPy cursors. From here we could start refining the code:
- Check the datatype in advance to see if we need to change the SHAPE geometry
- Verify the fields match b/w the inputs and target
- Append multiple datasets
- Manipulate attributes before loading
- Optionally remove other read-only fields if you aren’t using them (ObjectID, shape.length, shape.area)
- These are ignored if you try to update anyway
Example: Append multiple datasets
Here is one additional example where we add multiple datasets to the append list. This example still assumes all the layers have an identical schema. Since we are loading lines, we will update the Geometry object.
import arcpy lstFCs = ["C:/input.gdb/Road_North", "C:/input.gdb/Road_South", "C:/input.gdb/Road_East", "C:/input.gdb/Road_West"] #List all the fields and add the full Geometry object lstFields = [field.name for field in arcpy.ListFields(inFC) if field.type not in ['Geometry']] lstFields.append("SHAPE@") # Set the target and Loop all input layers to insert targetCursor = arcpy.da.InsertCursor("C:/out.gdb/Roads",lstFields) for inFC in lstFCs with arcpy.da.SearchCursor(inFC,lstFields) as cursor: for row in cursor: targetCursor.insertRow(row) del targetCursor
Cover image: CC-BY Yohanes Sanjaya (Flickr) Jan 03, 2021. Modified from original.