Stay Tuned

I will be posting more Force.com content here soon.  I promise.  I would give that to you in writing but you’re not standing in front of me and I can’t seem to find a pencil.  I’ve been tied up with learning a wonderful new application – SCM – at my wonderful new job – FinancialForce.com, and also getting things together for my Dreamforce 14 session.  Speaking of Dreamforce, if you will be out at the conference and have not signed up for the session yet, please do so.  Details here:  https://success.salesforce.com/Ev_Sessions#/session/a2q30000000guoCAAQ .  And feel free to stop by and say hello after.  Once the conference is done, I’ll get back to writing some technical articles on here.  I have a few good articles planned out – I just need to find the time to write them now.  Until then – cheers, and happy coding.

Ch ch ch ch changes

I apologize for not having posted here in two weeks – I have recently switched jobs and the resulting churn has unfortunately eaten up much of my free time.  This week I started with FinancialForce.com as a Technical Architect on their SCM product.  I was out in San Francisco most of the week for orientation and new hire process type stuff.  I always love visiting out there but this time was extra special since my new employer is headquartered out there.  The rest of the team was really great and I am very much looking forward to becoming an integral part of the dev team there, hopefully for many years.  Stay tuned though – I will have a new post on how to grow great Force.com developers coming up shortly, probably within the next 3-5 days.  Happy coding!

Scripting Data Loads for Salesforce, Part 2

In Part 1 of this article, I talked about how you can set up your objects in Salesforce to be able to export them and keep relationships intact. In this follow-up post we will take a more in-depth look at the ANT build scripts, how to invoke the Salesforce Data Loader from ANT, the property file, how you authenticate against Salesforce and how to create shell scripts to make things as easy as possible. Finally, we’ll talk about how you can work this approach into your continuous integration or continuous deployment strategy.

First though, let’s quickly talk about our directory structure.

Directory Structure

We want to use a consistent directory structure when we set this up.  Why?  Because consistency is good. Lack of consistency leads to chaos and I can’t stand chaos.  You shouldn’t either.  You’re better than that.

Our root directory is where we will keep our ANT build script – the build.xml file.  Everything else is kept in subdirectories under the root.  My recommendation is to use the following structure:

datascripts/
lib/
properties/

Note – if you clone my github repository for these samples, you’ll notice that it is set up and structured exactly as I just described.  You can access this repository here:  SFDataloadScripting github repository.

In datascripts/ we will keep our SDL mapping files and also use this as a place to store pre-canned data for repetitive loads like you might do as part of a CI cycle.  The lib/ directory holds all our necessary JAR files and the properties/ directory holds our org-specific property information.  Now that you have your directories organized, we can dig into the ANT setup.

ANT Setup and Configuration

If you’ve never used ANT, you are missing out.  ANT is an open source scripting tool that is part of the Apache project.  You can learn more about it at the Apache ANT Project home page.  A full discussion of ANT is out of scope here but basically you define build steps via a set of XML instructions – you can link in external build tasks, chain dependencies, and even create your own external build tasks if you’re feeling especially enterprising and know your way around a Java IDE.  It is a way better replacement for the old make utility.  I know some of you older, err – “seasoned”, engineers out there just like myself remember make and the headaches associated with.  ANT rules.  Get it, install it, and let’s look at how we’re going to use it.

Importing the Salesforce DataLoader JAR file

The key to this whole thing working is the Salesforce Data Loader.  Not your grandfather’s data loader though, no sir.  We are going to leverage the JAR file that contains all of the working parts of the data loader, pull that into our ANT script, and leave that pesky old GUI desktop thing on the curb for next week’s trash pickup.

So how do we get to that JAR file?  There are two ways.  First, you can link to my github repository which contains all of these examples and pull it out of there.  I will try to keep this current but depending on how far in the future you are reading this post (hi, future Scott) it may be slightly out of date.  Therefore the second option is to download the Data Loader right from your Salesforce org and grab the JAR file from there.  You can find directions to get the Data Loader right here:  Installing Data Loader.  Once you’ve downloaded it and installed it, you’ll want to sniff around in the install directories and try to locate the file named dataloader-xx.x.x-uber.jar or something along those lines – again, this is what it’s named now but I have no control over what Salesforce names things two years from now.  The xx.x.x is the version number – the first two x’s should match the Salesforce org version (i.e. 29).

Ok, got your JAR file?  No?  Fine, I’ll wait – but make it snappy.  We have a lot more to cover here.

Ok, now that you’ve got the JAR file, we’re going to put it in the lib/ directory you created above.

ANT Property Files

In order to connect to Salesforce orgs, we need to define our login credentials somewhere.  The best place to do that is using ANT property files.  ANT will read these property files in as directed and use all of the settings within the file to do it’s thing.  So we will keep our username, password, security token, etc. inside property files and version those in our git repository.

Note – I do not recommend storing your production org credentials this way.  I know some of you will ignore this warning, but a better, albeit more tedious, process is to keep a production property file but leave the actual credentials blank and fill them in when you need to, but then do not version them.  Chances are you are not going to be running imports and exports from production too often, although your use cases for scripted data loads may obviously vary.

So create a new file in the properties/ directory and call it sandbox.properties.  You can change this to whatever you want – the name is not important.  In the past I’ve named my properties files to match the sandbox name (prod.properties, qa.properties, etc.).  Open this new file in your favorite text editor and add the following lines:

sf.username=
sf.password=
sf.passwordencrypted=
sf.orgserver=
orgname=
sf.server=

These will get set as follows:

  • sf.username: This is set to the user that will be connecting to Salesforce
  • sf.password: This is set to the password that will used for the user connecting to Salesforce. Note that it has to include the user’s security token on the end of it as well – it will not work without.
  • sf.passwordencrypted:  This is created once for a given password + security token combination – we will cover how to do this in a moment.
  • sf.orgserver:  The Salesforce pod indicator (na8, cs12, etc.) for the org server.
  • sf.server:  https://test.salesforce.com for sandboxes, https://login.salesforce.com for production or developer orgs
  • orgname:  The logical name of the org.  Think “qa”, “prod”, etc.  The export targets will create a directory named this when you export so that you can keep exports from different orgs separate.

Once you have this configured, you should have a sandbox.properties file that looks something like the following:

sf.username=my.user@mydomain.com.qa
sf.password=fluffybunnyKkubu6yfhDpf326j9GFAf4Jh
sf.passwordencrypted=
sf.orgserver=cs9
sf.server=https://test.salesforce.com
orgname=qa

Encrypted Passwords

So now we will create our encrypted password and then our property file will be complete.  To handle creating our encrypted passwords, there is a utility in the bin/ directory of the Data Loader – two versions, one for *nix / Mac OSX, and one for Windows.

First we need an encryption key, which is basically the salt value we provide to the key generator.  Open a command prompt or terminal window, and change directory to your Data Loader install directory, then bin/ underneath.  Run the following command (Windows users will run the encrypt.bat version obviously):

./encrypt.sh —g <seedtext>

<seedtext> is any String value you provide.  It will spit out a value – copy this value and paste it into a file called dataloader_encryption_key.txt and put this file under datascripts/ in your root directory.  You will need this file to be there when you run your import/export tasks from ANT.

Change directory back to your Data Loader install directory, then bin/ underneath, and run the following command to generate the actual encrypted password.  Note that [password+security token] is the value of your sf.password property (fluffybunnyKkubu6yfhDpf326j9GFAf4Jh in our example).

./encrypt.sh -e [password+security token] [path to your root]/datascripts/dataloader_encryption_key.txt

This will output a line similar to the following:

2014-05-28 17:28:05,382 INFO [main] security.EncryptionUtil main (EncryptionUtil.java:365) - 155fb2ed0bf65cc155fb2ed0bf655fb2ed0bf65cc1

Copy everything after the hyphen (155fb2ed0bf65cc155fb2ed0bf655fb2ed0bf65cc1 in the example above) into your sf.passwordencrypted field in the properties file.  Your final sandbox.properties file now looks like the following:

sf.username=my.user@mydomain.com.qa
sf.password=fluffybunnyKkubu6yfhDpf326j9GFAf4Jh
sf.passwordencrypted=155fb2ed0bf65cc155fb2ed0bf655fb2ed0bf65cc1
sf.orgserver=cs9
sf.server=https://test.salesforce.com
orgname=qa

Data Loader Process Template Files

Template files are used to provide a tokenized template from which the Data Loader process XML files can be created for a given org and object.  Since you may be running a script which is exporting or importing multiple tables, it is important to templatize this process XML file so that the tokens can be easily replaced on each invocation.  This replacement is worked into the build targets we looked at above. The tokens can be found by looking for the underscores bookending all capital words.  For example:  _ORGSERVER_.

These files should be placed in the datascripts/ directory under your root to match the naming conventions we’ve specified in the build targets.  Create each of the files below in that directory.

template-export-process-conf.xml

<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
   <bean id="exportObject" class="com.salesforce.dataloader.process.ProcessRunner" singleton="false">
      <description>Exports object records to a CSV file.</description>
      <property name="name" value="_OBJECT_Export"/>
      <property name="configOverrideMap">
         <map>
            <entry key="sfdc.endpoint" value="https://_ORGSERVER_.salesforce.com"/>
            <entry key="sfdc.username" value="_USERNAME_"/>
            <entry key="sfdc.password" value="_ENCRYPTED_PASSWORD_"/>
            <entry key="process.encryptionKeyFile" value="_ENCRYPTION_KEY_FILE_"/>
            <entry key="sfdc.debugMessages" value="true"/>
            <entry key="sfdc.debugMessagesFile" value="_DEBUG_MSG_FILE_"/>
            <entry key="sfdc.timeoutSecs" value="600"/>
            <entry key="sfdc.loadBatchSize" value="200"/>
            <entry key="sfdc.entity" value="_OBJECT_"/>
            <entry key="sfdc.extractionRequestSize" value="500"/>
            <entry key="sfdc.extractionSOQL" value="_EXTRACTION_SOQL_"/>
            <entry key="process.operation" value="extract"/>
            <entry key="process.mappingFile" value="_MAPPING_FILE_"/>
            <entry key="dataAccess.type" value="csvWrite"/>
            <entry key="dataAccess.name" value="_DATA_FILENAME_"/>
            <entry key="process.outputSuccess" value="_OUTPUT_SUCCESS_FILE_"/>
            <entry key="process.outputError" value="_OUTPUT_ERROR_FILE_"/>
         </map>
      </property>
   </bean>
</beans>

templates-upsert-process-conf.xml

<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
   <bean id="upsertObject" class="com.salesforce.dataloader.process.ProcessRunner" singleton="false">
      <description>Inserts object records from a CSV file.</description>
      <property name="name" value="_OBJECT_Upsert"/>
      <property name="configOverrideMap">
         <map>
            <entry key="sfdc.endpoint" value="https://_ORGSERVER_.salesforce.com"/>
            <entry key="sfdc.username" value="_USERNAME_"/>
            <entry key="sfdc.password" value="_ENCRYPTED_PASSWORD_"/>
            <entry key="process.encryptionKeyFile" value="_ENCRYPTION_KEY_FILE_"/>
            <entry key="sfdc.debugMessages" value="true"/>
            <entry key="sfdc.debugMessagesFile" value="_DEBUG_MSG_FILE_"/>
            <entry key="sfdc.timeoutSecs" value="600"/>
            <entry key="sfdc.loadBatchSize" value="200"/>
            <entry key="sfdc.externalIdField" value="_EXTERNAL_ID_FIELD_"/>
            <entry key="process.operation" value="upsert"/>
            <entry key="dataAccess.type" value="csvRead"/>
            <entry key="sfdc.entity" value="_OBJECT_"/>
            <entry key="dataAccess.name" value="_DATA_FILENAME_"/>
            <entry key="process.mappingFile" value="_MAPPING_FILE_"/>
            <entry key="process.outputSuccess" value="_OUTPUT_SUCCESS_FILE_"/>
            <entry key="process.outputError" value="_OUTPUT_ERROR_FILE_"/>
         </map>
      </property>
   </bean>
</beans>

Now let’s take a look at the ANT build file.

 A Simple ANT Build File

Now that we have our properties defined, we can set up our build file.  A full discussion of ANT targets and build file set up is again out of scope here but covered in great detail on the Apache ANT Project page.  Our build.xml file can stay fairly simple.  Open your text editor again and add the following lines to it.  We will add our build targets later – this is just setting up the task definitions for the external JAR files and specifying our environment.  Save this file as build.xml in your root directory.

<?xml version="1.0" encoding="UTF-8"?>
<project name="Tasks for import and export of Salesforce data" default="" basedir="." xmlns:sf="antlib:com.salesforce">

   <!-- Define required jars-->
   <path id="taskDependencies">
      <pathelement location="./lib/ant.jar"/>
   </path>

   <property environment="env"/>
</project>

ANT Build Targets for Data Loading

We’ll begin by taking the skeleton ANT file we built in the previous step and blowing it out to include everything we need for data importing and exporting.  The final file looks like the example below.  Note, the build.xml file in my github repository for this example is already set up like the following.

<?xml version="1.0" encoding="UTF-8"?>
<project name="Tasks for import and export of Salesforce data" basedir="." xmlns:sf="antlib:com.salesforce">
   
   <!-- this can be overwritten in your property file -->
   <property name="dataLoaderAPIVersion" value="29.0.0" />
   
   <!-- Define required jars-->
   <path id="taskDependencies">
      <pathelement location="./lib/ant.jar"/>
   </path>

   <property environment="env"/>

   <macrodef name="sfUpsert">
      <attribute name="object"/>
      <attribute name="basedir"/>
      <attribute name="orgname"/>
      <attribute name="username"/>
      <attribute name="password"/>
      <attribute name="orgserver"/>
      <attribute name="externalIdField"/>
      <attribute name="dataLoaderAPIVersion"/>
     
      <sequential>
         <echo message="Upserting @{object}"/>
         <mkdir dir="@{basedir}/orgs/@{orgname}/data"/>
         <copy file="@{basedir}/datascripts/template-upsert-process-conf.xml" tofile="@{basedir}/orgs/@{orgname}/data/process-conf.xml" overwrite="true" failonerror="true"/>
         <replace file="@{basedir}/orgs/@{orgname}/data/process-conf.xml">
            <replacefilter token="_ORGSERVER_" value="@{orgserver}"/>
            <replacefilter token="_USERNAME_" value="@{username}"/>
            <replacefilter token="_ENCRYPTED_PASSWORD_" value="@{password}"/>
            <replacefilter token="_ENCRYPTION_KEY_FILE_" value="@{basedir}/datascripts/dataloader_encryption_key.txt"/>
            <replacefilter token="_DEBUG_MSG_FILE_" value="@{basedir}/orgs/@{orgname}/data/debug.log"/>
            <replacefilter token="_OBJECT_" value="@{object}"/>
            <replacefilter token="_DATA_FILENAME_" value="@{basedir}/datascripts/datafiles/@{object}Insert.csv"/>
            <replacefilter token="_MAPPING_FILE_" value="@{basedir}/datascripts/mappingfiles/@{object}InsertMap.sdl"/>
            <replacefilter token="_OUTPUT_SUCCESS_FILE_" value="@{basedir}/orgs/@{orgname}/data/@{object}Ins_success.csv"/>
            <replacefilter token="_OUTPUT_ERROR_FILE_" value="@{basedir}/orgs/@{orgname}/data/@{object}Ins_error.csv"/>
            <replacefilter token="_EXTERNAL_ID_FIELD_" value="@{externalIdField}"/>
            <replacefilter token="_PROCESS_NAME_" value="@{object}UpsertProcess"/>
         </replace>
 
         <java classname="com.salesforce.dataloader.process.ProcessRunner" classpath="@{basedir}/lib/dataloader-@{dataLoaderAPIVersion}-uber.jar" failonerror="true">
            <sysproperty key="salesforce.config.dir" value="@{basedir}/orgs/@{orgname}/data"/>
            <arg line="process.name=upsertObject"/>
         </java>
      </sequential>
   </macrodef>

   <macrodef name="sfExport">
      <attribute name="object"/>
      <attribute name="basedir"/>
      <attribute name="orgname"/>
      <attribute name="username"/>
      <attribute name="password"/>
      <attribute name="orgserver"/>
      <attribute name="soql"/>

      <sequential>
         <echo message="Exporting @{object}"/>
         <mkdir dir="@{basedir}/orgs/@{orgname}/data"/>
         <copy file="@{basedir}/datascripts/template-export-process-conf.xml" tofile="@{basedir}/orgs/@{orgname}/data/process-conf.xml" overwrite="true" failonerror="true"/>
         <replace file="@{basedir}/orgs/@{orgname}/data/process-conf.xml">
            <replacefilter token="_ORGSERVER_" value="@{orgserver}"/>
            <replacefilter token="_USERNAME_" value="@{username}"/>
            <replacefilter token="_ENCRYPTED_PASSWORD_" value="@{password}"/>
            <replacefilter token="_ENCRYPTION_KEY_FILE_" value="@{basedir}/datascripts/dataloader_encryption_key.txt"/>
            <replacefilter token="_DEBUG_MSG_FILE_" value="@{basedir}/orgs/@{orgname}/data/debug.log"/>
            <replacefilter token="_OBJECT_" value="@{object}"/>
            <replacefilter token="_DATA_FILENAME_" value="@{basedir}/orgs/@{orgname}/data/@{object}Export.csv"/>
            <replacefilter token="_MAPPING_FILE_" value="@{basedir}/datascripts/mappingfiles/@{object}ExportMap.sdl"/>
            <replacefilter token="_OUTPUT_SUCCESS_FILE_" value="@{basedir}/orgs/@{orgname}/data/@{object}Export_success.csv"/>
            <replacefilter token="_OUTPUT_ERROR_FILE_" value="@{basedir}/orgs/@{orgname}/data/@{object}Export_error.csv"/>
            <replacefilter token="_EXTRACTION_SOQL_" value="@{soql}"/>
         </replace>

         <java classname="com.salesforce.dataloader.process.ProcessRunner" classpath="@{basedir}/lib/dataloader-@{dataLoaderAPIVersion}-uber.jar" failonerror="true">
            <sysproperty key="salesforce.config.dir" value="@{basedir}/orgs/@{orgname}/data"/>
            <arg line="process.name=exportObject"/> 
         </java>
      </sequential>
   </macrodef> 
   
   <target name="initDataDirectory" description="Removes and re-creates a blank data directory in orgs/${orgname}"> 
      <delete dir="${basedir}/orgs/${orgname}/data"/>
   </target> 

   <target name="exportData" description="Export Salesforce objects"> 
      <sfExport object="${object}" orgserver="${sf.orgserver}" basedir="${basedir}" orgname="${orgname}" dataLoaderAPIVersion="${dataLoaderAPIVersion}" username="${sf.username}" password="${sf.passwordencrypted}" soql="${soql}" /> 
   </target> 

   <target name="upsertData" description="Upsert Salesforce objects">
      <sfUpsert object="${object}" prefix="${prefix}" orgserver="${sf.orgserver}" basedir="${basedir}" orgname="${orgname}" dataLoaderAPIVersion="${dataLoaderAPIVersion}" username="${sf.username}" password="${sf.passwordencrypted}" externalIdField="Load_ID__c" />
   </target>

   <target name="updateDataFiles" description="Refreshes the data files found in datascripts/datafiles from a specified org's data directory. This expects the exportData target to have been run first to generate output CSV data load files. Invoke with -Dorg=[org name, ex. qa] to specify the org to copy files from." >
      <copy todir="${basedir}/datascripts/datafiles"> 
         <fileset dir="${basedir}/orgs/${org}/data" includes="**/*Export.csv"/>
         <mapper type="glob" from="*Export.csv" to="*Insert.csv"/> 
      </copy> 
   </target> 
</project>

There are a few targets and two macros we’ve defined.  Let’s quickly go through each and review what they do.

Macros

Macros are pre-defined steps to invoke the DataLoader ProcessRunner command, which is a Java program.  The macro creates a copy of the correct process template file (see previous section), and does token replacement on all the values you see there, using the values passed into the macro.  Most of these come from either the ANT property file.  The rest are passed in at run time from the command line.  The Java program is then invoked with the process file and it handles the rest from there.  Two flavors of macro are needed because each specifies different tokens which must be replaced in their respective process files.

  • sfExport - Used for exporting data.
  • sfUpsert - Used for usperting data.  Note that the upsert macro expects data CSV files to live in the datascripts/datafiles directory under your root.  This is so that you can store canned CSV data files in your git repository and load those over and over again.  If you are exporting from another org and want to use those exported files you need to copy them from the orgs/[orgname]/data directory up to datascripts/datafiles.   Luckily there is a target for this too.

Targets

The other targets are standard ANT targets – they chain together one or more steps.

  • initDataDirectory - simply creates a new org-specific temp/holding subdirectory under the orgs directory. Running this before other commands will ensure a place to put output from the commands you run.
  • exportData - invokes the sfExport macro, passing in all relevant properties.  Note the soql property here – we use this to allow the user running the script to specify custom SOQL for exporting the data from a given object.  This can allow for custom filtering.
  • upsertData – invokes the sfUpsert macro, passing in all relevant properties.
  • updateDataFiles - as mentioned above, upsert expects CSV data files to be in datascripts/datafiles.  This target will copy all *Export.csv files from orgs/[orgname]/data to datascripts/datafiles and rename them to *Insert.csv so you’re all set.

One more note – those of you who are fluent and dangerous with ANT can adjust the directory structure and storage / naming conventions however you like.  I have found this approach works for me, as it allows me to have canned CSV files in datascripts/datafiles that I version off in my git repository, but also allows me to export data on the fly from one org, then run updateDataFiles to get those files in the correct location to be able to upsert.  Your mileage may vary.  Feel free to change this around to suit your individual needs.

Running The Export and Import

Alright, now we come to the payoff.  Hang in there, we’re almost done.  Hey, you – in the back, you’re not asleep are you?  Good.  Let’s put everything we’ve done so far together and get some data out of Salesforce.  We are going to use our very simple account and contact example that we’ve been working with above.  Before we do, double check to make sure everything is ready:

Quick Checklist

  • Add Load ID fields to all objects you want to import in your target org?
  • Create Export and Insert SDL mapping files for all objects you want to export and import?
  • Set up your properties file and created an encryption key and your encrypted password?
  • Have your export / upsert process template files ready to go?
  • Have the ANT build.xml file ready to go?
  • Have some data in your source org?

Ok.  Now all we need to do to run the import is to open a command or terminal window, and run the following three ANT commands:

ant initDataDirectory -propertyfile properties/sandbox.properties
ant exportData -propertyfile properties/sandbox.properties -Dobject=Account -Dsoql="SELECT Id, Name FROM Account"
ant exportData -propertyfile properties/sandbox.properties -Dobject=Contact -Dsoql="SELECT Id, FirstName, LastName, AccountId FROM Contact"

Note that you should replace the properties file name with your actual property file name here.  But you knew that already, right?

When these commands run, you will see them spit out a whole bunch of verbose information about the execution of the export.  Most of it you can ignore unless you didn’t get any data exporting out.  What you should see is a line similar to the following towards the end of each exportData run:

2014-08-20 10:53:47,458 INFO [ContactExport] progress.NihilistProgressAdapter setSubTask (NihilistProgressAdapter.java:78) - Processed 22 of 22 total records. Rate: 26,400,000 records per hour. Estimated time to complete: 0 minutes and 0 seconds. There are 22 successes and 0 errors.
2014-08-20 10:53:47,459 INFO [ContactExport] progress.NihilistProgressAdapter doneSuccess (NihilistProgressAdapter.java:63) - The operation has fully completed. There were 22 successful extractions and 0 errors.

You want to see successful extractions and zero errors.  If there were errors, you can scroll up a little and it will tell you what went wrong.

You should now have two files in the orgs/sandbox/data directory.  Well, actually more than two.  A whole lot more than two.  But you can ignore most of them - the two you're concerned with are the CSV export files.  You'll see AccountExport.csv and ContactExport.csv.  Open these up, check them out and you should see your data in there.  Sweet!  The other files in that directory are things like the object-specific process XML file that the data loader uses, success and error reports, etc.  If something goes wrong check the error files here to get more information.

A quick note about the -Dsoql property.  You can specify whatever SOQL you want here to extract your data - the data loader will fire that SOQL query, so if you have a WHERE clause in there to get a very specific subset of data, you can do that.  You just need to ensure that you list ALL fields that you want to export and that the list of fields matches your export SDL mapping file.  If they do not match, you will get an error.

Alright, so now we have data.  We can now take these files and either manually move them up to datascripts/datafiles and rename to AccountInsert.csv and ContactInsert.csv or use the updateDataFiles target to do it automatically for us.  I'm lazy so I'll use ANT.  Just like this:

ant updateDataFiles -Dorg=sandbox -propertyfile properties/sandbox.properties

This should copy both files up and if you look under datascripts/datafiles you should now see AccountInsert.csv and ContactInsert.csv.  Awesome.  We're ready to import this data into another org.

Since I don't want you to have to go to the trouble of setting up a new org and creating Load ID fields on everything and all that, we're going to take a small shortcut here (you can of course create a brand new dev org if you want and set up a new property file - have at it).  We can edit the two files we just created and modify their data so that we are actually creating new records.  Go into the AccountInsert.csv file and delete all but one entry.  Change the first character of the ID field value to something else, and change the name of the account so that it doesn't match any of your existing accounts.  Now go into the ContactInsert.csv file and delete all but two of the entries.  Change the names to something new and then update the AccountId fields to match the modified account ID that you just saved in AccountInsert.csv.  Once you are done with this, run the import as follows:

ant upsertData -propertyfile properties/sandbox.properties -Dobject=Account
ant upsertData -propertyfile properties/sandbox.properties -Dobject=Contact

This will run and spit out a whole bunch of stuff and at the end you should see something like this:

2014-08-20 11:07:57,250 INFO [ContactUpsert] progress.NihilistProgressAdapter setSubTask (NihilistProgressAdapter.java:78) - Processed 22 of 22 total records. Rate: 26,400,000 records per hour. Estimated time to complete: 0 minutes and 0 seconds. There are 22 successes and 0 errors.
2014-08-20 11:07:57,252 INFO [ContactUpsert] progress.NihilistProgressAdapter doneSuccess (NihilistProgressAdapter.java:63) - The operation has fully completed. There were 22 successful upserts and 0 errors.

Go check your org and you should see the new Account you created, and both new Contacts in its related list.  You did it!

One more note about imports - you need to run the imports in an order that respects relationship dependencies.  By this I mean that you can't upsert Contacts without having upserted Accounts first.  Don't believe me?  Try it out.  You will get an error because your Contact CSV file is referring to Accounts which do not exist yet.  So you need to be careful about what order you run the upserts in.  Exports do not matter, but ordering of upserts most definitely does.  Now, once that Account record is in there, then you could run the Contact upsert without first running the Account upsert, because the record is already there.

Errors

Errors sometimes occur when exporting or importing.  A lot of times errors are caused by one of the following:

  • The field you are trying to export does not exist.  Are you exporting from the correct org?  Are you sure?
  • The name of the field in your SOQL is incorrect.
  • The name of the field in your mapping file is incorrect.
  • You forgot to add the Load ID field to your target object.
  • Triggers are firing on import and causing governor limit exceptions.  This can happen with especially large data sets - disable your triggers before importing to be safe.

Creating Export and Import Scripts to Speed Things Up

I'm not going to spend a ton of time here, but suffice it to say that you can create shell scripts to chain all of your ANT commands together rather than running them one by one.  You can define SOQL for each export and fine tune the WHERE clauses to filter data however you'd like.  Then running a full data model export becomes as simple as invoking a single shell script.

Integrating Data Import / Export into CI / CD Strategy

Needless to say, you can probably imagine how this could be useful for a continuous integration / continuous deployment strategy to pre-load your orgs with canned data.  Your unit tests should never be running against canned data of course, but let's say you have a CI or CD build set up for the dual purpose of having an environment that is always up to date with the latest metadata from your git repository, and a set of data for you to test with as well.  The possibilities are really endless here and I leave it as an exercise to you, gentle reader, to experiment here and let me know different ways you are using it.

Summary

Thank you for bearing with me - this is not a simple topic by any means.  I hope you have found this approach useful and are able to turn loading data into your development and test sandboxes into a relatively trivial effort after some initial setup.  I welcome comments, questions and suggestions on improvements to the process.  Happy coding!

 

Scripting Data Loads for Salesforce, Part 1

For my first official post I figured I’d cover a topic near and dear to my heart – scripting data loads for the Force.com platform. As those of you who’ve worked with Salesforce know, the Data Loader is available for you to run exports and imports of data against production orgs and sandboxes, and of course you have the Import Wizard for your more standard type objects – leads, contacts, etc. But in my experience, the most powerful way to do data import/export from a development standpoint is to script those tasks using ANT and the Salesforce Data Loader. Because this is a fairly involved topic, I’m going to split this into two posts. We’ll spend the first post talking about setting up our objects for data loading and creating our mapping files, and then the follow up post will go over the ANT targets and running everything. Alright, enough yapping – let’s get started.

Before you can begin scripting the data loader, you should be familiar with the concepts of upserts and Salesforce external IDs. Why are these important? Because for all but the most simplistic flat data model with no relationships between objects, you will have object relationships that you want to preserve, both when you are exporting the data, as well as when you import that data back into another org. Data loading is very easy and straightforward without relationships. Relationships add a wrinkle that we need to do some creative magic to get around. But fear not – it is all possible.

Upserting allows us to re-run a data import multiple times with no fear that we will be creating duplicate records on successive runs. It also allows us to match on an external ID for this duplicate check, which is what we want for our relationship resolution. We’ll get to that in a moment.

External IDs are fields that Salesforce allows you to use as an ID match when upserting records. External IDs are critical to preserving our data relationships when exporting. We will see exactly how very shortly. The thing to remember now is that you will need to make modifications to any of your objects that you wish to script to allow for an external ID which points to the Salesforce ID.

Setting Up Your Objects For Data Loading

Aside from a full-copy sandbox, Salesforce object IDs will not match between production and a sandbox. Therefore we need a way to ensure that our object relationships are preserved when exporting from one org and then importing into a different org.

As an example, assume we have a simple two object hierarchy: Accounts and Contacts. Accounts can have many contacts, defined via a Lookup field on the Contact object which points to Account. We have two accounts and four contacts in the org, with the following information:

Account 1: ABC Computers, Salesforce ID = 001F000000yHO3ZIAW
Account 2: XYZ Consulting, Salesforce ID = 001F000000yHO3ZIAX
Contact 1: John Jones, account = ABC Computers, Salesforce ID = 003a000000xJP2QIAW
Contact 2: Jane Peters, account = ABC Computers, Salesforce ID = 003a000000xJP3QIAW
Contact 3: Jim Smith, account = XYZ Consulting, Salesforce ID = 003a000000xJP4QIAW
Contact 4: Maria Brown, account = XYZ Consulting, Salesforce ID = 003a000000xJP5QIAW

If we export these records, we end up with the following CSV files – keep in mind that lookup fields are really ID references and as such will export the Salesforce ID with that record.

Account Extract CSV

ID,Name
001F000000yHO3ZIAW,"ABC Computers"
001F000000yHO3ZIAX,"XYZ Consulting"

Contact Extract CSV

ID,FirstName,LastName,AccountId
003a000000xJP2QIAW,John,Jones,001F000000yHO3ZIAW (account ID for ABC Computers)
003a000000xJP3QIAW,Jane,Peters,001F000000yHO3ZIAW (account ID for ABC Computers)
003a000000xJP4QIAW,Jim,Smith,001F000000yHO3ZIAX (account ID for XYZ Consulting)
003a000000xJP5QIAW,Maria,Brown,001F000000yHO3ZIAX (account ID for XYZ Consulting)

Now if we import the accounts, which has to happen first since the contacts refer to the accounts, they import into the sandbox with different IDs as such:

Account 1: ABC Computers, Salesforce ID = 001g000000yCW4xIAN
Account 2: XYZ Consulting, Salesforce ID = 001g000000yCW4yIAN

But this will cause problems when we go to import the contacts, as none of the contacts has an Account ID value of either of these two new IDs. That’s because the IDs are not unique across org instances. Therefore we must make a provision to use the IDs from the original org as Salesforce External ID fields. We do this by adding a special field named Load_ID__c to every object we want to import and export. The field definition for this field is simple:

Data Type: Text (18)
External ID: Yes

This now allows us to specify the ID of the original org in this new field and then use that to link the items together on import. Again, an example will help to clarify this.
When we set up the mapping files, which we will do below, we will indicate to map the ID of the object into the Load_ID__c field. Doing so produces extract files that look as follows:

Account Extract CSV – After Adding Load_ID__c

ID,Name,Load_ID__c
001F000000yHO3ZIAW,"ABC Computers",001F000000yHO3ZIAW
001F000000yHO3ZIAX,"XYZ Consulting",001F000000yHO3ZIAX

Contact Extract CSV – After Adding Load_ID__c

ID,FirstName,LastName,AccountId,Load_ID__c
003a000000xJP2QIAW,John,Jones,001F000000yHO3ZIAW,003a000000xJP2QIAW
003a000000xJP3QIAW,Jane,Peters,001F000000yHO3ZIAW,003a000000xJP3QIAW
003a000000xJP4QIAW,Jim,Smith,001F000000yHO3ZIAX,003a000000xJP4QIAW
003a000000xJP5QIAW,Maria,Brown,001F000000yHO3ZIAX,003a000000xJP5QIAW

Now when we go to do the import, we import the accounts first and they import in as follows:

Account 1: ABC Computers, Salesforce ID = 001g000000yCW4xIAN, Load_ID__c = 001F000000yHO3ZIAW
Account 2: XYZ Consulting, Salesforce ID = 001g000000yCW4yIAN, Load_ID__c = 001F000000yHO3ZIAX

And then we set the contact mapping up to map the Account ID field to the Load_ID__c field of the parent. This will be demonstrated in the mapping files section below, but when we do this we match the ID of the Account field in the extract file to the Load_ID__c in the Account record and everything matches up correctly, and our records import. Woohoo!

Creating Mapping Files

So now that we know how it works and how to get things matching up nicely, we need to set up mapping files.  Mapping files are the means by which the DataLoader maps a Salesforce field to a CSV column for exports and vice versa for imports.  These are actually fairly simple to set up, but if you have a large object model this exercise can become tedious.  However this only needs to be done once for the full object.  After the initial mapping is set up, you just need to add additional fields as your objects are updated.

In practice, we need two different mapping files for each object – this is because the way you map the relationship fields for importing data is a little bit different.  Let’s go back to our example and see what import/export files would look like for both the Account and Contact objects.

Before we do so however, it’s important to note that the ANT build targets we will be defining for importing and exporting are going to expect a particular naming convention.  This is so that they can function generically against any object.  Sure you could name them whatever you want but then you lose the flexibility of the ANT targets.

So to keep things working generically, we will name the import files as follows: [object]InsertMap.sdl and we will name the export files like this:  [object]ExportMap.sdl.  For our two example objects we have been working with, this will translate to AccountInsertMap.sdl, AccountExportMap.sdl, etc.  If your mapping files do not follow this naming convention, the data loader will fail unless you modify the ANT targets I show you.

Enough about that, on to the examples.

AccountExportMap.sdl

Id=LOAD_ID__C
Name=Name

AccountInsertMap.sdl

LOAD_ID__C=Load_ID__c
Name=Name

ContactExportMap.sdl

Id=LOAD_ID__C
Name=Name
AccountId=AccountId

Note there is a small peculiarity here – for Salesforce standard objects a lot of times the relationship fields are named with [object name]Id such as AccountId, ContactId.  This is slightly counterintuitive as custom objects just refer to the actual field name.  For example if you had a custom object Widget__c and it had a lookup to an account, you’d typically name that field Account__c.  Account__c is then what you’d put in your mapping file.  In the case of standard objects, the API name of the related object is not always a direct match or obvious – check your metadata explorer in the workbench to get the exact name if you’re unsure.

ContactInsertMap.sdl

LOAD_ID__C=Load_ID__c
Name=Name
AccountId=Account:\Load_ID__c

The insert map is important to look at because of the syntax for representing relationships.  Whenever a relationship is needed, you need to specify the parent object followed by a colon, a backslash, and then Load_ID__c (remember the external ID mapping).  Standard object fields can simply be referred to as is (note Account:\Load_ID__c above) but custom objects will have to be referred to with a __r extension instead of __c, since you are referring to the parent object (think SOQL syntax here). An example of a custom object relationship mapping would be:

Custom_Object__c=Custom_Object__r:\Load_ID__c

In all cases, the field names on the left must match your column names in the CSV data file.

That gives us enough to get started with. Your homework is to set up your objects and get those mapping files created for all of your objects. Next post we will dive into the ANT scripts and properties files and then pull everything together. Happy coding till then!

Welcome to my corner of the cloud

Hi everyone.  Welcome to my corner of the cloud.  I will be posting some meaningful content up here very soon but in the meantime I wanted you to welcome you to my blog.  I hope that this will become a useful place for you to find technical ramblings and tidbits of advice about all facets of software architecture and engineering on the Force.com platform.