Friday, April 26, 2024
No menu items!
HomeData IntegrationHow to automate importing data from CSV files

How to automate importing data from CSV files

Read Time: 8 minutes

When it comes to moving data, CSV files are inescapable. Critical business processes like order management, lead management, payroll processing, customer or partner onboarding, data pipelines, and more can require importing data from CSV files into one business app or another.  

While business applications do provide tools to import CSVs, the process of manually downloading CSV files and importing them is tedious, and limited with what you can do.

For one thing, manually downloading and importing files is slow and inconsistent. You’re relying on a person to be aware whenever a new file is ready, and to be available to do the work, so it’s difficult to consistently meet standards for timeliness. With time-sensitive data like leads, a day or two of lag time can mean the difference between a viable opportunity and even more wasted time chasing a prospect who’s already moved on with a competitor who got there faster.

For another, most CSV importers are “dumb”. They don’t know how to deal with data that’s missing, incorrectly formatted, or irrelevant, so manually importing CSV data without quality control can clog up your apps with inaccurate, duplicate, or unactionable data.

To work successfully with CSVs you need to build a smart, customized importer to pick up files from any source, extract the raw data, and transform it into a useful format. To show you how to do this, we’ll work through an example.

Lead ingestion – moving leads from CSV files to Salesforce

Lead data comes from a wide range of sources. Cloud apps like Marketo, Hubspot and Unbounce output leads from emails and landing pages that can be captured, enriched and routed in real time. However, leads might also come from less manageable channels. For example, if you’ve ever scanned badges at a tech conference, you’ve probably received a huge CSV file full of lead data after an event. Similarly, many virtual event platforms also output lists of attendees as CSV files.

While leads that arrive by CSV can’t be dealt with in real-time, it is possible to streamline your process to automatically pick up new files, parse out the relevant data, and put your event leads through the same enrichment and routing pipeline as leads from real-time sources. Let’s look at how to turn a raw CSV file into clean, usable data.

Start with a sample dataset 

To develop and test our automation, we need some test data to work with. You probably already have a few sample files you can work with. If not, you can use a data-mocking tool like Mockaroo. Just make sure the format of your file closely matches the production data that you want to process. For this example, we have a CSV file with five columns:

ID – a row number
Email
Name
Company
Date

For a production use case, each file may have thousands, or tens of thousands of rows, but it’s okay for your sample data set to be smaller while you build and test your recipe.

Automate file uploads

To automate uploading your CSV files, you need to consider where they will come from. Files can arrive via a number of routes:

Via an FTP or SFTP server
Attached to an email via the Gmail or Outlook apps
Attached to a Slack message
Via a cloud drive like Google Drive, Box, or Dropbox
Cloud storage apps like Amazon S3, Google Cloud Storage, or Azure Blob Storage,

For this example, we’ll pick up CSV files from the Google Drive app. You can make a free Google Drive account to follow along exactly with this tutorial, but the same basic steps will apply, whatever the source of your CSV files.

Create a new recipe

Set up your new recipe, selecting Trigger from app.

Select the Google Drive app.

A corporate drive might have hundreds of files added a day, so we’ll choose the New file or folder in folder hierarchy trigger. This way we can narrow our focus only to files in a single folder.

If you don’t already have an active connection to Google Drive, you’ll need to sign in to access your drive. Once you’re connected, select the folder your CSV files will be uploaded to.

Get the right file at the right time 

A key advantage to automating the upload process is being able to get the right file every time. Over time, a Google Drive folder may have hundreds or thousands of almost identically named files. It’s easy for a person to accidentally grab the wrong one. Our trigger explicitly captures files only as they are added to a particular folder, so you always get the right file straight away.

You can also set a trigger filter to check for the correct file naming convention. For example, you can filter for only files where the filename contains event_leads_. This way, the recipe will pick up a file named event_leads_2022-02-01.csv but not purchase_orders.csv.

Test

Let’s test that our trigger can successfully pick up a CSV file from Google Drive. First, set the recipe to test mode.


Upload the CSV file of mock data to the correct folder in Google Drive. If necessary, make sure to rename the file to match the naming convention you set up in the trigger filter.

Switch back to Workato and click Check now

You should see a successful test. Check the filename to make sure the trigger has picked up the correct file.

If your test doesn’t work, check the following:

You’re connected to the right Google Drive account
You uploaded the file to the directory selected in the trigger setup
The name of the file matches the naming convention you set up in the trigger filter

Extract useful data

Access the CSV app

The CSV by Workato app lets you process a CSV file from any source, or create a new CSV file from any data. We’ll use it to extract usable data from our raw CSV file.

Create a new step and select Action in app

Select the CSV by Workato app.

Choose the Parse CSV action.

We need to provide the raw contents of the CSV file to be parsed. This part will look a little different depending on where your CSV file originates. In this case, select File Contents from the Google Drive trigger.

Configure the CSV parser

Now that we’ve got our CSV file, we need to extract the data from it and make it actionable. To do this, we need to be able to describe how the data is structured. There are three important things we need to know about our CSV data:

What character separates each column? This is usually a comma, but can sometimes be another character, such as tab, semicolon (;) or pipe (|)

How can we know what type of data is in each column? There are two ways CSV data can be structured:
By column order – this method relies on you knowing that column 2 contains the email address, column 3 contains the name, etc. This method relies on the column order of each file you process being the same.
By column name – this method relies on the CSV file having a header row which contains the names of each column. This method relies on having consistent column names in each file you process.

What quote character is used? Sometimes a value in a CSV file will contain a comma, which is also used to separate columns. To keep the file valid, the value is enclosed in single or double quotes (” or “”).

Let’s have a look at the raw CSV file used as a sample.

We can see that:

Columns are separated by a comma
We have a header row specifying the column names
Double quotes are used to enclose values that contain commas

Use this information to configure the Parse CSV action:

Test

To test our parser configuration, switch back to the Test tab. No need to upload a new file, we can just repeat the previous test to use the file we’ve already captured.

You should see a successful test. Check the output of the Parse CSV step. There should be a list of lines, with 1000 entries. Each entry has properties that match the header line of the CSV:

id
email
name
company
date

Use the data in your recipe

We’ve turned our raw CSV file into a list of 1000 individual records, each representing a single lead. Now, we can use that data as we would any other list. A couple of examples of what we might do from here:

Upload leads in bulk to a CRM

If we’re confident in the quality of the data in the CSV file, we can upload the data directly to our CRM using a batch action.

Cleanse, validate and transform data

However, we may not be completely confident in the quality of the source data in our CSV file. In this case, we can use a Repeat action to cleanse and validate the data line by line.

Learn more: other ways to access CSV data

We’ve looked at how to use the CSV Parser app to extract data from a CSV, since this will work no matter where your CSV file comes from. 

Some apps also have convenience methods to make it easier to work with CSV data. For example, the Dropbox connector has specialized triggers for capturing new/updated CSV files or even new lines in an existing file.

Remember that to use specialized triggers like these, you still need to provide information about the CSV file, like header row, column separator, etc. So as in the example above, make sure you check an example of the raw CSV file so that you can correctly set up the trigger.

 

The post How to automate importing data from CSV files appeared first on Workato Product Hub.

Read MoreWorkato Product Hub

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments