Automating tasks with Capybara and Poltergeist (on Heroku)

 The Problem

As the solo dev at a company that uses a lot of GUI tools to manage their work, I’ve been tasked with building tools to help automate their workflows. They must be easy to use since the tool may be used by contractors. They should be web based since they need to support multiple users who may be on Window or Mac devices.

One of our biggest problems is that product data needs to be exported into multiple applications, a very labor intensive process. Our product data exists in CSV format. Since our listings must be added on Wordpress via a GUI it’s currently an incredibly time consuming process. Additionally, each product must also be added to the CMS separately. My task was to build an app that a user could upload a CSV to and it would automatically create the listing in both Wordpress and the CMS. The CMS was reasonably straightforward as it provided an API that I could submit POST requests to; the challenge was Wordpress.

 Attempt #1: Mechanize

I was familiar with using Capybara to write my acceptance tests, and initially thought using it with a driver that would let me visit remote hosts would be the ideal option. However, after looking at the options I decided to go with Mechanize since it provided a very nice looking DSL and great documentation.

I got Mechanize running and was able to get most of the form elements filled out but ran into an issue when trying to fill the description of a product. The field was not a form input but rather an iframe that let you input text that altered the value of a p tag. I spent some time looking but didn’t find a good method of working with iframes or executing JavaScript with Mechanize.

 Below: An example of the field

The field

 Below: An example of the iframe

The code

 Attempt #2: Capybara with Selenium

I decided that Mechanize wasn’t the right tool for the job, a Capybara + JS driver solution would be best since I knew Capybara made it easy to step into iframes and there were a number of JS drivers I could use.

I decided to use selenium as my driver since I thought it would be useful to be able to see the script running and it wasn’t important that it run very quickly. At this point we discovered a bug in our site where the login screen didn’t render correctly in FireFox. After a quick call to the contractor maintaining the site to put the fix in place, I decided to try using Poltergeist on the recommendation of a colleague.

 Attempt #3: Capybara and Poltergeist

I decided to first build a simple command line script before porting the application to Sinatra. This is what I came up with:

require 'csv'
require 'capybara/poltergeist'

puts 'Listing bot waking up...'
puts 'What is the filename you would like to draft'
filename = gets.chomp
puts 'What is your username'
username = gets.chomp
puts 'What is your password'
password = gets.chomp

data = {}
symbols = [:title, :price, :location, :city, :room,
           :subroom, :description, :owner_name, :hot_deals,
           :featured, :seller]

puts 'Reading from CSV'
CSV.open(filename, 'r') do |row|
  row = row.to_h
  symbols.each do |sym|
    data[sym] = row.delete(sym.to_s)
  end
  row.delete("attributes:")
  data[:attributes] = {}
  row.each do |k, v|
    data[:attributes][k] = v
  end
end

puts 'Reading CSV complete'

Capybara.configure do |config|
  config.run_server = false
  config.default_driver = :poltergeist
  config.app_host = 'https://yourhost.com' # change url
end

Capybara.register_driver :poltergeist do |app|
  Capybara::Poltergeist::Driver.new(app, { window_size: [1600, 3500] })
end

s = Capybara::Session.new(:poltergeist)

puts 'Logging in'
s.visit '/login'
s.within('form.login') do
  s.fill_in "username", with: username
  s.fill_in "password", with: password
  s.click_on "Login"
end

puts 'Visiting new product page'
s.visit '/new_product_page'

puts "Filling in title with #{data[:title]}"
s.fill_in 'post_title', with: data[:title]

puts "Filling in price with #{data[:price]}"
s.fill_in '_regular_price', with: data[:price]

puts "Filling in description with #{data[:description]}"
s.within_frame 'content_ifr' do
  s.execute_script("document.getElementsByTagName('p')[0].innerHTML = \"#{data[:description]}\";")
end

puts "Filling in seller name with #{data[:owner_name]}"
s.fill_in 'pods_meta_seller_name', with: data[:owner_name]

puts "Selecting #{data[:seller]} from available sellers"
s.select data[:seller], from: 'pods_meta_seller'

if data[:featured]
  puts "Checking off Featured"
  s.find('a.edit-catalog-visibility', text: "Edit").click
  s.find('input#_featured').set(true)
  s.find('a.save-post-visibility', text: "OK").click
end
# check hot deal if hot deal
if data[:hot_deals]
  puts "Checking off Hot Deals"
  s.check 'pods_meta_hotdeal'
end

puts "Filling in additional item attributes"
s.find('li.attribute_tab').click
data[:attributes].each do |k, v|
  s.find('button.add_attribute').click
  puts "  Setting item's #{k} to be #{v}"
  #set the name
  s.all('input.attribute_name').last.set(k)
  #set the value
  s.all('.product_attributes textarea').last.set(v)
  #click the visible button
  s.all('.product_attributes input.checkbox').last.set(true)
end

# Save Draft
s.find('#save-post').click

puts "Draft saved. Listing Bot self destruct in:"
sleep 1
puts "3"
sleep 1
puts "2"
sleep 1
puts "1"

Disclaimer: Since the tool was for internal use there’s not much in the way of validation. The CSV is expected to be in the correct format.

 Part 1: The setup

require 'csv'
require 'capybara/poltergeist'

puts 'Listing bot waking up...'
puts 'What is the filename you would like to draft'
filename = gets.chomp
puts 'What is your username'
username = gets.chomp
puts 'What is your password'
password = gets.chomp

data = {}
symbols = [:title, :price, :location, :city, :room,
           :subroom, :description, :owner_name, :hot_deals,
           :featured, :seller]

puts 'Reading from CSV'
CSV.open(filename, 'r') do |row|
  row = row.to_h
  symbols.each do |sym|
    data[sym] = row.delete(sym.to_s)
  end
  row.delete("attributes:")
  data[:attributes] = {}
  row.each do |k, v|
    data[:attributes][k] = v
  end
end

In the section above we’re just loading the libraries, getting the CSV, username, and password from the user, and converting the CSV into a data structure that we can work more easily with.

Capybara.configure do |config|
  config.run_server = false
  config.default_driver = :poltergeist
  config.app_host = 'https://yourhost.com' # change url
end

Capybara.register_driver :poltergeist do |app|
  Capybara::Poltergeist::Driver.new(app, { window_size: [1600, 3500] })
end

s = Capybara::Session.new(:poltergeist)

We set config.run_server = false since this is not a Rack based app.
We set config.default_driver = :poltergeist to set Poltergeist as the JS driver. Important to note this is not the same as config.current_driver.
We set config.app_host equal to our site url.

There is a very useful Capybara::Session method that lets you take a screenshot of the current state of the page you’re visiting. This is exceptionally useful for debugging. I used Capybara::Poltergeist::Driver.new(app, { window_size: [1600, 3500] }) to set the window size to be very large so I could capture the whole page

s = Capybara::Session.new(:poltergeist) creates a new Capybara Session using Poltergeist that we can use to navigate and alter the page.

 Part 2: Filling out forms

The rest of the script is reasonably standard if you’ve used capybara for acceptance testing. You identify a form field and fill, check, select, etc. using Capybara’s DSL for form inputs. The notable exception is the iframe we saw earlier. We handle that with the following bit of code:

puts "Filling in description with #{data[:description]}"
s.within_frame 'content_ifr' do
  s.execute_script("document.getElementsByTagName('p')[0].innerHTML = \"#{data[:description]}\";")
end

We first navigate within the iframe using Capybara::Session’s within_frame method. We then execute a line of JavaScript that finds the p tag being used as a form field and sets it to our description.

 Porting to Sinatra

I had to move the logic to Sinatra, and did so by moving the script to it’s own model (called Bot) that takes the username, password, and file as arguments rather than console inputs. Other than that it was not altered.

For the client I create a form with inputs for the username and password, plus a field for file upload:

<form class="add-product" action='/create' method='post' enctype='multipart/form-data'>
  Username:<br>
  <input type="text" name="username">
  <br>
  Password:<br>
  <input type="password" name="password">
  <br>
  Upload CSV:<br>
  <input type='file' name='my_file'>
  <br><br>

  <input type='submit' value='Submit'>
</form>

Then on the server I created this:

require 'sinatra'
require 'capybara/poltergeist'
require 'csv'
require 'json'
require 'rest_client'

configure do
  set :views, 'app/views'
end
configure :production do
  require 'newrelic_rpm'
end

Dir[File.join(File.dirname(__FILE__), 'app', '**', '*.rb')].each do |file|
  require file
end

get '/' do
  erb :index
end

post '/create' do
  username = params['username']
  password = params['password']
  file = params['my_file'][:tempfile]
  Bot.new(username, password, file).populate_wordpress_and_pipedeals
  File.delete(filename.path)
  redirect '/'
end

The element to take note of is that the file is available as part of params. I passed the entire file to my script, making sure to delete it after I was done.

 What about PhantomJS?

Poltergeist runs on top of PhantomJS. Unfortunately, since I wanted to use Heroku I realized I could not install PhantomJS. After some searching of StackOverflow I realized if I just placed the PhantomJS binary in a /bin directory at the root of the application it would work. Surprisingly, it did!

 Memory leak

Excited, I pushed the working version to Heroku and had all my colleagues give it a try. On the fourth attempt, it crashed. After a look at the logs, I saw a R14 (Memory Quota Exceeded) error. My first step was to switch from Webbrick to Unicorn. After consulting with @heroiceric, I inspected the running processes and realized the PhantomJS process was not being killed between requests. I initially set a pkill phantom command at the end of the request. @richardbeastmaster pointed me to the docs and I realized I was doing s = Capybara::Session.new(:poltergeist) but not closing out the session. A quick update with s.driver.quit solved the problem.

 What’s next?

The tool is in a good state for now, my team is able to use it to publish items to multiple locations in a fraction of the time. The main limitation is on rare occasions Heroku will time out the request since nothing is delegated to a background job.

 
53
Kudos
 
53
Kudos

Now read this

Allowing Innovation in Education

Recently, I came across criticism of coding bootcamps. As someone with a passion for improving education as well as both a former student and current employee of a bootcamp, I took the criticism quite personally. The main criticisms... Continue →