6
votes

I'm noticing a strange issue that I haven't been able to solve for a few days.

I have a Rails 5 API server with system tests using RSpec and Capybara + Selenium-webdriver driving headless Chrome.

I'm using Capybara.app_host = 'http://localhost:4200' to make the tests hit a separate development server which is running an Ember front-end. The Ember front-end looks at the user agent to know to then send requests to the Rails API test database.

All the tests run fine except for ones which use RSpec file fixtures.

Here's one spec that is failing:

describe 'the affiliate program', :vcr, type: :system do
  fixtures :all

  before do
    Capybara.session_name = :affiliate
    visit('/')
    signup_and_verify_email(signup_intent: :seller)
    visit_affiliate_settings
  end

  it 'can use the affiliate page' do
    affiliate_token = page.text[/Your affiliate token is \b(.+?)\b/i, 1]
    expect(affiliate_token).to be_present

    # When a referral signs up.
    Capybara.session_name = :referral
    visit("?client=#{affiliate_token}")
    signup_and_verify_email(signup_intent: :member)

    refresh

    # It can track the referral.
    Capybara.session_name = :affiliate
    refresh
    expect(page).to have_selector('.referral-row', count: 1)

    # When a referral makes a purchase.
    Capybara.session_name = :referral
    find('[href="/videos"]').click
    find('.price-area .coin-usd-amount', match: :first).click
    find('.cart-dropdown-body .checkout-button').click
    find('.checkout-button').click
    wait_for { find('.countdown-timer') }
    order = Order.last
    order.force_complete_payment!
    Rake::Task['affiliate_referral:update_amounts_earned'].invoke

    # It can track the earnings.
    Capybara.session_name = :affiliate
    refresh
    amount = (order.price * AffiliateReferral::COMMISSION_PERCENTAGE).floor.to_f
    amount_in_dom = find('.referral-amount-earned', match: :first).text.gsub(/[^\d\.]/, '').to_f * 100
    expect(amount).to equal(amount_in_dom)
  end
end

This will fail maybe 99% of the time. There is the odd case where it passes. I can get my test suite to eventually pass by running it on a loop for a day.

I ended up upgrading all versions to the latest (Node 10, latest Ember, latest Rails) but the issue persists.

I can post a sample repo that reproduces the issue later. I just wanted to get this posted in case anyone has encountered the issue.

Here's a typical stack trace when the timeout happens:

 1.1) Failure/Error: page.evaluate_script('window.location.reload()')

      Net::ReadTimeout:
        Net::ReadTimeout



      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/webmock-3.3.0/lib/webmock/http_lib_adapters/net_http.rb:97:in `block in request'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/webmock-3.3.0/lib/webmock/http_lib_adapters/net_http.rb:110:in `block in request'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/webmock-3.3.0/lib/webmock/http_lib_adapters/net_http.rb:109:in `request'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/remote/http/default.rb:121:in `response_for'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/remote/http/default.rb:76:in `request'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/remote/http/common.rb:62:in `call'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/remote/bridge.rb:164:in `execute'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/remote/oss/bridge.rb:584:in `execute'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/remote/oss/bridge.rb:267:in `execute_script'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/selenium-webdriver-3.14.0/lib/selenium/webdriver/common/driver.rb:211:in `execute_script'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/capybara-3.8.2/lib/capybara/selenium/driver.rb:84:in `execute_script'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/capybara-3.8.2/lib/capybara/selenium/driver.rb:88:in `evaluate_script'
      # /home/mhluska/.rvm/gems/ruby-2.5.1/gems/capybara-3.8.2/lib/capybara/session.rb:575:in `evaluate_script'
      # ./spec/support/selenium.rb:48:in `refresh'
      # ./spec/support/pages.rb:70:in `signup_and_verify_email'
      # ./spec/system/payment_spec.rb:43:in `block (3 levels) in <top (required)>'

I should point out it doesn't always happen with page.evaluate_script('window.location.reload()'). It can happen with something benign like visit('/').


Edit: I tried disabling Ember FastBoot (server-side rendering) using the DISABLE_FASTBOOT env variable and suddenly all tests pass. I'm thinking that somehow the RSpec fixtures are causing Ember FastBoot to not finish rendering in some cases. This certainly lines up with dropped connections I've occasionally seen in production logs.

I've been experimenting with the client code and it may be due to my use of FastBoot's deferRendering call.


Edit: I'm using the following versions:

  • ember-cli: 3.1.3
  • ember-data: 3.0.2
  • rails: 5.2.1
  • rspec: 3.8.0
  • capybara: 3.8.2
  • selenium-webdriver: 3.14.0
  • google chrome: 69.0.3497.100 (Official Build) (64-bit)

Edit: I'm using this somewhat flaky Node/Express library fastboot-app-server to do server-side rendering. I've discovered that it sometimes strips important response headers (Content-Type and Content-Encoding). I'm wondering if this is contributing to the issue.


Edit: I added a strict Content Security Policy to make sure there are no external requests running during the test suite that could be causing the Net::ReadTimeout.

I inspect the Chrome network tab at the point when it locks up and it seems to be loading nothing. Manually refreshing the browser allows the tests to pick up and continue running. How strange.

I've spent a couple weeks on this now and it may be time to give up on Selenium tests.

I upgraded to Chrome 70 and chromedriver 2.43. It didn't seem to make a difference.

I tried using the rspec-retry gem to force a refresh when the timeout occurs but the gem seems to fail to catch the timeout exception.

I've inspected the raw request to chromedriver where things hang. It looks like it's always POST http://127.0.0.1/session/<session id>/refresh. I tried refreshing in an alternate way: visit(page.current_path) which seems to fix things!

1
This can occur if you haven't exempted selenium requests from webmock/VCR or if your app is taking too long to start up (would usually be the first test failing then) - Do your test logs show the relevant apps starting up and receiving requests?Thomas Walpole
I don't think it's either of those things because VCR is not complaining about blocked requests and my other specs without fixtures pass just fine.Maros
What version of Rails are you using and have you managed to produce a sample repo that reproduces the issue?Thomas Walpole
@ThomasWalpole I'm using Rails 5.2.1. No sample repo yet but I'm working on it. I'm slowly reducing my production app to an example that contains the issue. It's quite time consuming.Maros
@MarosHluska If you are not getting time for the sample repo then I think you can just share the VCR recorded yml for this endpoint of your app and this rspec file.Jaswinder

1 Answers

0
votes

I finally got my test suite to pass by switching page.driver.browser.navigate.refresh to visit(page.current_path).

I know it's an ugly hack but it's the only thing I could find to get things working (see my various attempts in the question edits).

I looked at the request to chromedriver that was causing the timeouts each time: POST http://127.0.0.1/session/<session id>/refresh. I can only guess that it's some kind of issue with chromedriver. Perhaps incidentally, it only hangs when multiple chromedriver instances are active (which happens when multiple Capybara sessions are being used).

Edit: I needed to account for query params as well:

def refresh   
  query = URI.parse(page.current_url).query
  path = page.current_path
  path += "?#{query}" if query.present?

  visit(path)
end

I tried just doing visit(page.current_url) but that was giving timeouts as well.