Using TS with a remote sphinx service #1131

jdelStrother · 2019-04-10T10:30:44Z

Hi there,
I'm currently trying to get thinking-sphinx working with searchd in a docker container, though I think a lot of the same issues apply if you were running searchd on a separate server to your Rails servers. I was hoping to discuss either workarounds that people are using for these cases, or work that we could do on thinking-sphinx to improve that workflow.

There's two main pain points I've been hitting:

Config-generation seems pretty insistent on calling mkdir_p for various directories, which isn't very useful if you're trying to generate configuration for a remote machine.
It seems like we ought to be able to call rake ts:index from a local Rails server and have it populate our realtime indexes on a remote server. However, TS also tries to check that searchd is running (via the pid file) and tries to rotate the index after it's done populating.

In my hacky experimentation I've been working around these with this rake file:

namespace :ts do
  task docker_configure: :environment do
    config = ThinkingSphinx::Configuration.instance
    # force the configuration to load the "docker" key out of thinking_sphinx.yml
    config.framework = ThinkingSphinx::Frameworks::Plain.new.tap { |f| f.environment = "docker"; f.root = Rails.root }
    # sphinx configuration is going to attempt to create some dirs that don't exist locally.  Ignore them.
    def FileUtils.mkdir_p(dir)
      puts "Ignoring request to mkdir_p('#{dir}')"
    end

    interface.configure
  end

  task docker_index: :environment do
    # hack to allow running, ts:index against a remote sphinx service.
    ThinkingSphinx::Commander.registry[:running] = proc { puts "fake-sphinx running!" }
    interface.rt.index
  end
end

with this docker-compose:

version: '3'
services:
  sphinx:
    image: macbre/sphinxsearch:3.0.1
    ports:
     - "9306:9306"
    volumes:
      - ./config/docker.sphinx.conf:/opt/sphinx/conf/sphinx.conf
      - ./lib/dict:/opt/sphinx/lib/dict

Any thoughts/plans on separating out some of the TS code that only works if you're running Rails & Sphinx side-by-side? Or am I doing it all wrong?

(Previous docker discussions at #1010)

The text was updated successfully, but these errors were encountered:

pat · 2019-04-10T11:45:39Z

Second issue first: definitely sounds like something that should be fixed, probably via an environment variable. I'll look into it soon :)

As for the first: you should only be running the configure task on the Sphinx server - there's no value having it occur on the client servers. However, if this is only a problem due to the index task running configure automatically, you can use INDEX_ONLY=true. That said, would it make sense to just run all the TS tasks only on your Sphinx server?

jdelStrother · 2019-04-10T12:12:23Z

you should only be running the configure task on the Sphinx server - there's no value having it occur on the client servers. [....] would it make sense to just run all the TS tasks only on your Sphinx server?

The thing I'm trying to get away from is that we have a big monolithic Rails app with a lot of dependencies (both gems, and compiled libraries like ImageMagick). So right now our Sphinx server needs to have all those irrelevant dependencies installed just so that we can generate a sphinx config file.
(Admittedly this approach I'm trying of generating the config file from a Rails server then shipping it over to the sphinx server, is also filled with drawbacks.)

This is for situations where you're rebuilding a real-time index remotely, and so don't have access to the PID file on the Sphinx server. See #1131.

This is useful in Docker environments where you may want to generate the configuration file on your Rails server machine and immediately transfer it over to a dedicated Sphinx machine (which has no Rails context). See #1131.

pat · 2019-04-14T12:48:22Z

Just pushed some commits to the develop branch which add two boolean settings (which can be turned on per-environment in config/thinking_sphinx.yml): skip_directory_creation and skip_running_check. This should remove the need for your monkey patches, but would appreciate the confirmation after you've tested it! :)

jdelStrother · 2019-04-16T08:49:47Z

Yep, both work great thanks 🎉

pat · 2019-04-16T08:53:59Z

Excellent! And that means all the Docker stuff's working well, without your Rails app on the Sphinx server?

jdelStrother · 2019-04-16T09:09:52Z

Yep, my docker-searchd container seems to be working fine. It's basically just using the macbre/sphinxsearch image with my config file mounted into it.

xtrasimplicity · 2019-05-13T06:35:37Z

Is there a release scheduled for this feature, @pat? This looks like it could solve some of the issues that have been making me put off de-monolithifying a project I've been working on. Thanks!

pat · 2019-05-14T02:11:18Z

There's no release just yet - there's a couple of outstanding issues I want to tackle first - but it's on my radar. With a bit of luck I'll have something out early next week 🤞

xtrasimplicity · 2019-05-14T02:51:43Z

Awesome - that sounds great. Thanks heaps!

pat · 2019-05-18T02:54:33Z

These settings are now part of the newly released v4.3.0 🎉

xtrasimplicity · 2019-05-18T04:40:18Z

Awesome! Thanks, Pat!

…

On Sat., 18 May 2019, 12:54 Pat Allan, ***@***.***> wrote: Closed #1131 <#1131>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1131?email_source=notifications&email_token=ACHCTTIZX2SDE6UQPI6SXBTPV5VWXA5CNFSM4HE2UFV2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORQO3IBA#event-2350756868>, or mute the thread <https:/notifications/unsubscribe-auth/ACHCTTMYWFXWHM3PITHFDFTPV5VWXANCNFSM4HE2UFVQ> .

alexanderadam · 2020-06-18T16:46:59Z

@jdelStrother / @xtrasimplicity did anyone experience significant performance issues on index creation?
We're trying the mentioned setup with @macbre's Sphinx container and the referenced Thinking Sphinx settings skip_running_check & skip_directory_creation but it seems that rake ts:rebuild takes indeed ages.

Or does anyone have an idea what the reason could be or are there any tweaks / other suggestions?

Thank you in advance!

xtrasimplicity · 2020-06-21T06:33:37Z

@alexanderadam, We haven't had any major performance issues, but our database is quite small and a few minutes at startup isn't a huge issue for us as searching is only a tiny part of our application's functionality.

You could try increasing the size of /dev/shm from 64MB to something a bit higher, but I'm not sure if that will have any performance benefits for TS.

jdelStrother · 2020-06-22T10:03:28Z

@alexanderadam Our rebuilds are pretty slow by default, on a database with something like 5 million documents. We've monkeypatched it with an alternative approach:

class ThinkingSphinx::RealTime::Populator
  def populate(&block)
    instrument "start_populating"

    limit = ENV["RT_BATCH_LIMIT"] || nil
    cnt = 0
    scope.find_in_batches(batch_size: batch_size) do |instances|
      break if limit && (cnt += 1) > limit.to_i
      transcriber.copy(*instances)
      instrument "populated", instances: instances
    end

    instrument "finish_populating"
  end
end

When we need to rebuild a sphinx index, we'll run, eg:

bin/rake ts:rebuild INDEX_FILTER=posts_rt_core RT_BATCH_LIMIT=1000

just to get sphinx back up-and-running with a few documents, and then incrementally populate it with something like this:

Post.find_each do |post|
  ThinkingSphinx::RealTime.callback_for(:post).after_save(post)
end

pat added a commit that referenced this issue Apr 14, 2019

Allow overriding of Sphinx's running state.

6a1f981

This is for situations where you're rebuilding a real-time index remotely, and so don't have access to the PID file on the Sphinx server. See #1131.

pat closed this as completed May 18, 2019

pat mentioned this issue Jun 18, 2020

[question]: Tasks are relying on a local daemon? #1169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using TS with a remote sphinx service #1131

Using TS with a remote sphinx service #1131

jdelStrother commented Apr 10, 2019 •

edited

Loading

pat commented Apr 10, 2019

jdelStrother commented Apr 10, 2019

pat commented Apr 14, 2019

jdelStrother commented Apr 16, 2019

pat commented Apr 16, 2019

jdelStrother commented Apr 16, 2019

xtrasimplicity commented May 13, 2019

pat commented May 14, 2019

xtrasimplicity commented May 14, 2019 via email •

edited

Loading

pat commented May 18, 2019

xtrasimplicity commented May 18, 2019 via email

alexanderadam commented Jun 18, 2020

xtrasimplicity commented Jun 21, 2020

jdelStrother commented Jun 22, 2020 •

edited

Loading

Using TS with a remote sphinx service #1131

Using TS with a remote sphinx service #1131

Comments

jdelStrother commented Apr 10, 2019 • edited Loading

pat commented Apr 10, 2019

jdelStrother commented Apr 10, 2019

pat commented Apr 14, 2019

jdelStrother commented Apr 16, 2019

pat commented Apr 16, 2019

jdelStrother commented Apr 16, 2019

xtrasimplicity commented May 13, 2019

pat commented May 14, 2019

xtrasimplicity commented May 14, 2019 via email • edited Loading

pat commented May 18, 2019

xtrasimplicity commented May 18, 2019 via email

alexanderadam commented Jun 18, 2020

xtrasimplicity commented Jun 21, 2020

jdelStrother commented Jun 22, 2020 • edited Loading

jdelStrother commented Apr 10, 2019 •

edited

Loading

xtrasimplicity commented May 14, 2019 via email •

edited

Loading

jdelStrother commented Jun 22, 2020 •

edited

Loading