Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TS with a remote sphinx service #1131

Closed
jdelStrother opened this issue Apr 10, 2019 · 14 comments
Closed

Using TS with a remote sphinx service #1131

jdelStrother opened this issue Apr 10, 2019 · 14 comments

Comments

@jdelStrother
Copy link
Contributor

jdelStrother commented Apr 10, 2019

Hi there,
I'm currently trying to get thinking-sphinx working with searchd in a docker container, though I think a lot of the same issues apply if you were running searchd on a separate server to your Rails servers. I was hoping to discuss either workarounds that people are using for these cases, or work that we could do on thinking-sphinx to improve that workflow.

There's two main pain points I've been hitting:

  • Config-generation seems pretty insistent on calling mkdir_p for various directories, which isn't very useful if you're trying to generate configuration for a remote machine.

  • It seems like we ought to be able to call rake ts:index from a local Rails server and have it populate our realtime indexes on a remote server. However, TS also tries to check that searchd is running (via the pid file) and tries to rotate the index after it's done populating.

In my hacky experimentation I've been working around these with this rake file:

namespace :ts do
  task docker_configure: :environment do
    config = ThinkingSphinx::Configuration.instance
    # force the configuration to load the "docker" key out of thinking_sphinx.yml
    config.framework = ThinkingSphinx::Frameworks::Plain.new.tap { |f| f.environment = "docker"; f.root = Rails.root }
    # sphinx configuration is going to attempt to create some dirs that don't exist locally.  Ignore them.
    def FileUtils.mkdir_p(dir)
      puts "Ignoring request to mkdir_p('#{dir}')"
    end

    interface.configure
  end

  task docker_index: :environment do
    # hack to allow running, ts:index against a remote sphinx service.
    ThinkingSphinx::Commander.registry[:running] = proc { puts "fake-sphinx running!" }
    interface.rt.index
  end
end

with this docker-compose:

version: '3'
services:
  sphinx:
    image: macbre/sphinxsearch:3.0.1
    ports:
     - "9306:9306"
    volumes:
      - ./config/docker.sphinx.conf:/opt/sphinx/conf/sphinx.conf
      - ./lib/dict:/opt/sphinx/lib/dict

Any thoughts/plans on separating out some of the TS code that only works if you're running Rails & Sphinx side-by-side? Or am I doing it all wrong?

(Previous docker discussions at #1010)

@pat
Copy link
Owner

pat commented Apr 10, 2019

Second issue first: definitely sounds like something that should be fixed, probably via an environment variable. I'll look into it soon :)

As for the first: you should only be running the configure task on the Sphinx server - there's no value having it occur on the client servers. However, if this is only a problem due to the index task running configure automatically, you can use INDEX_ONLY=true. That said, would it make sense to just run all the TS tasks only on your Sphinx server?

@jdelStrother
Copy link
Contributor Author

you should only be running the configure task on the Sphinx server - there's no value having it occur on the client servers. [....] would it make sense to just run all the TS tasks only on your Sphinx server?

The thing I'm trying to get away from is that we have a big monolithic Rails app with a lot of dependencies (both gems, and compiled libraries like ImageMagick). So right now our Sphinx server needs to have all those irrelevant dependencies installed just so that we can generate a sphinx config file.
(Admittedly this approach I'm trying of generating the config file from a Rails server then shipping it over to the sphinx server, is also filled with drawbacks.)

pat added a commit that referenced this issue Apr 14, 2019
This is for situations where you're rebuilding a real-time index remotely, and so don't have access to the PID file on the Sphinx server. See #1131.
pat added a commit that referenced this issue Apr 14, 2019
This is useful in Docker environments where you may want to generate the configuration file on your Rails server machine and immediately transfer it over to a dedicated Sphinx machine (which has no Rails context). See #1131.
@pat
Copy link
Owner

pat commented Apr 14, 2019

Just pushed some commits to the develop branch which add two boolean settings (which can be turned on per-environment in config/thinking_sphinx.yml): skip_directory_creation and skip_running_check. This should remove the need for your monkey patches, but would appreciate the confirmation after you've tested it! :)

@jdelStrother
Copy link
Contributor Author

Yep, both work great thanks 🎉

@pat
Copy link
Owner

pat commented Apr 16, 2019

Excellent! And that means all the Docker stuff's working well, without your Rails app on the Sphinx server?

@jdelStrother
Copy link
Contributor Author

Yep, my docker-searchd container seems to be working fine. It's basically just using the macbre/sphinxsearch image with my config file mounted into it.

@xtrasimplicity
Copy link

Is there a release scheduled for this feature, @pat? This looks like it could solve some of the issues that have been making me put off de-monolithifying a project I've been working on. Thanks!

@pat
Copy link
Owner

pat commented May 14, 2019

There's no release just yet - there's a couple of outstanding issues I want to tackle first - but it's on my radar. With a bit of luck I'll have something out early next week 🤞

@xtrasimplicity
Copy link

xtrasimplicity commented May 14, 2019 via email

@pat
Copy link
Owner

pat commented May 18, 2019

These settings are now part of the newly released v4.3.0 🎉

@pat pat closed this as completed May 18, 2019
@xtrasimplicity
Copy link

xtrasimplicity commented May 18, 2019 via email

@alexanderadam
Copy link

@jdelStrother / @xtrasimplicity did anyone experience significant performance issues on index creation?
We're trying the mentioned setup with @macbre's Sphinx container and the referenced Thinking Sphinx settings skip_running_check & skip_directory_creation but it seems that rake ts:rebuild takes indeed ages.

Or does anyone have an idea what the reason could be or are there any tweaks / other suggestions?

Thank you in advance!

@xtrasimplicity
Copy link

@alexanderadam, We haven't had any major performance issues, but our database is quite small and a few minutes at startup isn't a huge issue for us as searching is only a tiny part of our application's functionality.

You could try increasing the size of /dev/shm from 64MB to something a bit higher, but I'm not sure if that will have any performance benefits for TS.

@jdelStrother
Copy link
Contributor Author

jdelStrother commented Jun 22, 2020

@alexanderadam Our rebuilds are pretty slow by default, on a database with something like 5 million documents. We've monkeypatched it with an alternative approach:

class ThinkingSphinx::RealTime::Populator
  def populate(&block)
    instrument "start_populating"

    limit = ENV["RT_BATCH_LIMIT"] || nil
    cnt = 0
    scope.find_in_batches(batch_size: batch_size) do |instances|
      break if limit && (cnt += 1) > limit.to_i
      transcriber.copy(*instances)
      instrument "populated", instances: instances
    end

    instrument "finish_populating"
  end
end

When we need to rebuild a sphinx index, we'll run, eg:

bin/rake ts:rebuild INDEX_FILTER=posts_rt_core RT_BATCH_LIMIT=1000

just to get sphinx back up-and-running with a few documents, and then incrementally populate it with something like this:

Post.find_each do |post|
  ThinkingSphinx::RealTime.callback_for(:post).after_save(post)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants