2015年6月12日星期五

Searching Serialized Fields in Rails Using Postgres Arrays

In Rails, ActiveRecord allows you store and retrieve data structures like arrays and hashes in a single string or text-type database column through application-level serialization.



It is a common pattern with models having fields like tags or nicknames, where you want to be able to store an arbitrary list of strings:




class Post < ActiveRecord::Base serialize :tags end


This is all fine n' dandy until you're like, "I want to find all posts tagged with either 'hobbits' or 'gandalf'!" Well..



simply


The database only knows it has a string, not a comma-separated list of strings. It is possible to search application-level serialized fields, but it's pretty hacky and brittle:




class Post scope :including_all_tags, -> (tags) { where(matching_tag_query(tags, 'AND')) } scope :including_any_tags, -> (tags) { where(matching_tag_query(tags, 'OR')) } private def matching_tag_query(tags, condition_separator = 'OR') tags.map { |tag| "(tags LIKE '%#{tag}%')" }.join(" #{condition_separator} ") end end # In use Post.including_all_tags(['hobbits', 'gandalf']) Post.including_any_tags(['hobbits', 'gandalf'])


This approach builds a WHERE condition and matches against the given tags, which grows and becomes increasingly inefficient as the number of searched tags increases. Also, if you have tags like gandalf and gandalf-the-grey, you'd get false positives with the above example. While passable for simple cases, there's got to be a better way! Fortunately for us, along comes Postgres.



types



PostgreSQL Arrays



Postgres has built-in array-type columns, which we can take advantage of in our Rails 4.X apps with the postgres_ext gem. What does this mean for our Rails apps? If we can store arrays in a single database column, then our database and application have the same understanding of the structure of the data. This isn't possible with the out-of-the-box serialization model. Most importantly, we can use SQL to effectively search inside our array-type columns.



With PostgresExt installed, we can add an array: true flag to string/text columns:




class PostgresHaveMyBabiesMigration < Migration def change create_table :posts do |t| t.text :tags, array: true end end end


Now, the model is no longer concerned with serializing the :tags field:




class Post < ActiveRecord::Base # crickets end


Searching



As mentioned, one of the primary benefits of Postgres's array-type column is the enhanced searchability through array-specific SQL. The Postgres Extensions gem unlocks a number of array operators and functions for usage in SQL strings. Let's take a look at operators in both their SQL and Arel forms. If you're unfamiliar with Arel, check out this post where I expand on its usefulness.



Overlap Operator



The overlap operator (&&) is useful when you're comparing two arrays to see if they share one or more elements in common.



In SQL:




SELECT * FROM posts WHERE posts.tags && '{hobbits,gandalf}'


Utilizing Arel:




Post.where(Post.arel_table[:tags].overlap(['hobbits', 'gandalf'])) # or Post.where.overlap(tags: ['hobbits', 'gandalf'])


These queries would return any posts having either 'hobbit' or 'gandalf' as a tag.



Contains Operator



The contains operator (@>) is useful when you're looking for arrays that contain all the elements of another.



In SQL:




SELECT * FROM posts WHERE posts.tags @> '{hobbits,gandalf}'


Utilizing Arel:




Post.where(Post.arel_table[:tags].contains(['hobbits', 'gandalf'])) # or Post.where.contains(tags: ['hobbits', 'gandalf'])


These queries would return any posts having both the 'hobbits' and 'gandalf' tags.



mmm



Wrapping Up



Postgres is awesome. The array extensions are magical. Instead of application-level serialization, let Postgres handle array persistance for you. It unifies the understanding of array-type data between the database and application and simplifies searching.

没有评论:

发表评论