Skip to main content

Data scraping in Ruby on Rails using Nokogiri and Mechanize Gem

What is Data scraping?
Website/Data  Scraping is a technique to operating large amounts of data from websites whereby the data is extracted and displayed in own sites or it can be stored to a File/Database. Data scraping is basically used where the websites does not provides API.
Some Applications do not provide API to collect records. For the same , Data Scraping technique is used.
The data can be scraped using Nokogiri Gem.
The steps are required:
  • Add the gem “gem ‘nokogiri’, ‘~> 1.8’, ‘>= 1.8.1'”.
  • Then run the bundle install
  • Add the “require ‘nokogiri'”, “require ‘open-uri'” line where you will write the code for the scraping.
The controller of the page will look like below:
The view of the code of view page will look like :
The result in our application will look like:

Mechanize Gem in rails
The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, and can follow links and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.
For the above site ,I have used Mechanize gem to scrap the data or search the record.
We are having the following Sample application running on the local
The steps required are:
  • Add the gem “gem ‘mechanize’, ‘~> 2.7’, ‘>= 2.7.5’“.
  • Then run the bundle install
  • Add require ‘mechanize’ in the controller.
The controller code to scrap the data using mechanize gem for search:
The output of the above scraping as would be seen on the console:
By using the mechanize gem we can select radio button as given below:
staff_data.page.forms[0].radiobutton_with(:id => “First_drop”).check
By using the mechanize gem we can see all input fields in the form.
staff_data.page.forms[0].fields
By using the mechanize gem we can also select the drop down value of the site.
staff_data.page.forms[0].field_with(:id => “country”).value = “Single”
By using the mechanize gem we can find the form content of the site.
form = staff_data.page.form_with(:id => “search_form”)
By using the mechanize gem we can find the button as given.
button = form.button_with(:value => “Search”)
In the Mechanize gem, link_with method is available to make it simpler to fetch the random record link .
link = staff_data.link_with(text: ‘Random article’)
In the mechanize gem the click method instructs mechanize to follow the link
page = link.click
By using the mechanize gem we can find the page title of the site.
staff_data.page.title
 
Source: Data Scraping in Ruby on Rails 

Comments

  1. Install Indian Social Media Pixalive

    Start creating post with Pixalive. Add Voice Notes along with Photos, Videos, and Texts and secure your memories with Pixalive.

    Web: https://www.pixalive.me/

    #Pixalive #App #voice #Games #socialMedia #Friends #Chat #VideoCall #Voicecall #Photos #Texts #India #helo #facebook #instagram

    ReplyDelete

Post a Comment

Popular posts from this blog

GraphQL With Ruby

Now a day’s most of the web or mobile applications fetch data from server which is stored in a database. REST API provides an interface to stored data that require by the applications. GraphQL is a query language for REST API's not for server databases. It is database agnostic and effectively can be used in any context where an API is used. GraphQL provide platform for declarative data fetching where client need to specify what data needs from API in response. Instead of multiple endpoints that return fixed data structures, a GraphQL server only exposes a single endpoint and responds with precisely the data a client asked for. GraphQL minimizes the amount of data that needs to be transferred over the network and improves applications operating under these conditions. Introduction to GraphQL API on Ruby on Rails Start with adding gem in Gemfile gem ‘graphql’ Run command bundle install Run command rails generate graphql:install Above command will add gr

Best In Place Gem In Ruby On Rails Tutorial

The best_in_place gem is the easiest solution for in place editing in Ruby on Rails. This gem provides functionality of “in place editing” in ruby on rails without writing any extra ajax code. It supports text inputs, textarea, select dropdown, checkboxes, jQuery UI Datepickers, etc. Also Displays server-side validation Installation Steps of “best_in_place” Gem : Installing best_in_place is very easy and straight-forward. Just begin including the gem in your Gemfile: gem ‘best_in_place’ After that, specify the use of the jquery and best in place javascripts in your application.js, and optionally specify jquery-ui if you want to use jQuery UI datepickers: //= require jquery //= require best_in_place //= require jquery-ui //= require best_in_place.jquery-ui Then, just add a binding to prepare all best in place fields when the document is ready: $(document).ready(function() { /* Activating Best In Place */ jQuery(".best_in_place").best_in_place(); });