Content Curation ...
Follow
Find tag "scraping"
616.0K views | +454 today
Content Curation World
What a Content Curator Needs To Know: How, Tools, Issues and Strategy
Curated by Robin Good
Author: Robin Good   Google+
Your new post is loading...
Scooped by Robin Good
Scoop.it!

Scrape Structured Data Easily from Any Web Page with TheWebMiner



Robin Good's insight:



TheWebMiner is an experimental web-based scraping tool capable of rapidly capturing any amount of structured data from any web page (for example a Google search result page). Once captured the data can be saved in SQL, CSV, XML, JSON and Excel file formats.

TheWebMiner is very easy to use, as you only need to provide the reference URL page, and text samples of the structured data units that you would like to capture.


The service is free until January 5 2013.


Pricing thereafter: http://thewebminer.com/priceing


Read more about it: http://extract-web-data.com/thewebminer-a-cloud-scraping-tool/



Try it out now here: http://thewebminer.com/demo (watch the video first)


More info: http://thewebminer.com/









more...
Therese Torris's curator insight, December 28, 2012 6:55 AM

Structured data scraping/crawling/extraction is a major step up from the simpler text and image extraction now familiar from generalist content curation tools like Scoop.it. It's about recognizing and extractic specific types of structured data like product names, product prices, taxes, brand names etc. 

 

Structured data extraction used to be the priviledge of a few high tech companies like Google.

 

It looks like it will soon be everybody's tool !

Scooped by Robin Good
Scoop.it!

Capturing Information From Online Sources: Scraping for Journalists by Paul Bradshaw [eBook]

Capturing Information From Online Sources: Scraping for Journalists by Paul Bradshaw [eBook] | Content Curation World | Scoop.it

Robin Good: Here's a new book for journalists interested in learning how to capture information from any web page or resource online, even when who is publishing it has not made that content available for everyone.


The subtitle of the book is: "How to grab data from hundreds of sources, put it in a form you can interrogate - and still hit deadlines"


From the official book site: "Scraping - getting a computer to capture information from online sources - is one of the most powerful techniques for data-savvy journalists who want to get to the story first, or find exclusives that no one else has spotted.


Faster than FOI and more detailed than advanced search techniques, scraping also allows you to grab data that organisations would rather you didn’t have - and put it into a form that allows you to get answers.


Scraping for Journalists introduces you to a range of scraping techniques - from very simple scraping techniques which are no more complicated than a spreadsheet formula, to more complex challenges such as scraping databases or hundreds of documents.


At every stage you'll see results - but you'll also be building towards more ambitious and powerful tools."


Paul Bradshaw runs the MA in Online Journalism at Birmingham City University, and is a Visiting Professor at City University’s School of Journalism in London. He publishes the Online Journalism Blog, and is the founder of investigative journalism website HelpMeInvestigate.


Price: $9.99


Buy the book or find out more: https://leanpub.com/scrapingforjournalists



more...
No comment yet.