KIAM Main page Web Library  •  Publication Searh  Русский 
Publication

KIAM Preprint № 20, Moscow, 2019
Authors: Kitaev E. L., Skornyakova R. Y.
Scraping on the fly of external web resources, driven by HTML page markup
Abstract:
The paper presents an approach to displaying data from cross origin resources on web pages using the REST API and describes a tool based on this approach that allows one to extract and display on the web page metadata of html documents, pdf files and documents Word posted on the Internet, as well as microdata and data in JSON LD format. The tool includes the REST API on the IIS web server and JavaScript scripts. Examples of using this tool are given. The created REST API enables cross origin resource sharing (CORS) and can be requested from web pages of any origins.
Keywords:
web scraping, semantic markup, microdata, JSON LD, REST API, CORS
Publication language: russian,  pages: 31
Research direction:
Programming, parallel computing, multimedia
Russian source text:
Export link to publication in format:   RIS    BibTeX
View statistics (updated once a day)
over the last 30 days — 20 (+5), total hit from 01.09.2019 — 1979
About authors:
  • Kitaev Evgeny L'vovich,  kitaev@keldysh.ruorcid.org/0000-0002-0938-2610KIAM RAS
  • Skornyakova Rimma Yuryevna,  rimmaskorn@gmail.comorcid.org/0000-0001-7372-3574KIAM RAS