Research on Tibetan News Sites’ Web Crawler and Search Engine

Zhiqiang Han; Guixian Xu; Wei Sun

doi:10.2991/lemcs-15.2015.116

<Previous Article In Volume

Next Article In Volume>

Research on Tibetan News Sites’ Web Crawler and Search Engine

Authors

Zhiqiang Han, Guixian Xu, Wei Sun

Corresponding Author

Zhiqiang Han

Available Online July 2015.

DOI: 10.2991/lemcs-15.2015.116 How to use a DOI?
Keywords: Tibetan; News sites; Web crawler; Solr; Search engine.
Abstract: In this paper, researchers detailedly introduce the features of Tibetan language and related technologies that researchers use to deal with Tibetan news web pages with computers. To get the content of the Tibetan news, researchers used web crawler to download Tibetan news pages which are the bases of this project. Researchers used an open source web crawler named scrapy and rewrote the crawl part to make the crawler work more accurate and efficient. To search the Tibetan content in a way, researchers define and count every statistical data that is useful and helpful to enhance the performance of our search engine, researchers use solr, another open source software, as the user interface of this system. The crawler and search engine are combined by the web pages to provide the data retrieval service. Comparing with other works, our work adopts a safe and stable enough framework to enhance the user experience in using Tibetan search engine. Our work played a positive role in the spread of Tibetan culture and promoted the development of the Tibetan language news in the field of search engines.
Copyright: © 2015, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
Series: Advances in Intelligent Systems Research
Publication Date: July 2015
ISBN: 978-94-6252-102-5
ISSN: 1951-6851
DOI: 10.2991/lemcs-15.2015.116 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Zhiqiang Han
AU  - Guixian Xu
AU  - Wei Sun
PY  - 2015/07
DA  - 2015/07
TI  - Research on Tibetan News Sites’ Web Crawler and Search Engine
BT  - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
PB  - Atlantis Press
SP  - 607
EP  - 611
SN  - 1951-6851
UR  - https://doi.org/10.2991/lemcs-15.2015.116
DO  - 10.2991/lemcs-15.2015.116
ID  - Han2015/07
ER  -

download .riscopy to clipboard