Data in the form of document is growing exponentially. It is evident in the era of big data where data is voluminous, streamed continuously with variety of data. Document streaming is the concept in which documents arrive to server from different sources. Making queries on the streams can help in obtaining information that is latest with high coverage. Social networking web sites like Twitter are generating text documents continuously. Processing such continuously streaming data is challenging but it bestows plethora of benefits. In order to understand the dynamics of document streaming, literature review is made which revealed different existing methods. Various filtering algorithms are explored in 1-3 and top-k queries are studied in 5, 6, 7, 15 and 17.
The top-k queries on the document streaming is made in 22 where revere ordering techniques like RIO and MRIO are explored. However, we considered continuous monitoring and making top-k queries on document streaming an optimization problem and proposed a method known as Adaptive Identifier Ordering (AIO) which is adaptive in nature and suitable for continuous monitoring of document streams. Our contributions in this paper are as follows.
1. We built a new algorithm known as Adaptive Identifier Ordering (AIO) for continuous monitoring of top-k queries on document streams. This algorithm is adaptive in nature and found to be effective in making top-k queries on document streams.
2. We built a prototype application to demonstrate proof of the concept. The application has web based intuitive interface while the business logic is built in the server which provides response to user queries. The application is a web client from which user makes queries.
3. We evaluated the proposed algorithm and found it to have better performs when compared with other state-of-the-art algorithms.
The remainder of the paper is structured as follows. Section 2 provides review of literature. Section 3 presents the proposed system. Section 4 presents implementation details while section 5 covers the experimental results. Section 6 concludes the paper besides providing directions for future work.