The approach used in the project involves mining emerging news from a set of news. It worth understanding that the process involves using online RSS news feed to help gather emerging news. This online process will help us get emerging news from leading media platforms such as CNN and Reuters. The use of RSS will be of significance importance in forming a dataset from which the data mining process will be realized. By use of RSS’s XML-based format, it will be possible to form a dataset where critical news will be stored in a semi-structured format from which it will be possible to derive emerging news. It is prudent noting that despite the presence of online aggregators, it will not be an easy process of sorting out emerging news from the said dataset. The formation of a dataset is also a complex process since information gathered from the online tools is generated at intervals, and it is imperative ensuring that we have a mining process that will lead to desired results.
The process of data input will also be of paramount importance in our approach. To subscribe to an online RSS news feed, we will need to have a news aggregator or a feed reader. By the help of a feed reader, it will be possible to subscribe to and view as many news feeds as possible thus making our experiment a worthy course. The news reader will enable automatic retrieval of news updates thus making timely delivery on any news update as soon as they are published. To make more effective, we will use a web-based feed readers that will be compatible with our browser to enhance effectiveness and efficiency in the extraction of emerging news. It will also be imperative to note that this RSS feeder will give us an opportunity to get only the required news and only in a formatted code.
It is prudent noting that all news received by news reader is stored in a semi-structured set of the database. The RSS will help sort out in different sets for the purpose of Pre-processing. This is an experimental process that will involve the use of keywords to derive the ID of the news item. The sorting process of the news item will involve the use of the hottest news headline received by our browser. In this case, the process involves extracting news regarding the current state in Syria. To get the essential news, it is imperative to drop keywords such as ‘’The, on, has.’’ From the news, ‘’ the ongoing Syrian conflict has displaced millions’’. After the removal, the remaining words are’’ Syrian, conflict, displaced, millions. This helps easier extraction of the news item from the text mining engine that determines the frequency of news item.
The text mining engine will also be a significant tool in the analysis and explains the parameter setting of our approach. After sentence splitting, the next step in the experiment will involve the tokenization process. This is the stage that will involve generation of hot news item intended in the mining process. It is after the generation of hot news that we set a frequency that the news emerges for example in a period of three hours. With a specific time of three hours, one can determine the algorithms that happen to a certain frequent pattern. It is on this premise that it will be possible for us to sort out the frequent item generated by the text mining engine. In essence, the approach used in the data mining process is extracting emerging news from RSS news feeder of XML database through a text mining engine.
Conclusion
The objective of this research paper is to give an in-depth analysis of text mining. As aforementioned, the modern word has experience advances in technology where more and more data is available in digital form. Increased globalization has necessitated the urge for emerging news in all parts of the world. With most of this news being in unstructured textual form, it is imperative that we design better techniques that will enable extraction of emerging and interesting news from the bulk textual data. This call for extensive data pre-processing and post-processing that will enable using the emerging news for the best interest of the community. It is nevertheless prudent noting that the data mining process is not an easy process and has a significant number of challenges. However, with a good approach, it will be easy to complete a comprehensive text mining process. To sum it up, it is possible to identify hot items that occur in more than some frequency threshold from dynamic datasets.
Do you need an Original High Quality Academic Custom Essay?