Crowdsourced Dataset to Study the Generation and Impact of Text Highlighting in Classification Tasks

Pinterest LinkedIn Tumblr

Text classification is one of the most important applications of machine learning projects and this task is widely followed on crowdsourcing platforms. Studies reveal that the combination of advanced approaches including machine learning and crowdsourcing ensure better results for reducing the overall cost of the projects.

One of the most important ways to mix machine and crowd efforts is to design some impactful algorithms that can process text and then feed these algorithms to the crowd for obtaining fast classification results. This article provides details about the text highlighting generations, and their impact on the document classification.

Introduction to the crowdsourced dataset:

A group of researchers has recently conducted some experiments to evaluate the generation and impact of text highlighting. They followed two main tasks: classified documents as per the question relevance for highlighting specific parts of the text to support the decision. In the second phase, the relevance of the documents was accessed in terms of six machine-generated text highlighting conditions and six human-generated text highlighting conditions. Note that, the dataset used for conducting these experiments was obtained from two application domains: product reviews and systematic literature reviews. It included three different document sizes and three relevance questions focusing on different difficulty levels. This dataset included 27711 individual judgments of 1851 workers that could help not just the specific issues; rather, the larger class of problems in the classification domain where researchers experience scarcity of individual judgment based crowdsources datasets

Approach for text highlighting in classification tasks:

The classification objective for the crowdsourcing experiments focused on two steps pipeline where the first step highlights the most relevant text and the second step doesn’t document classification based on the highlighted text. In order to improve classification accuracy while reducing the overall cost of the process, researchers preferred using crowd efforts along with machine learning algorithms. For the crowd generated highlights, the workers were asked to classify documents while justifying their decisions by highlighting specific text or passage on the document. On the other side, the machine-generated highlights followed the state-of-the-art question-answering and extractive summarization models. Two field experts were employed to judge the quality of highlights obtained from both processes.

The results and impact of the technique were analyzed based on three different experiments. In the first experiment, the workers were asked to classify documents while giving additional support for text highlighting. The obtained crowdsourced highlights were categorized on the basis of six different experimental conditions. Note that the baseline condition doesn’t include any highlighted text whereas the 100%, 66%, 33%, and 0% condition leads to the varying quality of highlights. Further, the researchers performed aggregation of the conditions to obtain votes for respective crowdsourcing tasks.

The second experiment was mainly focused around longer pages and documents using 3×12 and 6×6 layouts along with crowd generated highlights. One condition for this experiment was baseline condition and the other was 83% quality assurance. In the third experiment, the researchers focused on a machine learning-based approach where a 3×6 layout was processed as per six different experimental conditions. These conditions were baseline, 100% ML, AggrML, Bert-QA, Refresh, BertSum. Refresh and BertSum are extractive summarization techniques whereas Bert-QA is a question-answering based model. AggrML aggregates outcomes from all three algorithms whereas 100%ML uses the only machine-generated highlights that focused on high-quality work.

Future scopes and possibilities:

It is important to mention that the dataset used in these experiments was limited to a set of dimensions; therefore, it cannot be considered a comprehensive approach. The dataset was limited to two main types of classification tasks focusing on state-of-the-art algorithms. However, the approach could be further extended to work on a variety of datasets including crowd-sourced pricing data, product reviews, and customer experiences with a certain brand.

The upcoming researchers are also working on some additional hybrid approaches to achieve the best results with the crowdsourcing datasets. The main goal is to access and process multi-dimensional datasets without any restriction over parameters. Furthermore, the initial relevance of the text must be decided carefully depending upon the type of dataset. The focus relevance questions may vary from dataset to dataset.


Obsessed with technology. Compelled by innovation. We're your home of gadget reviews, technology news and intensive buyers guides. We believe that technology goes far beyond the gadget itself. From groundbreaking technological advancements to quirky gifts and gizmos, we're here to tell you all about the clash between man and machine.

Write A Comment