PROM05 My Research

The aim of this project is to design, develop, launch, test, and evaluate a web-based search tool capable of analysing and filtering online recipes according to the fourteen legally recognised allergen groups. By employing an automated system based on ethical web scraping techniques, the proposed tool will enable users to effortlessly locate recipes that align with their dietary restrictions. The final design will focus on being user-friendly, intuitive, and visually clear, ensuring an accessible experience, free from unnecessary distractions or interface clutter.

The subject:

The focus of the study is on the Information Age and Big Data, specifically examining Data Automation and Information Overload, with particular emphasis on how Information Overload affects user experience in terms of search efficiency, perceived usability, trust and transparency. In relation to this, the online search process and the faceted search will be investigated as an effective solution to tackle Information Overload.

In addition to the above, the study will also address allergen awareness and provide a basic overview of online recipe search tools already in use, as well as their compliance with providing allergen information and labelling practices.

For the search tool to be effective, automation techniques will be employed during development, including web scraping, data collection methods, data clustering and automated ingredient detection, as well as dynamic search filtering. Therefore, research into these methods will be performed, as well as web interface design for users with dietary restrictions.

Research questions: 

1. Is there a need for an enhanced search tool to detect allergens in recipes available online?

2. Why is searching for recipes online often frustrating for users with specific dietary restrictions?

3. What is web scraping, and under what circumstances can it be considered unethical?

4. How can the ethical implementation of web scraping techniques within faceted search systems enhance user experience while ensuring data privacy and compliance with digital ethics standards?

To achieve the above aim and answer the research questions, the following objectives and methodological stages will be employed in the form of a mixed-methods study combining a prototype implementation and user evaluation:

  1. Literature review – exploring the context of food allergy awareness and labelling standards, big data and information overload, the functionality and limitations of existing recipe filtering and faceted search systems, and relevant web scraping techniques, data collection methods, text analysis processes, and the design of web-based search systems and user interfaces.
  2. System Design and Architecture – designing and developing three-layer architecture that will consist of Data Layer, Application Layer and Presentation Layer.
  3. Data collecting and Web Scraping from ethical sources.
  4. Data pre-processing and cleaning such as tokenisation, normalisation and ingredient mapping.
  5. Allergen detection – developing classification and filtering component usin keyword matching and text analysis techniques.
  6. Database development – creating a structured database using PostgreSQL.
  7. Web Application development – developing an interactive and user-friendly web tool focusing on faceted search capabilities.
  8. Testing and evaluating – employing techniques such as baseline comparison of functionality of the traditional search and web scraping enhanced tool, usability and satisfaction levels testing by conducting participant interviews/surveys, quantitive evaluation such as System Usability Scale or Task Success Rate, and data analysis such as Paired T-Test or Wilcoxon Signed-Rank Test.
  9. Ethical, Legal and Technical considerations – identifying potential technical challenges and evaluating the legal, professional, and ethical implications of using web scraping. Emphasis will be placed on compliance with website terms and conditions, data protection legislation, and ethical safeguards ensuring responsible data collection and system deployment.

Resources and constraints:

Hardware – standard PC able to run Python-based applications and Visual Studio, minimum 100GB of storage for the web-scraped data, as well as a cloud-based platform to host the prototype (such as Render or PythonAnywhere).

Software – Python and tools and libraries such as BeautifulSoup and Scrapy for web scraping, Pandas for data collection and cleaning, Django and Flask for back-end development, HTML, CSS and JavaScript for front-end development, MySQL for the data storage, as well as PyTest for code functionality testing and GitHub for version control.

Data Sources – all data will be collected from open-source data sets, publicly available sources and websites that allow web-scraping.

Human resources – for usability testing and evaluation, a group of 15-20 participants from a range of backgrounds will be employed.

Constraints – following limitations and constraints will be addressed in this study:

  • time limitations,
  • scope of available resources,
  • variability of the data,
  • Inconsistent allergen labelling,
  • storage limitations,
  • legal and ethical standards.

Reference:

Kowalska, A. (2025) Research Project Proposal: Assignment 2. Prom04: Research Project Proposal. University of Sunderland. Unpublished assignment.

Posted in

Leave a comment