Introduction
Despite the fast-paced evolution of technology in this age, email remains one of the most prominent communication tools on the market, especially for work [1]. According to a survey-based report by Gated, 82% of people stated that email is their primary source of internal communication at work, with each work email receiving an average of 71 emails per workday [2]. Statistics show that an employee spends an average of 28% of their workweek managing their email inbox [3], which is a questionably large number of hours considering it is being spent on a task that may not be directly related to their jobs. This highlights the importance of streamlining email management processes to ensure that users are able to focus on more important tasks.
Project Objective
- Minimize Digital Distraction
Digital noise, including email and social media, is one of the major sources of distraction that may hinder productivity, engagement at home or work, and even well-being [4]. Referring to the mentioned report, 62% of people expressed that digital distractions significantly lower their focus [5]. With email being the default communication medium at most workplaces or academic environments, the extent of its effect on victims of digital distraction could only be magnified. By filtering and categorizing spam emails based on importance, our project aims to enhance users’ focus and productivity by minimizing digital distractions from the usage of email. - Avoid Missing Important Information
According to the same report, 67% of interviewees admitted to feeling overwhelmed by their inboxes due to spam and sales emails [5], and 30% have even purged or abandoned their inboxes [2]. As a result, 82% of people are reported to have missed important emails due to the cluster in their inboxes [5], evidently affecting their productivity. With fewer or no spam emails, users will be encouraged to keep track of their inboxes more frequently. Hence, they will be much less likely to miss out on important information. - Prevent Email Fatigue
It is reported that “not having the bandwidth to read or respond to every email leaves 74% of workers feeling guilty or stressed” [2]. Even though email fatigue may sound insignificant, it is in fact a main contributor to burnout and job dissatisfaction [3]. On a larger scale, this may lead to a drop in productivity and morale of a company while affecting the overall health and well-being of the employees, demonstrating the severity of the negative impact of missing important information due to spam emails. As our project aims to prevent users from missing important information, the occurrence of email fatigue may also be prevented, significantly boosting the performance of employees as well as the company. - Security and Privacy
Cybersecurity has always been a major concern when it comes to the usage of online resources, with over 75% of targeted cyberattacks starting with an email – or more specifically, with phishing [6]. Phishing is when scammers send fake emails that appear to be from a legitimate online bank, auction, or payment site leading users to a fake website that is designed to look like a real login page. It usually either prompts victims to provide account passwords or trick them into downloading malware that gives scammers access to sensitive information on their devices [7]. It is found that about 39% of people have opened a phishing email at work [3], which underlines the need and urgency to take preventive measures. By filtering out phishing emails, users’ security and privacy may be protected.
Project Background
- Built-in Email Filtering System
Default filtering systems of mainstream email clients identify emails to be filtered by primarily examining the header of the email to extract information about the sender, recipient, subject, or other metadata. That said, systems may be unable to filter out relevant emails if they do not match header information exactly. For example, if a user sets up a filter such that all incoming mail from cs@hku.hk will be funnelled to the ‘CS’ folder, this filter will fail to filter out mail from cs2@hku.hk. On the other hand, emails from filtered senders may contain useful information. In these cases, such emails would be categorized as spam and, thus will not be able to reach the audience due to broad-sweeping filters. Hence, default filtering systems are inefficient and unreliable. - External Extensions
Gmail and Outlook allow users to install external extensions for additional features. For instance, Clean Email is an extension for Gmail and Outlook that incorporates Google Workspace and Outlook APIs to provide more comprehensive filters to users. Smart Views, one of the core features of Clean Email, offers several pre-set filters like age of the email, size of the attachment, Cc’d emails, etc., with a “Read Later” folder as the destination of filtered emails [8]. However, said pre-sets fail to effectively filter spam emails as the metrics used for measurement are irrelevant. - Paid Email Clients in the Market
There are numerous paid email clients available on the market. One of the more popular options is Superhuman, which is a system that provides a variety of features that aim to boost users’ productivity. Superhuman categorizes emails by searching through email content for domain names in shared links [9]. For example, if an email contains “github.com”, it will be identified as a GitHub-related email. Once again, although it may be an efficient algorithm that helps boost productivity by labelling received emails, it fails to filter spam emails as the metrics used for measurement are irrelevant.
Product Features
- Core Concept
We aim to deliver an email client that specializes in filtering emails efficiently and effectively – and more importantly, automatically. Besides the conventional email filtering techniques like DMARC, SPF, and DKIM [10] that we may include in our product, our model will automatically generate filters through machine learning measures. The email client will also sort received emails with colour tags to indicate different levels of importance, allowing users to prioritize important and timely information. - Time Data Collection
By collecting the amount of time spent on reading each email, the system will then be able to deduce the groups of emails that a user tends to be more interested in with the assistance of GPT3.5. - Whitelist
There is a whitelist function that allows users to manually input email addresses whose emails shall not be neglected. This function ensures that important content will not be missed or miscategorized as spam even if a user does not always spend a remarkable amount of time reading emails from listed senders. - Natural Language Input
User input will be allowed to customize the filtering system to better suit each user’s needs. To enhance and smoothen user experience, users may input their preferences in natural language – again, with the help of GPT3.5.
Project Methodology
Frontend
- Wireframing and User Interface (UI) Design: Figma
An email client that seeks to differentiate itself from the default email clients requires a superior UI design. For this, we will make use of Figma to demonstrate our frontend design. Figma is selected as it is the wireframing tool the team is most comfortable with and experienced in, not to mention that it is one of the most popular and advanced interface design tools on the market. - Frontend Development: React.js
As our email client will primarily function as a web application, a frontend framework is mandatory to develop the user interface. React.js is chosen to be adopted not only due to our prior knowledge and experience of working with it, but more importantly the existing vast supporting community that provides a variety of UI component libraries to be incorporated into our product.
Backend
- Backend Development: Flask
Every web application requires a comprehensive backend. For this project, Flask is the most preferable tool as it is currently the readiest framework to work on and has the lowest learning curve out of all other frameworks we accounted for. Moreover, since Flask is a Python framework, it supports Python’s third-party libraries, which puts ease to backend integration with efficiency. - AI Development: TensorFlow
TensorFlow is a powerful and broadly used open-source machine learning framework. It has a vibrant and active ecosystem of libraries and tools that integrate with the framework, enhancing the efficiency of tasks such as data pre-processing and deployment. Furthermore, it is highly scalable and supports various architectures, ranging from traditional web servers to mobile or embedded devices. However, as the team is relatively new to AI development, this framework might be changed in the future after experimenting.
Performance Measurement
- Average Time Spent
As the core objective of the project is to enhance user productivity by minimizing time spent on going through emails, the average time spent on email clients per user is a direct indicator of the performance of our product – if there is a significant drop in the average time spent on reading emails for most users, it is implied that our model has successfully delivered the expected outcome. The primary method of collecting said data would be by conducting live human experiments on a statistically significant number of participants (typically around 30). However, this approach introduces various privacy and logistical problems. For instance, human subjects exposing their inbox to our experimental application might pose significant privacy concerns. This could be countered by providing human subjects with dummy email accounts. In terms of logistics, carrying out human experiments requires the approval of a remarkable sum of paperwork and bureaucracy that could heavily slow down the progress. - User Activities
User activities and interactions with the mail client may also be tracked to estimate the performance of our model. For instance, if users have to constantly move emails from the inbox to the spam folder, or vice versa, it is a clear reflection that our model is inefficient and inaccurate. - Feedback Form
To guarantee that our product is up to, if not beyond, users’ expectations, a feedback form may be provided for users to contribute to the improvement of our system. This way, we may most effectively amend and enhance our model to cater to most users’ needs. Out of all the mentioned potential performance measurements, feedback form is the most straightforward method. However, the logistical problem of gathering a statistically significant number of experimenters persists.
Potential Problems
- Syncing Email Platforms
As we aim to construct a new email client to retrieve emails, networking problems like how a server can be built and how new emails can constantly and consistently be retrieved from email platforms are expected. For potential failure due to network, research on networking that is related to emails shall be conducted before kickstarting this project. - Email Platforms’ Priority
Email syncing from different services is allowed. However, as each email service develops its global-scaled API, it would be challenging for our product to cover the entire industry at the current stage. Due to said limitation, this project will include Gmail, Outlook, and Yahoo Mail for now. - Privacy and Security
Privacy and security are the key elements in this project – any breach of users’ private information is prohibited. Mainstream email clients have been established for decades with a potentially outdated infrastructure, which imposes the risk of information leakage when retrieved. In addition, the GPT models of OpenAI may be introduced to the email client. In the privacy policy of OpenAI, it is stated that they have the right to make use of user inputs and responses to train their GPT models [11], which again poses privacy concerns for users as we have no insights into the technological infrastructure and data preservation on their end. Hence, there is always a risk of a data leak by OpenAI. - Unknown Knowledge
Both GPT and AI are black box programs, which means that their operations may not be monitored. Full dependency on such programs may cause errors in categorizing emails and false predictions in users’ preferences. The whitelist function, one of the safety net policies, is suggested to combat faultiness. - Limitations of OpenAI
As each incoming email is forwarded to OpenAI for classification, due to platform restrictions, only a maximum of 4097 tokens, where each token stands for approximately 4 English characters, per prompt is allowed [12], meaning that there is a limit to the length of email to be scanned by OpenAI. Additionally, the email client may face challenges with managing the growing number of users and the increasing frequency of incoming emails due to the imposed rate limit of 60 requests per minute [13].
Future Prospect
- Highlight Key Content
As most emails tend to be long and tedious, highlighting the potential key content may significantly reduce users’ time spent going through emails. Users may swiftly identify major components in the content, leading to greater efficiency in prioritizing tasks. Visually accentuating the sender’s requests, focus points, and time-sensitive information (e.g., deadlines) allows users to effectively grasp the essence of the email and respond promptly with an action plan. This streamlined process minimizes reading time and promotes efficient workflow management, resulting in enhanced productivity. Besides, as mentioned in 2.3, the feelings of stress and pressure of employees may be attributed to the overwhelming volume of emails. By automatically highlighting vital points, users can communicate more effectively, ensuring that major information is not overlooked. This fosters clearer and more efficient exchanges in the workplace, reducing the potential for missing details and promoting mental well-being. - Summarize Long Emails
A computerized summarization tool breaks emails down into digestible chunks, letting users quickly scan through the summaries to determine the relevance and importance of the message with the risk of misinterpretation mitigated. Readers can organize their mailboxes effectively by categories, hence achieving efficient email retrieval. A study discovered that email overload leads to responses to smaller factions of less than 5% [14]. Automatized summarization of conciseness helps alleviate this burden by allowing users to filter out emails with less importance and focus on those requiring immediate attention. This cut in cognitive load averts urgent messages from being lost in the flood of information. - Domain Authentication
Identifying authentic domains may effectively combat the problem of phishing. Scams via the Uniform Resource Locators (URLs) enclosed in emails are a severe form of cyber misconduct that caused the loss of USD52,000,000 in the United States in 2022 [15]. Since phishing commonly involves displaying texts that appear to be official links but then lead to different URLs and directing recipients to spoofed websites that mimic official sites, verifying URLs in emails may effectively ensure the legitimacy of the websites that are referenced.
Expected Timeline
- Email Client
– Figma: 15 October 2023
– Authentication on Outlook: 1 November 2023
– Empty email client (no email categorizations): 15 November 2023
– UI for user input: 1 December 2023
– Complete email client (with email categorizations): 1 February 2024 - Machine Learning Model
– Data collection and research: 15 November 2023
– GPT prompting (user input filters): 1 December 2023
– Machine learning model (automatically generate filters): 1 February 2024
– Categorize emails: 1 February 2024 - Final Deliverables
– [Potential] Experiments for performance measurement: March 2024
– Final project report: April 2024
– Live demo: April 2024
– Presentation: April 2024
References
- L. A. Dabbish, R. E. Kraut, S. Fussell, and S. Kiesler, Understanding Email Use: Predicting Action on a Message. New York, NY: Association for Computing Machinery, 2005.
- G. Christ, “Workers overwhelmed, distracted by email,” HR Dive, https://www.hrdive.com/news/workers-overwhelmed-distracted-by-email/637786/ (accessed Sep. 26, 2023).
- “The Latest Work Email Statistics 2023 You Shouldn’t Ignore,” Gitnux, https://blog.gitnux.com/work-email-statistics/#:~:text=11%20hours%20is%20the%20average,spends%20on%20email%20per%20week. (accessed Sep. 26, 2023).
- L. Rosen and A. Samuel, “Conquering Digital Distraction,” Harvard Business Review, https://hbr.org/2015/06/conquering-digital-distraction (accessed Sep. 26, 2023).
- “67 Percent of People Feel Overwhelmed by Their Email Inbox According to New Inbox Intelligence Report from Email Management Solution Gated,” PR Newswire, https://www.prnewswire.com/news-releases/67-percent-of-people-feel-overwhelmed-by-their-email-inbox-according-to-new-inbox-intelligence-report-from-email-management-solution-gated-301659242.html (accessed Sep. 26, 2023).
- “Top 10 Cybersecurity Threats in 2023,” Embroker, https://www.embroker.com/blog/top-cybersecurity-threats/#:~:text=Over%2075%25%20of%20targeted%20cyberattacks,of%20%20stolen%20%20credentials%20and%20%20ransomware (accessed Sep. 26, 2023).
- “Email habits in the age of information overload | IDM Magazine,” idm.net.au. https://idm.net.au/article/0010435-email-habits-age-information-overload (accessed Sep. 25, 2023).
- “Clean your Inbox of emails you don’t need. Then keep it clean.,” Clean Email, https://clean.email/?_gl=1%2A1lp92xk%2A_up%2AMQ..%2A_ga%2AMTIzNzc0MTMyMi4xNjk1NjQ3MDU1%2A_ga_W864N1WMEZ%2AMTY5NTY0NzA1NC4xLjAuMTY5NTY0NzA1NC4wLjAuMA..&_ga=2.179578273.488318465.1695647058-1237741322.1695647055 (accessed Sep. 26, 2023).
- “Blazingly fast email for teams and individuals,” The Fastest Email Experience Ever Made, https://superhuman.com/ (accessed Sep. 26, 2023).
- S. Nightingale, Email Authentication Mechanisms: DMARC, SPF and DKIM. US Department of Commerce, National Institute of Standards and Technology, 2017.
- OpenAI, “Privacy policy,” https://openai.com/policies/privacy-policy (accessed Sep. 28, 2023).
- OpenAI, “What are tokens and how to count them?,” help.openai.com. https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them (accessed Sep. 30, 2023).
- OpenAI, “OpenAI Platform,” platform.openai.com. https://platform.openai.com/docs/guides/rate-limits/overview (accessed Sep. 30, 2023).
- FBI Springfield, “Internet Crime Complaint Center Releases 2022 Statistics,” Mar. 22, 2023. https://www.fbi.gov/contact-us/field-offices/springfield/news/internet-crime-complaint-center-releases-2022-statistics (accessed Sep. 30, 2023).
- J. Swearingen, “Just How Safe Does That HTTPS Green Padlock Keep You?,” Intelligencer, https://nymag.com/intelligencer/2017/03/phishing-and-malware-sites-can-use-https-and-ssl-against-you.html (accessed Sep. 30, 2023).
Leave a Reply