About the FYP – HKU CS FYP (23085)

Programming Languages / Libraries / Frameworks

Frontend

There are various frontend programming languages to choose from when building a full-stack mobile app, such as JavaScript, Dart (with Flutter), Java, Kotlin, Objective-C, and Swift. After comparing their pros and cons, the JavaScript framework React Native will be used to build the front end of the app. We have valued some benefits of React Native.

Based on previous coursework and internships, we are already familiar with JavaScript and React. With such existing knowledge and skills, coding productivity can be greatly enhanced under a similar framework React Native. Meanwhile, React Native enables us to write code once and deploy it on both iOS and Android platforms. This cross-platform capability can save development time and effort compared to building separate apps for each platform, like using both Java and Objective-C. Besides, React Native offers features like Live Reload and Hot Reloading, which enable users to see real-time updates and changes without recompiling the entire application. This can significantly speed up the development and testing process, making it faster to iterate and refine the app.

However, it is still noted that platform-specific code may be required for certain functionalities and React Native apps may not match the performances of apps developed in other languages. It is believed that React Native can still offer good performance for our asset management app use case.

Backend

Apart from the front-end language, a programming language must be chosen for the back end. Since the app has elements of machine learning with NLP, it is natural to choose Python as the backend language. There are two major Python libraries to be employed, namely FastAPI and a machine learning library. Python FastAPI is a web framework that leverages the power of Python to build high-performance web APIs, with a large collection of benefits.

Based on previous coursework and internships, we are already familiar with Python. With such existing knowledge and skills and the easy-to-learn nature of FastAPI, it can be learned and adopted in no time. Besides, FastAPI is built on top of Starlette, an asynchronous web framework. It takes advantage of Python’s asynchronous capabilities to handle high loads and deliver excellent performance. This makes it suitable for building responsive and scalable backend systems for mobile apps. Just like the fetch API for JavaScript, FastAPI fully supports async and await syntax for asynchronous tasks such as API calls and I/O operations. Asynchronous programming can improve the responsiveness and scalability of the app.

However, it is never a simple task to connect the front end to the back end. First, both ends will be developed separately first by different teammates. Next, the project directory structures, and the API tutorials should be carefully studied and analyzed before combining both ends to form the full-stack app.

Python ML Libraries

There are a few mutually competitive machine learning libraries in Python, such as PyTorch and Keras (with TensorFlow and NumPy). PyTorch will be chosen to be a Machine Learning library for the app, with the consideration of some major benefits.

PyTorch has a Pythonic and user-friendly interface, with syntax that is often considered more concise and readable compared to TensorFlow. This simplicity can accelerate the development process. PyTorch also has a strong dynamic flavor where it uses dynamic computational graphs modified on-the-fly during runtime, and dynamic neural networks where the structure of the network can change dynamically based on the input data. This flexibility allows for easier debugging, more intuitive model development, desiged for models that require adaptability and varying architectures, such as recurrent neural networks (RNNs).

Another major benefit is that PyTorch has gained significant popularity in the commercial field and the research community, particularly in fields like natural language processing (NLP). Many recent papers and software developments adopt PyTorch implementations, making it easier to reproduce and build upon existing work.

Financial data collection

This session covers how our application will obtain the stock prices and financial news for the training of our NLP model. The stock prices will be necessary as we would use the past stock prices to train and finetune our NLP model. By collecting historical stock data, including price movements, trading volumes, and other relevant financial indicators, the model can learn patterns, correlations, and relationships between different stocks and market conditions. Financial news provides the necessary context for NLP models to understand the language used in the financial domain. By training on stock-related textual data, such as news articles, financial reports, and social media posts, the model can learn the specific vocabulary, jargon, and sentiment associated with stock markets. Getting live financial news is also crucial for our application. Our NLP model would have to be updated with the updated news to perform the most accurate analysis and provide the best recommendations to users. There are multiple tools that help us obtain stock data and financial news, both for live and historical usage.

REST API

REST APIs enable communications between systems over the internet, in our case, it would be between our application and the information suppliers. APIs provide a simple and flexible approach to retrieve stock prices and financial news, allowing us to integrate this information seamlessly into our application and model training. The stateless nature of APIs also provides the benefit of scalability, enabling multiple clients to access the API simultaneously, which is necessary for us as multiple users would be checking this information at the same time. The lightweight JSON format also facilitates efficient data transmission, allowing us to simulate real-time updates. The following are some APIs we would consider:

OpenAI API: The OpenAI API grants us access to language models like GPT-3.5, enabling us to integrate natural language processing capabilities into our stock management application. By utilizing the API, we can generate text, perform language translation, summarization, question-answering, and more. This API comes at a fee.
Hugging Face API: The Hugging Face API provides us with access to a wide range of pre-trained models for natural language processing tasks similar to the OpenAi API. This API is open-source.
Bloomberg stock API: Provides access to real-time and historical stock data, technical indicators, and financial news. This API is offered for free.
Yahoo Finance API: Offers a wide range of financial data, including stock prices, historical data, company profiles, and news articles. There is a premium for exceeding the request limit.

Web Scraping

We would need a lot of financial news and stock data for training our NLP model. However, APIs come at the cost of paying a premium. To avoid this, we would propose using web scraping for textual data collection, performing unmanned crawling through multiple information providers, which would be tedious for humans. There are several tools needed in web scraping.

Selenium – a powerful automation tool primarily used for web testing but also widely employed for web scraping. Selenium allows us to automate browser interactions, including clicking buttons, filling out forms, navigating through pages, and extracting data from dynamically generated content. It is especially useful when dealing with websites that heavily rely on JavaScript or require user interactions to load or display data. By controlling a web browser through Selenium, we can scrape historical data from websites like yahoo finance that would be costly to extract with APIs.
Beautiful Soup – a Python library that is used for parsing HTML and XML documents. It provides a convenient way to navigate and extract data from the Selenium-parsed document. Beautiful Soup helps in locating specific HTML elements, extracting text, attributes, and other relevant data from web pages. It simplifies the process of parsing and extracting structured data from the HTML source code, making it easier to scrape and manipulate the stock data and financial news.
IP Rotators – tools or services that provide a pool of IP addresses that can be rotated or switched during web scraping. They help overcome limitations imposed by websites that may block or restrict access from a single IP address or impose rate limits. By rotating IP addresses, we can distribute our web scraping requests across multiple IPs, making it harder for websites to detect and block our scraping activities. IP rotators enable anonymous and distributed scraping, improving the success rate and reliability of our web scraping tasks.

NLP Model Training

To make our NLP model more accurate and able to adjust to users’ needs, we would have multiple ways of training the model.

Supervised Learning

This approach involves training the NLP model using labeled data, where each input(e.g., news article, social media post) is associated with a known output(e.g., sentiment label, event type). We can manually annotate a dataset specific to stock management tasks and use it to train the model using supervised learning algorithms such as logistic regression, random forests, or deep neural networks.

Transfer Learning

Transfer learning leverages pre-trained language models that have been trained on large-scale general text data. Models like BERT, GPT, or Transformer have learned rich language representations with billions of parameters and can be fine-tuned for our stock management application using a smaller labeled dataset. By starting from a pre-trained model, we can benefit from the knowledge and language understanding these models have acquired, we can access these models simply by downloading the raw model or calling their APIs.

Few-shot Learning

Few-shot learning can greatly benefit NLP in a stock management app by addressing the challenges of limited labeled data and the need for rapid adaptation to new tasks. In our context of stock management, where data scarcity and evolving market conditions are prevalent, few-shot learning enables our NLP model to effectively learn from a small number of labeled examples per task. This capability allows the model to quickly adapt and generalize to new, unseen tasks, saving time and effort in collecting and labeling extensive data for each specific task. By efficiently utilizing the available labeled data and leveraging the learned knowledge from meta-training, few-shot learning empowers the NLP model to provide accurate and timely insights, adapt to emerging trends in the stock market, and offer valuable support to users in their decision-making processes. This can be further expanded to quickly train our model to cater to the user’s personal needs.

Domain Adaptation

Our stock management tasks require domain-specific knowledge. Domain adaptation techniques can be employed to transfer knowledge from a source domain (general financial text) to a target domain (stock-specific news articles, earnings reports). This helps the model better understand the target domain and improves its performance on stock management tasks.