Methodologies
Frontend
Technologies:
-
- React: Modern web framework with declarative and component-based approach
-
- Next.js: React meta-framework to enable the application to be maintainable, responsive, performant, and scalable.
-
- TypeScript: Syntactic superset of JavaScript that offers high-level type safety.
Design and Implementation:
-
- Material UI: Component library of React that provides a set of prebuilt and customizable UI components
-
- Figma: Collaborative and popular design tool for building various applications.
-
- Responsive Web Design: Adjusts the size, position, and visibility of webpage elements based on the device viewport to ensure the website looks intuitive on different devices.
Backend
Solution:
-
- Spring Boot: Main Web Server for the web application
-
- Spring Security: Authentication with Json Web Token(JWT) and Role Level Security for API access
-
- MySQL: Relational Database used to store all information
-
- Amazon S3 Bucket: BLOB Object Storage to store images, including user icon and game images
-
- RabbitMQ: Interprocess communication broker between Spring Boot and NLP solutions
DevOps:
-
- Jenkins: Support Deployment to cloud environment
-
- Docker: Container Tool used in Jenkins for easy deployment
Natural Language Processing (NLP) & Machine Learning (ML)
Sentiment Analysis:
-
- Train a deep neural network to classify positive and negative sentiments from an existing dataset with over 6M game reviews scrapped from Steam.
-
- Compare the performance of three different architectures on this task: TFIDF + RF, GloVe + CNN, BERT. The best performant model will be deployed to the platform
Topic Modelling:
-
- Train topic models based on reviews from games with the same genre instead of per-game reviews to extract latent topics within the same genre.
-
- Compare the performance of three different architectures: LDA, Contextualized Topic Model (CTM), BERTopic. Models from the best performant architecture will be deployed to the platform.
-
- Explore the possibility of using topic models pre-trained on genre-based datasets for analyzing reviews from a single game with the same genre as the dataset.
Keyword Extraction:
-
- Use LLM to name the topics extracted from the topic models in task Topic Modelling.
-
- Extract keywords of different game aspects, such as Gameplay, Visuals, Bugs and Suggestions, from a single review using LLM.
-
- Use LLM to generate aggregated summary of a game by considering all reviews of the game.
Results
Frontend
The web application was developed using the technologies, framework, and design approach specified in the proposed methodology. Some of the pages and important user interface elements that have been developed are the toolbar, register and login popup modal, forget password and reset password pages, landing page, search result page, game page, game reviews page, review page, game analytics page, and profile page.
Backend
Solution:
-
- Spring Boot: High performance and scalable API server that can support hundreds of concurrent users while maintaining low latency on processing API requests.
-
- Spring Security: JWT has been implemented with security testing done. Authentication/Registration with email and password is supported by our own implementation using Access and Refresh Token to support multiple device sign-in and “Remember Me” extended signin session.
-
- MySQL: 12 different tables have been added with the relationships defined by Spring Boot. All database access and connection must go through Spring Boot to ensure ACID properties is upheld and the data modified can be easily traced. Normalisation and indexing have been tested to optimise storage efficiency and fetching performance.
-
- Amazon S3 Bucket: All of the user images, game images and game review images are stored in a Bucket with a Content Delivery Network(CDN) enabled to improve the loading time of the images regardless of the user’s geolocation.
-
- RabbitMQ: A high throughput exchange is used so that all messages will be going through one centralized exchange for better monitoring and debugging. The messages sent and received are using “Routing Keys” to distinguish between different queues in the broker. This provides high scalability and reliability in the interprocess communication.,
DevOps:
-
- Jenkins: 2 Custom pipelines using Jenkinsfile were written to support the deployment of the Spring Boot application along with its monitoring systems. The other pipeline handled the deployment of all NLP related containers since they were deployed on 2 different virtual machines.
-
- Docker: Multiple docker images are built using the Jenkins pipeline and the containers started are depending on the current situation of the virtual machines running the services.
NLP & ML
Sentiment Analysis:
-
- Fine-tuned BERT outperformed the two other architectures, achieving 100% accuracy on a 1M testing dataset.
-
- Further optimization on BERT was performed to further reduce the inference time of a single review by 45%, to 47.13ms, on a CPU only machine, using ONNX.
Topic Modeling:
-
- BERTopic outperformed the two other architectures in both quantitative and qualitative evaluation. Coherent and diverse topics were generated.
-
- Custom models were trained from reviews from all games, as well as only action or indie games. They were deployed to our system
-
- Successful transfer learning by applying the trained BERTopic model to reviews from a single game, then generating a new set of topic keywords and representative reviews. Resulting topics can be named using Keyword Extraction module coherently with semantic relationships to the pre-trained topics.
Keyword Extraction:
-
- Use Mixtral8x7B, Langchain and ChromaDB to build a tightly integrated and scalable pipeline for naming the extracted topics and creating new topics names when BERTopic model faced reviews from a single game (i.e. transfer learning).
-
- Different prompt engineering techniques, such as Retrieval Augmented Generation, Generated Knowledge, one-shot prompting, Role Prompting, and use of delimiters, were applied to confine the behaviour of the LLM to produce coherence, concise and in-context results, as well as reducing the chance of hallucination.
-
- A pipeline was built to determine whether the review is a possible spam, extract keywords of different game aspects from a single review, classify the sentiment of different game aspects, and generate a short summary when the review was long.
-
- A pipeline was built to produce a per-game summary by first, considering all reviews from the game, aggregating results from sentiment analysis and topic modelling, then generating a summary regarding different game aspects and analytic results from the two modules.