Redesigning the experience of Pulze.ai's Playground to help user's discover what LLMs work best for their businesses.
With the emergence of popular large language models (LLMs) like ChatGPT, more companies are looking to build similar AI-based, customer-facing applications to help fulfill their customers’ needs, while boosting efficiency and productivity. Pulze.ai allows users to build and manage custom applications that utilize the capabilities of these industry-leading LLMs.
The Playground tool allows users to compare the capabilities of these LLMs and gain early insight into how their application would perform while also configuring limitations on performance-based properties like quality, cost, token usage and speed.
•
Test the capabilities of LLMs with generative AI.
•
Adjust settings to find the best configuration for your business model.
•
Compare LLMs based on parameters and request results.
•
Use Playground configurations to create application.
Users struggled with understanding the value of the Playground, as the tool was regularly perceived as a 'ChatGPT clone'. This lead us to believe that the tool wasn't very intuitive, negatively affecting user engagement, stickiness and session duration on both the Playground and the platform as a whole. Additionally, users tended to give off feelings of confusion and being underwhelmed once they performed a request.
•
Completing user research and compiling my findings to share user insights with the team.
•
UI lacks visibility on the parameters of each LLM and doesn’t allow for any immediate comparison between LLMs.
•
Each request was individually evaluated and answered. This made it impossible for users to test realistic, conversational situations for their customers.
•
Users have to re-enter all of the info that they entered into the playground during the ‘Create App’ stage.
Users of the Playground are struggling to evaluate and understand which LLMs are best for their use case, leading to low engagement and creation of apps.
As a part of redesigning the user experience of the Playground, I was tasked with completing the following in one week:
•
Complete user interviews and other user research methods to understand the pain points and needs of users.
•
Work with our lead front end engineer to implement a complete redesign of the Playground interface.
•
Renovate the style guide to give everything a fresh new look and feel that appeals more to a tech enthusiast.
•
Improve obvious issues that positively impact user experience while minimizing development resources.
•
Ship first iteration of designs, based on standard UX heuristics and internal feedback.
•
Perform in-depth user research and testing to gather impactful user insights.
•
Update style and components of the platform.
•
Ship final iteration of designs, based on user feedback.
The scope of the project seemed like a bit much for a 1-week timeline if we wanted to uncover quality insights and implement meaningful changes, especially since we didn’t have any user interviews lined up yet. Understanding the company's urgency to make these improvements, I pushed back to voice my thoughts and proposed utilizing a more agile methodology to help make improvements in a 2-phased approach. This would allow us to quickly turnaround impactful changes on glaring issues, while also preparing more time to interview users and evaluate user feedback more in-depth.
I had regular daily meetings with the CEO to communicate my research findings and validate any assumptions that I had about the redesign. It was mentioned that one of the major characteristics of the platform is that it should be easy enough for “anyone” to use. This referenced a spectrum of users of tech professionals and enthusiasts to business owners and executives.
•
Possesses tremendous buying power and influence throughout company.
•
Responsible for viewing and reporting benchmarks on their company performance.
•
Less experience with LLMs.
•
Essential role in the organization and can have strong influence over product selection.
•
Responsible for implementing and managing Pulze API to their company's product.
•
More experience with LLMs.
I began my research by checking into some of the competitors in the market like Huggingface, Perplexity and Anyscale, as well as similar LLM products in the same space like GPT-4, Claude and Cohere.
By evaluating these relatively mature and successful platforms, I was able to glean insights on industry-approved elements such as layout, workflows and terminology that would be expected by the experienced user that we were targeting. Now, I would just have to digest these elements and simplify for the opposite end our user spectrum.
Since this was a redesign project, I wanted to conduct a heuristic analysis of the existing Playground. I then discussed my findings with several internal stakeholders to confirm my findings and initial assumptions. I was able to confirm the following issues that we’d focus on for Phase 1:
•
Inability to share configurations.
•
Parameters are hidden by default.
•
A lot of wasted space.
•
No way to reset the space to the default state.
•
Lacks visual hierarchy.
•
No conversion to the app creation process.
•
Difficult to compare prompt results.
•
Large screen distance inbetween input elements.
I interviewed a few new and existing users to understand a bit more about their experience in using the Playground. I made sure to interview users with various levels of expertise in using AI-based tools. Most existing users were just thrown into the tool without any real understanding of what Pulze was trying to accomplish.
Almost all users had issues understanding the purpose of the tool and how to make use of the data from the Playground, highlighting issues of intuitiveness. However, once I explained the concept of the tool and the product they were pretty intrigued by the idea and saw value.
•
Users have to re-enter all of the info that they entered into the Playground.
•
UI doesn’t allow for any immediate comparison between LLMs other than prompt results.
•
Users couldn’t have a conversation with the AI. Each prompt was individually evaluated.
•
Users are just thrown into the playground with no real instruction or hints as to how to operate it.
•
Connect the configurations of the Playground to the main platform/product.
•
Display LLM parameters (latency, tokens, cost) for each prompt result to allow for more detailed comparison.
•
Update the AI to be conversational. The team was already in the process of updating this.
•
Users struggled to understand the value of the Playground.
•
The Playground was not intuitive and left users feeling “lost” throughout using the tool.
•
Users were hesitant to trust the results of their requests.
•
A lack of a CTA or interaction to further utilize the Playground data left users confused about how to proceed.
How might we help users easily find the best LLM for their needs and business?
•
Much easier to use than before. Confirmed by 8/8 users.
•
UI is a bit cramped and busy.
•
Still not quite that easy to understand where to start.
•
The “Top Choice” created feelings of discomfort, as it gives the feeling of being sold something.
•
A lot of users were initially interested in trying out a specific LLM.
With the results from user testing, I wanted to apply some of my new learnings to the next iteration of the Playground. Here's some of the major updates that I made for this final iteration:
To really give users an intuitive starting point, the new start screen shares a layout similar to ChatGPT, an interface most users were already highly familiar with.
Also, by restructuring the layout of the dashboard, and moving the primary navigation to the top, I was able recapture ~20% of screen real estate, allowing the design to breathe a bit more.
In testing with Phase 1 designs, I realized most users already had some idea of which LLMs they wanted to test and compare upon entering the Playground. With this understanding, I wanted to give users the ability to manually select which models they wanted Pulze to evaluate.
Comparing LLMs is now made easier by eliminating other UI distractions. Recommended options are highlighted to help users easily identify which LLMs are best for them.
Feedback from Phase 1 designs revealed that some users thought the the look and feel of the "Top Choice" made them feel like they were "being sold something". This created feelings of discomfort and an hesitance in proceeding with app creation. By adjusting this approach a bit I was able to still highlight and promote the top recommendation in a more subtle manner.
The team wanted to also freshen up the look of the entire company brand with a dark theme that would be considered more attractive and updated by their engineer users. I began by constructing a new style guide and components to test out in the Phase 2 designs of the Playground.
With the release of the redesign, we saw a big bump in user engagement and the amount of apps created on the platform. Average session duration increased from less than 5 minutes to about 10 minutes, while the app creation increased by over 300%.
With these metrics, I would say the redesign was successful in addressing the problems at hand. I would've loved to test with more users to optimize the playground and platform even further.
•
300% increase in conversion rate to app creation from the Playground.
•
2x increase in average session duration on the Playground.
If I had the opportunity, it would’ve been great to be able to further track metrics like the click-through rate and session duration to help optimize the effectiveness of the design on a larger scale.
Designing for this product gave me a peek into the world of designing for accessibility, something I hadn't had much experience with before. The impact of accessibility in design is tremendous and it will be top of mind for me at the beginning of every project that I work on from now on. It's always great to learn and grow as a designer.