pulze.ai playground

UX/ui
|
web
|
generative ai

Redesigning the experience of Pulze.ai's Playground to help user's discover what LLMs work best for their businesses.

overview

role

|
UX/UI Designer

tools

|
3 weeks

duration

|
Figma, Figjam, Clickup

type

|
Redesign

With the emergence of popular large language models (LLMs) like ChatGPT, more companies are looking to build similar AI-based, customer-facing applications to help fulfill their customers’ needs, while boosting efficiency and productivity. Pulze.ai allows users to build and manage custom applications that utilize the capabilities of these industry-leading LLMs.

The Playground tool allows users to compare the capabilities of these LLMs and gain early insight into how their application would perform while also configuring limitations on performance-based properties like quality, cost, token usage and speed.

Discovery

product user path

Test in Playground

Test the capabilities of LLMs with generative AI.

Adjust settings to find the best configuration for your business model.

Compare LLMs based on parameters and request results.

Use Playground configurations to create application.

Create app
Connect app via API

what's the problem?

Users struggled with understanding the value of the Playground, as the tool was regularly perceived as a 'ChatGPT clone'. This lead us to believe that the tool wasn't very intuitive, negatively affecting user engagement, stickiness and session duration on both the Playground and the platform as a whole. Additionally, users tended to give off feelings of confusion and being underwhelmed once they performed a request.

UI is not intuitive

Completing user research and compiling my findings to share user insights with the team.

Difficult to compare LLMs

UI lacks visibility on the parameters of each LLM and doesn’t allow for any immediate comparison between LLMs.

Can’t test realistic situations

Each request was individually evaluated and answered. This made it impossible for users to test realistic, conversational situations for their customers.

Forces high user information retention

Users have to re-enter all of the info that they entered into the playground during the ‘Create App’ stage.

Problem

Users of the Playground are struggling to evaluate and understand which LLMs are best for their use case, leading to low engagement and creation of apps.

pushing back

As a part of redesigning the user experience of the Playground, I was tasked with completing the following in one week:

Project scope

Complete user interviews and other user research methods to understand the pain points and needs of users.

Work with our lead front end engineer to implement a complete redesign of the Playground interface.

Renovate the style guide to give everything a fresh new look and feel that appeals more to a tech enthusiast.

Phase 1: Quickly address low hanging fruit

Improve obvious issues that positively impact user experience while minimizing development resources.

Ship first iteration of designs, based on standard UX heuristics and internal feedback.

Phase 2: Incorporate design decisions based on user feedback

Perform in-depth user research and testing to gather impactful user insights.

Update style and components of the platform.

Ship final iteration of designs, based on user feedback.

The scope of the project seemed like a bit much for a 1-week timeline if we wanted to uncover quality insights and implement meaningful changes, especially since we didn’t have any user interviews lined up yet. Understanding the company's urgency to make these improvements, I pushed back to voice my thoughts and proposed utilizing a more agile methodology to help make improvements in a 2-phased approach. This would allow us to quickly turnaround impactful changes on glaring issues, while also preparing more time to interview users and evaluate user feedback more in-depth.

research

archetypes

I had regular daily meetings with the CEO to communicate my research findings and validate any assumptions that I had about the redesign. It was mentioned that one of the major characteristics of the platform is that it should be easy enough for “anyone” to use. This referenced a spectrum of users of tech professionals and enthusiasts to business owners and executives.

Executives

Possesses tremendous buying power and influence throughout company.

Responsible for viewing and reporting benchmarks on their company performance.

Less experience with LLMs.

Software developers

Essential role in the organization and can have strong influence over product selection.

Responsible for implementing and managing Pulze API to their company's product.

More experience with LLMs.

Competitive & comparative analysis

I began my research by checking into some of the competitors in the market like Huggingface, Perplexity and Anyscale, as well as similar LLM products in the same space like GPT-4, Claude and Cohere.

By evaluating these relatively mature and successful platforms, I was able to glean insights on industry-approved elements such as layout, workflows and terminology that would be expected by the experienced user that we were targeting. Now, I would just have to digest these elements and simplify for the opposite end our user spectrum.

heuristic analysis

Since this was a redesign project, I wanted to conduct a heuristic analysis of the existing Playground. I then discussed my findings with several internal stakeholders to confirm my findings and initial assumptions. I was able to confirm the following issues that we’d focus on for Phase 1:

Start screen - Original
User made a request - Original
Issues highlighted from heuristic analysis

Inability to share configurations.

Parameters are hidden by default.

A lot of wasted space.

No way to reset the space to the default state.

Lacks visual hierarchy.

No conversion to the app creation process.

Difficult to compare prompt results.

Large screen distance inbetween input elements.

customer interviews & Affinity mapping

I interviewed a few new and existing users to understand a bit more about their experience in using the Playground. I made sure to interview users with various levels of expertise in using AI-based tools. Most existing users were just thrown into the tool without any real understanding of what Pulze was trying to accomplish.

Almost all users had issues understanding the purpose of the tool and how to make use of the data from the Playground, highlighting issues of intuitiveness. However, once I explained the concept of the tool and the product they were pretty intrigued by the idea and saw value.

Affinity mapping in Figjam
User pain points

Users have to re-enter all of the info that they entered into the Playground.

UI doesn’t allow for any immediate comparison between LLMs other than prompt results.

Users couldn’t have a conversation with the AI. Each prompt was individually evaluated.

Users are just thrown into the playground with no real instruction or hints as to how to operate it.

Opportunites

Connect the configurations of the Playground to the main platform/product.

Display LLM parameters (latency, tokens, cost) for each prompt result to allow for more detailed comparison.

Update the AI to be conversational. The team was already in the process of updating this.

Insights from research

Users struggled to understand the value of the Playground.

The Playground was not intuitive and left users feeling “lost” throughout using the tool.

Users were hesitant to trust the results of their requests.

A lack of a CTA or interaction to further utilize the Playground data left users confused about how to proceed.

How-Might-We

How might we help users easily find the best LLM for their needs and business?

Solution & key design decisions

Design process framework in Figjam

designing phase 1

Start screen - Phase 1
User enters prompt into input field  - Phase 1
Request made - Chat
Request made - Compare
Figma artboard - Phase 1
User feedback from Phase 1

Much easier to use than before. Confirmed by 8/8 users.

UI is a bit cramped and busy.

Still not quite that easy to understand where to start.

The “Top Choice” created feelings of discomfort, as it gives the feeling of being sold something.

A lot of users were initially interested in trying out a specific LLM.

designing phase 2 - final iteration

Start screen - Phase 2
Request made - Chat
Request made - Conversational
Request made - Compare
Figma artboard - Phase 2

designing phase 2 - key design decisions

With the results from user testing, I wanted to apply some of my new learnings to the next iteration of the Playground. Here's some of the major updates that I made for this final iteration:

Getting started

To really give users an intuitive starting point, the new start screen shares a layout similar to ChatGPT, an interface most users were already highly familiar with.

Also, by restructuring the layout of the dashboard, and moving the primary navigation to the top, I was able recapture ~20% of screen real estate, allowing the design to breathe a bit more.

Model selection

In testing with Phase 1 designs, I realized most users already had some idea of which LLMs they wanted to test and compare upon entering the Playground. With this understanding, I wanted to give users the ability to manually select which models they wanted Pulze to evaluate.

Comparing models

Comparing LLMs is now made easier by eliminating other UI distractions. Recommended options are highlighted to help users easily identify which LLMs are best for them.

Interacting with results

Feedback from Phase 1 designs revealed that some users thought the the look and feel of the "Top Choice" made them feel like they were "being sold something". This created feelings of discomfort and an hesitance in proceeding with app creation. By adjusting this approach a bit I was able to still highlight and promote the top recommendation in a more subtle manner.

reimagining the style

The team wanted to also freshen up the look of the entire company brand with a dark theme that would be considered more attractive and updated by their engineer users. I began by constructing a new style guide and components to test out in the Phase 2 designs of the Playground.

success metrics

With the release of the redesign, we saw a big bump in user engagement and the amount of apps created on the platform. Average session duration increased from less than 5 minutes to about 10 minutes, while the app creation increased by over 300%.

With these metrics, I would say the redesign was successful in addressing the problems at hand. I would've loved to test with more users to optimize the playground and platform even further.

Key success metrics

300% increase in conversion rate to app creation from the Playground.

2x increase in average session duration on the Playground.

looking back

if i had more time

If I had the opportunity, it would’ve been great to be able to further track metrics like the click-through rate and session duration to help optimize the effectiveness of the design on a larger scale.

the importance of accessibility design

Designing for this product gave me a peek into the world of designing for accessibility, something I hadn't had much experience with before. The impact of accessibility in design is tremendous and it will be top of mind for me at the beginning of every project that I work on from now on. It's always great to learn and grow as a designer.