Case Study - Pulze.ai Playground

Discovery

product user path

Test in Playground

•

Test the capabilities of LLMs with generative AI.

•

Adjust settings to find the best configuration for your business model.

•

Compare LLMs based on parameters and request results.

•

Use Playground configurations to create application.

Create app

Connect app via API

what's the problem?

Users struggled with understanding the value of the Playground, as the tool was regularly perceived as a 'ChatGPT clone'. This lead us to believe that the tool wasn't very intuitive, negatively affecting user engagement, stickiness and session duration on both the Playground and the platform as a whole. Additionally, users tended to give off feelings of confusion and being underwhelmed once they performed a request.

UI is not intuitive

•

Completing user research and compiling my findings to share user insights with the team.

Difficult to compare LLMs

•

UI lacks visibility on the parameters of each LLM and doesn’t allow for any immediate comparison between LLMs.

Can’t test realistic situations

•

Each request was individually evaluated and answered. This made it impossible for users to test realistic, conversational situations for their customers.

Forces high user information retention

•

Users have to re-enter all of the info that they entered into the playground during the ‘Create App’ stage.

Problem

Users of the Playground are struggling to evaluate and understand which LLMs are best for their use case, leading to low engagement and creation of apps.

Creating a plan

As a part of redesigning the user experience of the Playground, I was tasked with completing the following in one week:

Project scope

•

Complete user interviews and other user research methods to understand the pain points and needs of users.

•

Work with our lead front end engineer to implement a complete redesign of the Playground interface.

•

Renovate the style guide to give everything a fresh new look and feel that appeals more to a tech enthusiast.

Phase 1: Quickly address low hanging fruit

•

Improve obvious issues that positively impact user experience while minimizing development resources.

•

Ship first iteration of designs, based on standard UX heuristics and internal feedback.

Phase 2: Incorporate design decisions based on user feedback

•

Perform in-depth user research and testing to gather impactful user insights.

•

Update style and components of the platform.

•

Ship final iteration of designs, based on user feedback.

The scope of the project seemed like a bit much for a 1-week timeline if we wanted to uncover quality insights and implement meaningful changes, especially since we didn’t have any user interviews lined up yet. Understanding the company's urgency to make these improvements, I pushed back to voice my thoughts and proposed utilizing a more agile methodology to help make improvements in a 2-phased approach. This would allow us to quickly turnaround impactful changes on glaring issues, while also preparing more time to interview users and evaluate user feedback more in-depth.

How-Might-We

How might we help users easily find the best LLM for their needs and business?

Phase 1

heuristic analysis

Since this was a redesign project, I wanted to conduct a heuristic analysis of the existing Playground. I then discussed my findings with several internal stakeholders to confirm my findings and initial assumptions. I was able to confirm the following issues that we’d focus on for Phase 1:

Start screen - Original

User made a request - Original

Issues highlighted from heuristic analysis

•

Inability to share configurations.

•

Parameters are hidden by default.

•

A lot of wasted space.

•

No way to reset the space to the default state.

•

Lacks visual hierarchy.

•

No conversion to the app creation process.

•

Difficult to compare prompt results.

•

Large screen distance inbetween input elements.

designing phase 1

Start screen - Phase 1

User enters prompt into input field - Phase 1

Request made - Chat

Request made - Compare

Figma artboard - Phase 1

User feedback from Phase 1

•

Much easier to use than before. Confirmed by 8/8 users.

•

UI is a bit cramped and busy.

•

Still not quite that easy to understand where to start.

•

The “Top Choice” created feelings of discomfort, as it gives the feeling of being sold something.

•

A lot of users were initially interested in trying out a specific LLM.

Phase 2

archetypes

I had regular daily meetings with the CEO to communicate my research findings and validate any assumptions that I had about the redesign. It was mentioned that one of the major characteristics of the platform is that it should be easy enough for “anyone” to use. This referenced a spectrum of users of tech professionals and enthusiasts to business owners and executives.

Executives

•

Possesses tremendous buying power and influence throughout company.

•

Responsible for viewing and reporting benchmarks on their company performance.

•

Less experience with LLMs.

Software developers

•

Essential role in the organization and can have strong influence over product selection.

•

Responsible for implementing and managing Pulze API to their company's product.

•

More experience with LLMs.

Competitive & comparative analysis

I began my research by checking into some of the competitors in the market like Huggingface, Perplexity and Anyscale, as well as similar LLM products in the same space like GPT-4, Claude and Cohere.

By evaluating these relatively mature and successful platforms, I was able to glean insights on industry-approved elements such as layout, workflows and terminology that would be expected by the experienced user that we were targeting. Now, I would just have to digest these elements and simplify for the opposite end our user spectrum.

customer interviews & Affinity mapping

I interviewed a few new and existing users to understand a bit more about their experience in using the Playground. I made sure to interview users with various levels of expertise in using AI-based tools. Most existing users were just thrown into the tool without any real understanding of what Pulze was trying to accomplish.

Almost all users had issues understanding the purpose of the tool and how to make use of the data from the Playground, highlighting issues of intuitiveness. However, once I explained the concept of the tool and the product they were pretty intrigued by the idea and saw value.

Affinity mapping in Figjam

User pain points

•

Users have to re-enter all of the info that they entered into the Playground.

•

UI doesn’t allow for any immediate comparison between LLMs other than prompt results.

•

Users couldn’t have a conversation with the AI. Each prompt was individually evaluated.

•

Users are just thrown into the playground with no real instruction or hints as to how to operate it.

Opportunites

•

Connect the configurations of the Playground to the main platform/product.

•

Display LLM parameters (latency, tokens, cost) for each prompt result to allow for more detailed comparison.

•

Update the AI to be conversational. The team was already in the process of updating this.

Insights from research

•

Users struggled to understand the value of the Playground.

•

The Playground was not intuitive and left users feeling “lost” throughout using the tool.

•

Users were hesitant to trust the results of their requests.

•

A lack of a CTA or interaction to further utilize the Playground data left users confused about how to proceed.

reimagining the style

The team wanted to also freshen up the look of the entire company brand with a dark theme that would be considered more attractive and updated by their engineer users. I began by constructing a new style guide and components to test out in the Phase 2 designs of the Playground.

designing phase 2

Start screen - Phase 2

Request made - Chat

Request made - Conversational

Request made - Compare

Figma artboard - Phase 2

Solution & key design decisions

Design process framework in Figjam

key design decisions

With the results from user testing, I wanted to apply some of my new learnings to the next iteration of the Playground. Here's some of the major updates that I made for this final iteration:

Getting started

To really give users an intuitive starting point, the new start screen shares a layout similar to ChatGPT, an interface most users were already highly familiar with.

Also, by restructuring the layout of the dashboard, and moving the primary navigation to the top, I was able recapture ~20% of screen real estate, allowing the design to breathe a bit more.

Model selection

In testing with Phase 1 designs, I realized most users already had some idea of which LLMs they wanted to test and compare upon entering the Playground. With this understanding, I wanted to give users the ability to manually select which models they wanted Pulze to evaluate.

Comparing models

Comparing LLMs is now made easier by eliminating other UI distractions. Recommended options are highlighted to help users easily identify which LLMs are best for them.

Interacting with results

Feedback from Phase 1 designs revealed that some users thought the the look and feel of the "Top Choice" made them feel like they were "being sold something". This created feelings of discomfort and an hesitance in proceeding with app creation. By adjusting this approach a bit I was able to still highlight and promote the top recommendation in a more subtle manner.

success metrics

With the release of the redesign, we saw a big bump in user engagement and the amount of apps created on the platform. Average session duration increased from less than 5 minutes to about 10 minutes, while the app creation increased by over 300%.

With these metrics, I would say the redesign was successful in addressing the problems at hand. I would've loved to test with more users to optimize the playground and platform even further.

Key success metrics

•

300% increase in conversion rate to app creation from the Playground.

•

2x increase in average session duration on the Playground.

looking back

if i had more time

If I had the opportunity, it would’ve been great to be able to further track metrics like the click-through rate and session duration to help optimize the effectiveness of the design on a larger scale.

the importance of accessibility design

Designing for this product gave me a peek into the world of designing for accessibility, something I hadn't had much experience with before. The impact of accessibility in design is tremendous and it will be top of mind for me at the beginning of every project that I work on from now on. It's always great to learn and grow as a designer.

pulze.ai playground

overview

role

tools

duration

type