SCISPOT (YC S21) INTERNSHIP

Building a smart data import tool for scientists

CONTEXT

Scispot (YC S21) is a platform where wet-lab scientists can automate their laboratory operations.

They’ve built a tool called “Labsheets” (think Spreadsheets) where users can manage their sample, inventory and instrument data in one place.

Role

Product Manager/Designer (Me)

Time

3 weeks

So, why is CSV import so important to scientists?

In lab workflows, scientists use multiple different tools such as LIMS (Laboratory information management software), instruments, and analysis tools (Python, R). Their data becomes scattered across different sources.

CSV import is the common denominator between these tools. Scientists want to be able to update, merge and consolidate their sample data through bulk csv imports.

CURRENT PROCESS

THE PROBLEM

Scientists struggle with bulk importing their csv data into existing labsheets due to data mismatches and incompatibilities on Scispot.

The current experience lacks proper error handling and user feedback, resulting in scientists making data imports with missing data.

MY IMPACT

As the sole designer on this project, I lead product scoping and re-design for a brand new data ingestion tool.

V1 shipped to 100+ labs on Scispot's platform

Cut import time by 30%

The current import flow lacked core functionalities that scientists needed.

I went through the import csv process, audited the flow, and discovered these three core issues.

CURRENT PROCESS

USER FLOW

I simplified the flow of importing a CSV into an existing labsheet, while added in the missing blocks of creating new columns.

I changed the point of entry of importing a csv from the home page to the selected labsheet, reducing friction in the user flow. In addition, I worked with the engineers to scope a data quality check step to verify csv column compatibilities.

BEFORE

AFTER

CHALLENGE #1

How might scientists review and map their csv columns?

As scientists are importing dozens of csv columns, my engineers and I explored using AI to automatically map each csv column to each existing labsheet column based on name and data type similarity. I first explored allowing scientists to review and accept each AI-recommended mapping. But, as this because tedious at a larger scale, I pivoted to displaying all the mappings in a table.
VERSION #1
Reviewing and accepting mapped pairs
This because time consuming at a larger scale when scientists are dealing with dozens of columns.
CHOSEN
Mapping column pairs in a table
Easily scannable for scientists to review each pair quickly at a large scale.
As there's a use case where scientists might not want to import a column, I created an action button where scientists can leave out columns from import. Initially in Version 1, I explored allowing scientists to check off columns. However, it was not clear that the deselected columns were not being imported. Thus in Version 2, I added a drop down menu with the "Don't Import" option.

VERSION 1

Selecting imported columns

CHOSEN

Dropdown to not import columns

CHALLENGE #2: ERROR HANDLING

How might we inform scientists of data errors during the import process?

Structured and clean data is incredible important for scientists to minimize error and misinterpretations during the analysis phase. Thus, I scoped out handling two main types of errors scientists experience.

TYPE #1 ERROR

Blocking error that ID column is not mapped.

TYPE #2 ERROR

Warning that data may be lost due to row errors.

Type 1: Global blocking error

For this type of error, scientists can't proceed until they map a csv column to an ID column. Thus, I chose to add a red banner flag at the top of the modal to catch scientist's attention and reflect the error severity.
CHOSEN
Error banner

Type 2: Warning scientists of incompatible data won't be imported

As this a non-blocking error, I chose to use a warning icon with a tooltip signaling what rows will be lost. The scientist can either ignore the warning or go back into their csv and fix the data.
VERSION 1
Flagging the percentage of valid data
CHOSEN
Specifying rows that cannot be imported
Initially, I showed only the count of rows that would be lost. Following feedback, I added an example row to provide context, helping scientists see why those rows are invalid and excluded from import.
VERSION 1
CHOSEN

CHALLENGE #3

How will users be notified when their import is done?

Depending on the size of the CSV, the import process can take anywhere between 10 seconds to 1 hour. Currently, scientists are not notified when their CSV report is completed.
VERSION #1
Notifying through notifications
Notifications are more suited for persistent reminders, than updates.
CHOSEN
Notifying through a toast message
Provides immediate immediate lightweight confirmation.

FINAL DESIGNS

Re-imagining how scientists import & map their CSV data

Here are the final re-design of the import csv flow form start to end. This was shipped out to 100+ laboratories!
BEFORE
AFTER

REFLECTIONS

Grateful for the journey! Here's what I learned....

Quality means shipping fast
This was my first time working at a fast-paced and high-growth startup! I felt uncomfortable at first making a lot of product and design decisions at an accelerated pace. But, gradually, I learned that shipping fast and getting the design out there in the world allowed us to learn about what customers really think, get feedback, and iterate on the product, turning into something people love using.
Being full-stack in product and engineering
Knowing how data is structured, stored and queried allowed me to effectively communicate with my engineers and understand limitations on earlier design concepts. Furthermore, in my PRD, I was able to note down all the possible edge and error cases I'd have to consider and design for in my design process.
Let's create something great together! ♡