Monday, May 20, 2024
No menu items!
HomeCloud ComputingApplying Generative AI to product design with BigQuery DataFrames

Applying Generative AI to product design with BigQuery DataFrames

For any company, naming a product or service is complex and time-consuming. This process is particularly challenging in the pharmaceutical industry. Typically, companies start by brainstorming and researching thousands of names. They must ensure that the names are unique, compliant with regulations, and easy to pronounce and remember. With so many factors to consider, multiplied across an entire product catalog, the process must be designed to scale.   

In this blog post, we will show how the power of data analytics and generative AI can help unleash the creative process, and accelerate testing. We will provide a step-by-step guide on how to generate potential drug names using BigQuery DataFrames. Please note that this blog post simply illustrates the concepts and does not address any regulatory requirements.

Background

Our goal in this demonstration is to generate a set of 10 brand names that can be reviewed by a panel of experts for an imaginary generic drug called “Entropofloxacin”. Drugs with the suffix -floxacin belong to the fluoroquinolones class of antibiotics.

We’ll use the text-bison model, a large language model that has been trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer all kinds of questions.

We will also provide these indications & usage to the model: “Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections. It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back.”

Getting started

In case you want to follow along, we will use code from this Drug Name Generation notebook in this blog post. We will highlight key steps here, leaving some details in the notebook. 

We will be using BigQuery DataFrames to perform generative AI operations. It’s a brand new way to access BigQuery, providing a DataFrame interface that Python developers and data scientists are familiar with. It brings compute capabilities directly to your data in the Cloud, enabling you to process massive datasets. BigQuery DataFrames directly supports a wide variety of ML use cases, which we will showcase here.

Zero-shot learning

Let’s start with a base case, where we simply ask the model a question, through a prompt. No examples, no chains, just a simple request and response scenario.

First, we will need to create a prompt template. You will notice that the prompt guides the model toward the precise outcomes we’re looking for. Also, it is parameterized, so that we can easily update the parameters to try out different scenarios and settings.

code_block[StructValue([(u’code’, u’zero_shot_prompt = f”””Provide {NUM_NAMES} unique and modern brand names in Markdown bullet point format. Do not provide any additional explanation.rnrnrnBe creative with the brand names. Don’t use English words directly; use variants or invented words.rnrnrnThe generic name is: {GENERIC_NAME}rnrnrnThe indications and usage are: {USAGE}.”””rnrnrnprint(zero_shot_prompt)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca62790>)])]

We can submit our prompt to the model using the `model.predict()` function. This function takes a dataframe input. In our simple scenario with a 1 string input and a 1 string output, I’ve created a helper function. This function creates a dataframe for the input string, and also extracts the string value from the returned dataframe. The function includes an optional parameter for temperature, to control the degree of randomness, which can be helpful in a creative context.

code_block[StructValue([(u’code’, u’def predict(prompt: str, temperature: float = TEMPERATURE) -> str:rn # Create dataframern input = bigframes.pandas.DataFrame(rn {rn “prompt”: [prompt],rn }rn )rnrnrn# Return responsernreturn model.predict(input, temperature).ml_generate_text_llm_result.iloc[0]’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca625d0>)])]

To get a response, we first need to create a model reference using a BigQuery connection. Then we can pass the prompt to our helper method.

code_block[StructValue([(u’code’, u’# Get BigFrames sessionrnsession = bigframes.pandas.get_global_session()rnrnrn# Define the modelrnmodel = PaLM2TextGenerator(session=session, connection_name=connection_name)rnrnrn# Invoke LLM with promptrnresponse = predict(zero_shot_prompt)rnrnrn# Print results as MarkdownrnMarkdown(response)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca62a10>)])]

And now, the exciting part. Here are several responses we get:

code_block[StructValue([(u’code’, u’XylocinrnZervoxrnZaroxrnZeroxyrnXerozid’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca62b10>)])]

These names might work! You might notice that the names are very similar. Well, that might not actually be a problem. According to “The art and science of naming drugs”: “The letters “X,” “Y” and “Z” often appear in brand names because they give a drug a high-tech, sciency sounding name (Xanax, Xyrem, Zosyn). Conversely, “H,” “J” and “W” are sometimes avoided because they are difficult to pronounce in some languages.”

Few-shot learning

Next, let’s try expanding on this base case by providing a few examples. This is referred to as few-shot learning, in which the examples provide a little more context to help shape the answer. It’s like providing some training data without retraining the whole model.

Fortunately, there is a public BigQuery FDA datasetavailable at bigquery-public-data.fda_drug that can help us with this task!

We can easily extract a few useful columns from the dataset into a dataframe using BigFrames:

code_block[StructValue([(u’code’, u’df = bpd.read_gbq(“bigquery-public-data.fda_drug.drug_label”,rn col_order=[“openfda_generic_name”,rn “openfda_brand_name”,rn “indications_and_usage”])’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca62c90>)])]

And it’s straightforward to sample the dataset for a few useful examples. Let’s run this code and peek at what we want to include in our prompt.

code_block[StructValue([(u’code’, u’# Take a sample and convert to a Pandas dataframe for local usage.rndf_examples = df.sample(NUM_EXAMPLES).to_pandas()rnrnrndf_examples’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca62d90>)])]

We can create a more sophisticated prompt with 3 components:

General instructions (e.g. generate 𝑛 brand names)

Multiple examples generated above

Information about the drug we’d like to generate a name for (entropofloxacin)

Our prompt will now look like this, truncating some sections for readability:

Provide 10 unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.

Be creative with the brand names. Don’t use English words directly; use variants or invented words.

First, we will provide 3 examples to help with your thought process.

Then, we will provide the generic name and usage for the drug we’d like you to generate brand names for.
Generic name: BUPRENORPHINE HYDROCHLORIDE
Usage: 1 INDICATIONS AND USAGE BELBUCA is indicated for the management of pain…
Brand name: Belbuca

Generic name: DROSPIRENONE/ETHINYL ESTRADIOL/LEVOMEFOLATE CALCIUM AND LEVOMEFOLATE CALCIUM
Usage: 1 INDICATIONS AND USAGE Safyral is an estrogen/progestin COC containing a folate…
Brand name: Safyral

Generic name: FLUOCINOLONE ACETONIDE
Usage: INDICATIONS AND USAGE SYNALAR® Solution is indicated for the relief of the inflammatory and pruritic manifestations of corticosteroid-responsive dermatoses.
Brand name: Synalar

Generic name: Entropofloxacin
Usage: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial…
Brand names:

With this prompt, we see a much different set of brand names generated. With the examples included, we see that the model is anchored on the generic name.

code_block[StructValue([(u’code’, u’EntrolrnEntromycinrnEntrozolrnEntrofloxrnEntroxilrnEntrosyn’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e31eca62e90>)])]

Bulk generation

Now that we’ve learned the fundamentals of prompts & responses with BigQuery DataFrames, let’s explore generating names at scale. How can you generate candidate names when you have thousands of products? We can perform multiple operations in the Cloud without bringing the data into local memory within the notebook.

Let’s start with querying for drugs that don’t have a brand name in the FDA dataset. Technically, we are querying for drugs where the brand name and generic name match.

code_block[StructValue([(u’code’, u’# Query 3 columns of interest from drug label datasetrndf_missing = bpd.read_gbq(“bigquery-public-data.fda_drug.drug_label”,rn col_order=[“openfda_generic_name”,rn “openfda_brand_name”,rn “indications_and_usage”])rnrnrn# Exclude any rows with missing datarndf_missing = df_missing.dropna()rnrnrn# Include rows in which openfda_brand_name equals openfda_generic_namerndf_missing = df_missing[rn df_missing[“openfda_generic_name”] == df_missing[“openfda_brand_name”]]’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e32281f8b90>)])]

We’ll pass a whole dataframe column of prompts to BigFrames instead of a single string prompt. Let’s look at how we could construct this column.

code_block[StructValue([(u’code’, u’df_missing[“prompt”] = (rn”Provide a unique and modern brand name related to this pharmaceutical drug.”rn+ “Don’t use English words directly; use variants or invented words. The generic name is: “rn+ df_missing[“openfda_generic_name”]rn+ “. The indications and usage are: “rn+ df_missing[“indications_and_usage”]rn+ “.”rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e320af94890>)])]

Next, let’s create a new helper function for batch prediction. We’ll use the column as-is without any transformation from/to strings.

code_block[StructValue([(u’code’, u’def batch_predict(rninput: bigframes.pandas.DataFrame, temperature: float = TEMPERATURErn) -> bigframes.pandas.DataFrame:rnreturn model.predict(input, temperature).ml_generate_text_llm_resultrnrnrnrnrnresponse = batch_predict(df_missing[“prompt”])’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e320bbb1690>)])]

After the operation completes, let’s take a look at one of the generated brand names for “alcohol free hand sanitizer”:

**Sani-Tize**

This is a modern and unique brand name for an alcohol-free hand sanitizer. It is derived from the words “sanitize” and “tize”, which give it a scientific and technical feel. The name is also easy to spell and pronounce, making it memorable and easy to market.

In this scenario, we saw that Generative AI is a powerful tool for accelerating the branding process. While we walked through a pharmaceutical drug name scenario, these concepts could be applied to any industry. We also saw that BigQuery puts all of the tools in one place for multiple prompting styles, all with an intuitive DataFrame interface.

Enjoy applying these creative tools to your next project! For more information, feel free to check out the quickstart documentation.

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments