Sep 30, 2016

Google Sheets Explore: a work in progress

Spreadsheet __software that will understand natural-language questions about your data is certainly a cool concept. Microsoft Power BI's cloud service - its analytics platform, not Excel - has offered this capability for awhile, but having it percolate down to spreadsheets will put it before many more potential users.

I've been taking Google Sheets Explore for a test drive this afternoon. First thoughts? It's got a lot of potential, but at the moment I'd caution you not to depend on it to understand exactly what you mean in the same way that, say, Google search can sometimes figure out what you want even if you've misspelled a word.

Here's why. One of the sample problems Google uses in its blog post about Explore is "what are the top three items by sales price?" That's a pretty clear question and a fairly simple response: Return the three items with the largest sales price.

But what if your spreadsheet has catgories as well, and you want the three priciest items in each category? I tried to find out something similar, using a spreadsheet with a month's worth of commercial airline flights arriving in Boston. When I typed in a question asking for the top three delays by airline, I got the three most-delayed flights altogether (along with which airline was responsible for each). Typing "What were the top three ARR_DELAY grouped by Airline" didn't work. Neither did "What were the top three ARR_DELAY for each Airline?"

When I asked "What was the Mean ARR_DELAY by Airline" I received the appropriate response. "Average ARR_DELAY" worked as well. However, "What was the Median ARR_DELAY by Airline" returned the median for all flights. That's curiously inconsistent behavior, especially since Google Sheets pivot tables can calculate medians by group. For those who were hoping Explore would be a simple way to type out natural-language instructions whenever you want a pivot table, the platform isn't quite there, at least yet.

Next up: counts by group. "What is the COUNT of each Airline" just gave me the total number of rows in my spreadsheet. So did "What is the count by Airline." If I typed in "Airline," I got a list of unique Airlines in the data.

Was this an impossible request? No, it turns out that "Distribution of Airline" is the correct phrase to get a count grouped by Airline ("Airline Distribution," however, couldn't be understood.). "Count of [some other column] by Airline" also ended up giving me a count of records for each Airline.

I'll be the first to admit that "Distribution of [variable of my choice]" is a lot easier to remember than creating a new pivot table. But if Explore is truly designed to be "natural language" as opposed to a simplified SQL requiring specific syntax, it needs some better understanding.

To be fair, Power BI has similar problems understanding my queries, although it sometimes seems to provide a bit more guidance while developing questions. Being a BI tool, it has some other advantages, such as natural-language requests for visualizations -- you can ask it to generate a graph of your results. Google Sheets Explore offers some automated visualizations of your data, and you can easily add those to your spreadsheet with a single click. But as far as I can tell, if a visualiation isn't generated in Explore automatically, there isn't a way to ask it to make you one.

But back to Explore's natural-language capabilities. One of the best tips for using this new functionality is to look carefully at the Answer text which displays just under your question and above your results. It will tell you how it interpreted your question, such as "Average of DEP_DELAY by Airline" if you asked it "What is the mean of DEP_DELAY by airline?" That's probably the best way to ensure what you've requested is also what Sheets has understood.

No comments:

Post a Comment