Bryan Helmig on building AI at Zapier
AI lets us build automations which we could never have built through traditional programming. Because AI is able to process unstructured information in a common sense way, a whole new class of tasks can now be automated for the first time. AI is also empowering; because the main interface to AI is natural language, anyone can build automations without any programming experience.
Bryan Helmig, co-founder and CTO of Zapier, joins Anton to discuss how Zapier is using AI to automate workflows for businesses. Bryan shares how Zapier is making AI accessible to non-technical users and how they are building tools to help users iterate on their AI models. They also discuss the challenges of building AI tools for non-technical users and how to collect and evaluate feedback on AI models.
Released Aug 20, 2024, Recorded July 23, 2024
Transcript

Bryan on X (Twitter): @bryanhelmig

Anton on X (Twitter): @atroyn


Bryan

It's easy with this... At least with this tech, I don't know how you feel. I can sit down and just come up with a million things that I want to try because there's so much interesting stuff that these models can do. A lot of it we just haven't really discovered.

Anton

We have to build this iteration into our interfaces somehow. I think that's really fundamental. I think it's something that's getting missed a lot, right now. This iterative interface is everywhere in AI. Every organization probably has dozens to hundreds of things that could be automated today that just aren't being because nobody's really tried.

Bryan

We just want to give people a little spark, a bunch of sparks, and then it's so cheap to generate all these different things that if you can put them in front of a user, then they can jump off on that. They can say, oh, that's interesting. It's maybe not 100% what I want, but that's pretty close and I can kind of play with that.

Anton

It's been the reality in the last few years that to work with AI, you practically needed a PhD in it.

Bryan

Right.

Anton

Right?

Bryan

Right.

Anton

Or at the very least, you needed to understand at least how to do training runs and how do you run all this [inaudible 00:00:54]. And now we live in a world where everybody has access to leading edge models over an API. But there's that intimidation and trepidation.

Bryan

Totally, yeah.

Anton

So one thing we're doing with Chroma is helping software developers on board into building applications with AI and getting rid of some of those barriers to entry, including the product that we're building, but also through developer education, which is what this is really about. A great place to start again is feel free to introduce yourself and we can get into it.

Bryan

Sure. Cool. So, hey, I'm Bryan, co-founder CTO of Zapier, workflow automation platform for SMBs up to enterprises. So anything you need to automate, yeah, we help you do that without needing to write any code that... You can if you want to. For a nerd like myself.

Anton

Look, and this again, despite being a founder of an AI company, I have probably one of the most boring opinions about AI. In San Francisco or Silicon Valley, which is I think AI is great for business process automation.

Bryan

Sure.

Anton

Which is why I'm very excited to chat with you because I would love to learn, first of all, what is Zapier doing in AI for business process automation? What sort of things are you guys building?

Bryan

Yeah, so a lot of it goes to a lot of the common stuff that you'll see. So summarize, extract data, etc. But the cool thing about Zapier is people come up with all kinds of wacky things that you have a hard time predicting. So people often kind of customize it and tweak it. So that's a big thing that we see people doing. But it runs the gamut of use cases like extracting, like asking questions. There's some lightweight like rag stuff people do, respond to this, rewrite this sort of a thing, write a different title for this email or for this blog post, give feedback on this, categorize this into one of these five categories with descriptors and examples. So people do all kinds of things like that. I think most people are doing a ChatGPT conversation sort of action with it, and I forget exactly how many, but it's like tens of millions of tasks each month are going through that. So it's pretty popular and people are doing all kinds of interesting stuff with it.

Anton

Yeah. And again, this is one of the great things about it is the model are flexible. It's one model to perform many tasks as opposed to a specialized model per task, which we'd have before where you have one model doing OCR, one model doing sentiment analysis. No, it's everything. Just feed it over to the LLM.

Bryan

The feedback loop is actually way more amenable to... Because you kind of see the output and you're like, that's not what I wanted. So you just go back to the prompt and add a little rule and say, nah, but when you see X, don't do the lie, do Z.

Anton

Yeah, I really want to touch on that in the later part of this conversation. Because I really think one of the things that application developers in AI and what I think about is you can build much more iterative interfaces for your users than you could before.

Bryan

Totally.

Anton

So stepping back just a little bit. So obviously Zapier is about giving you low-code ability to build these types of automations, right? How does Zapier present AI as a building block of those automations? What does it look like for the user, first of all?

Bryan

Yeah. So I mean to even step back further, Zapier at its most simple form is a trigger and an action, right? So when something happens over here, go do something over here. But you can string these out, you can do many things, you can do paths and you can do all kinds of crazy stuff. But what we see people do with AI, with a [inaudible 00:04:08] like ChatGPT, create conversation or engage conversation sort of action. It's kind of what you would do in the ChatGPT app, except you just drop it into his app. So you might get a new GitHub commit or a PR or something come in, and then you would have your second step, take some of that data and then prompt it and then output some other data and then use it to send a Slack, you know what I mean? So we have people doing things like that. We have people combining stuff. Like if this contains intent to purchase, return true. And then another step will filter it and say, if it's true, send it to Slack. If it's false, send it to the normal support channel. So you have people combining and chaining these things together. And then I do a lot of stuff where I'm writing little code steps to go into it, right? So it's fun to kind of combine that. You get the AI doing all this kind of wacky stuff and then, oh, I'm just going to write a little bit of Python or JavaScript to do something within that. So that's kind of what I would see is the kind of common ways people use it. They just slap it into this app, into the workflow.

Anton

What does that look like for the user? Okay, so I want to build a Zap with... And I want part of the processing pipeline to do something with an LLM. What do you present to the user who wants to do that? How does Zapier support that?

Bryan

Yeah, so what you would see is very similar, like if you've ever looked at some of the API docs for ChatGPT where you're allowed to-

Anton

Extensively.

Bryan

Right? Yeah. They have these list of messages. We do something kind of similar to that. We kind of shrink it down a bit where a lot of times you just get a system prompt and then a user message, a message you want to send in. And another thing that we offer is a memory key, right? So you can type in some... It can be a global identifier or you can pass in a user's email address. And then as you continue to have that kind of conversation, we remember the past responses and we slide them in, inject them in so that it has this context that's building up, right? So that right there is 80% of the utility people get. But then of course we expose the temperature setting, so you can say more or less deterministic. We offer Top-p, Top-k. So you can kind of mess around with some of these things, but a lot of times what users are doing is just like, Hey, here's a system prompt. Here's the user message I want to send across. And then we get that response back. You can do what you need to do with it and the following step. So it's like with Zap, you have trigger, action, action, action, and then you can just add ChatGPT action in the middle, or... I forget how many, we got a dozen different AI providers. So Anthropic, we use Claude. I've been in love with Claude 3.5. I've been using that a ton. So I use that a bunch.

Anton

Super powerful.

Bryan

Super powerful. So there's lots of ways you can kind of swap these in and out. We have a couple more actions and trying to do more work on some actions that are a little less low level like that, a little more kind of describe your task, get a little feedback on the prompt. You can kind of give ratings and things of that nature. And then we just kind of help train it over time to-

Anton

So that's early.

Bryan

To just make it a little bit easy. Yeah, it's a little bit earlier, but we're hoping to get more of these tools. Because a lot of times, even though using an LLM is dead simple, it still is kind of... It trips people up, right?

Anton

Oh, 100%.

Bryan

It kind of gets them off. They think it's more complicated than it is, or even that really simple feedback loop of, oh, that was a little confusing. I wish it wouldn't do that. I'm going to update the prompt. And just going back and forth that little ping pong, until you kind get that under your belt, it kind of feels a little foreign.

Anton

That's right.

Bryan

So offering a few more primitives of like, Hey, here's some feedback on your prompt. Maybe you should include some examples, be more specific, break it down, and then kind of ratings that would help kind of train and say, Hey, this was great. This could have used some work, and here's what. And then that in context learning can happen. And then it just gets better. And it wasn't like you had to do anything really crazy to spin up a training cluster.

Anton

Yeah, of course.

Bryan

Any of that stuff, it's like I'm just like, you can be a non-technical or semi-technical person and just kind of-

Anton

It's kind of the beauty of this stuff, right? I always think about how many more people are now empowered to automate parts of their workflow than could ever have been before. Let's get down to a few practical questions, right? For this type of application. So first of all, who's paying for the model calls? Are they giving you an API key? What's going on there?

Bryan

So it's kind of changing. They used to be a lot more expensive, and now that Mini launched last week and then think as we were taping all the news was dropping about Llama 3.1.

Anton

That's right. It's huge.

Bryan

So these costs are getting crushed. So I think we're going to be taking on more of that and you'll see more of us just like, Hey, it is just included.

Anton

But right now.

Bryan

Right now, a lot of it is you provide your own API key for open AI or Anthropic or Google or whatever.

Anton

How do you manage that? Because I think a lot of people would like to build applications that ask their users for that API key, but managing and storing security, it's always an issue when you have some third-party thing. Of course, Zapier has pretty much the most experience with this.

Bryan

Yeah, we do. We have a lot of security practices around that where it's sort of turnkey, so you add your... A lot of it's O off. I don't think a lot of these providers, the model providers offer O off yet. It's still very API key oriented.

Anton

It's really, frankly, quite primitive still.

Bryan

Yeah, still pretty straightforward copy paste in there, which can still trip people up. But yeah, we store that and then we just reuse that. So as we need to sign requests that have to go out the door to the API, we sign that. We have some kind of new beta products like our AI actions, and it had this link sort of product that we're working on where we can offer more of these things out of the box to developers. But again, that's kind of early stage sort of stuff where, oh, I don't want to manage these API keys, can someone else that deals with API just manage that so I can do all the other stuff with my users. So that's kind forthcoming work that we hope to get in front of folks. But yeah, you got to accept those API keys right now. That's just kind of the game.

Anton

It's funny that you say, oh, we're trying this. It's so early. Sometimes I wonder if we're doing this series a little too early because literally everyone's like, yeah, I don't know. We're figuring it out.

Bryan

I know, it's so true. I mean, we were a launch partner with the... I forget exactly what OpenAI called them, but they were the GPT actions, the precursor to the GPTs, and we were offering a bunch of our actions there. So we've been very early, but things changed so fast I think that was like, wow, this is really cool and people are using it, and then we're doing GPTs now, and they kind just jumped off to this other thing. So a lot of this stuff is churning really quickly as people explore and try new things. So a lot of this is, hey, we put in front of customers, they tell us, oh, this really works, or not as exciting as we thought, We're doing this now. And we just have to kind of change just like everyone does, I think.

Anton

Yeah, and I guess that's like why... Being like, Hey, we're figuring this out from people who really have been working at it for a while. It should be inspiring to people. It should be like, Hey, literally you can be the person who figures out how to do this because nobody knows really [inaudible 00:10:55].

Bryan

Totally. Oh, yeah. Yeah, absolutely.

Anton

So we talked about some practicalities. We talked about storing and managing API keys for people. Zapier is obviously really great at that because you plug into so many different services. And again, as we briefly touched on, the models are so general purpose and that's what makes them very valuable. But at the same time, we're building tooling for people who are not necessarily very code-oriented, for whom even getting an API key might be something that they've never done before, right? How do you steer them into building something useful with this? Is it just a matter of providing them with a lot of examples? How do you do that? How do you show them that, hey, this makes your life easier?

Bryan

Yeah, that's a really good question. A lot of it requires a little bit of a spark from the user side, and that I think is a bit challenging. But there are a bunch of use cases that we've published, a bunch of examples that we've published, a lot of content we've published, and then there's a bunch out there. I mean, even if you were to go look at Anthropic openAI's guides, they have some really good stuff where they talk through the different use cases, different prompt examples, that'll be useful to get you started. But it's difficult to give global advice because everybody that uses it is going to need kind of to shimmy it. They can start with a generic thing, but the beauty is, like we were talking about, you can get your own custom AI model by just tweaking this, tweaking that, and it's just text, right? So even though it may seem a little intimidating, at the end of the day, it's just a blob of text. And then when it doesn't do exactly what you want, you kind of say not that, more like this. And once you get that kind of loop under your belt, you're really kind of off the races. So that's, I think maybe the key thing we try to get our users onto.

Anton

Is to iterate.

Bryan

Is to iterate, to find just a tiny little wedge where it's like, oh, I wonder if I could do that. They get their first response and it's not great, but they're like, okay, I'm going to update this and change it to where I add a rule that says never do this, always do this. And then they do it again and oh, it's a little bit better. And then they just do that over and over and over again. And then before you know it, you've got a prompt that you've engineered, right? And it's kind of doing what you want it to do. It's not perfect. Maybe it's 80, 90% there, but you're off to the races. So I think that quite a core loop of just getting the user into just the-

Anton

That iterative mindset.

Bryan

The iterative mindset is more important than-

Anton

You don't need to know what in context learning is, right? This is the thing, because of our history in AI of being in these research labs, we're very jargon-laden.

Bryan

Yeah, that's true. Yeah. It's just like examples, right?

Anton

Just showed examples-

Bryan

Just showed a bunch of examples, right? And then it's like, okay, I kind of get what you're going for here and it can kind of pick up on that. So that's the beauty of it and why it makes it awesome for non-technical folks and fun to build with is that it gets frustrating whenever it doesn't quite do what you want no matter how many times you ask it. But that stuff will get resolved in time too.

Anton

I mean, I really think about this, right? About why it's frustrating to use, and when I think about software in general, because my favorite way... Because I really get worried anytime we anthropomorphize the model. It's not human, but it's really easy to project a theory of mine onto it as if it was human. And that's what makes it frustrating. Because it's like you can do all this stuff, why can't you understand what I'm asking you, right? Because you can do all this other stuff and you understand it perfectly well. It's like you get frustrated teaching maybe a child math or something and they just don't get it. And I think thinking about it as like, if this is a general purpose information processing element that can do unstructured things in a common sense way, and you're trying to push it into your version of common sense. And compared to a software bug, a software bug is 100% your fault as a programmer and maybe the fault of the person who developed the API and made it bad, but ultimately it's on you. You have the wrong mental model of how the system works. [inaudible 00:14:35], nobody has a complete mental model of how these systems work, and we're really exploring them still. To tie that back into the user facing piece of what Zapier is doing, how do you guide people into that iterative loop? How do you point out that's what they should be doing? And you talked a little bit about it before with some of these new tools, but what is there now?

Bryan

I mean, that's something we got to get better at, right? People who use something like ChatGPT and they ask something and they edit it and add a little more detail and they get a little more detail. People who've done that kind of intuitively get it and they know, oh, when I test it, let me go back and update it. Actually presenting that as a core UX primitive is something we got to work on still.

Anton

Got you.

Bryan

We haven't cracked that. That's something we got to improve, but we have a couple of things that we're putting in front of users I think is helping. But again, nobody knows all the answers to this. So anyone that's watching this can be figuring this stuff out right alongside of us, right? I mean, there's some best practices that are showing up, but the shelf life is so short for them because they're so contextual and you try them and maybe they work for you, maybe they don't. But I mean definitely we've found there are definitely some best practices in how we've found to make these to develop, but it's more of like to manage the frustration, right? Than it is to solve it, right?

Anton

Let's talk about that. What are some of those things that you guys have discovered here that works, right? That you think would work for your users as well?

Bryan

Yeah, so a big one is we're big users of evals. We're starting to do-

Anton

Everyone talks about evals. I would love to hear about your evals.

Bryan

So we're trying to do more and more of this, but there is a learning curve. I mean, I guess I would say, I'm kind of curious your thoughts, but the closest parallel that I've had for devs is its kind of like a unit test. Except don't think of it as pass, fail. Think of it as degrees of good and bad, right? Where it's like you're trying to have a score. And even scoring it sometimes is-

Anton

Yeah. How do you evaluate? Evaluation is relatively expensive and you can say, well, we'll use the model to evaluate, but now you've got the model's ability to evaluate in question as well.

Bryan

Yeah, it gets a little tricky, but we've gotten a lot of value out of evals. One of the great things about evals and that pattern, unlike unit tests, is you can evolve more of a squat, right? To work on these. You can get PMs and designers involved in this because at the end of the day, here's a text input, here's a great text output I expect. And then what does the model actually give you? Maybe some gibberish, maybe they gave you the wrong thing, maybe they gave you the wrong kind of scores. And if you can get them into something discreet like, Hey, here's a number, right? And how good is this number? Those are always really good. But then measuring how similar is this text to this other text? That gets a little bit harder in that evaluation thing. But the great thing is other folks can get involved.

Anton

How do you get other members of the team involved? Because I actually completely agree with you, and this is again, the surface area of this technology is so much larger than just software engineering, even though software engineering is driving what we can build with it. So what are you doing to get other people involved in building those evals is really important.

Bryan

Yeah. Some of the simplest things that you can do is wire up some feedback signals, right? So whenever you have in your product, you do some AI kind of generation, maybe you show one, maybe you show three, you can have... We kind of talked about this before, explicit signals, where people's thumbs up, this is great. Thumbs down, this is not good. And they tell you why. That's an explicit signal. But we're also tracking a lot of implicit signals. And maybe there's a term of art for how people like using this, but that's what we've kind of called it internally. And implicit is just, here's three things, did you click on one of them? And then go use that, or did you ignore them and say, try again, and then that's an implicit signal, like none of these were good, right? And then a lot of your time as a PM or as a data [inaudible 00:18:15], AI engineer I guess is the term now.

Anton

I think that's where we're going probably-

Bryan

I guess that's where we're going.

Anton

AI engineer, I don't know.

Bryan

But anyone could be an AI engineer. These are just prompting these things and you can kind of just start looking through the data and saying, oh, why does it do that? That's so weird. And then you pull that in and copy that as an example, run whatever your little harness you have. We use Braintrust. What we're doing is having a human pick them and select, oh, that's weird. That's great. And we're just trying to mind for those bad examples and then bring them in and then start adjusting the prompts or sometimes what are we feeding the LLM from an upstream step? And then if we can get that score kind of up, it's just the LLM just running these things as a CI job, just like a test sort of thing.

Anton

So basically once you've collected bad examples, you're doing basically LLM based eval, right?

Bryan

Yeah. Well, it kind of depends. So there are a bunch of different ways you can do the scoring. And you can really or add a bunch. One thing that we've found be successful is add six or seven different ways to score. You'll get six or seven scores for each thing, and then as you learn more over time, you can start to tweak them and adjust them or remove them or add new ones. So it's not even just what are the inputs and outputs that you want your system to give you, which you can again have PMs or designers or anyone kind of watch, Hey, here's a weird one. Can we get this into the test suite or the eval suite? That's one side of it, but the other side is often, well, how do you even evaluate these? And you can do stuff like LLM is a judge, which we've done where you ask a stronger model, a more expensive model to judge, and then you give it some criteria and then it gives you a score, right? And then that score is what you see and you try to make that score go up. We've done other things like the as simple as Levenshtein distance, how different are these strings? Kind of works. We've done custom stuff where it's here's like two blobs of JSON, these keys must be exact matches. These keys are kind of more semantic. You could do embeddings on that. How far apart are these? And so you can kind of build your own scoring mechanism depending on what you need, but you don't have to start there. You can just use some of the off-the-shelf. I just do a distance score on this and if it's very similar, then it's very good. If it's not... And then kind of iterate from there as you start to get more experience.

Anton

It sounds like there's pretty much no one best way, right? For any particular user.

Bryan

Not that I've found.

Anton

Yeah, makes sense. That makes sense. Let's talk about actually collecting the feedback in the first place, right? Surfacing those cases. So obviously looking at the... The very first part is either explicit or implicit signals. What do you store?

Bryan

You have to store enough to reproduce the input, right? So you got to store that. So any of that stuff that gets put into your prompt that you're going to send across these APIs, you definitely got to store that. I think a lot of our stuff we store both the raw inputs and the final rendered prompt that we send across. That way you can kind of break it apart and use it in as pieces, and you can also get kind of a quick understanding of what do we actually send. Because sometimes there's just bugs in how you render the prompt, right? You might render HTML and ah, I forgot to tag here. It broke all my stuff. You can do the same thing when you're doing rendering prompts in a piece of software or a feature. So we'll often track that and then when we get that output back, we'll tag that and usually that'll have some sort of ID like correlation ID that we just generate. And then whenever you click a rating button or these signals for, oh, this was good, or this wasn't, we also send that correlation-

Anton

And you're collecting a variety of signals per input?

Bryan

Variety of signals. And then after the fact, after you've kind of collected that data, you can start to visualize and play with it and pull it out. So that's the path that we've gone down. And there's tools. I mean, I think we've done our own custom event framework that we have that we plugged in. I think we use Datadog pretty extensively for some of the stuff. And I think Braintrust also we're using quite a bit. They have some logging capabilities, so you can export that as a data set and stuff. So there's a bunch of different options there. I think the main thing is just try to collect those signals, the inputs, the prompts, the responses, what you extract, some IDs, and then basically the signals, did the users use this? Did they like it? Did they tweak it? Did they ask for another iteration? Did they thumbs up or thumbs down it? And then from there you can start to dive in, get a better understanding.

Anton

There's a few tools for doing things like this out there in the ecosystem. Have you tried any, evaluated them? What did you think?

Bryan

Yeah, we've tried a few. We were early partners with Braintrust, which I mentioned a few times. We like those. It's not too difficult to put together your own, if you're already doing Pytest or any of these different libraries, you could probably pack together your own and then just kind of report with a basic sort of thing. The nice thing about some of the solutions is they give you kind of a UX. Again, you can start to bring in some of PNs and stuff-

Anton

So you can bring in other people.

Bryan

You can bring in other people, right? I mean, in the early days, we even had just people with spreadsheets, Hey, here's an input, here's what would be great. And then we pulled that spreadsheet in as a part of the test thing and just ran a bunch of them and checked the scores and then did kind of a pass, fail from that. And honestly, that gets you a lot, and then you can upgrade some of the other more bespoke tools past that.

Anton

It's amazing how much this stuff resembles very classical labeling in ML, right? It's just the person looking at the data and being like, yeah, I as a human think this is reasonable or not, right? Or based on my domain knowledge and expertise, this is correct or not. And the thing that I wonder about what's going to happen is are we all going to sort of land on one say suite of say open source tooling around this, or is it going to be a really fragmented set of tools or are people just going to build their own for a while? My bet is people are going to build their own for a while. I think it's probably a little early for anyone to standardize because again, the ecosystem's so tiny and small and everyone who's working in it is like... It's the equivalent of being right up against the metal in software engineering, right? We're really touching the model directly, and so we really need to build these custom bespoke things. But maybe in the future... I'm actually surprised none of the LLM providers, the labs have released anything here. They must have stuff like this internally.

Bryan

Yeah. It seems like a lot of that stuff, we're left just reading the tea leaves because there's not a lot going on. I mean, I think OpenAI released their eval harness with a couple public-like eval things, but I don't think it's meant to be used as a product thing. I think it was more like, Hey-

Anton

It was more, Hey, you can contribute evals.

Bryan

Yeah. You can contribute evals.

Anton

That's really what it's for.

Bryan

But yeah.

Anton

Just by the way, I found it really hard to use. I'm somebody who's been doing this for a while. I like to think that I understand AI pretty well. Found it very hard to come to grips with open-source eval harness.

Bryan

Yeah.

Anton

It's very clearly something that they've extracted from their way of doing things.

Bryan

Seems like it. Yeah, it seems like it. Yeah. I mean, the key ingredients for doing an eval is just what are the inputs? What are the expected outputs, score the expected against the actual.

Anton

And give me a nice UI so that I don't bleed from my eyes from looking at it.

Bryan

Yeah. I mean, it's pretty kind of like the table stake stuff is not that complicated, right? It's very similar to anyone who's familiar with unit testing is going to fit in. It's when you want to go down that path of I need a UI. I want to visualize this changing over time. I want to build reports off of this. Where you start to get into, it's like, yeah, yeah, I could build that stuff, but isn't there something out there that's kind of a standard that we can use? And that's why we reach for things like Braintrust, there's others, but yeah.

Anton

Makes sense. You mentioned that you package or wrap the user's prompt. Can you speak to that a little bit? What goes on?

Bryan

Yeah, so a lot of what we're trying to lean towards is having sort of encapsulations of the logic that we want to perform. So oftentimes there's input information that goes in, and this works like this idea of, hey, here's a bunch of inputs. They might be like date times, they might be big long strings, they might be images now, right? It might be audio that needs to be transcribed. All these different inputs that need to come in, they got to be translated to the right kind of format and then put in this case into a prompt. So for us, that means that we have little templating things that we use a lot to times it's just like a mustache or general templates or ginger or whatever your favorite is. And you kind of pass that in. Though in a lot of cases we started just passing JSON right into the model and it works pretty darn good. That or YAML tends to work pretty well. So that's generally what we're looking at when it comes to encapsulating. We do this a lot as building products, but also users are doing this as well because when you have a step and it's going to be using prior data to make some sort of an output. How do you map that in? How do you kind of manage that? We do something very similar inside of this app. Whenever you put that, I want to summarize my blog post and then here's the blog post body. It's kind of mapped in there. That way whenever the model actually runs, it kind of all compiles into one big old string and that gets sent across. Then we get that response.

Anton

So it's a mixture of both, right? You on your end are encapsulating certain types of inputs that you know about in a structured way, but the rest you almost leave up to the user and their prompt.

Bryan

We try to, because again, there's all kinds of... The more kind of guardrails we put around this, I think the less optionality, maybe it gets a little bit easier. But all the kind interesting stuff that you might want to do, you sort of start to lose out on it because you start to box people in. So most of the power is just like you just imagine having a text box. I'm going to write in some hard text and then I'm just going to map some content in from my prior steps or from another bit of code, if you're working on the code side. Whether that's string interpolation, you're just doing the standard sort of thing or you're doing something fancier. That's like... I don't know, 80% of prompting is just doing that, running it, checking it, seeing if you're kind of going the right direction, re-running the evals. Just kind get a feel for where it's going.

Anton

Do you inject anything to constrain the model's output on the other end? Because if you're chaining, for example, Zaps, it can't kind of break. So what's happening there or is that up to the user again as well?

Bryan

That has been up to the user. Again, we're working on ways to make this a little bit more robust, especially nowadays we have tool use.

Anton

I was about to say that tool use API, does that help you at all?

Bryan

It helps a ton. So that is... Then the tricky thing is creating the schema, right?

Anton

Right? Because you have these custom actions, right? So now you have to generate the schema on the flight to feed the model, to get it to do tool use.

Bryan

Right.

Anton

But yeah.

Bryan

Yeah. Or you have to have the user kind of build it or you have to think of it yourself like, okay, I'd like... One thing that we do a lot of is... We've seen customers even do this where you do a thinking. You say, give me your rationale of what you want to do and what you're going to do, right? And that's the first key you want it to give you. And then you'll ask it for different levels of detail because maybe you want that. We've had people build some pretty intricate translation and summarization pipelines where they have like, Hey, give me your chain of thought, right? It was like what people have been kind of referring it to is just tell me what you're thinking model. Give me that first, then give me a high detail about 100 words, then give me a medium detail, like 50 words, and then give me one sentence, right? 10 words. And that ordering actually matters a lot. So we're trying to root out all of those use cases and just kind of understand what are people trying to do. And then as we build out these sort of capabilities, I think we incorporate that more so that you can be assured that, oh, I'm going to have these four output values, right? And they're going to roughly be what I asked it to be. And that's something that I suspect users are going to get... Got a lot of that value out of.

Anton

Yeah. I think tool users, really... Again, you and I have considerable experience with this, but when you speak to people who are just starting out, they're not aware that this capability exists in the model at all. And a lot of times because it's buried in frankly, fairly obfuscated documentation in the API documentation of the model providers.

Bryan

Yeah. There really should be a kind of maybe openAI or Anthropic could do this is more I'm a ChatGPT user. I've used it. Help me get into the API side of things.

Anton

You're going to love what we're coming out with in a couple of weeks. Yeah, we're working on what's called the AI engineering explainer, and it does a lot of that. Our mission with it is to onboard people who have used ChatGPT and understand how to build software into building stuff.

Bryan

Fill that gap. I think that makes a lot of sense. Yeah, because there was a lot of similarities there. We already talked about if you go in there and you edit, that's just refining and getting into more of the eval sort of iteration loop. And then the function call thing is just... It's basically, Hey, you're going to give me a response, but instead of just giving me a blob of text, I want you to give me some JSON and it should have this shape, right? And that alone is really powerful. And some of the early versions of it, it would still glitch out and do weird stuff, but the new ones seem pretty light rock solid. So you can just kind of pack that right into JSON parse, and boom, you've got that exactly what you want, right? There's lots of tools that help make that easier now too.

Anton

So right now with these that people are building, they're mostly, like you said, trigger action. The action is usually the model does something, then it gets taken away. Have you been experimenting at all with multi-turn things where the model might need to do a couple of things before it outputs?

Bryan

I mean, you can kind of simulate that a little bit with that kind of thinking thing where it's like, Hey, before you just respond to me, give me some reason. Break down what you're doing so you can kind of get into the sort of thing where you might go back and forth. You can kind of preempt that by baking those instructions up front.

Anton

How do you surface that to the user? Because obviously in a traditional Zap with a little script or whatever, there's no intermediate steps that need to be surfaced. How does that work?

Bryan

No, not really. I mean, there's a couple other products that we have. So [inaudible 00:31:57] is another one that we've launched. It's kind of like an assistant agent style chat kind of program. You can add actions to it, you can add different triggers to it, you can add different data sources to it. And that is a much more interactive kind of thing. More in that vein of, Hey, this is ChatGPT, but for where I want to do my work with my SaaS apps, I want to hook in HubSpot, I want to hook in... Hey, send the 5, 10 team members customized messages on Slack, right? For what their work is today. And you can kind of build up an agent to do that and then do that repeatedly. And that has more of that vibe of, okay, multi-turn thing, I'm going to interact with it. It's going to help me do stuff. And then you can turn it into something that automates, runs this more regularly and then doesn't have your user human input. And that's really powerful. A big surprise to... It shouldn't have been a surprise, but we had this in our interfaces' product, which is, Hey, I want to build a web page, no code web page. We just had this throw away. Hey, let's do a chatbot in there, right? And that blew up. That got really popular. People wanted to train it and ad content to it, so we pulled that out as a separate product. So now users are building their own little multi-turn things, right? With all kinds of cool things where they're able to pull in information, gather information, you can tell the agent, their little bot, Hey, I want you to get all this first name, last name. And you can work to gather that. So that's one thing that people are doing a ton of. But-

Anton

How do you surface those limits to users, right? Because those aren't Zapier's limits, those are the model's limits. How do you surface that through your product?

Bryan

Yeah, we try to have good error messages when that happens. And we also try to have some configurable options so that we can do trimming for you, right? So if you send us a giant book and the model that you want to invoke is much smaller than that, you can kind of configure it and say like, Hey, throw an error if you get that, or you can kind of chop it off, right? We do some of that, and that's the same thing we do for this memory key thing where you have this history, we'll try to pull in as much context as we can before we blow up the token limit. And we do that with some token awareness counting tokens and things like that.

Anton

So obviously stuff like that tends to change pretty frequently and we get new model releases. How are you keeping up, given that a large part of this job is to wrap those APIs, how are you keeping up with those changes and how do you communicate those to users? It's hard.

Bryan

Yeah. We've started using some of these translating proxies like light LLM. There's a handful out there. I think the AI [inaudible 00:34:11], SDK is pretty good, so they normalize some of this stuff, but a lot of that kind of deeper, what are the-

Anton

Token limits.

Bryan

The tokenization.

Anton

Slightly different meanings of different API parameters.

Bryan

Yeah, I think that's right now... That's still very much the wild west. So a lot of our stuff is just kind of putting guards in and just making sure to pop error messages that people can read and understand, like, Hey, you requested too many tokens here, click here to learn more about OpenAI. Stuff like that where we just try to get you to the right information-

Anton

Be friendly with the user.

Bryan

Yeah, yeah, yeah.

Anton

Yeah. I think that's right.

Bryan

Because we can't control a lot of those, but where we can, we try to do some of that trimming. We try to help people out in terms of just making sure you get good. We get you to the pit of success best we can, but when you don't hit it, we give you some off ramps and try to give you some better information.

Anton

Nice. What do you think are the top one or two biggest misconceptions that users have when they try to do something for the first time? What do you see frequently, commonly anti-patterns?

Bryan

Probably the first one is they're not omniscient, right? So you got to feed them information. It's part of that frustration loop is have you fed it enough of that information to know, so that it knows what you want? And usually nine times out of 10 when people get a bad response, it's because they didn't provide enough context or they weren't clear enough, right? And this is not hard to remediate because you can hit up those prompting guides and they have some good things. You need to kind of read those, it's like, oh, they're being a lot more specific. They're providing examples, and if you just mimic some of that, you'll get better results. So that's something that I often see people running into. I'm trying to think what might be the other thing that I see users really get hung up on. I mean, the other one is, it's kind of related to... It's this Achilles' heel of these general models is like they can do anything, what would I do with them, right? There's a little bit of that, which I think people are taking a back on. But it's not too hard to get started with a little hair brained idea. Almost anything that you want them to do, they can probably kind of do it, right? And then how well they can do it is really kind of the question. So if you've got a little idea of, I wonder if they could do this, it's worth just trying.

Anton

That's what I always encourage people-

Bryan

Just go.

Anton

Just try it, right?

Bryan

Just try it.

Anton

It's so inexpensive and it's quick. And also the failures are kind of more fun than regular software.

Bryan

They're kind of fun. Yeah.

Anton

It does something weird, but it's not weird like segmentation fault. No, it's saying something bizarre. And why is it doing that?

Bryan

Why is it doing that?

Anton

One of my perspectives on this whole thing right now is I really think the speed limit is not what the models can do, it's just the extent to which we've explored them. I think actually they could be doing a lot for us that we don't even know about yet, right? Even with the ones we have today, we don't need them to be super smart. Every organization probably has dozens to hundreds of things that could be automated today that just aren't being, because nobody's really tried.

Bryan

That's absolutely right. Yeah. No, without a doubt. That is totally true. If you and I could just sit down with every person who might have a use for LLMs, we could probably-

Anton

Just build something. Yeah.

Bryan

We'd probably come up with a dozen things that they're like at the end of the day, they're like, this is great.

Anton

And the better part of it is we could then hand it back to them and be like, if it does something weird, just tell them not to do that.

Bryan

Yeah, just tell them not to do that. Yeah, the maintenance kind of story is a lot more approachable, because it's not this inscrutable code. It's like English, right?

Anton

Text. Yeah.

Bryan

Yeah. So I do broadly agree with that. I think that's very true. I think a lot of this is just familiarity. The idea is if you get some value out of ChatGPT, just like going to it or [inaudible 00:37:38] or any of these and just asking questions and asking to like, Hey, I'm going to paste this CSV and you tell me this, you tell me that. That can be just lifted into either a product you're building or kind of a workflow you're building in your organization. You just have to kind of find the right shape for it. How do I get the data that I paste it in? That's my trigger. How do I define what I want on the output? That's what I got to put into my prompt. And then that's how I got to map it all together. Where do I want to send it? So once you have those kind of building blocks down, you can really do a lot of really cool stuff. We have people who are... I saw a user the other day, they had a, I think some keyword from different, a Google alert and Twitter and a bunch of different social stuff feeding into a digest. And then at the end of the day, they feed it all back in and they asked the LLM to give me the top three things I need to be aware of. And then they posted it to Slack. We have people doing this for internal comms, right? If you've got an internal blog, people will do this. And you can even kind of customize it to you. It's like, I'm not really interested in teams processes that are X Y Z. I'm really interested in anything related to the design system in our board, right? So if you see anything like that, make sure to mention it. Otherwise, just return an empty string or whatever it is. So people find all kinds of novel use cases and then make them their own. And I think that's maybe where I see people kind of getting that iteration from is, oh, I use it for this, now I'm going to try it for this, and I'm going to try it for this. So once you kind of open the floodgates, people really kind of find some cool stuff.

Anton

I think of it as developing intuition for what the models are and can do. And then combining that with domain expertise, right? I think that's undervalued right now as well. And also the fact that literally anybody can develop intuition with the models. You don't have to be a brain genius to do this. It's like they're right there. Just talk to it. I often prototype things literally by just chatting with ChatGPT, and then I'll extract like, okay, this is the prompts that work. I'm going to put that in my code and make that run.

Bryan

Totally.

Anton

I'm curious, obviously we've talked about people using Zapier to build these automations. What's Zapier automating internally, what are you using it for?

Bryan

So we're using it for all kinds of stuff. My recent obsession has been like the Claude AI artifacts. Basically you just ask Claude on their... I think it's Claude.ai, right? You ask it like, Hey, build me a little game that whatever, or a little widget that does whatever or whatever little idea you have for something like visual interactive. And it pops it over to the side and it builds it for you. And then you can play with it and then say, actually, can you make it more bright? Or can you actually provide some defaults in there? And you can start, it gives you another one, it gives you another one. So that I think is a peek into the future, right? As to how a lot of software is going to be built. So that's something that I'm really interested in and that is related to a lot of the work that we're doing now where we want generate a lot of these actions and triggers that we have for users.

Anton

Right. So you're using the models to start generating some of those.

Bryan

Start generating a lot of this, because a lot of this has been written by hand. That's one of the big things that Zapier's done. We got 60,000 different actions now, and those are all written by hand, by either us or by our partners.

Anton

But now you can have omni-Zapier.

Bryan

Yeah, you can have all these different API docs feeding in and you get actions and stuff. So that's something that we're spending a lot of time on. We building-

Anton

Things you've never seen before.

Bryan

Yeah, you've never seen, or even little permutations, right? Everybody wants a little flavor. Hey, I want to create a card and Trello, but I want you to also do X and Y and Z in there. And it's more esoteric options and not everyone really wants, and we opted not to include in there because nobody really uses them.

Anton

But then there's a long tail of users and we can address all of that.

Bryan

Yeah. So that's a big thing that we're using it for. We're using it... The problem that we were talking about, Hey, what do I even do? Zapier's had this problem of it can do a lot of stuff, right? You can configure workflows to do all kinds of... 60,000 action, oh, my gosh, what do I do with it, right? That's a big question we get. Well, LMS are great, coming up with a million different ideas. So that's a big thing that we're doing is, Hey, how do we personalize this stuff? How do we make... If you look at something that's very popular, [inaudible 00:41:31], go send it to a spreadsheet or send it to Slack or something. Why you would want to do that is incredibly, incredibly wide, right? Maybe you're in a sales org, maybe you're in a support org. Maybe you're a small business, maybe you're engineering department, maybe you're a manager. You could use a forum with an update alerting you for a thousand different reasons. But why you might want use that is so personal to you. So how can we use an LLM to take those kind of raw ingredients and then bring them-

Anton

[inaudible 00:41:59] intent.

Bryan

Yeah.

Anton

It's using it to capture intent and then doing something productive with it.

Bryan

Yeah, it's kind of like an intent. It's kind of like brainstorming. It's kind of like a lot of these things [inaudible 00:42:07] together where we just want take... We just want give people a little spark, a bunch of sparks. And then it's so cheap to generate all these different things that if you can put them in front of a user, then they can jump off on that. They can say, oh, that's interesting. It's maybe not 100% what I want, but that's pretty close and I can kind of play with that.

Anton

We have to build this iteration into our interfaces somehow. I think that's really fundamental. I think it's something that's getting missed a lot right now. This iterative interface is everywhere in AI. If you use something like Midjourney or another image generator tool, the real power of it is the fact that I don't have to tell you exactly what I want. I can get you to get closer to what I want by either making you generate more versions of it and then going from there or by observing what you generate and then changing slightly my input, right?

Bryan

Exactly.

Anton

We haven't had that in software before. Programs are big bang, binary, pass fail even when we're building them iteratively, right? Very different. But this is very, very different. Let me ask you a couple final questions. The first one is, the perspective that most people have is the models are inevitably going to get smarter. How do you think that changes things over time? Because right now I do understand that we're doing a lot to mitigate when they're not good and we're figuring out the use cases based on where they're at now, because it's very difficult to see what the capabilities would be like in the future. It's your thinking about what we should be doing as the models get smarter. What should we be building? How should we be thinking about that? What's your perspective here?

Bryan

Boy, I wish I had the answer here. I don't think anyone has. I mean, the only things that I can really probably take to the bank is they're going to get more affordable, the economics are going to get crushed, brought down. They're probably going to get a lot more capable on just the kind of things we all do, right? Because that training data is more readily available than... I don't know, way sci-fi out there, scenarios like data entry. Yeah, it's going to get really dominated by these sorts of things. So I just think of capabilities, like you're going to get more or less superhuman knowledge work that everybody's doing, right? You're going to get it very, very cheap, right? And then beyond that, I think anything else... I can't say exactly what it's going to be, but if you can figure out the second order effects from those two, you're probably kind of on the right path. But I think there's lots of paths forward from there.

Anton

Yeah, I think, again, you have to look a little bit at the history of software. And again, the speed limit turns out to be people almost every single time, right? It's like how... Do we have organizations that can actually adopt these technologies? That's ultimately the limiting factor a lot of the time.

Bryan

Yeah, yeah, and I know you all are spending a lot of time trying to teach devs. I mean, we're trying to hire devs that want to work in AI.

Anton

Exactly. I mean, we are too, because we have so much stuff that is AI-enableable, right? Almost every day in the course of writing code, which I still do. I think about, man, I wish I just fed this GitHub action output into the model. So I'm not scrolling through 2000 lines of logs. Just tell me what the error is.

Bryan

Yeah, give me straight to the... Yeah. And maybe one thing that is probably already happening is you're upscaling people. They're having this tool, so if you-

Anton

Everyone in the org can build an automation now.

Bryan

Totally.

Anton

Yeah.

Bryan

Yeah. So I think the number of people that are able to build things is going to increase. So I mean, hopefully the number of AI engineers that exist in the world are going to go up, not just because people have-

Anton

Just the programmers on orgs.

Bryan

Right. But just this idea of, well, I use LLMs, generative AI, whatever, to build things is going to be really common. And that's really exciting. And I suspect a lot of these jobs that we're worried we're going to lose, we're going to gain by people creating all this really cool stuff. At least that's my hope. And I mean, these are also the kinds of folks we're trying to hire too, that are like... It's like, oh yeah, right now you still got to code a little bit. But that's evaporating fast. I mean, a lot of the stuff that you're seeing with these models, that's the most obvious economic output they can do is code, right? So they're probably going to crush that. So I don't know if it's going to be a highly differentiated for just common software development tasks just because a lot of people will be able to do it. But that's also exciting because that's more people that we can hire, more people that are creating cool things.

Anton

It also means we can do more complex tasks.

Bryan

Of course.

Anton

That's the other forgotten piece to this. If we don't really have to do boilerplate much anymore, and if we don't have to spend a lot of time coding, figuring out how do I even use this other interface, I can actually think about the job I'm trying to get done. All right, last question. What were some of the misconceptions that you had going into this, that you really had to eliminate or get around before you started making progress? You personally, when you first encountered this technology, what were some of the biggest misconceptions?

Bryan

Biggest misconceptions? I think a lot of it boiled down to just getting my hands dirty. It's easy with this... At least with this tech, I don't know how you feel. I can sit down and just come up with a million things that I want to try, right? Because there's so much interesting stuff that these models can do. A lot of it we just haven't really discovered, but the ideas are not worth much. You just got to sit down and try to bang them out. And it doesn't matter if you build it in a Zap or I've seen people do the spreadsheets with the GPT like action. We got that in our tables product and people go build crazy cool stuff. Just those prototyping things, so you can... Or even just ChatGPT, just go in there and play with it. And I think maybe that is just ask it to try. That's maybe step zero. And if you can do that, you'll gain a lot of insight into how these models work. To your point, you just kind of have to understand broadly where they land, where they're good at. And once you kind of have a feel for that, then you can start to really hone how you do your work. And then there's really advanced, crazy with log probes and you can do some really cool-

Anton

Yeah, you can start getting crazy with it for sure.

Bryan

It's not necessary.

Anton

No.

Bryan

It's not necessary.

Anton

And again, people are very tempted to sort of over-invest in tooling early. People are tempted to learn all the terminology, and they think they really need to get into the guts of it and know what an attention head is and how models work. Not really, right?

Bryan

No, not really.

Anton

Yeah. Look, again, the thing that we want to do here is encourage devs. And I think this conversation is really valuable for that.

Bryan

Yeah. And I think the definition of devs is like-

Anton

It's everybody.

Bryan

It's broadening. It's way more inclusive now. And to me that's really... I mean, we built Zapier. It's like no code. We want more people. It's like a substitute for software. You're like, oh, this is happening. And it's like, whoa, this is kind of our dream come true. It's way more people are getting access to this sort of stuff. So it's really fun.

Anton

And again, the second-order effects, you start thinking about it. Well, anyone can build all kinds of process automation. And what's more, it's not that more people can automate the same things, we always have been able to, we can automate new things, which means, okay, now that we have all that automation isn't that exciting? We get to build other stuff on top of that. And this is, again, I have a really boring opinion, which is not science fiction at all, which is we're going to be able to automate away tons of boring stuff that is knowledge work nonetheless. So we can free up people to actually get their job done. That's what's exciting for me.

Bryan

I agree.

Anton

Cool. Thanks very much.

Bryan

Yeah.

Anton

That was great.

Bryan

Yeah.

Anton

Yeah.

2024 Chroma. All rights reserved