Thursday, October 3, 2024
No menu items!
HomeData IntegrationThe Sources And Destinations Podcast Episode #6 with Ahmed Elsamadisi

The Sources And Destinations Podcast Episode #6 with Ahmed Elsamadisi

We know, we know. We went a little long this time, but you are going to love every minute of this week’s guest. This week we are so excited to host Ahmed Elsamadisi from Narrator.ai. Ahmed is the co-founder and CEO of Narrator.ai, a startup that standardizes all data into a single format. He previously built out WeWork’s data infrastructure and has also worked as an AI engineer at Raytheon. Not only that, he was also selected as one of Forbes’ 30 under 30 this year. Ahmed chats with us about what it takes to be truly data driven, shares his learnings from WeWork in a high scale/competitive market, and what led him to create his own company, Narrator.ai.

Introducing Ahmed Elsamadisi

Ahmed founded Narrator.ai with one big idea: what if I can standardize all data into a single structure? And how does that single structure answer any question that anyone has quickly and accurately? 

This year Ahmed landed on Forbes 30 under 30, listed under the enterprise technology category. That is no small feat, but with the many amazing milestones he’s accomplished in his career, it’s not a surprise either.

In the beginning of his career, Ahmed focused on human-robot interaction, Bayesian data fusion, and building algorithms for autonomous cars at Cornell’s Autonomous Systems Laboratory. Soon after, he joined Raytheon to develop tactical AI algorithms and human exoskeletons. Ahmed referred to this project as “the Iron Man suit but made of rubberbecause it’s way more energy efficient and not a fictional concept ignoring proper scientific practicesand algorithms for adaptive decision making”. In 2015, he joined WeWork where he built WeWork’s standard data infrastructure and grew its data team from one to forty Data Engineers and Data Analysts.

With the many issues and hurdles he faced at WeWork while growing a team and organization, Ahmed came up with his big idea and shifted focus on creating Narrator.ai to overcome answering data requests within minutes and reason.

Introductions

Sean 0:53  

Hello, hello, and welcome to another exciting episode of the sources to destinations podcast. And this week, we have a very interesting guest to introduce you guys. It’s gonna be a really lively, informative conversation. But before we get kick-started, I did want to mention some great news here at the podcast. We were recently featured on Feedspot’s top 10 data engineering podcasts. So a big thank you to all the listeners that have helped us grow this small little podcast. And it’s allowing us to talk to a whole bunch of very talented data engineers from around the world. So you guys are really fueling this fire, and we appreciate every single one of you. And that’s about enough of the housekeeping items for this weekI say since we have such a great topic and a great guest today. Let’s just go ahead and roll into introducing that person. So I’m going to hand it over to my co-host, Dash Desai, to tell us who’s joining us today. 

Dash Desai 1:54  

Thanks, Sean. So this week, we’re so excited to host Ahmed from Narrator.ai. He’s the co-founder and CEO of Narrator.ai startup that standardizes all data into a single format. He’ll talk about that a little bit more later on. He previously built out WeWork’s data infrastructure, as well as the data team. And he’s also worked as an AI Engineer at Raytheon. Not only that, he was also selected as one of Forbes 30 under 30 this year. That is one impressive resume, Ahmed. Welcome to our podcast, and thank you for joining us today. To kick it off, would you like to say a few words about your background?

Ahmed 2:37  

Well, thank you for having me here, Dash and Sean. I’m excited to kind of share more about data engineering and a little bit about my learnings. As Dash mentioned, I started my career in AI, really working on how humans make decisions with machines. And you’ll see that kind of theme throughout a lot of my answers; it’s kind of how I like to think about data engineering in terms of its applications and usage. And, I eventually joined WeWork, so I have experienced all sorts of data engineering funds, as you would call them, and have used a lot of the common tools, probably every tool from starting with Luigi from Spotify, to airflow to DVT, to building our own systems at WeWork in every BI tool that exists. And this all kind of led me to found Narrator.ai. I realized that data engineering needed something a little bit different, and the problem was going to be solving the framework of how we approach data modeling. And that’s where Narrator.ai came to be, so I’m excited to hear more about that journey and our approach.

Dash Desai 3:45  

Yeah, that’s awesome. So just to follow up on that, can you talk a little bit about your passion for data engineering in general?

The Art of Data Engineering

Ahmed 3:52  

I look at data engineering as the most fundamental piece of decision-making. Why is that the case? Well, data comes in, and especially in today’s age, we’re capturing data from so many different systems. Here’s Zendesk, your Salesforce, your website, data just captured all over. We’ve gotten really good at capturing data, we capture everything, and we store it. And now, thanks to tools like Snowflake and Fivetran, we’re easily able to process it, and we’re easily able to dump it in one place. Then comes the world of stakeholders. They’re asking questions, and your team wants to make decisions. How do people who call us increase or decrease the likelihood for them to buy the product? Those simple questions require a lot of work to go from the raw data to data that’s usable, and that’s the most important piece. The way a data engineer structures data, the way a data engineer ensures quality changes the culture of how people consume data, changes the kind of questions people are asking, and changes the kind of questions people are answering, and that’s the essence of what makes a company data-driven. And I think we often forget all these cascading effects from something simple as what you enable people to answer and kind of just focus a little bit on the pipelining, but data engineering is so much broader than that and it continues to really, really change the world. We often see in sales that people often want to buy a data analytics solution when they often have a data engineering problem. So I look at that as the essence of everything we do.

Sean 5:33  

Ahmed, I really like how you position that. And obviously, you know, we’re here at the Sources and Destinations podcast; we’re talking about data engineering and really trying to highlight those everyday struggles. And you kind of say, you know, it’s a foundation, it’s really critical. One thing that I’ve observed over the years is that, generally like you said, companies are hungry for more and more sources of data, right. So they want that Zendesk data, they want the clickstream data from their website, they want the financial data flowing in in real-time. And at the same time, they’re ramping up their data science capabilities. They’re hiring people with experience in machine learning. Some companies are hiring quants and different data scientists. But, I think one thing that we observe is that they don’t always scale the data engineering function, you know, to the same degree that they’re scaling the analytics. Do you think that that is, fundamentally, a bad approach? Or do you see the data engineers being able to kind of act as a force multiplier?

Ahmed 6:40  

So there are two sides to that question. One is, why do people do that? And two, how does engineering sustain those natural behaviors in the industry? The first question is, to an untrained eye, data engineering is a confusing field. So when you talk to people, they sayokay we captured the data and it comes in tables; we’re going to query it with SQL and then we’re going to do data science. And then you’re likewhat does data engineering do? Well, they take the data, use SQL to create another table that you then query with SQL again to answer data science. So a lot of people who are stakeholders or business people don’t really understand why this middle step is needed. It often gets characterized as cleaning—like oh we have to clean the data—which is really, really under-representing what that step is. People often think, there’s some columns that are going to be null that you have to massage. That’s never the problem. It’s really stitching the data together and bringing it into a more usable format. So I think that it is key to really educate the stakeholders that data engineering is about making your data usable; it doesn’t get captured in usable formats and you have to work really hard to make it usable.

The thing that I did engineering, though, is that it is a multiplier. And I have a very strong opinion on data engineering because our company, our entire premise, is the very small upfront work to structure your data into this new way of structuring data, we call the activity schema, and then it makes any table you need instantly available. So we look at 20 to 30x speed and time savings on your data engineering work. And that I think is capable in data engineering but requires a lot of planning and tools is kind of how I think about it. So good data engineering will be a huge force multiplier. Bad data engineering actually is a slow down, because then you have a lot of middle layer tables, numbers aren’t matching, people are constantly confused—they’re saying “I got a total number of sales from this table and it gave me 30 but this table gave me 32 so what’s the difference? Why is there a difference?” And then you end up living in what most engineers call maintenance hell. You’re just stuck in it. There is that kind of how much data you give people to ensure they’re able to make more decisions and how much maintenance that requires when people misuse the data or don’t understand the nuances of those tables. So Narrator.ai does solve that problem, so I have a very optimistic view of the world. But I can see very, very much that that’s kind of the dynamic you are often put in place when you are a data engineer.

Dash Desai 9:34  

Hey, that’s cool. So I have a follow-up question on that. You mentioned bad data engineering. Do you think that’s because of lack of knowledge, lack of tools, or the combination?

Bad Data Engineering

Ahmed 9:45  

Bad data engineering, to me, is very similar to bad data science, where marketing and press makes people think they have to do things in a very specific way when we don’t understand why we do it.

Let’s look at the best practice for data engineering today. It really hasn’t changed since Microsoft Stored Procedure. You’re taking your data in the raw form; you’re writing these crazy SQL queries to bridge the data sources and deal with missing foreign keys and all the complexity that is the nature of data, you’re going to fit your web data to your sales data—you’ve got cookie and time and all these different things that makes that really hard. So you end up writing a 1000 line query. And then you have a table that has the columns that you think someone’s going to need. And you do it and someone goes, “oh, well, actually I want to know if they bought within 30 minutes, not ever.” And now you have to go back and add more layers to that table. So this is kind of the best practice approach, this is the fact that you mentioned table approach. And people keep telling you that’s going to work and you just build more tables and manage tables. And then you need dependency managers. And you need data dictionaries to tell people which table to use. So you’re constantly following this trap of buying these tools that make a system more and more complicated. And I think that leads to a lot of bad engineering.

I always make the joke, if you like Google Airflow and look at their prime example, it’s kind of complicated. There are 7000 lines moving on top of each other. And you’re like, “I don’t want my system to look like that. That’s really complicated.” And it makes, even though they provide tools to manage it, we haven’t solved that complexity, we have only managed the complexity. The alternative view is people say, “Oh, don’t worry about it, it’s very common.” And I love these tools. I’m not really bashing any of these tools. But another approach is like the Looker where somebody will tell you “hey, use Looker and put Looker on top of your raw data and Looker will do all the joins for you.” And you might think that’s a good idea and go down that path. And just so you know, Looker itself doesn’t do that in its own Looker implementation. And then again, you end up with messy, messy systems, it’s really hard to do it, and you just kind of fall flat. So there’s really this key part of how you structure your data, which is one of the most elementary and critical parts of data engineering. That’s up to you. You figure that out. You figure out what tables should look like. And no one is helping you deal with that situation. Even when you get to templated and DVT packages, which are super helpful, they’re still single source. They don’t help you bridge different sources to answer questions that are realistic. So when you’re kind of left stranded. I was that, too, at WeWork. I followed the best practices; I talked to people and I mentioned the star schema approach and looked into all different data approaches. But it’s still up to me to figure out how I want to structure that table and what kind of questions I anticipate people are going to answer. And as long as we’re still in this pathway of trying to guess what someone’s going to ask, data engineering is always going to be a little bit falling behind, because people are always going to ask questions that you did not anticipate. So I feel like that’s often what happens in data engineering, and it’s not anyone’s fault. It’s just why you end up having a lot of bad data engineering. And that’s the cultural shift that you have to kind of change; We need more standard, more validated approaches to help so we don’t have this huge black hole of your decisions of how you want to structure that data.

Sean 13:32  

Now, Ahmed, you bring up a really good point. I was actually moderating a think tank symposium yesterday. And we had a lot of really great people working on data teams from different companies. And I think that pressure is really real. I think vendors are putting a lot of information out there, which is creating confusion in the marketplace. So I think there is some ownership on vendors to ensure that we’re helpful in the conversation. We’re also looking across the ecosystem to balance our message with everything out there and allow data acceleration to happen naturally. I attend a lot of conferences with keynotes from companies like Google and Facebook, and PayPal—companies with really sophisticated data engineering practices. But then I think if you talk to a majority of smaller companies, companies in the middle of America, who are maybe just starting out on their analytics journey, they feel a great deal of pressure because they’re not as far in their analytics journey. So I think for a lot of those companies, they kind of get pulled into waters that maybe are unfamiliar, and maybe they aren’t really ready for from either a data engineering or data science capacity. And, you know, my advice to them was, you know, don’t let any big company or any vendor really dictate how far you are in the analytics journey. So, any thoughts on that, Ahmed?

An Industry of Confusion

Ahmed 15:05  

I really love this question because I talk about this all the time. And my go-to example is always Spark. Whenever someone’s like, “I’m gonna use Spark,” I say, “don’t use Spark.” like. We have such power in our warehouses that no one really needs to use Spark unless you’re Google, Facebook, or a very specific niche company that’s dealing with big data. The same thing goes with I have a lot of data. It’s like, “how much data do you have? 100 billion rows?” “No, we only have like, maybe 100 million.” You don’t have big data. You have 100 billion? You still don’t have big data. So don’t worry about solving big data problems when you’re trying to answer simple questions. And I think that those big companies are solving different problems than what we are trying to solve. Most people are trying to optimize customer behavior and understand what’s happening. And you look at these ginormous companies, and they’re trying to do something very different. So different problems, different solutions, different tools. Don’t compare. Yet, when you do go talk to some of these companies—I went on a tour when I was at WeWork, talking to Spotify, Netflix, Airbnbto see, let’s forget your data science work. Let’s look at your data engineering. And it’s really interesting because what you’ll find is the problems are still there. No matter how sophisticated you get, the core problems of people constantly asking you questions, having to deal with new tools, new tables you have to build, constantly updating your data models, people complaining, your numbers not matching—those things are there.

The biggest difference that you’re going to see in solving these problems is that those companies have a lot more money, are paying for people with a lot more experience, and have a lot more things in their tool chest to really help beat their competition. So, if you’re a company, don’t try to do what Facebook or Google or these companies are trying to do. You don’t need it, and if somehow you prove me wrong, then you probably do need it. But 99% of people will not need it. The second thing, realize that they get to take a lot of stuff to their advantage because they have a lot more regulation, a lot more structured data, a lot more people, and a lot more seniority to solve some of these little tiny problems that you’re going to run into. And if you don’t have the same thing to solve it, you need to come up with more clever ways to answer those questions.

And this really, really brings it back to what we often say at Narrator.ai. If we succeed at Narrator.ai, then answering questions no longer becomes important, answering questions becomes a commodity instead of a democracy. So we talk about commoditizing data instead of democratizing it, because we’re really about making it so easy to answer questions that any question you can come up with, you can answer in under 10 minutes. And if you can do that, then you can actually have a similar playing field between Facebook and Amazon and your small Shopify startup, because then you guys get to compete on who can ask the better question. And I think that’s what we eventually want to shift the industry to. Eventually, we want to solve data engineering so people can start thinking and focusing on asking better questions and making better decisions because that is how you’re going to compete. 

Dash Desai 18:39  

Hey, so just to follow up on that, why do you think these small companies try to model what other companies have done, for example, Google, Facebook, Netflix, and so on, in terms of data engineering?

Ahmed 18:50  

I think that’s the nature of information. When companies do not know what to do, they’re always going to look at the companies that are role models. And they keep hearing that these companies are doing such advanced data science and data is so important to every single decision, and they go, “Oh, I need to use data science, I need to follow what they did, or they do,” which is really misleading in general. I think still, most big decisions are made by simple analysis and really guided, so it’s kind of just misplaced.

But, I have worked with enough executives to hear, “We need to be more data-driven, we need to follow Netflix’s pattern in making better decisions. We don’t want to be a Blockbuster. We want to be Netflix. Let’s use data”. And you’re like, “Yeah, but you’re not Netflix, your value prop is not a recommendation engine. So, don’t start building something that’s designed for a recommendation engine, when you’re solving problem A”. And people often respond to that— “Well, we want to be able to get to a world where we do recommendation engines”. And I say, “When you get there, you should use those tools. The challenge is you’re not there. And you’re never going to get there if you can’t answer your simple questions today, your complicated questions, to make better decisions. And then, you can worry about building a pipeline for machine learning.” So it’s just the nature of everyone trying to jump forward and imagine a future world, because they think it’s gonna really help with their decision making. But, way more often than not, it’s just a little bit of a distraction.

So, I really strongly believe, and as a person who comes from an AI background who has used AI, I can tell you that, if everyone’s able to ask and answer questions reliably and make a little bit better decisions, you can optimize way more of your business, which would lead to a much greater impact than you trying to just build humongous machine learning black boxes and shove things in there. At WeWork, we fell into that trap, too. Some executive came down and was like, “Hey, data, we’re gonna use Watson”. And I was like, “Ugh”. And they were like, “No, you have to use Watson; it’s like the most advanced AI ever built and it’s going to answer all of our problems. It’s going to solve churn.” And we’re like, “Really”. And we ended up going through the whole motion—seeing the presentation, what Watson concluded, and the thing they said was so ridiculous that the executive walked out of the room. And we were like, “Okay, no more Watson, let’s go back to answering simple questions to improve sales, instead of trying to find this magic bullet”.

Sean 21:36  

Yeah, that reminds me of a couple of years back. We were consulting with people on whether they needed to use Spark streaming or Flink. And the whole discussion was, you know, do you need more than five milliseconds? Do you need processing in less than five milliseconds? And so, we have all these people coming saying, ”Well, micro-batch is not okay. We need real-time streaming”. And then we said, “Okay, so you need to process data in less than five milliseconds”? And they go, “Well, no, we don’t really have a use case for that”. So sometimes people just get sold on the speeds and feeds of what they’re looking at. So I’m glad you mentioned WeWork there; I think you bring a lot of context and knowledge to the discussion. I do want to talk a little bit later about what you’re doing at Narrator.ai because I think you guys are providing many needed optimizations specifically for data engineers. But if we can roll it back for a second, what does it take to build out data infrastructure at a company like WeWork? I mean, obviously, with a huge directory, and you were essentially leading that effort. Do you mind sharing, just kind of at a high level the lessons you learned?

Data Engineering at WeWork

Ahmed 22:46  

Building data engineering at WeWork was truly something. To give you an idea, because most people don’t realize it, WeWork had 85 systems used in-house and was about 6000 raw data tables, migrations from bulyun, just a lot of complexity. The business was evolving so quickly that the engineering team was changing the data so quickly. Data engineering is needed to cascade the data and provide everyone with the information they need to make better decisions. And, it was quite tricky. I think that WeWork was the epitome of a traditional startup. We used all the tools, we followed the best practice, we hired very, very talented engineers to help build it. And we still struggled. There were partially challenges of maintaining and building the system and partially just dealing with executives. You mentioned five milliseconds. Every executive in the world wants real-time data. And I have never seen… I actually tell them that if you can make a decision, if data is changing every minute and you can make different decisions based on the minute, you should not be an executive. Take a break, step back, and let the data decisions marinate. No one should be that volatile.

So WeWork data engineering was two parts: building a system, following the tools and ensuring that everyone had tables and access to data they needed. And then we had education where we try to really teach people how to ask good questions and answer good questions. It’s very, very rare that you see people that know how to ask good questions. And even better, even harder is finding people who knew what decision they wanted to make. So I think that it was initially just building the system, getting our transformation layer running, dependency management, controls, and everything in place. But toward the end of my time, WeWork was really about trying to change the culture of how people use data. We used to play this game, which would be whenever an executive ran down and said, “There’s an emergency; I need to know the top five referral partners”. And I would say, “Okay, here they are”. And he would say, “How do you know”? I’m like, “I’m just guessing randomly; what are you going to do with that information”? And more often than not, there was no action for it. They were just like, “Oh, um, I need to know the conversion rate”. I’m like, “30%”. So shifting people to really try to figure out what they want instead of, what we call spelunking at WeWork, where people would look at dashboards and try to come up with something, is very dangerous and very misleading because you’ll create a lot of associations and things in your head that are not real. So really getting to the habit of asking quality questions and answering them was always the key. Hopefully, that is helpful.

What Does it Mean to be Data Driven?

Dash Desai 25:42  

No, that’s great. So I have a follow-up question on that. So you mentioned, asking the right questions and things like that. And a lot of companies these days claim to be data-driven. Based on your extensive experience, what do you think it really means to be data-driven?

Ahmed 25:59  

I would say most companies are data-informed. What often happens is that they have a decision that they want to make and then they go and look for dashboards that represent a story that they can piece together to allow them to confidently make that decision. Data-driven is when people actually have hypotheses, data is used to analyze that hypothesis, and then a decision can be made given the conclusion of data.

Most companies, I would say, are not data driven. And there’s two ways that I usually ask people to figure out—I do this in sales to understand how much they value decision making—and I usually ask the question, “How many things have you undone? How many ideas did you have that you were to go build, you tested it, and decided to roll it back”? Most people won’t have.. very, very few product people, very few people have actually seen companies undo things. It’s really rare. On top of that, that’s ignoring the data. Data tells you something is bad. And you just say, “Okay, well, I’m going to keep tuning it until I get some sort of validation that is good enough”. So I think that’s the first thing that is important to realize; companies are not data driven. The second thing that I often see is—I ask data teams what their goal is. And it’s often to build more dashboards or provide more tables; it’s really never to help somebody make a better decision or increase revenue or increase conversion rate, or really impact the bottom line of the business. So often that the data team is really not aligned with the business goals of the company. And I think that causes another big separation. So we have 5000 dashboards, and you’re like, cool, we are data driven, but no one’s really making decisions off of it. So being data driven is really, really hard. And in today’s ecosystem, which is the thing that made me leave WeWork and start Narrator.ai., was that at WeWork after we had built the systems that we were spending like $3-$4 million on tools, and we had bought all these dashboards and these self- serve tools, and we had all this documentation and tables, people weren’t making better decisions. When we quantify the impact of people who saw the dashboard to make a decision, it’s very, very little. And I really wondered why?

And the answer is, it’s actually a really, really, really hard thing to be data driven. It is much easier to splunk and just think that I can just like, in Tableau, slice and dice one table a million different ways until something sticks out and I’m like I found an insight. Because every single BI tool is promising you insights when they’re just visualization tools. And, if you could find it inside by spelunking, then an AI, like a machine learning algorithm, will find it faster. And they’re so cheap to do nowadays, you can literally drag them into and find any insight in this table. So there’s so much that’s conditioned on that table. And this kind of approach is something I used to call data theater; it’s like you look like you’re using data, but you’re not actually making better decisions, and you’re not actually improving. And I look at companies all the time and I say, I’ve been doing this work for years, okay, let’s look at your conversion rate from lead to sale over time. And it’s flat. And you’re like, great, you haven’t really done much work in optimization or making better decisions with data because you haven’t improved your system at all. So, I see that a lot of companies love dashboards, love tools, love real-time things. And it’s often used for data theater.

To be data driven it means you’re making decisions with data. And I still think the most prevailing tool for data driven decisions right now is finance data and Excel. Like they’re the few ones that actually are using data to make decisions right now in a lot of optimization and customer behavior. Like I see a lot more data theater than I do see data driven. But we’re hoping to change that at Narrator.ai. And that’s kind of what we’re doing; customers that start using their Narrator.ai start changing because there are a lot of things that I blame the vendors and the tools and the marketing for them. I blame somebody who promises you that you can find insights because you have a visualization tool, when it doesn’t do anything. I think it’s the owner of the tool and the vendor to provide structures that you can rely on, the ability to answer questions, guarantee that you’re answering questions correctly, help you ask and answer more questions, build an innate feeling that you can iterate. And if we can create that experience and help you make better decisions, that is the job of the vendor and that’s how we help people become data driven. Because right now, if you’re doing it on your own, it is really hard and really expensive.

Sean 30:47  

I really love when you talked about having to either trash an assumption or trash or project because the data is telling you something different. That takes humility. As someone who’s had to do that a couple of different times, you know, it’s often really hard. And depending on the level of politics inside your organization, it could be near impossible. So I think that’s a really great one. So I think you kind of led into my next question: really, to talk about why you founded Narrator.ai. Is the core hypothesis to really help people become more data-driven by focusing on those decisions versus the world of data technologies and data infrastructure.

Narrator.ai

Ahmed 31:33  

I found Narrator.ai—and since we have a technical audience I’ll be a little bit more technical on what was the moment that I felt I needed to build a company—I founded Narrator.ai because when people ask questions that were unique, we had to go back and redo our data modeling layer, which added maintenance and added higher risk of numbers mismatching. And it was often because people mentioned a unique kind of relationship that was a little trickier because of how we join data and relate data.

And, often when you’re bridging systems, you don’t have foreign keys—like if you bridge your email data to your web data, you’re hoping that the UTM parameters are maintained across the order and you can join based on that unique parameter, which hopes the email team actually put it in all the links, and the user viewed the email on their phone and then just logged in on their browser, which is one of the most common thing for eCommerce. So those kinds of questions when people ask us, are really hard. Just like how many people who opened my email came to our site, or how many went to open the email but ended up buying are really hard questions to answer correctly. And it is a data engineering problem to do it. So I felt that I was like, we can talk about these concepts and we all have an idea of how to do it yet, but because our data is unique, we can’t really share that. You read a blog about Netflix and you can’t use their algorithm because every company has unique data. And 80% of using an algorithm is getting the data in the format that it needs.

So I found a Narrator.ai with one idea: what if I can standardize all of the data into a single structure? And what if that single structure can answer any question that anyone has? And I remember trying to raise money with this story and people were like, “This sounds ridiculous and impossible. How are you going to do it”? And I was like, “Well, that’s what I need money for, because I need to start figuring out how to solve this problem. I have a couple hypotheses and hunches. But like, this is the goal. I don’t know how we’re going to do it.” And it took us about two years before we had our first version working where we can take any piece of data, and I really mean all your data—so your Zendesk data, your Salesforce data, all this data gets transformed into a single table. We call it the activity schema. I’s 11 columns, there’s no JSON blobs, it’s just a table that follows your core entity, often your customer, in time, and has actions that that customer did and a couple of metadata. And then what we’re able to do, is that we’ve reinvented how you can ask and answer questions and how you relate to data. By removing all relationships from the original data, you don’t have to depend on foreign keys, which means you can bridge systems instantly. And what Narrator.ai provides you is a way you can actually take this data and reassemble it. So asking questions like everyone who opened an email, how likely are they to buy? That’s two clicks. Because it’s everyone who opened the email and we use this thing called relationships where you say, first in between, depending on customer, time, and occurrence to really stitch that data together. And this is something that’s going to make very little sense to a lot of people that I’m talking to. But it’s the outcome that really matters.

Today, every company we talk to starts by saying, “There’s no way you can get our data into a single table”. And we always say, “There’s no way you can answer any of my questions”. And I do it in a demo—I say—come up with any question you have and let me see if I can answer it live, in front of you, in under 10 minutes. I have been doing this now for four years, and still haven’t been broken, so fingers crossed that this is going to keep going. But that is the key there— by enabling anyone to ask any question and create a table, you’re able to really start solving that data engineering problem. There’s low maintenance, it’s very clean, it’s very fast, it’s very efficient. But what it also does, it changes how we ask and answer questions. To answer a question in Narrator.ai like you’re stitching something based on a customer or core entity. So by definition, they’re going to relate. The query is always correct. You don’t have to worry about duplications dropping rows, any nuances, you have transparency, you don’t need a data dictionary because you have a language. We’re using these building blocks that we call activities, you can say, called us, visits our site, submitted a ticket, and submitted a rating. So you really just shift, because the ability to ask a question is so easy through this approach of the activity schema. We just see this nature of people being able to ask and answer way more questions. Last week, 800 new questions were answered in Narrator.ai, so they’re able to ask and answer new questions really easily. They’re able to iterate and do follow up questions naturally. They never have to go back to the engineering team to build a new table. And they’re confident that the data is correct. The data is accurate and the data is reliable. And this shifts the world. Because it’s now so cheap to test your hypothesis. People often just now test it. Because now the cost of doing it is so low that you think you might as well see what data says before I start trying to create a story. And I think that moves us in the right direction of having people start using data. And that’s why I keep talking about this idea of commoditizing it. If the cost of using data goes to zero, people will use data, people will make better decisions. And, that’s the thing that we see in our customers, and that’s what I hope to see in the world.

That One Weird Thing

Dash Desai 37:00  

That brings us to the last segment in our podcast, we call it that one weird thing. So I’m sure you can relate to this as a data engineer. So please put your data engineering hat on, and then share something that’s weird, something that stands out every single time you start working on a project, a very simple example that I can give is working with different data formats, time zones, and things like that, across platforms, databases, and what have you. So anything that you’d like to share, that’d be awesome.

Ahmed 37:37  

Awesome. Um, so I was thinking a lot about this question. And I thought about maybe daylight savings, but I’m sure people answer that, getting the median which is also annoying. And I decided that what I want to talk about is getting the first value. So, when you’re going to answer a question, we often want to know if somebody came to a site and called us, for example, I’ve been using, and you want to know the first time they called us, you want to kind of be unique row, give me like the first product they viewed here. And, when you go to get the first value, especially in an aggregation, it’s a very annoying problem in SQL and in data engineering in general. You’re often doing a window function, and then aggregating by it, and then removing it, or trying to do a self join with the minimum timestamp to get the first value based on this time. It is often a SQL nightmare. And everyone here who has done it, when you’re often trying to get the first or last value, you end up spending, like the SQL just explodes from complexity to deal with just the nesting of the window functions to get the first value of like three values. So that’s the weird thing, and I remembered it because I literally wrote an entire blog about this specific problem of getting the first value in a group buy. If you’re interested there, I give a one line solution without a window function to get it. It’s a little nifty. But I think this is one of the most annoying problems in all of SQL, so I hope the people who watch the video can resonate with the pain that this thing has given. And I’m sure you’ve googled it. And if you get a chance to read my blog, because I think the approach that I do is kind of sneaky, but kind of clever. And it allows you to create multiple phrase values without any window functions.

Find Out More…..

Sean 39:33  

Ahmed, thanks for sharing that. And that brings us to the end of today’s conversation. I want to thank Ahmed for being a great guest. I think your talk here has been illuminating. I’ve learned a whole bunch in this conversation. And we’ve really been able to take it down to the level of what data engineers experience every day. So, for that, I definitely thank you. Just real quick, Ahmed. Where can people go to find out a little bit more about you and Narrator.ai?

Ahmed 39:59  

Yes. Our website is a great place—Narrator.ai. You can also just find me on LinkedIn. I’m Ahmed Elsamadisi. And this is a secret that I don’t know if a lot of people know. But usually, if you book a demo from the activities schema page on the Narrator.ai website, you’ll book it straight with me, because I love talking about this stuff, and we can dive into the details of how this approach works. So, check it out.  

Sean 40:28

Excellent. So speed dating with Ahmed at the Narrator.ai website. I want to thank my co-host, Dash, and out guest today for such a great conversation. We will be back in two weeks with another great guest and more on data and data engineering here on the sources and destinations podcast. Thanks everyone for listening and have a great week. 

 

The post The Sources And Destinations Podcast Episode #6 with Ahmed Elsamadisi appeared first on StreamSets.

Read MoreStreamSets

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments