U.S. Markets open in 3 hrs 16 mins
  • S&P Futures

    -1.25 (-0.03%)
  • Dow Futures

    -23.00 (-0.07%)
  • Nasdaq Futures

    -12.25 (-0.10%)
  • Russell 2000 Futures

    -1.10 (-0.06%)
  • Crude Oil

    -0.05 (-0.06%)
  • Gold

    -1.60 (-0.09%)
  • Silver

    +0.03 (+0.15%)

    +0.0006 (+0.0527%)
  • 10-Yr Bond

    0.0000 (0.00%)
  • Vix

    -0.41 (-1.99%)

    +0.0023 (+0.1842%)

    -1.4140 (-1.0450%)

    -118.81 (-0.70%)
  • CMC Crypto 200

    -3.71 (-0.91%)
  • FTSE 100

    -13.34 (-0.18%)
  • Nikkei 225

    -448.18 (-1.59%)

iMerit Unveils: Reporting, Analytics and Insights for Scaling your ML Data Pipeline

iMerit’s VP of Product, Glen Ford shares the challenges companies face when moving from proof-of-concept to production ready ML deployments. During this phase workflows within the data pipeline can quickly move from cumbersome to unmanageable. With a single point of management for reporting, analytics and insights you can scale your ML data pipeline more efficiently and effectively, allowing you to reach your goals faster.

Video Transcript

- And now, iMerit Unveils: Reporting, Analytics and Insights for Scaling Your ML Data Pipeline with Glen Ford from iMerit.

GLEN FORD: Welcome. I'm Glen Ford, VP of Product at iMerit. You've heard the stat that 80% of data science time goes into data preparation. iMerit takes the friction out of this process for our customers.

We hear all the time that clients data is growing faster than their ability to get value out of it. And that happens in autonomous vehicles, medtech, commerce, and any number of other industries. iMerit's solution to this combines a tool inclusive approach with a large and experienced workforce. But we also accelerate clients time to value using our own proprietary technology.

You'll see several parts of our iMerit stack later in the show with my colleagues Sidibe and Brett. But now I'm very excited to introduce, ground control, a single source of truth for all your annotation projects regardless of the type and regardless of which tool is being used. Here, I've drilled into a computer vision project that's been running a while. And you can see some near real time analytics around throughput.

This throughput data is gathered by the annotator's browser regardless of the tool being used. And that's unique to us. It's not just a no code integration. It's a no integration integration, meaning you gain visibility into performance without effort, making it easier to do better planning. Since this is near real time you don't have to wait till the project is finished to understand and react to how things are going.

So we have some filters here at the top, naturally. The contributors filter is interesting. It allows you to filter the entire page down to select individuals. So let's imagine that you've got 100 annotators working on a project and you're happy with the quality. But you want to move even faster so you add 20 new ones to the mix. This filter allows you to segment those 20, to evaluate their performance and determine whether the additional horsepower will be enough to finish early.

Tasks performed per day is important because it gives you real time visibility to spot events but also the ability to forecast when a project will be complete. When something deviates from expectations, you want to react as fast as possible. Average time for tasks per day is aggregated across the people working on that day.

There's a couple of spikes here. But we know that these spikes in this case were due to denser images being sent, meaning there's more objects to annotate. So you would certainly expect it to take longer to do the annotation. But if that wasn't the case, you'd want to dig in and see whether there are possibly more edge cases being found.

Maybe the annotation is more subjective, or complex relative to the norm. Or maybe the instructions are more ambiguous relative to these images that were sent on that day. That's how actionable this is.

Now, the histogram below bucketizes users buy their time on task, their overall performance. As you can see that this is a pretty healthy distribution, showing most of the annotators are doing quite well and being very efficient. At iMerit, we continually look at metrics like these to see when to improve instructions, to intervene with individuals to improve that performance. So we'd expect over time the histogram to get even more and more weighted towards the left.

The filters I showed you earlier are very helpful for digging into those details. And there is a drill down view available here. So there are the exciting features in the analytics portion of our iMerit ground control today. But we're not done yet. Let me show you a look at upcoming analytics now.

Interception. Over union is a really critical measure for computer vision scientists it's the key measure for accuracy in many types of localization. So here's a typical and healthy pattern of improvement in the annotator performance on a new project. You can see that IOW improves continuously. So that's nice.

If you were to spot a negative trend, you can immediately react and get greater precision going. So how about class distribution? Models need to be trained on lots of instances of everything that they have to identify. And you can immediately spot here that we're just not getting enough pedestrians and pets. We're also not getting enough mailboxes for some reason.

So you can react by changing the data set that is sent to us and get more, get a higher likelihood of more of those classes coming through. You can also react by saying, I've got already enough passenger cars. So let's stop annotating those. That kind of immediate actionability is what we're looking for in these charts.

So now, that's just one slice of ground control. I want you to make sure to tune in to Sidibe's presentation later where he'll cover edge cases. I promise it's super fascinating. If you'd like to know more about ground control at any time, please visit iMerit.net and have a great rest of the summit.