By John P. Desmond, AI Trends Editor
The AI Infrastructure Alliance is taking shape, adding more partners who sign up to the effort to define a “canonical stack for AI and Machine Learning Operations (MLOps).” In programming, “canonical means according to the rules,” from a definition in webopedia.
The mission of the organization also includes, according to its website: develop best practices and architectures for doing AI/ML at scale in enterprise organizations; foster openness for algorithms, tooling, libraries, frameworks, models and datasets in AI/ML; advocate for technologies, such as differential privacy, that helps anonymize data sets and protect privacy; and work toward universal standards to share data between AI/ML applications.
Core members listed on the organization’s website include Determined AI, an early stage company focused on improving developer productivity around machine learning and AI applications, improving resource utilization, and reducing risk.
The determined.ai team encompasses machine learning and distributed systems experts, including key contributors to Spark MLlib, Apache Mesos, and PostgreSQL; PhDs from UC Berkeley and University of Chicago; and faculty at Carnegie Mellon University. Investors include GV (formerly Google Ventures), Amplify Partners, CRV, Haystack, SV Angel, The House, and Specialized Types. Founded in 2017, the company has raised a total of $13.6 million so far, according to Crunchbase.
Determined CEO Evans Says AI Stack “Needs to be Defined”
“At Determined, we have always been focused on democratizing AI, and our team remains incredibly optimistic about the future of bringing AI-native software infrastructure to the broader market,” said Determined Cofounder and CEO Evan Sparks, in an email response to a query from AI Trends on why the company joined the alliance. “This same mindset led us to open source our software last year in order to reach more teams across industries. As software becomes increasingly powered by AI, we think that the infrastructure stack to support developing and running software needs to be defined.”
He felt the challenge was too big for one company. “It’s going to take multiple companies solving different problems on the way as AI applications move from R&D into production, working together to define interfaces and standards to benefit data scientists and machine learning engineers. The AI Infrastructure Alliance is poised to be a powerful force in making this a reality.”
Asked why the mission of the AI Infrastructure Alliance is important, Sparks said, “In order to see the true potential of AI, AI development needs to be as accessible as software development, with little to no barriers to adoption. At Determined, we view collaboration as critical to achieving this. Joining the AI Infrastructure Alliance has provided us the opportunity to work with more like-minded companies in our own space and bring together the essential building blocks to create the future of AI, while creating a long-term framework for what AI success looks like.”
Super AI Focused on Quality of Datasets for Training
Another core member is Superb AI, a company focused on helping with training datasets for AI applications. The company offers labeling tools, quality control for training data, pre-trained model predictions, advanced auto-labeling and ability to filter and search datasets.
Hyunsoo Kim, CEO and cofounder, launched the company in 2018 with three other cofounders. He got the idea for the company while working on a PhD in robotics and AI at Duke University. The process to label data in order to train a computer in AI algorithms was expensive, laborious and error-prone. “This is partly because building a deep learning system requires extreme amounts of labeled data that involve labor-intensive manual work and because a standalone AI system is not accurate enough to be fully trusted in most situations,” stated Kim in an account in Forbes.
So far, the company has raised $2.3 million, according to Crunchbase. It has attracted support from Y Combinator, a Silicon Valley startup accelerator, Duke University and VC firms in Silicon Valley, Seoul and Dubai.
Pachyderm’s Platform Targets Data Scientists
Another core member is Pachyderm, described as an open source data science platform to support development of explainable, repeatable, and scalable ML/AI applications. The platform combines version control with tools to build scalable end-to-end ML/AI pipelines, while allowing developers to use the language and framework of their choice.
Among the company’s customers is LogMeIn, the Boston-based supplier of cloud-based SaaS services for unified communication and collaboration. At LogMeIn’s AI Center of Excellence in Israel, the company’s team deals with text, audio, and video that needs to get quickly processed and labeled for its data scientists to go to work delivering machine learning capabilities across their product lines.
“Our job at the AI hub is to bring the best-in-class ML models of, in our case, Speech Recognition and NLP,” stated Eyal Heldenberg, Voice AI Product Manager, in a case study posted on the Pachyderm website. “It became clearer that the ML cycle was not only training but also included lots of data preparation steps and iterations.” For example, one step to process audio would add up to seven weeks on the biggest computer machine Amazon Web Services has to offer. “That means lots of unproductive time for the research team,” stated Moshe Abramovitch, LogMeIn Data Science Engineer.
Pachyderm’s technology was chosen for a proof of concept test because its parallelism allowed nearly unlimited scaling. The result was instead of taking seven to eight weeks to transform data, Pachyderm’s products could perform the work in seven to 10 hours. The tech also had other benefits.
”Our models are more accurate, and they are getting to production and to the customer’s hands much faster,” stated Heldenberg. “Once you remove time-wasting, building block-like data preparation, the whole chain is affected by that. If we can go from weeks to hours processing data, it greatly affects everyone. This way we can focus on the fun stuff: the research, manipulating the models and making greater models and better models.”
Founded in 2014, Pachyderm has raised $28.1 million to date, according to Crunchbase.