Mmlspark Versions Save

Simple and Distributed Machine Learning

v1.0.4-spark3.5

3 weeks ago

v1.0.4-spark3.3

3 weeks ago

v1.0.4

3 weeks ago

v1.0.4

Bug Fixes 🐞

  • companionModelClassName no longer returns generic type variable (#2195)
  • Fix tag for pyCodeGenImpl (#2194)

Build 🏭

  • bump azure/login from 1 to 2 (#2176)
  • bump azure/CLI from 1 to 2 (#2178)

Maintenance 🔧

  • Bump version to 1.0.4 (#2200)
  • fix flaky HyperOpt NB (#2198)
  • Bump python version to 3.11 (#2193)
  • exclude non-executable docs from automated tests (#2197)
  • add retry logic to build steps (#2192)
  • update openai api version to 2024 (#2190)
  • fix 1.0.1 version tags (#2189)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.

v1.0.3-spark3.3

1 month ago

v1.0.3-spark3.5

1 month ago

v1.0.3

1 month ago

[v1.0.3]

Bug Fixes 🐞

  • repair failing speech tests (#2179)
  • update openai completion (#2142)

Build 🏭

  • bump github/codeql-action from 2 to 3 (#2148)
  • bump actions/setup-python from 4 to 5 (#2146)
  • bump peter-evans/create-or-update-comment from 3 to 4 (#2162)
  • bump actions/upload-artifact from 3.1.3 to 4.3.1 (#2165)

Features 🌈

  • OpenAI embeddings with GPU based KNN (#2157)
  • Synthetic difference in differences (#2095)

Maintenance 🔧

  • Update to version 1.0.3 (#2183)
  • update build system service principals (#2181)
  • raise error with documentation link - find_secret (#2180)
  • check Fabric Tenant (#2175)
  • rotate outdates SAS url in speech tests (#2173)
  • Support Token Provider Mode (#2160)
  • Bump isolation forest to 3.0.4 (#2168)
  • Add script to generate pypi mfa qr (#2150)
  • Add Unified Logging Base Class for Python (#2159)
  • fix failing tests (#2153)

v1.0.2-spark3.3

4 months ago

v1.0.2

5 months ago

[SynapseML v1.0.2]

Bug Fixes 🐞

  • Add the error handling for Langchain transformer (#2137)
  • use java class loader (#2135)
  • Support to Bool input for Onnx models (#2130)

Build 🏭

  • bump amannn/action-semantic-pull-request from 5.3.0 to 5.4.0 (#2125)

Doc

  • update find_secret on Fabric and doc (#2132)

Documentation 📘

  • update CONTRIBUTING.md (#2138)
  • fix install instructions (#2136)
  • fix readme install
  • add audiobook paper to README
  • add analyze text document (#2127)
  • use the new AnalyzeText API in docs(#2126)
  • removing spark 3.2 instructions

Maintenance 🔧

  • bump to v1.0.2 (#2140)
  • change udf vec2array to pyspark.ml.functions.vector_to_array (#2131)
  • fix failing notebooks (#2134)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

  • 522661ae3834f3a54c0ad746350b225663c51d41 chore: bump to v1.0.2 (#2140)
  • 2a01c8e68ae58f281eab2afc5a3f69aa28dd2fc7 doc: update find_secret on Fabric and doc (#2132)
  • 23222c08403bcc067c402b95f36e9da89e62b94a fix: Add the error handling for Langchain transformer (#2137)
  • f3ae1465f5564afe69cf6697ac4e98937a9e0ed4 fix: use java class loader (#2135)
  • fc3a9992675ff42e5d2a45566abce692ed3fd9b9 docs: update CONTRIBUTING.md (#2138)
  • 9b20829010ff2818b623e4fb06aa7481f82ab2f9 docs: fix install instructions (#2136)
  • c10f46ea3d7ede110d219806428932b486a8bbcc docs: fix readme install
  • 28cd6db0f02c85a20dabf461e3f7a333de900b8a chore: change udf vec2array to pyspark.ml.functions.vector_to_array (#2131)
  • 46a1ef816aa12292ad101ef16296bdd5aded557a docs: add audiobook paper to README
  • 5e9bae1c442d5f9ea78274b219e18d690f5fe12f build: bump amannn/action-semantic-pull-request from 5.3.0 to 5.4.0 (#2125)
See More
  • 241062fac15ea96815d597f38acbc03984ef185c docs: add analyze text document (#2127)
  • 4623219956d4629b74bf76f7b382252d30dbd187 review docs (#2128)
  • 9195deef8b3c260983934010bc7f60efa93e6817 fix: Support to Bool input for Onnx models (#2130)
  • 4c4fc8aa5d9e080ee13b1026a000edb15a4d6485 chore: fix failing notebooks (#2134)
  • 90ded807fc28f8b6b6cdf25250b908543432c61e docs: use the new AnalyzeText API in docs(#2126)
  • 5cd78c9f610bc14a429f49e8bfe32ca72e5cfe37 Improve LightGBM Network logs (#2124)
  • a187cd063e4c7e6ac7f913187d41c373d66ab5f2 docs: removing spark 3.2 instructions

This list of changes was auto generated.

v1.0.1

6 months ago

v1.0.1

Documentation 📘

  • pointing cognitive apis to azure ai (#2119)
  • bump readme to spark 3.4

Maintenance 🔧

  • bump to v1.0.1 (#2123)
  • add back in exclusions (#2122)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

  • cb4fd82835e6193ac4c4283f21faa1ed4e69660c chore: bump to v1.0.1 (#2123)
  • 91e8c8525df06110345ea774fc4417812af4ec49 chore: add back in exclusions (#2122)
  • d240cbb1f4a6916fb46a622d3a33089e9001dae1 docs: pointing cognitive apis to azure ai (#2119)
  • 77be64100870889b563c759a079d86c6bca23ce1 docs: bump readme to spark 3.4
  • ef435a2917bc383a251e574c6d88cf909b1336e3 chore: bump to v1.0.0 (#2120)
  • c2fdb05f44d6c705c954dc80e2c7c0f33b96a71b chore: Adding Spark34 support (#2052) (#2116)
  • 903dc6b94e5ae617b995d94490dfafc8ff2ca4aa docs: move cognitive namespace to services namespace (#2118)
  • fd00b8700441ef47205950b72c7bcbe84b0f5b36 chore: refactor cognitive package to services (#2117)
  • b0caf2e5ff920094f2d7f80bc2dd8145009c4863 build: bump @babel/traverse from 7.18.9 to 7.23.2 in /website (#2098)
  • c12afc51b0b68c8a3aa7188955b3795dcfc0a1c8 chore: bump speech sdk version (#2107)
See More
  • 1af71ed4d40ca52e14774623651c6fc2c784615f docs: update anomaly detector docs (#2103)
  • 377df2f57d485f91bdef14139c658c14003c3576 build: bump ossf/scorecard-action from 2.3.0 to 2.3.1 (#2108)
  • cd43ee7c73268a0545c612599fb598398923d0d7 fix: unit test break in TranslatorSuite (#2111)
  • cc77eda925ceeda0de4354daa7b3624a6a26f84a chore: removing gpt-review (#2113)
  • 70dc523114768eea12ff0648c03fcfc3785f69de fix: gpt-review action (#2112)

This list of changes was auto generated.

v1.0.0

6 months ago

SynapseML: Simple and distributed machine learning
We are excited to announce the release and general availability of SynapseML v1.0 following seven years of continuous development. SynapseML is an open-source library that aims to streamline the development of massively scalable machine learning pipelines. It unifies several existing ML Frameworks and new Microsoft algorithms in a single, scalable API that is usable across Python, R, Scala, and Java. SynapseML is usable from any Apache Spark platform and is now generally available with enterprise support on Microsoft Fabric.

Highlights

Distributed Langchain Vector Search Indices Semantic Link
Deploy your LLM apps on millions of documents Quickly create semantic and multi-modal search engines Work with PowerBI datasets natively from Microsoft Fabric
View Notebook Try an Example Learn More
Keyless AI Services Orthogonal Forests
Use built-in AI services without keys in Microsoft Fabric Discover and measure heterogeneous causal effects
Learn More Try an Example

New Features

General ✨

  • Add support for spark 3.4.1 (#2052) (#2116)
  • Enterprise support on Microsoft Fabric

Open AI and Langchain 🦜

  • Add the LangchainTransformer for orchestrating LLMs at scale (#1925, #2036)
  • Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
  • Add Langchain notebook (#2002, #2013)
  • Add OpenAI document Q+A notebook (#2029, #2033)
  • Add custom chatbot creation to form recognition demo (#1888)

Azure AI Services 🧠

  • Add Support for Azure Cognitive Search Vector Indices (#2041)
  • Add keyless Azure AI services on Microsoft Fabric (#2070, #1859)
  • Support new form recognizer APIs (#1882)
  • Support streaming multivariate anomaly detection (#1893)
  • Add prerequisites page for setting up OpenAI and Azure AI services (#2008)

Deep Learning 🕸

Causal Learning 📈

  • Add OrthogonalForestDML for causal learning with heterogeneous effects (#1873)
  • Add Heterogeneous Effect Quickstart
  • Support custom reference distribution in DistributionBalanceMeasures to detect data drift (#1885)
  • Add statistical significance reporting for causal learners using getPValue (#1863)

LightGBM 🌳

Additional Updates

Bug Fixes 🐞

  • Improve LGBM exception and logging (#2037)
  • AI Services and other HTTP Clients no longer retry 4XX codes other than 429 (#2005)
  • Make geospatial services robust to 404s thrown by the service (#2007)
  • Fix bug #1869, where DoubleML .setFitIntercept should default to true (#1876)
  • Fix Multivariate Anomaly error handling (#1991)
  • Fix import error when using AI services on Azure Machine Learning clusters (#1951)
  • Fix default values of aadToken & url on Fabric (#1918)
  • Fix ONNX model shape inference on batches with shape [-1] (#1906)
  • Add getPValue to python API of DoubleML (#1909)
  • Add diagnosticsInfo in Multivariate Anomaly detection response (#1892)
  • Fix Double ML timeout on large datasets (#1903)
  • Retry OnnxHub calls to improve test reliability (#1889)
  • Remove case matching for erased generic types (#1880)
  • Remove extraneous Foo type from Python codegen (#1867)
  • Update OpenAIEmbedding Schema to account for internalServiceType
  • Update Maven package to include correct GitHub path (#2073)

Documentation 📘

Maintenance 🔧

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

Aydan Aksoylar Sheryl Zhao Markus Cozowicz
Aydan is a Senior Applied AI Engineer and a first-time contributor to SynapseML. Aydan recently joined Azure Data but quickly led the efforts to add the new integration with Azure Cognitive Search's Vector Indices. This feature allows users to quickly create flexible semantic search engines powered by rich models like GPT4. Aydan went above and beyond on thie project and also contributed a Document Question and Answering with PDFs quickstart to showcase how to use these new features. Sheryl is Principal Applied Scientist on the SynapseML team and a first-time contributor to SynapseML. Sheryl worked hard to devise an elegant connection between the LangChain and SynapseML to enable deploying chains on large datasets. She also designed and built a lovely quickstart to showcase how to build a distributed axiv reader with only a few lines of code. Markus is a Principal Applied Scientist on the SynapseML team and a SynapseML veteran developer. Markus has contributed algorithms running the gamut from reinforcement learning and LLMs to anomaly detectors. This release, Markus contributed an ambitious and full-featured integration between SparkSQL and PowerBI data models. This allows users to explore their existing PowerBI datasets and measures with the full generality of PySpark or (Scala) Spark. This dramatically expands the automation possibilities within Microsoft Fabric. Markus never ceases to out-do his prior contributions and we are excited to see what he has in store next.
Amir Jafari Aadharsh Kannan Brendan Walsh
Amir Jafari is Senior Product Manager on the SynapseML team and has recently taken over the role of the official SynapseML PM. Amir's passion to advance the library was instrumental in driving us to v1.0. He is fiercely productive and has a knack for simplifying and improving the SynapseML user experience. Additionally, Amir isn’t afraid to roll up his sleeves and contribute notebooks and blogs. He drove several efforts to create new quickstarts and documentation for a variety of SynapseML features. Aadharsh is a Vice President and Head of Economics and Data Science at Western Digital. Aadharsh is also a new SynapseML contributor whose first contribution significantly generalized our causal inference stack to support fast estimation of heterogeneous causal treatment effects with Orthogonal Random Forests. This was a nontrivial and mathematically intensive contribution, and we are grateful for Aadharsh's expertise and persistence in getting this through our build system. Brendan is a Senior Engineer on the SynapseML and a talented developer. Brendan's contributions range from core improvements to the SynapseML build and documentation generation system, to spearheading customer engagements and onboarding AI services. Most recently, Brendan used SynapseML to create and donate thousands of audiobooks to the open source in partnership with Project Gutenberg. This effort was considered one of TIME's top 200 inventions of 2023. You can learn more about Brendan’s awesome technical philanthropy efforts at https://aka.ms/audiobook.
Jessica Wang Serena Ruan Cruise Li
Jessica is Software Engineer who recently joined the SynapseML team. Already, Jessica has grown into the role of the SynapseML benevolent “doc”tator. This release Jessica has worked hard to ensure that the SynapseML notebooks work across a wide variety of Spark platforms and are easy and simple to get started with. This work requires knowledge of the entire library’s surface area, and we are thankful Jessica has worked so hard to learn this breadth of content. Furthermore, Jessica was also instrumental in building our Azure Doc auto-generation system to ensure all docs are tested as part of our CI build. Serena is a Software Engineer at Databricks, a MLFlow maintainer, and a prolific SynapseML contributor. Serena's impact can be felt throughout almost every aspect of the library, and she is personally responsible for the new Form Recognizer V3 update, new streaming anomaly detection APIs, distributed deep network training, and many more features. Additionally, Serena laid the foundations of keyless authentication on Fabric, and pioneered our integration with MLFlow. Cruise is a Software Engineer II on the SynapseML team in Bejing. Cruise has been instrumental in building and testing the keyless Azure AI services on Microsoft Fabric. With this contribution, Fabric users can configure their workspaces to use OpenAI, Langchain, and a variety of other AI services without the hassle of managing keys or authentication. Cruise has also worked hard to ensure AAD authentication works with Azure AI services and has helped the effort to standardize logging and telemetry across SynapseML and its sister projects.

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML

Markus Weimer @markusweimer, Eric Dettinger @sandshadow, Scott Votaw @svotaw, Mark Niehaus @niehaus59, Aydan Aksoylar @aydan-at-microsoft, Sheryl Zhao @sherylZhaoCode, Markus Cozowicz @eisber, Brendan Walsh @BrendanWalsh, Jessica Wang @JessicaXYWang, Tom Finley @TomFinley, Sailesh Baidya @saileshbaidya, Keerthi Yanda @KeerthiYandaOS, Kyle Rush @k-rush, Aadharsh Kannan @AKannanMSFT, Serena Ruan @serena-ruan, Cruise Li @mslhrotk @lhrotk, Jason Wang @memoryz, Haizhou (Dylan) Wang @dylanw-oss, Sarah Shy @sarahshy, Kashyap Patel @ms-kashyap, Puneet Pruthi @ppruthi, Ilya Matiach @imatiach-msft, Amir Jafari @amhjf, Nellie Gustafsson, Bogdan Crivat, Justyna Lucznik @juluczni, Richard Wydrowski @richwyd, Tania Arya @taniaarya, Adithya Mukund @adithyamukund, Roman Batoukov @RomanBat, Alexandra Savelieva @alsavelv, Jessica Wolk @msplants Luis França @luisffranca Paul Koch @paulbkoch Rich Caruana, Avrilia Floratou, Martha Laguna @martthalch @marthalc, Jeff Zheng, Sciong Yang, Peixian Gong, Ruixin Xu, Chris Hoder, Derek Legenzoff, Misha Desai, Eren Orbey, Beverly Kodhek, Louise Han @jr-MS, Raj Rikhy, Brice Chung, Marcos Campos, Mike Estee, Kim Manis, Mitrabhanu Mohanty, Anand Raman, Sudarshan Raghunathan @drdarshan, William T. Freeman, John Moyer, Vidip Acharya, Ashit Gosalia, Miguel Fierro @miguelgfierro, Ismaël Mejía @iemejia, Kartavya Neema @kartavyaneema, Daniel Ciborowski @dciborow, Mark Tabladillo @marktab Guilherme Beltramini @gcbeltramini Akshaya Annavajhala (AK), James Verbus @jverbus, Mopé Akande @msakande, Frank Solomon @fbsolo-ms1, ONNX Team, Azure Global, Vowpal Wabbit Team, LightGBM Team, MSFT Garage Team, MSR Outreach Team, Speech SDK Team, MLflow Team, Azure Docs Team

Learn More

Visit our website for the latest docs, demos, and examples Learn about our effort to create thousands of free audiobooks Learn the basics of SynapseML
Read our full list of SynapseML Ignite Announcements Apply OpenAI language models to your large datasets Read our Paper on Custom Voice Audiobook Creation