.NET for Apache Spark is a free, open-source, and cross-platform big data analytics framework.
In the Microsoft.Spark.CSharp.Examples folder, we provide C# samples which will help you get started with .NET for Apache Spark and demonstrate how to infuse big data analytics into existing and new .NET apps.
There are three main types of samples/apps in the repo:
-
SQL/Batch: .NET for Apache Spark apps that analyze batch data, or data that has already been produced/stored.
-
SQL/Streaming: .NET for Apache Spark apps that analyze structured streaming data, or data that is currently being produced live.
-
Machine Learning: .NET for Apache Spark apps infused with Machine Learning models based on ML.NET, an open source and cross-platform machine learning framework.
Batch Processing | |
Basic.cs A simple example demonstrating basic Spark SQL features. |
Datasource.cs Example demonstrating reading from various data sources. |
GitHubProjects.cs Example analyzing GitHub projects data. |
Logging.cs Example demonstrating log processing. |
VectorUdfs.cs Example using vectorized UDFs to improve query performance. |
VectorDataFrameUdfs.cs Example using vectorized UDFs and convenience APIs from Microsft.Data.Analysis to improve query performance. |
Structured Streaming | |
StructuredNetworkWordCount.cs Simple word count app that connects to and analyzes a live data stream (like netcat). |
StructuredNetworkWordCountWindowed.cs Windowed word count app. |
StructuredKafkaWordCount.cs Word count on data from Kafka. |
StructuredNetworkCharacterCount.cs Count number of characters in each string read from a stream, demonstrating the power of UDFs + stream processing. |
Machine Learning | |
Batch Sentiment Analysis Determine if a batch of online reviews are positive or negative, using ML.NET. |
Streaming Sentiment Analysis Determine if statements being produced live are positive or negative, using ML.NET. |
Beyond the sample apps, there are a few other files in the Microsoft.Spark.CSharp.Examples folder:
- IExample.cs: A common interface each sample implements to help provide consistency when creating/running sample apps.
Note: When you create and run sample apps beyond this repository's project, you do not need to use IExample.cs - it just provides consistency for all the apps included in this repo.
-
Microsoft.Spark.CSharp.Examples.csproj: The C# project file necessary for building/running all sample apps. It includes target frameworks, assembly information, and references to other C# project files references in the sample apps.
-
Program.cs: A common entry-point when running our sample apps (it contains the Main method). Helps us print error messages in cases such as a project lacking the necessary arguments.
-
README.md: The doc you are currently reading.