Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

3 bytes removed, 13:52, 6 December 2022
Spark Ecosystem
===2. Spark SQL===
Spark SQL brings a data abstraction concept called SchemaRDD to the Spark core to provide support for structured and semi-structured data. Spark SQL provides domain-specific languages, and you can manipulate SchemaRDDs using Scala, Java, or Python. It also supports the use of the SQL language using the command line interface and ODBC/JDBC server.
 
===3. Spark Streaming===
Spark Streaming takes advantage of Spark's core fast scheduling capabilities to perform stream analysis. It intercepts small batches of information and performs RDD transformations on them. This design allows streaming analysis to use the same set of application code written for batch analysis within the same engine.
 
 
===4. MLlib===
92
edits