Changes

GPU621/Apache Spark Fall 2022

3 bytes removed, 13:52, 6 December 2022

→‎Spark Ecosystem

===2. Spark SQL===

Spark SQL brings a data abstraction concept called SchemaRDD to the Spark core to provide support for structured and semi-structured data. Spark SQL provides domain-specific languages, and you can manipulate SchemaRDDs using Scala, Java, or Python. It also supports the use of the SQL language using the command line interface and ODBC/JDBC server.

===3. Spark Streaming===

Spark Streaming takes advantage of Spark's core fast scheduling capabilities to perform stream analysis. It intercepts small batches of information and performs RDD transformations on them. This design allows streaming analysis to use the same set of application code written for batch analysis within the same engine.

===4. MLlib===

RobinYu

92

edits

Changes

GPU621/Apache Spark Fall 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools