6

拥抱开源,我们是认真的-网易易数2020年Apache Spark贡献总结

 3 years ago
source link: https://my.oschina.net/u/4565392/blog/4839806
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
拥抱开源,我们是认真的-网易易数2020年Apache Spark贡献总结

第三、完备的企业级特性支持。依托 Kyuubi 自身架构的特点,提供认证、鉴权服务,保障数据安全性;提供健壮的高可用服务,保障服务的可用性;提供多租户资源资源隔离的能力,提供端到端的计算资源及数据安全隔离;提供两级弹性资源管理,在有效提升资源利用率的基础上合理控制成本,并且有效的覆盖交互式、批处理和点查、全表Scan等各种场景的性能及响应要求。

第四,丰富的生态支持与构建。一个优秀的开源产品离不开优秀的开源生态支持。Kyuubi 在拥抱Spark等顶级开源生态的同时,一方面有效的利用这些项目本身生态的开放性,可以快速使得Kyuubi对其既有生态及新特性新生态的拓展,如云原生支持、数据湖(Data Lake/Lake House)的支持;另一方面,Kyuubi也积极构建和完善自己的生态,弥补各个环节的空缺,如 https://github.com/netease/spark-ranger项目可完善大数据链路中权限控制短板,https://github.com/netease/spark-greenplum项目可解决Spark与传统数据库PostgreSQL和MPP数据库Greenplum数据交换的性能问题等等。

Kyuubi开源地址:https://github.com/netease/kyuubi

2020年,不平凡的一年。来自大自然的威胁,让我们深刻地认识到全人类开放合作的重要性。

一个开源社区的本质是开发者。拥抱开源,构建开源生态,符合网易的使命愿景:网聚人的力量,以科技创新缔造美好生活

参与开源,当然除了上面所提到的符合企业自身利益,同时也是因为热爱:为热爱全心投入。

附:截至2020年底网易人在Apache Spark 的主要贡献

* ae1d05927a [SPARK-33892][SQL] Display char/varchar in DESC and SHOW CREATE TABLE
* 2287f56a3e (origin/master, origin/HEAD, master) [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns
* a3dd8dacee [SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list
* 6da5cdf1db [SPARK-33876][SQL] Add length-check for reading char/varchar from tables w/ a external location
* f5fd10b1bc (SparkSPARK-33877) [SPARK-33834][SQL] Verify ALTER TABLE CHANGE COLUMN with Char and Varchar
* dd44ba5460 [SPARK-32976][SQL][FOLLOWUP] SET and RESTORE hive.exec.dynamic.partition.mode for HiveSQLInsertTestSuite to avoid flakiness
* c17c76dd16 [SPARK-33599][SQL][FOLLOWUP] FIX Github Action with unidoc
* 728a1298af [SPARK-33806][SQL] limit partition num to 1 when distributing by foldable expressions
* 205d8e40bc [SPARK-32991][SQL] [FOLLOWUP] Reset command relies on session initials first
* 4d47ac4b4b [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness
* 31e0baca30 [SPARK-33740][SQL] hadoop configs in hive-site.xml can overrides pre-existing hadoop ones
* c88eddac3b [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types
* da72b87374 [SPARK-33641][SQL] Invalidate new char/varchar types in public APIs that produce incorrect results
* 2da72593c1 [SPARK-32976][SQL] Support column list in INSERT statement
* cdd8e51742 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql
* 4335af075a [MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only
* 036c11b0d4 [SPARK-33397][YARN][DOC] Fix generating md to html for available-patterns-for-shs-custom-executor-log-url
* 82d500a05c [SPARK-33193][SQL][TEST] Hive ThriftServer JDBC Database MetaData API Behavior Auditing
* e21bb710e5 [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET
* dcb0820433 [SPARK-32785][SQL][DOCS][FOLLOWUP] Update migaration guide for incomplete interval literals
* 2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code
* 17d309dfac [SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation
* e2a740147c [SPARK-32874][SQL][FOLLOWUP][TEST-HIVE1.2][TEST-HADOOP2.7] Fix spark-master-test-sbt-hadoop-2.7-hive-1.2
* 9e9d4b6994 [SPARK-32905][CORE][YARN] ApplicationMaster fails to receive UpdateDelegationTokens message
* 316242b768 [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server
* 5669b212ec [SPARK-32840][SQL] Invalid interval value can happen to be just adhesive with the unit
* 9ab8a2c36d [SPARK-32826][SQL] Set the right column size for the null type in SparkGetColumnsOperation
* de44e9cfa0 [SPARK-32785][SQL] Interval with dangling parts should not results null
* 1fba286407 [SPARK-32781][SQL] Non-ASCII characters are mistakenly omitted in the middle of intervals
* 6dacba7fa0 [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation
* 0626901bcb [SPARK-32729][SQL][DOCS] Add missing since version for math functions
* f14f3742e0 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly
* 1f3bb51757 [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F
* c26a97637f Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…
* 1b6f482adb [SPARK-32492][SQL][FOLLOWUP][TEST-MAVEN] Fix jenkins maven jobs
* 7f5326c082 [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools
* 3deb59d5c2 [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path
* f4800406a4 [SPARK-32406][SQL][FOLLOWUP] Make RESET fail against static and core configs
* 510a1656e6 [SPARK-32412][SQL] Unify error handling for spark thrift server operations
* d315ebf3a7 [SPARK-32424][SQL] Fix silent data change for timestamp parsing if overflow happens
* d3596c04b0 [SPARK-32406][SQL] Make RESET syntax support single configuration reset
* b151194299 [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server
* 29b7eaa438 [MINOR][SQL] Fix warning message for ThriftCLIService.GetCrossReference and GetPrimaryKeys
* efa70b8755 [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation
* bdeb626c5a [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
* 4609f1fdab [SPARK-32207][SQL] Support 'F'-suffixed Float Literals
* 59a70879c0 [SPARK-32145][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message
* 9f8e15bb2e [SPARK-32034][SQL] Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown
* 93529a8536 [SPARK-31957][SQL] Cleanup hive scratch dir for the developer api startWithContext
* abc8ccc37b [SPARK-31926][SQL][TESTS][FOLLOWUP][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber
* a0187cd6b5 [SPARK-31926][SQL][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber
* 22dda6e18e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing
* 6a424b93e5 [SPARK-31830][SQL] Consistent error handling for datetime formatting and parsing functions
* 02f32cfae4 [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber
* fc6af9d900 [SPARK-31867][SQL][FOLLOWUP] Check result differences for datetime formatting
* 9d5b5d0a58 [SPARK-31879][SQL][TEST-JAVA11] Make week-based pattern invalid for formatting too
* afcc14c6d2 [SPARK-31896][SQL] Handle am-pm timestamp parsing when hour is missing
* afe95bd9ad [SPARK-31892][SQL] Disable week-based date filed for parsing
* c59f51bcc2 [SPARK-31879][SQL] Using GB as default Locale for datetime formatters
* 547c5bf552 [SPARK-31867][SQL] Disable year type datetime patterns which are longer than 10
* fe1da296da [SPARK-31833][SQL][TEST-HIVE1.2] Set HiveThriftServer2 with actual port while configured 0
* 311fe6a880 [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite
* 695cb617d4 (t1) [SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q'
* 0df8dd6073 [SPARK-30352][SQL] DataSourceV2: Add CURRENT_CATALOG function
* 7e2ed40d58 [SPARK-31759][DEPLOY] Support configurable max number of rotate logs for spark daemons
* 1f29f1ba58 [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
* 1d66085a93 [SPARK-31289][TEST][TEST-HIVE1.2] Eliminate org.apache.spark.sql.hive.thriftserver.CliSuite flakiness
* 503faa24d3 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
* ce714d8189 [SPARK-31678][SQL] Print error stack trace for Spark SQL CLI when error occurs
* b31ae7bb0b [SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions
* bd6b53cc0b [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry
* 9241f8282f [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations
* ea525fe8c0 [SPARK-31597][SQL] extracting day from intervals should be interval.days + days in interval.microsecond
* 295d866969 [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc
* 54996be4d2 [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations
* beec8d535f [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r)
* 5ba467ca1d [SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc
* ebc8fa50d0 [SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode
* 7959808e96 [SPARK-31564][TESTS] Fix flaky AllExecutionsPageSuite for checking 1970
* f92652d0b5 [SPARK-31528][SQL] Remove millennium, century, decade from trunc/date_trunc fucntions
* caf3ab8411 [SPARK-31552][SQL] Fix ClassCastException in ScalaReflection arrayClassFor
* 8424f55229 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession
* 8dc2c0247b [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static
* 3b5792114a [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info
* 37d2e037ed [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
* 2c2062ea7c [SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation
* 1985437110 [SPARK-31474][SQL] Consistency between dayofweek/dow in extract exprsession and dayofweek function
* 77cb7cde0d [SPARK-31469][SQL][TESTS][FOLLOWUP] Remove unsupported fields from ExtractBenchmark
* 697083c051 [SPARK-31469][SQL] Make extract interval field ANSI compliance
* 31b907748d [SPARK-31414][SQL][DOCS][FOLLOWUP] Update default datetime pattern for json/csv APIs documentations
* d65f534c5a [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing
* a454510917 [SPARK-31392][SQL] Support CalendarInterval to be reflect to CalendarntervalType
* 3c94a7c8f5 [SPARK-29311][SQL][FOLLOWUP] Add migration guide for extracting second from datetimes
* 1ce584f6b7 [SPARK-31321][SQL] Remove SaveMode check in v2 FileWriteBuilder
* f376d24ea1 [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery
* 5945d46c11 [SPARK-31225][SQL] Override sql method of OuterReference
* 8be16907c2 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
* 44bd36ad7b [SPARK-31234][SQL] ResetCommand should reset config to sc.conf only
* b024a8a69e [MINOR][DOCS] Fix some links for python api doc
* 336621e277 [SPARK-31258][BUILD] Pin the avro version in SBT
* f81f11822c [SPARK-31189][R][DOCS][FOLLOWUP] Replace Datetime pattern links in R doc
* 88ae6c4481 [SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document
* 3d695954e5 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text
* 57fcc49306 [SPARK-31176][SQL] Remove support for 'e'/'c' as datetime pattern charactar
* f1d27cdd91 [SPARK-31119][SQL] Add interval value support for extract expression as extract source
* 5bc0d76591 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir
* 0946a9514f [SPARK-31150][SQL] Parsing seconds fraction with variable length for timestamp
* fbc9dc7e9d [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark
* 7b4b29e8d9 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled
* 18f2730874 [SPARK-31066][SQL][TEST-HIVE1.2] Disable useless and uncleaned hive SessionState initialization parts
* 2b46662bd0 [SPARK-31111][SQL][TESTS] Fix interval output issue in ExtractBenchmark
* 3bd6ebff81 [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces
* f45ae7f2c5 [SPARK-31038][SQL] Add checkValue for spark.sql.session.timeZone
* 3edab6cc1d [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit
* 1fac06c430 Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"
* 1383bd459a [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url
* 2d2706cb86 [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite
* a6026c830a [MINOR][BUILD] Fix make-distribution.sh to show usage without 'echo' cmd
* 761209c1f2 [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations
* 46019b6e6c [MINOR][DOCS] Fix fabric8 version in documentation
* 0353cbf092 [MINOR][DOC] Fix 2 style issues in running-on-kubernetes doc
* 58b9ca1e6f [SPARK-30592][SQL][FOLLOWUP] Add some round-trip test cases
* 3228d723a4 [SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and TableCatalog to CatalogV2Util
* 8e280cebf2 [SPARK-30592][SQL] Interval support for csv and json funtions
* f2d71f5838 [SPARK-30591][SQL] Remove the nonstandard SET OWNER syntax for namespaces
* af705421db [SPARK-30593][SQL] Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip
* 730388b369 [SPARK-30547][SQL][FOLLOWUP] Update since anotation for CalendarInterval class
* 0388b7a3ec [SPARK-30568][SQL] Invalidate interval type as a field table schema
* 24efa43826 [SPARK-30019][SQL] Add the owner property to v2 table
* 4806cc5bd1 [SPARK-30547][SQL] Add unstable annotation to the CalendarInterval class
* 17857f9b8b [SPARK-30551][SQL] Disable comparison for interval type
* 82f25f5855 [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties
* bcf07cbf5f [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
* c37312342e [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE/ALTER NAMESPACE syntax
* 8c121b0827 [SPARK-30431][SQL] Update SqlBase.g4 to create commentSpec pattern like locationSpec
* c49388a484 [SPARK-30214][SQL] A new framework to resolve v2 commands
* e04309cb1f [SPARK-30341][SQL] Overflow check for interval arithmetic operations
* f0bf2eb006 [SPARK-30356][SQL] Codegen support for the function str_to_map
* da65a955ed [SPARK-30266][SQL] Avoid match error and int overflow in ApproximatePercentile and Percentile
* 12249fcdc7 [SPARK-30301][SQL] Fix wrong results when datetimes as fields of complex types
* d38f816748 [MINOR][SQL][DOC] Fix some format issues in Dataset API Doc
* cc7f1eb874 [SPARK-29774][SQL][FOLLOWUP] Add a migration guide for date_add and date_sub
* bf7215c510 [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
* d3ec8b1735 [SPARK-30066][SQL] Support columnar execution on interval types
* 8f0eb7dc86 [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal
* 24c4ce1e64 [SPARK-28351][SQL][FOLLOWUP] Remove 'DELETE FROM' from unsupportedHiveNativeCommands
* e88d74052b [SPARK-30147][SQL] Trim the string when cast string type to booleans
* 35bab33984 [SPARK-30121][BUILD] Fix memory usage in sbt build script
* b9cae37750 [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
* 332e252a14 [SPARK-29425][SQL] The ownership of a database should be respected
* 65552a81d1 [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking
* 39291cff95 [SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset
* 4e073f3c50 [SPARK-30047][SQL] Support interval types in UnsafeRow
* 4fd585d2c5 [SPARK-30008][SQL] The dataType of collect_list/collect_set aggs should be ArrayType(_, false)
* ed0c33fdd4 [SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string
* 8b0121bea8 [MINOR][DOC] Fix the CalendarIntervalType description
* de21f28f8a [SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces
* 5cf475d288 [SPARK-30000][SQL] Trim the string when cast string type to decimals
* 2dd6807e42 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting
* d555f8fcc9 [SPARK-29961][SQL][FOLLOWUP] Remove useless test for VectorUDT
* 7a70670345 [SPARK-29961][SQL] Implement builtin function - typeof
* 79ed4ae2db [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point
* ea010a2bc2 [SPARK-29873][SQL][TEST][FOLLOWUP] set operations should not escape when regen golden file with --SET --import both specified
* ae6b711b26 [SPARK-29941][SQL] Add ansi type aliases for char and decimal
* 50f6d930da [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval
* 5cebe587c7 [SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type
* 0c68578fa9 [SPARK-29888][SQL] new interval string parser shall handle numeric with only fractional part
* 15a72f3755 [SPARK-29287][CORE] Add LaunchedExecutor message to tell driver which executor is ready for making offers
* f926809a1f [SPARK-29390][SQL] Add the justify_days(), justify_hours() and justif_interval() functions
* d99398e9f5 [SPARK-29855][SQL] typed literals with negative sign with proper result or exception
* d06a9cc4bd [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values
* e026412d9c [SPARK-29679][SQL] Make interval type comparable and orderable
* e7f7990bc3 [SPARK-29688][SQL] Support average for interval type values
* 0a03839366 [SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils
* 9562b26914 [SPARK-29757][SQL] Move calendar interval constants together
* 3437862975 [SPARK-29387][SQL][FOLLOWUP] Fix issues of the multiply and divide for intervals
* 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling
* 44b8fbcc58 [SPARK-29663][SQL] Support sum with interval type values
* 8cf76f8d61 [SPARK-29285][SHUFFLE] Temporary shuffle files should be able to handle disk failures
* 5ba17d09ac [SPARK-29722][SQL] Non reversed keywords should be able to be used in high order functions
* dc987f0c8b [SPARK-29653][SQL] Fix MICROS_PER_MONTH in IntervalUtils
* 8e667db5d8 [SPARK-29629][SQL] Support typed integer literal expression
* 9a46702791 [SPARK-29554][SQL] Add `version` SQL function
* 0cf4f07c66 [SPARK-29545][SQL] Add support for bit_xor aggregate function
* 5b4d9170ed [SPARK-27879][SQL] Add support for bit_and and bit_or aggregates
* ef4c298cc9 [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership
* 4b902d3b45 [SPARK-29491][SQL] Add bit_count function support
* 6d4cc7b855 [SPARK-27880][SQL] Add bool_and for every and bool_or for any as function aliases
* 02c5b4f763 [SPARK-28947][K8S] Status logging not happens at an interval for liveness
* f4c73b7c68 [SPARK-27301][DSTREAM] Shorten the FileSystem cached life cycle to the cleanup method inner scope
* ac9c0536bc [SPARK-26794][SQL] SparkSession enableHiveSupport does not point to hive but in-memory while the SparkContext exists
* f8346d2fc0 [SPARK-25174][YARN] Limit the size of diagnostic message for am to unregister itself from rm
* 4a2b15f0af [SPARK-24241][SUBMIT] Do not fail fast when dynamic resource allocation enabled with 0 executor
* a7755fd8ce [SPARK-23639][SQL] Obtain token before init metastore client in SparkSQL CLI
* 189f56f3dc [SPARK-23383][BUILD][MINOR] Make a distribution should exit with usage while detecting wrong options
* eefec93d19 [SPARK-23295][BUILD][MINOR] Exclude Waring message when generating versions in make-distribution.sh
* dd52681bf5 [SPARK-23253][CORE][SHUFFLE] Only write shuffle temporary index file when there is not an existing one
* 793841c6b8 [SPARK-21771][SQL] remove useless hive client in SparkSQLEnv
* 9fa703e893 [SPARK-22950][SQL] Handle ChildFirstURLClassLoader's parent
* 28ab5bf597 [SPARK-22487][SQL][HIVE] Remove the unused HIVE_EXECUTION_VERSION property
* c755b0d910 [SPARK-22463][YARN][SQL][HIVE] add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive
* ee571d79e5 [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default
* 99e32f8ba5 [SPARK-22224][SQL] Override toString of KeyValue/Relational-GroupedDataset
* 581200af71 [SPARK-21428][SQL][FOLLOWUP] CliSessionState should point to the actual metastore not a dummy one
* b83b502c41 [SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState
* 2387f1e316 [SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Details for Stage Page
* e9d268f63e [SPARK-20096][SPARK SUBMIT][MINOR] Expose the right queue name not null if set by --conf or configure file
* 7363dde634 [SPARK-19626][YARN] Using the correct config to set credentials update time
* e33053ee00 [SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly
* 7466031632 [SPARK-32106][SQL] Implement script transform in sql/core
* 0603913c66 [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value
* 25c6cc25f7 [SPARK-26341][WEBUI] Expose executor memory metrics at the stage level, in the Stages tab
* 5f9a7fea06 [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow
* d7f4b2ad50 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+
* 47326ac1c6 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+
* dd32f45d20 [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value
* 34f5e7ce77 [SPARK-33302][SQL] Push down filters through Expand
* 0c943cd2fb [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
* e43cd8ccef [SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive
* a1629b4a57 [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
* f8277d3aa3 [SPARK-32069][CORE][SQL] Improve error message on reading unexpected directory
* ddc7012b3d [SPARK-32243][SQL] HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number error
* 0b5a379c1f [SPARK-33023][CORE] Judge path of Windows need add condition `Utils.isWindows`
* c336ddfdb8 [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
* 5e6173ebef [SPARK-31670][SQL] Trim unnecessary Struct field alias in Aggregate/GroupingSets
* 55ce49ed28 [SPARK-32400][SQL][TEST][FOLLOWUP][TEST-MAVEN] Fix resource loading error in HiveScripTransformationSuite
* 9808c15eec [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
* c75a82794f [SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column
* 6dae11d034 [SPARK-32607][SQL] Script Transformation ROW FORMAT DELIMITED `TOK_TABLEROWFORMATLINES` only support '\n'
* 03e2de99ab [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
* 643cd876e4 [SPARK-32352][SQL] Partially push down support data filter if it mixed in partition filters
* 4cf8c1d07d [SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec
* d251443a02 [SPARK-32403][SQL] Refactor current ScriptTransformationExec
* 5521afbd22 [SPARK-32220][SQL][FOLLOW-UP] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
* 6d499647b3 [SPARK-32105][SQL] Refactor current ScriptTransformationExec code
* 09789ff725 [SPARK-31226][CORE][TESTS] SizeBasedCoalesce logic will lose partition
* 560fe1f54c [SPARK-32220][SQL] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result
* 15fb5d7677 [SPARK-28169][SQL] Convert scan predicate condition to CNF
* 0d9faf602e [SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5
* 6bc8d84130 [SPARK-29492][SQL] Reset HiveSession's SessionState conf's ClassLoader when sync mode
* 246c398d59 [SPARK-30435][DOC] Update doc of Supported Hive Features
* 3eade744f8 [SPARK-29800][SQL] Rewrite non-correlated EXISTS subquery use ScalaSubquery to optimize perf
* da27f91560 [SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit jdk8/jdk11
* 6146dc4562 [SPARK-29874][SQL] Optimize Dataset.isEmpty()
* eb79af8dae [SPARK-29145][SQL][FOLLOW-UP] Move tests from `SubquerySuite` to `subquery/in-subquery/in-joins.sql`
* e524a3a223 [SPARK-29742][BUILD] Update checkstyle plugin's check dir scope
* d6e33dc377 [SPARK-29599][WEBUI] Support pagination for session table in JDBC/ODBC Tab
* 67cf0433ee [SPARK-29145][SQL] Support sub-queries in join conditions
* 484f93e255 [SPARK-29530][SQL] Make SQLConf in SQL parse process thread safe
* 9a3dccae72 [SPARK-29379][SQL] SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
* ef81525a1a [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2
* 178a1f3558 [SPARK-29305][BUILD] Update LICENSE and NOTICE for Hadoop 3.2
* 0cf2f48dfe [SPARK-29022][SQL] Fix SparkSQLCLI can not add jars by AddJarCommand
* 1d4b2f010b [SPARK-29247][SQL] Redact sensitive information in when construct HiveClientHive.state
* cc852d4eec [SPARK-29015][SQL][TEST-HADOOP3.2] Reset class loader after initializing SessionState for built-in Hive 2.3
* d22768a6be [SPARK-29036][SQL] SparkThriftServer cancel job after execute() thread interrupted
* fe4bee8fd8 [SPARK-29162][SQL] Simplify NOT(IsNull(x)) and NOT(IsNotNull(x))
* 54d3f6e7ec [SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation
* 9f478a6832 [SPARK-28901][SQL] SparkThriftServer's Cancel SQL Operation show it in JDBC Tab UI
* 036fd3903f [SPARK-27637][SHUFFLE][FOLLOW-UP] For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533
* e853f068f6 [SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs
* 1dd63dccd8 [SPARK-33860][SQL] Make CatalystTypeConverters.convertToCatalyst match special Array value
* bc46d273e0 [SPARK-33840][DOCS] Add spark.sql.files.minPartitionNum to performence tuning doc
* 839d6899ad [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field
* 5bab27e00b [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver

作者:网易易数Spark开发团队


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK