1. 概述
编写 UDF 前,需要您简单了解 Spark、Presto 引擎的一些前置条件,以便更加正确、高效的使用。
说明:
2. 背景信息
Spark 引擎在执行 UDF 时,会将 LAS Resource 资源中的 UDF Jar 包拉至 Spark Driver,并由 Driver 将 UDF 代码分发到运行对应 Task 的 Executor 节点上,在每个 Executor 节点进行本地执行。
Presto 引擎在执行 UDF 时,基于安全、稳定性考虑,会在远端 FaaS 执行。FaaS 即 Function as a Service,它可以基于自动扩缩容的能力免去扩缩容运维成本。关于 FaaS 在 UDF 的使用,需要注意两点:
当您首次创建函数时(执行 Create Function SQL)会触发 FaaS 的初始化,这个过程一般会耗时 1 min 左右,也会随着 UDF Jar 包增大而耗时增加。
基于 FaaS 自动扩缩容的能力,当您一段时间没有执行 UDF 时,FaaS 实例数可能会缩容至 0,此时当您首次执行 UDF 时,会触发 FaaS 的冷启动,正常在 2~3 s 内完成,同样会随 UDF Jar 包增大而耗时增加,之后一段时间内的调用便不会有冷启动的性能损耗。
3. 创建 UDF
LAS 支持 UI 创建及 DDL 创建 UDF,具体可参考 数据管理。
4. JAR 包打入指南
在使用 Maven 插件对第三方依赖打入 Jar 包时,参考如下表格,仅需要打入引擎没有内置的 Jar 包。
引擎 | 已内置Jar |
---|
Spark | JLargeArrays-1.5.jar
JTransforms-3.1.jar
RoaringBitmap-0.9.0.jar
ST4-4.0.4.jar
activation-1.1.1.jar
aircompressor-0.10.jar
algebra_2.12-2.0.0-M2.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.7.1.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.6.1.jar
apache-log4j-extras-1.2.17.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
arpack_combined_all-0.1.jar
arrow-format-0.15.1.jar
arrow-memory-0.15.1.jar
arrow-vector-0.15.1.jar
audience-annotations-0.5.0.jar
automaton-1.11-8.jar
avro-1.8.2.jar
avro-ipc-1.8.2.jar
avro-mapred-1.8.2-hadoop2.jar
aws-java-sdk-1.7.4.jar
bcprov-jdk16-1.46.jar
bec.jar
bonecp-0.8.0.RELEASE.jar
breeze-macros_2.12-1.0.jar
breeze_2.12-1.0.jar
btrace-1.0.3.jar
bytedance-data_2.12-2.0.3-SNAPSHOT.jar
caffeine-2.6.2.jar
cats-kernel_2.12-2.0.0-M4.jar
chill-java-0.9.5.jar
chill_2.12-0.9.5.jar
commons-0.0.14.jar
commons-beanutils-1.9.4.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.2.jar
commons-compiler-3.0.16.jar
commons-compress-1.8.1.jar
commons-configuration-1.6.jar
commons-crypto-1.0.0.jar
commons-dbcp-1.4.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.9.jar
commons-logging-1.1.3.jar
commons-math3-3.4.1.jar
commons-net-3.1.jar
commons-pool-1.5.4.jar
commons-pool2-2.6.2.jar
commons-text-1.6.jar
compress-lzf-1.0.3.jar
core-1.1.2.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
databus4j-1.2.0-SNAPSHOT.jar
datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
datasketches-java-1.3.0-incubating.jar
datasketches-memory-1.2.0-incubating.jar
derby-10.12.1.1.jar
dnsjava-2.1.7.jar
dps-2.0.5.jar
druid-spark-bundle-0.18-202112.10-bd1.jar
dwarch_rto-1.3.0-SNAPSHOT.jar
flatbuffers-java-1.9.0.jar
generex-1.0.2.jar
gson-2.8.2.jar
guava-14.0.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-4mc-2.0.0-bd2-SNAPSHOT.jar
hadoop-4mc.jar
hadoop-annotations-2.6.0-cdh5.4.4-bd211.jar
hadoop-auth-2.6.0-cdh5.4.4-bd211.jar
hadoop-aws-2.6.0-cdh5.4.4-bd211.jar
hadoop-brotli-0.0.1-SNAPSHOT.jar
hadoop-client-2.6.0-cdh5.4.4-bd211.jar
hadoop-common-2.6.0-cdh5.4.4-bd211.jar
hadoop-hdfs-2.6.0-cdh5.4.4-bd211.jar
hadoop-lzo-0.4.20-SNAPSHOT.jar
hadoop-mapreduce-client-app-2.6.0-cdh5.4.4-bd211.jar
hadoop-mapreduce-client-common-2.6.0-cdh5.4.4-bd211.jar
hadoop-mapreduce-client-core-2.6.0-cdh5.4.4-bd211.jar
hadoop-mapreduce-client-jobclient-2.6.0-cdh5.4.4-bd211.jar
hadoop-mapreduce-client-shuffle-2.6.0-cdh5.4.4-bd211.jar
hadoop-nfs-2.6.0-cdh5.4.4.jar
hadoop-qlimiter-client-2.6.0-cdh5.4.4-bd211.jar
hadoop-xz-1.5-byted.jar
hadoop-yarn-api-2.6.0-cdh5.4.4-bd211.jar
hadoop-yarn-client-2.6.0-cdh5.4.4-bd211.jar
hadoop-yarn-common-2.6.0-cdh5.4.4-bd211.jar
hadoop-yarn-server-common-2.6.0-cdh5.4.4-bd211.jar
hadoop-yarn-server-web-proxy-2.6.0-cdh5.4.4-bd211.jar
hadoop-zstd-1.0.0.jar
hadoop-zstd-4mc-1.0.0.jar
hive-beeline-1.2.2-bd97.jar
hive-cli-1.2.2-bd97.jar
hive-exec-1.2.2-bd97.jar
hive-hcatalog-core.jar
hive-jdbc-1.2.2-bd97.jar
hive-metastore-1.2.2-bd97.jar
hk2-api-2.6.1.jar
hk2-locator-2.6.1.jar
hk2-utils-2.6.1.jar
htrace-core-3.0.4.jar
httpclient-4.5.6.jar
httpcore-4.4.12.jar
hudi-bytelake-bundle_2.12-0.7.0-bd23.jar
infsecclient-1.4.1.jar
istack-commons-runtime-3.0.8.jar
ivy-2.4.0.jar
jackson-annotations-2.10.0.jar
jackson-core-2.10.0.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.10.0.jar
jackson-dataformat-yaml-2.10.0.jar
jackson-datatype-jsr310-2.10.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.10.0.jar
jackson-module-paranamer-2.10.0.jar
jackson-module-scala_2.12-2.10.0.jar
jackson-xc-1.9.13.jar
jakarta.activation-api-1.2.1.jar
jakarta.annotation-api-1.3.5.jar
jakarta.inject-2.6.1.jar
jakarta.validation-api-2.0.2.jar
jakarta.ws.rs-api-2.1.6.jar
jakarta.xml.bind-api-2.3.2.jar
janino-3.0.16.jar
java-jwt-3.11.0.jar
java-redis-client-2.1.6-HADOOP-SNAPSHOT.jar
javassist-3.25.0-GA.jar
javax.annotation-api-1.3.jar
javax.inject-1.jar
javax.servlet-api-3.1.0.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jaxb-runtime-2.3.2.jar
jcl-over-slf4j-1.7.30.jar
jdo-api-3.0.1.jar
jedis-2.9.5-bd1.jar
jersey-client-2.30.jar
jersey-common-2.30.jar
jersey-container-servlet-2.30.jar
jersey-container-servlet-core-2.30.jar
jersey-hk2-2.30.jar
jersey-media-jaxb-2.30.jar
jersey-server-2.30.jar
jetty-6.1.26.cloudera.4.jar
jetty-util-6.1.26.cloudera.4.jar
jline-2.14.6.jar
joda-time-2.10.5.jar
jodd-core-3.5.2.jar
jpam-1.1.jar
json-serde-1.3-jar-with-dependencies.jar
json4s-ast_2.12-3.6.6.jar
json4s-core_2.12-3.6.6.jar
json4s-jackson_2.12-3.6.6.jar
json4s-scalap_2.12-3.6.6.jar
jsoniter-0.9.23.jar
jsr305-3.0.0.jar
jta-1.1.jar
jul-to-slf4j-1.7.30.jar
kotlin-stdlib-1.7.0.jar
kotlin-stdlib-common-1.7.0.jar
kryo-shaded-4.0.2.jar
kubernetes-client-4.9.2.jar
kubernetes-model-4.9.2.jar
kubernetes-model-common-4.9.2.jar
leveldbjni-all-1.8.jar
libfb303-0.9.3.jar
libthrift-0.12.0.jar
log4j-1.2.17.jar
logging-interceptor-3.12.6.jar
lz4-java-1.7.1.jar
machinist_2.12-0.6.8.jar
macro-compat_2.12-1.1.1.jar
metrics-core-4.1.1.jar
metrics-graphite-4.1.1.jar
metrics-jmx-4.1.1.jar
metrics-json-4.1.1.jar
metrics-jvm-4.1.1.jar
metrics4j-1.0.27-simple-SNAPSHOT.jar
mina-core-2.0.10.jar
minlog-1.3.0.jar
msgpack-core-0.7.0-p9.jar
mysql-1.2-RELEASE.jar
mysql-connector-java-5.1.38.jar
netty-3.9.9.Final.jar
netty-all-4.1.47.Final.jar
objenesis-2.5.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.5.10-nohive.jar
orc-mapreduce-1.5.10-nohive.jar
orc-shims-1.5.10.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.3.jar
paranamer-2.8.jar
parquet-column-1.10.1-bd1.0.10.jar
parquet-columnfamily-1.10.1-bd1.0.10.jar
parquet-common-1.10.1-bd1.0.10.jar
parquet-encoding-1.10.1-bd1.0.10.jar
parquet-format-2.4.0-bd1.0-SNAPSHOT.jar
parquet-hadoop-1.10.1-bd1.0.10.jar
parquet-hadoop-bundle-1.6.0.jar
parquet-jackson-1.10.1-bd1.0.10.jar
protobuf-java-2.5.0.jar
py4j-0.10.9.jar
pyrolite-4.30.jar
rhino-1.7.11.jar
scala-collection-compat_2.12-2.1.1.jar
scala-compiler-2.12.10.jar
scala-library-2.12.10.jar
scala-parser-combinators_2.12-1.1.2.jar
scala-reflect-2.12.10.jar
scala-xml_2.12-1.2.0.jar
shapeless_2.12-2.3.3.jar
shims-0.9.0.jar
slf4j-api-1.7.30.jar
slf4j-log4j12-1.7.30.jar
snakeyaml-1.24.jar
snappy-java-1.1.7.5.jar
spark-avro_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-catalyst_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-core_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-graphx_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-hive-thriftserver_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-hive_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-kubernetes_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-kvstore_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-launcher_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-mllib-local_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-mllib_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-network-common_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-network-shuffle_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-repl_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-sketch_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-sql_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-streaming_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-tags_2.12-3.0.1-bd1-SNAPSHOT-tests.jar
spark-tags_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-unsafe_2.12-3.0.1-bd1-SNAPSHOT.jar
spark-yarn_2.12-3.0.1-bd1-SNAPSHOT.jar
spire-macros_2.12-0.17.0-M1.jar
spire-platform_2.12-0.17.0-M1.jar
spire-util_2.12-0.17.0-M1.jar
spire_2.12-0.17.0-M1.jar
stax-api-1.0-2.jar
stax-api-1.0.1.jar
stream-2.9.6.jar
stringtemplate-3.2.1.jar
super-csv-2.2.0.jar
threeten-extra-1.5.0.jar
thrift-client-pool-java-1.3.1.jar
uba-sdk-1.0.7-SNAPSHOT.jar
univocity-parsers-2.8.3.jar
xbean-asm7-shaded-4.15.jar
xmlenc-0.52.jar
xz-1.5.jar
zjsonpatch-0.3.0.jar
zookeeper-3.4.14.jar
zstd-jni-1.4.4-3.jar
zti-issuer-helper-java-1.0.12.jar
zti-jwt-helper-java-1.0.9.jar
zti-jwt-java-1.0.19.jar |
Presto | commons-collections-3.2.2.jar
grpc-core-1.38.1.jar
grpc-netty-shaded-1.38.1.jar
grpc-protobuf-1.38.1.jar
grpc-stub-1.38.1.jar
guava-26.0-jre.jar
hadoop-client-2.6.0-cdh5.4.4-bd1.jar
hadoop-common-2.6.5.jar
hive-exec-1.2.1.jar
hive-metastore-1.2.2-bd82.jar
slf4j-log4j12-1.7.5.jar
perfmark-api-0.23.0.jar
slf4j-api-1.7.5.jar
slf4j-simple-1.7.5.jar |
5. 常见问题 FAQ
Q:执行 Create Function SQL 失败,一般会是什么原因?
首先检查函数名是否满足 Schema.FunctionName 规范,其次函数名忽略大小写,需要提前检查是否已存在同名函数。
Q:同一个 Resource,可以创建多个 UDF 吗?
可以。
Q:删除Schema、资源后,UDF 会被删除吗?
Schema、资源删除后,对应 UDF 包括对应的 FaaS 资源不会被清理,但通过 LAS 控制台提交的 SQL,包含该 UDF 时会运行失败,需要重新在该 Schema 下上传相同资源解决报错。通过 BI 工具走Presto JDBC 直连时不会有这个问题。