Data Solution 2019(6)MySQL Data Source
Make sure our connection to database is good
> grant all privileges on database.* to root@‘142.xxx.xxx.xxx' identified by ‘xxxxxx';
> flush privileges;
In my Zeppelin Notebook
I can use this to load the dependencies
%spark.dep
z.load("mysql:mysql-connector-java:5.1.47”)
Connect to the Table and Database
val homeAdvisorCompanysRawDF = sqlContext.read
.format("jdbc")
.option("driver", "com.mysql.jdbc.Driver")
.option("url", "jdbc:mysql://45.55.xx.xx:3306/sillycat_services")
.option("user", "root")
.option("password", “xxxxxx")
.option("dbtable", "copy_home_companys")
.load()
homeAdvisorCompanysRawDF.printSchema()
homeAdvisorCompanysRawDF.registerTempTable("homeadvisorcompanys")
Use the function within on parameter Method
val checkPhone : (String => Int) = (phone: String) => {
val regexStr = "^(1\\-)?[0-9]{3}\\-?[0-9]{3}\\-?[0-9]{4}$"
if (phone.matches(regexStr)) {
20
} else {
10
}
}
val checkPhoneColumn = udf(checkPhone)
val phoneDF = homeAdvisorCompanysRawDF.withColumn("phoneScore", checkPhoneColumn(homeAdvisorCompanysRawDF("phone")))
phoneDF.select("phone", "phoneScore").show(2)
Using the Function with multiple parameters
val checkAddress = (location: String, street_address: String, address_locality: String, address_region: String, postal_code: String ) => {
if(location != null && !location.isEmpty() && postal_code != null && !postal_code.isEmpty() ){
20
} else {
10
}
}
val checkAddressColumn = udf(checkAddress)
val addressDF = phoneDF.withColumn("addressScore", checkAddressColumn(phoneDF("location"), phoneDF("street_address"), phoneDF("address_locality"), phoneDF("address_region"), phoneDF("postal_code")))
addressDF.select("phone", "phoneScore", "location", "postal_code", "addressScore").show(2)
Sum up all the related columns and get a total Score
val columnsToSum = List(col("phoneScore"), col("addressScore"))
val resultDF = addressDF.withColumn("totalScore", columnsToSum.reduce(_ + _))
resultDF.select("phone", "phoneScore", "location", "postal_code", "addressScore", "totalScore").show(2);
References:
https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.47
https://zeppelin.apache.org/docs/latest/interpreter/spark.html
分享到:
相关推荐
PHP and MySQL Recipes: A Problem-Solution Approach supplies you with complete code for all of the common coding problems you are likely to face when using PHP and MySQL together in your day-to-day web...
This article discusses how to develop a foreign trade documents management system based Brower/Server use Apache2, Php5 and MySQL5 open source software solution. Trade documents include Sales Contract...
MySQL, a widely-used open-source RDBMS (Relational Database Management System), serves as the backend storage for the system. It efficiently stores and retrieves flower-related data, such as flower ...
Universal Data Access Components (UniDAC) is a library of components that provides... UniDAC is a handy in use and efficient data access solution bringing into play many features specific to each server.
Delphi7 能正常编译,运行 ! Universal Data Access Components (UniDAC) is a library... UniDAC is a handy in use and efficient data access solution bringing into play many features specific to each server.
MySQL, a popular open-source relational database management system, is utilized to store and manage project-related data securely. It offers high performance, reliability, and ease of use, making it a...
Meanwhile, MySQL, a widely-used open-source relational database management system, is selected for the back-end to store and manage data securely and efficiently. MyEclipse, an integrated ...
Hadoop is a popular open-source framework that enables efficient handling of big data by breaking down large datasets into smaller chunks, distributing them across a cluster of commodity servers, and...
MySQL, a popular open-source relational database management system (RDBMS), serves as the backend storage for the data generated by the system. It provides reliable data storage and retrieval ...
UniDAC is a handy in use and efficient data access solution bringing into play many features specific to each server. Requirements UniDAC is a product joining functionality of such data access ...
MySQL, a widely used open-source relational database management system, ensures efficient storage and retrieval of large volumes of data. Its reliability, scalability, and performance make it an ideal...
MySQL, on the other hand, is a widely used open-source relational database management system known for its speed, reliability, and scalability. It serves as the backbone for storing and retrieving ...
MySQL is a popular open-source relational database known for its speed, reliability, and ease of use. It is well-suited for handling the transactional data generated by a supermarket system, ...
MySQL, on the other hand, is an open-source relational database management system (RDBMS) known for its speed, reliability, and ease of use. It's often the go-to choice for small to medium-sized ...
Cacti is a web-based, PHP/MySQL graphing solution to monitor network bandwidth with SNMP using the RRDTool engine—developed by Tobi Oeticker who is already the creator of the famous MRTG. RRDtool ...
On the technical side, MySQL is a popular open-source relational database management system (RDBMS) known for its reliability, scalability, and performance. It serves as the backbone of the system, ...