Recently, one of my friends asked me to download some pictures from a website. Instead of doing it manually (there were 90 images to download), I used the opportunity to automate it with Kotlin.
First, let's start with creating an empty project:
1$ mkdir test-jsoup
2$ cd test-jsoup
3$ gradle init --dsl kotlin \
4 --project-name test-jsoup \
5 --type kotlin-application \
6 --package be.yellowduck.testjsoup
7
8> Task :init
9Get more help with your project: https://docs.gradle.org/7.0.2/samples/sample_building_kotlin_applications.html
10
11BUILD SUCCESSFUL in 716ms
122 actionable tasks: 2 executed
We now have an empty project which we can build and run.
1$ ./gradlew run
2
3> Task :app:run
4Hello World!
5
6BUILD SUCCESSFUL in 5s
72 actionable tasks: 2 executed
Now, let's first start with adding the needed dependencies. In the app/build.gradle.kts
file, update the dependencies to:
1dependencies {
2 implementation(platform("org.jetbrains.kotlin:kotlin-bom"))
3 implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
4 implementation("org.jsoup:jsoup:1.13.1")
5 implementation("com.squareup.okhttp3:okhttp:4.9.1")
6 implementation("org.slf4j:slf4j-api:1.7.30")
7 implementation("ch.qos.logback:logback-classic:1.2.3")
8 implementation("ch.qos.logback:logback-core:1.2.3")
9 testImplementation("org.jetbrains.kotlin:kotlin-test")
10 testImplementation("org.jetbrains.kotlin:kotlin-test-junit")
11}
We'll be using the following libraries:
After adding the dependencies, the first thing I do it to configure logging. For that, I change the app/src/main/kotlin/be/yellowduck/testjsoup/App.kt
file to:
1package be.yellowduck.testjsoup
2
3import ch.qos.logback.classic.Level
4import ch.qos.logback.classic.Logger
5import org.slf4j.LoggerFactory
6
7object App {
8
9 init {
10 val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
11 rootLogger.level = Level.INFO
12 }
13
14 val log = LoggerFactory.getLogger(App::class.java)
15
16 @JvmStatic
17 fun main(args: Array<String>) {
18 log.info("Hello world")
19 }
20
21}
This does a couple of things:
- It creates a singleton
App
containing amain
function which will be the entry point of our app. - It configures the root logger so that info, warning and error messages are shown
- It configures a logger for the
App
class
Don't forget to update the main class name in app/build.gradle
before you run it:
1application {
2 mainClass.set('be.yellowduck.testjsoup.App')
3}
When you now run the app, you'll get:
1$ ./gradlew run
2
3> Task :app:run
414:46:21.612 [main] INFO be.yellowduck.testjsoup.App - Hello world
5
6BUILD SUCCESSFUL in 1s
72 actionable tasks: 1 executed, 1 up-to-date
Next up is to use Jsoup to download the HTML and parse it. We'll download the HTML using Jsoup and get a list of all images which have a class .image
. Let's change the main
function to:
1@JvmStatic
2fun main(args: Array<String>) {
3
4 val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"
5
6 log.info("Parsing: ${sourceUrl}")
7 val doc = Jsoup.connect(sourceUrl).get()
8
9 val urls = mutableSetOf<String>()
10 doc.select("img.image").forEach {
11 val url = it.attr("src").replace("thumbnail", "preview")
12 urls.add(url)
13 }
14
15 if (urls.size == 0) {
16 return
17 }
18
19 log.info("Downloading ${urls.size} image(s)")
20
21}
The select
function on the Jsoup document
allows you to use CSS queries to get the elements. In our case, we're taking all the src
attribute values, replace the URL and save them in a list.
The next step is to create a function which downloads an URL to a file. For that, I'll add the downloadFile
function in the App
class:
1val client = OkHttpClient.Builder().build()
2
3fun downloadFile(url: String, toDir: String) {
4
5 val request = Request.Builder().url(URL(url)).get().build()
6
7 val response = client.newCall(request).execute()
8 if (response.code == HttpURLConnection.HTTP_OK) {
9
10 val body = response.body?.bytes()
11
12 val outDir = File(toDir)
13 outDir.mkdirs()
14
15 val outPath = File(outDir, File(URL(url).path).name)
16
17 if (body != null) {
18 log.info("Saving: ${outPath}")
19 outPath.writeBytes(body)
20 }
21
22 }
23
24}
Note that I'm adding a property to the App
object containing the HTTP client as well as a new function. This function uses OkHttp to download and save the file. It takes the URL as the argument as well as the path to where the image should be saved. If the directory doesn't exist, it will be created automatically.
The last step is to download the images and save them:
1val outPath = "/Users/me/Desktop/out"
2
3urls.forEach {
4 downloadFile(it, outPath)
5}
All done and when you run it, it will save all images:
1./gradlew run
2
3> Task :app:run
414:59:17.413 [main] INFO be.yellowduck.testjsoup.App - Parsing: https://www.yellowduck.be/documents/2/001.html
514:59:17.799 [main] INFO be.yellowduck.testjsoup.App - Downloading 30 image(s)
614:59:18.039 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01218.JPG
714:59:18.093 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01146.JPG
814:59:18.145 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01149.JPG
914:59:18.205 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01144.JPG
1014:59:18.253 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01151.JPG
1114:59:18.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01147.JPG
1214:59:18.376 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01145.JPG
1314:59:18.432 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01148.JPG
1414:59:18.488 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01150.JPG
1514:59:18.542 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01161.JPG
1614:59:18.600 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01220.JPG
1714:59:18.657 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00437.JPG
1814:59:18.719 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00441.JPG
1914:59:18.778 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00440.JPG
2014:59:18.832 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00469.JPG
2114:59:18.892 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00468.JPG
2214:59:18.952 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00472.JPG
2314:59:19.020 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00473.JPG
2414:59:19.076 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00422.JPG
2514:59:19.129 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00425.JPG
2614:59:19.175 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00424.JPG
2714:59:19.223 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00426.JPG
2814:59:19.272 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00446.JPG
2914:59:19.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00445.JPG
3014:59:19.373 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00449.JPG
3114:59:19.424 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00450.JPG
3214:59:19.476 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01334.JPG
3314:59:19.535 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01340.JPG
3414:59:19.583 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01339.JPG
3514:59:19.633 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01343.JPG
36
37BUILD SUCCESSFUL in 3s
382 actionable tasks: 1 executed, 1 up-to-date
If you followed along, your app/src/main/kotlin/be/yellowduck/testjsoup/App.kt
should now look like this:
1package be.yellowduck.testjsoup
2
3import ch.qos.logback.classic.Level
4import ch.qos.logback.classic.Logger
5import okhttp3.OkHttpClient
6import okhttp3.Request
7import org.jsoup.Jsoup
8import org.slf4j.LoggerFactory
9import java.io.File
10import java.net.HttpURLConnection
11import java.net.URL
12
13object App {
14
15 init {
16 val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
17 rootLogger.level = Level.INFO
18 }
19
20 val log = LoggerFactory.getLogger(App::class.java)
21
22 val client = OkHttpClient.Builder().build()
23
24 fun downloadFile(url: String, toDir: String) {
25
26 val request = Request.Builder().url(URL(url)).get().build()
27
28 val response = client.newCall(request).execute()
29 if (response.code == HttpURLConnection.HTTP_OK) {
30
31 val body = response.body?.bytes()
32
33 val outDir = File(toDir)
34 outDir.mkdirs()
35
36 val outPath = File(outDir, File(URL(url).path).name)
37
38 if (body != null) {
39 log.info("Saving: ${outPath}")
40 outPath.writeBytes(body)
41 }
42
43 }
44
45 }
46
47 @JvmStatic
48 fun main(args: Array<String>) {
49
50 val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"
51
52 log.info("Parsing: ${sourceUrl}")
53 val doc = Jsoup.connect(sourceUrl).get()
54
55 val urls = mutableSetOf<String>()
56 doc.select("img.image").forEach {
57 val url = it.attr("src").replace("thumbnail", "preview")
58 urls.add(url)
59 }
60
61 if (urls.size == 0) {
62 return
63 }
64
65 log.info("Downloading ${urls.size} image(s)")
66
67 val outPath = "/Users/me/Desktop/out"
68
69 urls.forEach {
70 downloadFile(it, outPath)
71 }
72
73 }
74
75}
In a next blog post, we'll be adding coroutines to speed things up.
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.