Recently, one of my friends asked me to download some pictures from a website. Instead of doing it manually (there were 90 images to download), I used the opportunity to automate it with Kotlin.

First, let's start with creating an empty project:

$ mkdir test-jsoup
$ cd test-jsoup
$ gradle init --dsl kotlin \
              --project-name test-jsoup \
              --type kotlin-application \
              --package be.yellowduck.testjsoup

> Task :init
Get more help with your project: https://docs.gradle.org/7.0.2/samples/sample_building_kotlin_applications.html

BUILD SUCCESSFUL in 716ms
2 actionable tasks: 2 executed

We now have an empty project which we can build and run.

$ ./gradlew run

> Task :app:run
Hello World!

BUILD SUCCESSFUL in 5s
2 actionable tasks: 2 executed

Now, let's first start with adding the needed dependencies. In the app/build.gradle.kts file, update the dependencies to:

dependencies {
    implementation(platform("org.jetbrains.kotlin:kotlin-bom"))
    implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    implementation("org.jsoup:jsoup:1.13.1")
    implementation("com.squareup.okhttp3:okhttp:4.9.1")
    implementation("org.slf4j:slf4j-api:1.7.30")
    implementation("ch.qos.logback:logback-classic:1.2.3")
    implementation("ch.qos.logback:logback-core:1.2.3")
    testImplementation("org.jetbrains.kotlin:kotlin-test")
    testImplementation("org.jetbrains.kotlin:kotlin-test-junit")
}

We'll be using the following libraries:

After adding the dependencies, the first thing I do it to configure logging. For that, I change the app/src/main/kotlin/be/yellowduck/testjsoup/App.kt file to:

package be.yellowduck.testjsoup

import ch.qos.logback.classic.Level
import ch.qos.logback.classic.Logger
import org.slf4j.LoggerFactory

object App {

    init {
        val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
        rootLogger.level = Level.INFO
    }

    val log = LoggerFactory.getLogger(App::class.java)

    @JvmStatic
    fun main(args: Array<String>) {
        log.info("Hello world")
    }

}

This does a couple of things:

  • It creates a singleton App containing a main function which will be the entry point of our app.
  • It configures the root logger so that info, warning and error messages are shown
  • It configures a logger for the App class

Don't forget to update the main class name in app/build.gradle before you run it:

application {
    mainClass.set('be.yellowduck.testjsoup.App')
}

When you now run the app, you'll get:

$ ./gradlew run

> Task :app:run
14:46:21.612 [main] INFO be.yellowduck.testjsoup.App - Hello world

BUILD SUCCESSFUL in 1s
2 actionable tasks: 1 executed, 1 up-to-date

Next up is to use Jsoup to download the HTML and parse it. We'll download the HTML using Jsoup and get a list of all images which have a class .image. Let's change the main function to:

@JvmStatic
fun main(args: Array<String>) {

    val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"

    log.info("Parsing: ${sourceUrl}")
    val doc = Jsoup.connect(sourceUrl).get()

    val urls = mutableSetOf<String>()
    doc.select("img.image").forEach {
        val url = it.attr("src").replace("thumbnail", "preview")
        urls.add(url)
    }

    if (urls.size == 0) {
        return
    }

    log.info("Downloading ${urls.size} image(s)")

}

The select function on the Jsoup document allows you to use CSS queries to get the elements. In our case, we're taking all the src attribute values, replace the URL and save them in a list.

The next step is to create a function which downloads an URL to a file. For that, I'll add the downloadFile function in the App class:

val client = OkHttpClient.Builder().build()

fun downloadFile(url: String, toDir: String) {

    val request = Request.Builder().url(URL(url)).get().build()

    val response = client.newCall(request).execute()
    if (response.code == HttpURLConnection.HTTP_OK) {

        val body = response.body?.bytes()

        val outDir = File(toDir)
        outDir.mkdirs()

        val outPath = File(outDir, File(URL(url).path).name)

        if (body != null) {
            log.info("Saving: ${outPath}")
            outPath.writeBytes(body)
        }

    }

}

Note that I'm adding a property to the App object containing the HTTP client as well as a new function. This function uses OkHttp to download and save the file. It takes the URL as the argument as well as the path to where the image should be saved. If the directory doesn't exist, it will be created automatically.

The last step is to download the images and save them:

val outPath = "/Users/me/Desktop/out"

urls.forEach {
    downloadFile(it, outPath)
}

All done and when you run it, it will save all images:

./gradlew run

> Task :app:run
14:59:17.413 [main] INFO be.yellowduck.testjsoup.App - Parsing: https://www.yellowduck.be/documents/2/001.html
14:59:17.799 [main] INFO be.yellowduck.testjsoup.App - Downloading 30 image(s)
14:59:18.039 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01218.JPG
14:59:18.093 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01146.JPG
14:59:18.145 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01149.JPG
14:59:18.205 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01144.JPG
14:59:18.253 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01151.JPG
14:59:18.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01147.JPG
14:59:18.376 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01145.JPG
14:59:18.432 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01148.JPG
14:59:18.488 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01150.JPG
14:59:18.542 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01161.JPG
14:59:18.600 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01220.JPG
14:59:18.657 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00437.JPG
14:59:18.719 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00441.JPG
14:59:18.778 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00440.JPG
14:59:18.832 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00469.JPG
14:59:18.892 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00468.JPG
14:59:18.952 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00472.JPG
14:59:19.020 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00473.JPG
14:59:19.076 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00422.JPG
14:59:19.129 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00425.JPG
14:59:19.175 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00424.JPG
14:59:19.223 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00426.JPG
14:59:19.272 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00446.JPG
14:59:19.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00445.JPG
14:59:19.373 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00449.JPG
14:59:19.424 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00450.JPG
14:59:19.476 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01334.JPG
14:59:19.535 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01340.JPG
14:59:19.583 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01339.JPG
14:59:19.633 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01343.JPG

BUILD SUCCESSFUL in 3s
2 actionable tasks: 1 executed, 1 up-to-date

If you followed along, your app/src/main/kotlin/be/yellowduck/testjsoup/App.kt should now look like this:

package be.yellowduck.testjsoup

import ch.qos.logback.classic.Level
import ch.qos.logback.classic.Logger
import okhttp3.OkHttpClient
import okhttp3.Request
import org.jsoup.Jsoup
import org.slf4j.LoggerFactory
import java.io.File
import java.net.HttpURLConnection
import java.net.URL

object App {

    init {
        val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
        rootLogger.level = Level.INFO
    }

    val log = LoggerFactory.getLogger(App::class.java)

    val client = OkHttpClient.Builder().build()

    fun downloadFile(url: String, toDir: String) {

        val request = Request.Builder().url(URL(url)).get().build()

        val response = client.newCall(request).execute()
        if (response.code == HttpURLConnection.HTTP_OK) {

            val body = response.body?.bytes()

            val outDir = File(toDir)
            outDir.mkdirs()

            val outPath = File(outDir, File(URL(url).path).name)

            if (body != null) {
                log.info("Saving: ${outPath}")
                outPath.writeBytes(body)
            }

        }

    }

    @JvmStatic
    fun main(args: Array<String>) {

        val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"

        log.info("Parsing: ${sourceUrl}")
        val doc = Jsoup.connect(sourceUrl).get()

        val urls = mutableSetOf<String>()
        doc.select("img.image").forEach {
            val url = it.attr("src").replace("thumbnail", "preview")
            urls.add(url)
        }

        if (urls.size == 0) {
            return
        }

        log.info("Downloading ${urls.size} image(s)")

        val outPath = "/Users/me/Desktop/out"

        urls.forEach {
            downloadFile(it, outPath)
        }

    }

}

In a next blog post, we'll be adding coroutines to speed things up.