We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Recently, one of my friends asked me to download some pictures from a website. Instead of doing it manually (there were 90 images to download), I used the opportunity to automate it with Kotlin.
First, let's start with creating an empty project:
$ mkdir test-jsoup
$ cd test-jsoup
$ gradle init --dsl kotlin \
--project-name test-jsoup \
--type kotlin-application \
--package be.yellowduck.testjsoup
> Task :init
Get more help with your project: https://docs.gradle.org/7.0.2/samples/sample_building_kotlin_applications.html
BUILD SUCCESSFUL in 716ms
2 actionable tasks: 2 executed
We now have an empty project which we can build and run.
$ ./gradlew run
> Task :app:run
Hello World!
BUILD SUCCESSFUL in 5s
2 actionable tasks: 2 executed
Now, let's first start with adding the needed dependencies. In the app/build.gradle.kts
file, update the dependencies to:
dependencies {
implementation(platform("org.jetbrains.kotlin:kotlin-bom"))
implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
implementation("org.jsoup:jsoup:1.13.1")
implementation("com.squareup.okhttp3:okhttp:4.9.1")
implementation("org.slf4j:slf4j-api:1.7.30")
implementation("ch.qos.logback:logback-classic:1.2.3")
implementation("ch.qos.logback:logback-core:1.2.3")
testImplementation("org.jetbrains.kotlin:kotlin-test")
testImplementation("org.jetbrains.kotlin:kotlin-test-junit")
}
We'll be using the following libraries:
After adding the dependencies, the first thing I do it to configure logging. For that, I change the app/src/main/kotlin/be/yellowduck/testjsoup/App.kt
file to:
package be.yellowduck.testjsoup
import ch.qos.logback.classic.Level
import ch.qos.logback.classic.Logger
import org.slf4j.LoggerFactory
object App {
init {
val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
rootLogger.level = Level.INFO
}
val log = LoggerFactory.getLogger(App::class.java)
@JvmStatic
fun main(args: Array<String>) {
log.info("Hello world")
}
}
This does a couple of things:
- It creates a singleton
App
containing amain
function which will be the entry point of our app. - It configures the root logger so that info, warning and error messages are shown
- It configures a logger for the
App
class
Don't forget to update the main class name in app/build.gradle
before you run it:
application {
mainClass.set('be.yellowduck.testjsoup.App')
}
When you now run the app, you'll get:
$ ./gradlew run
> Task :app:run
14:46:21.612 [main] INFO be.yellowduck.testjsoup.App - Hello world
BUILD SUCCESSFUL in 1s
2 actionable tasks: 1 executed, 1 up-to-date
Next up is to use Jsoup to download the HTML and parse it. We'll download the HTML using Jsoup and get a list of all images which have a class .image
. Let's change the main
function to:
@JvmStatic
fun main(args: Array<String>) {
val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"
log.info("Parsing: ${sourceUrl}")
val doc = Jsoup.connect(sourceUrl).get()
val urls = mutableSetOf<String>()
doc.select("img.image").forEach {
val url = it.attr("src").replace("thumbnail", "preview")
urls.add(url)
}
if (urls.size == 0) {
return
}
log.info("Downloading ${urls.size} image(s)")
}
The select
function on the Jsoup document
allows you to use CSS queries to get the elements. In our case, we're taking all the src
attribute values, replace the URL and save them in a list.
The next step is to create a function which downloads an URL to a file. For that, I'll add the downloadFile
function in the App
class:
val client = OkHttpClient.Builder().build()
fun downloadFile(url: String, toDir: String) {
val request = Request.Builder().url(URL(url)).get().build()
val response = client.newCall(request).execute()
if (response.code == HttpURLConnection.HTTP_OK) {
val body = response.body?.bytes()
val outDir = File(toDir)
outDir.mkdirs()
val outPath = File(outDir, File(URL(url).path).name)
if (body != null) {
log.info("Saving: ${outPath}")
outPath.writeBytes(body)
}
}
}
Note that I'm adding a property to the App
object containing the HTTP client as well as a new function. This function uses OkHttp to download and save the file. It takes the URL as the argument as well as the path to where the image should be saved. If the directory doesn't exist, it will be created automatically.
The last step is to download the images and save them:
val outPath = "/Users/me/Desktop/out"
urls.forEach {
downloadFile(it, outPath)
}
All done and when you run it, it will save all images:
./gradlew run
> Task :app:run
14:59:17.413 [main] INFO be.yellowduck.testjsoup.App - Parsing: https://www.yellowduck.be/documents/2/001.html
14:59:17.799 [main] INFO be.yellowduck.testjsoup.App - Downloading 30 image(s)
14:59:18.039 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01218.JPG
14:59:18.093 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01146.JPG
14:59:18.145 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01149.JPG
14:59:18.205 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01144.JPG
14:59:18.253 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01151.JPG
14:59:18.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01147.JPG
14:59:18.376 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01145.JPG
14:59:18.432 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01148.JPG
14:59:18.488 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01150.JPG
14:59:18.542 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJO1A_01161.JPG
14:59:18.600 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_TJOLS1_01220.JPG
14:59:18.657 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00437.JPG
14:59:18.719 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00441.JPG
14:59:18.778 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS1_00440.JPG
14:59:18.832 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00469.JPG
14:59:18.892 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00468.JPG
14:59:18.952 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00472.JPG
14:59:19.020 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS7_00473.JPG
14:59:19.076 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00422.JPG
14:59:19.129 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00425.JPG
14:59:19.175 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00424.JPG
14:59:19.223 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS5_00426.JPG
14:59:19.272 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00446.JPG
14:59:19.321 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00445.JPG
14:59:19.373 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00449.JPG
14:59:19.424 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALNLS3_00450.JPG
14:59:19.476 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01334.JPG
14:59:19.535 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01340.JPG
14:59:19.583 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01339.JPG
14:59:19.633 [main] INFO be.yellowduck.testjsoup.App - Saving: /Users/me/Desktop/out/7351_ALN1_01343.JPG
BUILD SUCCESSFUL in 3s
2 actionable tasks: 1 executed, 1 up-to-date
If you followed along, your app/src/main/kotlin/be/yellowduck/testjsoup/App.kt
should now look like this:
package be.yellowduck.testjsoup
import ch.qos.logback.classic.Level
import ch.qos.logback.classic.Logger
import okhttp3.OkHttpClient
import okhttp3.Request
import org.jsoup.Jsoup
import org.slf4j.LoggerFactory
import java.io.File
import java.net.HttpURLConnection
import java.net.URL
object App {
init {
val rootLogger = LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME) as Logger
rootLogger.level = Level.INFO
}
val log = LoggerFactory.getLogger(App::class.java)
val client = OkHttpClient.Builder().build()
fun downloadFile(url: String, toDir: String) {
val request = Request.Builder().url(URL(url)).get().build()
val response = client.newCall(request).execute()
if (response.code == HttpURLConnection.HTTP_OK) {
val body = response.body?.bytes()
val outDir = File(toDir)
outDir.mkdirs()
val outPath = File(outDir, File(URL(url).path).name)
if (body != null) {
log.info("Saving: ${outPath}")
outPath.writeBytes(body)
}
}
}
@JvmStatic
fun main(args: Array<String>) {
val sourceUrl = "https://www.yellowduck.be/documents/2/001.html"
log.info("Parsing: ${sourceUrl}")
val doc = Jsoup.connect(sourceUrl).get()
val urls = mutableSetOf<String>()
doc.select("img.image").forEach {
val url = it.attr("src").replace("thumbnail", "preview")
urls.add(url)
}
if (urls.size == 0) {
return
}
log.info("Downloading ${urls.size} image(s)")
val outPath = "/Users/me/Desktop/out"
urls.forEach {
downloadFile(it, outPath)
}
}
}
In a next blog post, we'll be adding coroutines to speed things up.
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.