thisago's blog


Converting APK internal DB into YAML

Table of Contents

DBs, the gold of offline APKs


Story

The Cepher is a Bible translation project that has some interesting advantages, including:

  • The replacement of the modern latin characters names by a transliteration of the hebrew version of it.
  • Some translation fixes. (There is a couple of issues yet but a restoration process is not simple or fast)
  • Inclusion of some books considered apocrypha, inclusing:
    • Yashar, which is mentioned in Yahusha 10:13
    • Chanok
    • 1,2,3,4 Maccabees

And this weekend I played around The Cepher Android free app again to update its version in my (currently stalled) Ozzuu Bible application. I have plans to restart this project, host it back again and inclode interesting new features:

Social features
Personal notes, discussion threads.
Word-level interlinear
Strongs, lexicon, and more.
Exporting for offline usage
First-class self hosted usage. Online is opt-in.
Plain text
Static data contained in a Git repo: Open for contributions, built for cloning.

Extracting

The same process did 3 years ago with the paid version of it, provided as courtesy of a friend, now it was easier to me: Cepher app has now a free demo app. It is soft-locked some features and apocrypha books but the DB seems to be complete!1

So, following a similar reverse engineering process I made in a real state app, I used a make to use some decompilation tools. Pretty easy (and simplistic) tool usage:

  1. Unpack the XAPK with jadx.
  2. Unpack the APK contained in the XAPK.
  3. Dumps the SQLite into JSON to ease the data manipulation and Git tracking of the data. (git gc does a great job!)

    jq .[0] ~/Documents/repos/com.cepher.abridged/dbs/abridged.sqlite/thecepher.json
    
    {
      "TextId": 154431,
      "Book": "BERE'SHIYTH",
      "Chapter": 1,
      "Verse": 1,
      "CepherText": "In the beginning <strong>Elohiym</strong> created את the heavens and את the earth.",
      "CommonBook": "GENESIS",
      "Section": "Torah",
      "CommonSection": "Instruction",
      "SortOrder": 1,
      "Lexicon": "Elohiym"
    }
    
  4. Converts the flat JSON into a hierarchical YAML using a Nim script.

    yq '.["1"] | .verses = [.verses[0]]' ~/Documents/repos/com.cepher.abridged/thecepher.yaml
    echo '  # ...'
    
    commonName: GENESIS
    transliteratedName: "BERE'SHIYTH"
    commonSectionName: Instruction
    sectionName: Torah
    sortOrder: 1
    verses:
      - textId: 154431
        chapterNumber: 1
        verseNumber: 1
        content: >-
          In the beginning <strong>Elohiym</strong> created את the heavens and את the earth.
        lexicon: [Elohiym]
      # ...
    

The process is (mostly2) reproducible. It's just run make in the repo root! https://codeberg.org/thisago/com.cepher.abridged

Outro

Kinda an announcement, kinda a lengthy devlog. A story describing another step.

I hope it can be useful for breaking down the resistance of analyzing APKs. jadx does a great job and makes the process extremely simple. (but not when it's React Native LOL)

Footnotes:

1

Not really a big deal, the app is not that expensive and when I invest to use this data, I'll consider to buy it and compare the DBs together.

2

The fetched APK is the "latest". Not a big deal because updates of the data are welcome.