Bible Semantic Search

lalaland1125 · on June 14, 2022

SBERT isn't trained on this type of archaic English and you can see it failing. It needs to be fine tuned or you should use a modern Bible translation.

A clear example is the query: "homosexuality"

This returns:

> James 2:3 - And ye have respect to him that weareth the gay clothing, and say unto him, Sit thou here in a good place; and say to the poor, Stand thou there, or sit here under my footstool

It's clearly seeing "gay" but is unaware that the meaning has changed.

A classic issue when applying a ML model out of domain.

chrislee973 · on June 14, 2022

Hey, creator of the Bible Semantic Search app here. I 100% agree with you. I hacked together this prototype for the purposes of learning the Pinecone API rather than fine-tuning a language model, but I'm still pleasantly surprised by the quality of the search results despite all the shortcomings you mentioned. The results aren't great, but they're decent. I'm surprised how well SBERT works out of the box considering I haven't done any fine-tuning whatsoever, and like you said, it doesn't fully understand the KJV's archaic writing style. Switching to a version that uses more modern English like the NIV is trivial so maybe I'll do that.

chrismorgan · on June 14, 2022

I recommend the WEB as a public domain translation. The likes of the NIV and ESV will be troublesome for licensing reasons. The WEB isn’t my favourite translation (I grew up with the RSV and prefer the RSV or ESV), a bit clumsier than the ASV from which it is mostly springs (with changes both missing and extraneous), but it’s public domain.

mikeytown2 · on June 14, 2022

Go with the ESV or NASB as these are word for word translations similar to the KJV. There's also the NKJV that uses modern english. Can use modern translations on the backend and still display the KJV verse.

jordanmoconnor · on June 14, 2022

Most of these newer translations have licensing restrictions that prevent you from simply using them.

dotancohen · on June 14, 2022

Do those licenses apply to using them on the backend, though? Assuming that the site displays the classic verse, and not the copyrighted verse, as GP prescribes.

bentley · on June 14, 2022

I suppose that would be the same question as whether GitHub Copilot violates copyright in its code suggestions.

hushpuppy · on June 14, 2022

What makes the Bible unique in this context is that there are huge amount of resources that all reference the same basic structure.

One of the things that makes KJV version useful for scholars is that since it's been the "standard" for hundreds of years a great deal of other work references its structure. People use it because of this. It's less to do with it being a fabulously accurate translation or whatever. It's just newer bibles don't have this wealth of history and documentation that references it and much of them are pretty expensive to license, while KJV is public domain.

For example "Strong's Exhaustive Concordance". If you get a version of KJV with "Strong's Numbers" you can cross reference words and phrases in their original languages (greek/hewbrew/etc). This way students can understand some of the original meanings that go into difficult or disputed passages.

Also there is a large number of commentaries of all sorts of different types that reference specific passages.

Besides that KJV is just mostly valued in Protestant Christian dialects of Christianity. Other Christian religions such as various versions of Eastern Orthodox have different numbers of books or will arrange things in different orders. There have been different attempts to past scholars to arrange things in more chronological order, too.

This makes the Bible fairly unique when it comes to literature. Each verse of text can have dozens of different "back links" and "references".

so if somebody searches for the subject "Homosexuality" it will get hits in various commentaries. Those commentaries all directly reference verses in the Bible.

So you could show the version found and why it was selected. That way a reader would be shown "These authors think this verse is has to do with homosexuality" and they could click through and find out the justification for this, different translations, what those translations are likely based on, what other Christian sects feel this verse means, and so on and so forth.

I don't know if it would be useful for you, but there is a "Sword Project" that collects and cross references different Bibles and bible resources as well as tools.

https://www.crosswire.org/sword/index.jsp

Syonyk · on June 14, 2022

> Switching to a version that uses more modern English like the NIV is trivial so maybe I'll do that.

ESV or CSB, please...

How hard would it be to train it on the range of common modern translations? It would be an interesting stress test of the models to see how close searches in different translations are - they're theoretically all communicating the same thing, with different styles and emphasis, but I'd expect a lot of that to fall out in the semantic search (at least, if it were working properly).

You could grab a range of English translations ranging from "very literal" to "thought for thought" (The Message applies here, and I'm not even sure it's thought for thought), do various searches, and see what the overlap in results is.

In any case, very neat project... concept. :/ It appears to have fallen over, all I get is "Please wait..." when I try to access it. Even without my usual web filters interfering. I think.

xhrpost · on June 14, 2022

The range[1] you speak of could definitely be interesting to add. There can already be quite a difference between translations on this spectrum which could help this tool pick up even more possibilities for particular meanings (particularly on the extreme end of functional/paraphrasing like The Message you mentioned or The Passion Translation). It's quite fascinating looking at a verse across this spectrum and seeing what the different translators chose to focus on or "pull out" of the underlying Greek/Hebrew.

I haven't personally read much of ESV or CSB, just curious, why the preference for those?

[1]: https://en.wikipedia.org/wiki/Dynamic_and_formal_equivalence

eckza · on June 14, 2022

Semantic search on several versions and show results for chapter and verse in my preferred version = best of both worlds.

This is awesome. Thank you for building it.

xhrpost · on June 14, 2022

Some modern translations might have copyright rules preventing this without express permission (which may or may not be difficult to acquire, obviously other sites/apps have done it). They all have different rules and limitations but I remember reading that the NET Bible was specifically built for more permissive copyright requirements in the Internet age.

yjftsjthsd-h · on June 14, 2022

https://ebible.org/t4t/ might work? It's available in ex. https://andbible.github.io/ which bodes well for its licensing. It's also in conventional English (minimal jargon), which is probably a positive.

Edit: Actually, better yet: https://crosswire.org/sword/modules/ModDisp.jsp?modType=Bibl... has licensing info for each listed item

gibspaulding · on June 14, 2022

I just looked up NET because I was thinking the same thing, but it seems their free license is strictly for print versions and <1000 copies. Also if OP cares, its noncommercial. Their site has an email address posted for licensing inquiries though so you could ask!

As someone else pointed out, the WEB translation is public domain so it might be a better option.

swasheck · on June 14, 2022

sblgnt is a decent basis for literary analysis but it’s NT only. the non-trivial part of your work is to navigate the oddly byzantine bible licensing considerations

kadoban · on June 14, 2022

Couldn't they search in a modern translation and just refer back to the ancient one? Seems like the way bible verses are split up should remove the ability of translation to move anything around much.

chrislee973 · on June 14, 2022

Ah, that's a good idea. I was considering just fully replacing the KJV with the NIV, but for people who want the KJV, having a mapping between the two versions would work. Do the verses between the two versions map each other exactly? Like they cover the same exact thing, just written in a different style?

chrismorgan · on June 14, 2022

To do this properly, you have to map versifications, because some systems use different conventions. As the most significant and pervasive example, some systems treat the title of psalms as verse one (French generally does this), while others treat it as separate (verse zero, effectively; English generally does this), so if you just reuse numbers across such transitions you’ll get consistent off-by-one errors. There are quite a few other similar off-by-one errors, and occasionally more, throughout.

https://wiki.crosswire.org/Alternate_Versification is a decent starting point for looking into the topic, with SWORD’s canon_.h files fairly tolerable for showing the number of verses in each chapter. Unfortunately, SWORD has never gone as far as doing proper mapping* between versifications for some reason—they have some basic mapping somewhere or other, but I can’t remember offhand where or what it is, as I only briefly looked into it five years or so ago.

The most significant differences occur when you switch languages (there are quite a few differences if you switch from English to French or to many Indic languages), but there may be some differences between translations within a language too, e.g. SWORD’s NRSV versification has an extra verse in 3 John and Revelation 12 compared to its KJV versification.

lrvick · on June 14, 2022

Just remember the NIV translation of The Bible is property of Harper Collins, who is also amusingly the publisher of many tabloids and The Satanic Bible.

I used to distribute a BibleGateway-ripped copy of NIV as a plugin for the open source GnomeSword project and managed to upset both Harper Collins and the open source scholar community. It was absurd and I refused to stop. I dared them to sue me for sharing the Bible as open source... the headlines would write themselves.

FWIIW they never took the bait, but expect empty threat Cease and Desist letters from the Harper Collins legal team.

pbhjpbhj · on June 14, 2022

I thought NIV was Zondervan, bought (as you intimate) by News International (Rupert Murdoch). What I hadn't realised is Harper Collins owns Zondervan and Murdoch controls that.

JasonFruit · on June 14, 2022

Since a few verses are in the KJV but not the NIV, some would never appear in searches.

Yeroc · on June 14, 2022

Yes, for the majority of the English translations verses map to each other exactly with a few minor exceptions. NIV would map virtually exactly.

JasonFruit · on June 14, 2022

This is pretty good! I entered, "Every human thought is only evil," and got, among others, the verse I hoped for, Genesis 6:5: "And God saw that the wickedness of man was great in the earth, and that every imagination of the thoughts of his heart was only evil continually." "David cried" got me some unexpected results for "the Son of David", but also the verse about David and Jonathan I expected, and a couple others that were a very good fit.

…and it's gone.

phgn · on June 14, 2022

FYI it's failing for me with:

    urllib3.exceptions.ProtocolError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
    Traceback:
    File "/home/appuser/venv/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 475, in _run_script
        exec(code, module.__dict__)
    File "/app/bible-semantic-search/app.py", line 50, in <module>
        query_results = controller.query(
    File "/app/bible-semantic-search/controller.py", line 127, in query
        results = self.index.query(query_emb, top_k, namespace)
    File "/app/bible-semantic-search/pinecone_index.py", line 63, in query
        return self.index.query(

giancarlostoro · on June 14, 2022

I'm having the same issue. My searched phrase was two words very simple "Judge not" for reference. I'll check back later, curious how this looks and behaves overall.

mfer · on June 14, 2022

It worked yesterday. Something broke in the meantime.

steve_gh · on June 14, 2022

Hmm. I entered "kingdom of heaven" - and it returned a ProtocolError

Seems kind of appropriate. Obviously it is easier for a camel to thread the eye of a needle than to get a semantic search on the Kingdom of Heaven.

gk1 · on June 14, 2022

Credit goes to Chris Lee: https://www.chrisleeportfolio.com

Explanation when you click “Wat this?”:

> This is a Streamlit app I prototyped for performing semantic search on the King James Bible. It conducts full text search as well as semantic search, which is useful for surfacing passages that are similar in meaning to the query, even if the passages don't explicitly contain the query keyword(s). Suppose you wanted to bring up all verses that reference the infamous snake that tempted Eve. In a traditional keyword search system, searching for 'snake' wouldn't yield any results because the KJV uses the term 'serpent'. A semantic search system would take that 'snake' query and retrieve the relevant verses that contain 'serpent' as well as similar verses like ones about reptiles.

> Under the hood, I've generated vector embeddings of every verse in the Bible using SBERT (https://www.sbert.net/), and stored those embeddings in a vector database called Pinecone (https://www.pinecone.io). Every time you submit a query, it's converted to its vector representation using SBERT. That query vector is then sent to Pinecone, which performs an Approximate Nearest Neighbor (https://www.pinecone.io/learn/what-is-similarity-search/) search, retrieving the top n verses that are the most semantically similar to our query. The verses returned are ranked in order of most to least similar.

(Full disclosure: I work for Pinecone, but I have no connection to this demo.)

grahameb · on June 14, 2022

How does pinecone compare to pgvector?

https://github.com/pgvector/pgvector

chrislee973 · on June 14, 2022

In terms of performance, I'm not sure, although I'm willing to bet on Pinecone as they have their own proprietary indexing algorithm. I can speak in terms of developer experience though, as I did use Pinecone to build this app. One obvious but important difference is that with pgvector, you need to spin up and self host a postgres server on your own. Pinecone provides a batteries-included, fully managed experience. Also, it seems that a lot of important operations in pgvector are conducted with SQL statements. With Pinecone, you don't need to deal with SQL or with any ORMs.

In terms of this project, I would not have chosen pgvector. I don't want to deal with the PITA that comes with manually setting up and self hosting a vector database. When I'm building a demo or a prototype, I care about speed of development, which lets me more effectively explore the possibility space of whatever I'm building. I'm not a database admin, so when I deploy the project, I don't want to do database admin tasks. The ease of use of Pinecone's API lets me move fast, and it was very intuitive to learn. One downside of Pinecone is that although they have a generous free tier, their next highest tier is $50/month (and that's the low end of that tier). This is unfortunate for solo devs like myself who are likely to graduate from the free tier and would be willing to pay a bit more for a higher tier, but find the $50/month plan to be overkill. I think they're trying to target startups with that $50/month plan.

michaelmior · on June 14, 2022

Milvus[0] would be another interesting comparison.

[0] https://milvus.io/

gk1 · on June 14, 2022

Same general difference: Pinecone is SaaS that's ready to go with a few API calls, while Milvus an open-source tool that requires work to set up, scale, and maintain.

michaelmior · on June 14, 2022

I meant in terms of performance, but all those points are certainly true.

sircastor · on June 14, 2022

While serving a mission for my church, a fellow member bought me a copy of “Strongs Exhaustive Concordance of the Bible”. I found it really fascinating to flip through correlated terms and be able to draw conceptual lines between them. It was the first time I’d ever come across something like that. It strikes me as something that is relatively rare simply because what the Bible is, in terms of popularity and commonality across the western world.

jmorse2 · on June 14, 2022

I've got a copy and still use it sometimes -- being able to immediately see nearby words (like "glad", "gladly", "gladness") is quite useful, which you'd miss just searching for "glad". Being able to see all uses on a hardcopy in a fixed position that doesn't scroll is helpful too, I find it harder to rationalise big lists when they're scrollable.

(I also use it to explain type checking to friends, given the number of times I've looked up a Hebrew word in the Greek dictionary, or vice versa).

elihu · on June 14, 2022

I'd like to think it's called "Strong's Concordance" because that's what you'd be if you carried it around all the time.

michaelsbradley · on June 14, 2022

https://www.blueletterbible.org/kjv/gen/1/1/ss1/s_1001

pbhjpbhj · on June 14, 2022

BLB is great, their app works well too. I've been using it for what feels like ~2 decades.

brink · on June 14, 2022

Very cool! I could see myself really using this. I just wish it wasn't so dependent on SaaS - it tends to make things unreliable, and can disappear at any time.

beltsazar · on June 14, 2022

A query "what is the purpose of life?" returns decent results with:

- Precision@2 = 50%

- Precision@5 = 20%

- Precision@10 = 33%

Relevant results:

2) Philippians 1:21 - For to me to live is Christ, and to die is gain.

6) Ecclesiastes 3:13 - And also that every man should eat and drink, and enjoy the good of all his labour, it is the gift of God.

10) Ecclesiastes 2:17 - Therefore I hated life; because the work that is wrought under the sun is grievous unto me: for all is vanity and vexation of spirit.

yosito · on June 14, 2022

Would have loved this back in my religious days, or in the brief period of time I was interested in criticizing religion. These days I'm not really interested at all in the Bible, but this is still a pretty cool idea. A little bit of an archaic choice to search the KJV though. Maybe it was chosen due to copyright issues.

the_only_law · on June 14, 2022

Isn't the KJV popular among evangelical sects as well? I remember seeing more of those as a kid than any modernized version.

overthemoon · on June 14, 2022

At least when I was growing up it was the New International Version. You were a fundie if you insisted on the KJV or the NKJV among people I knew.

hnal943 · on June 14, 2022

The NIV changed their interpretive philosophy around the turn of the century to favor gender neutral language. Most churches switched to ESV at that point.

overthemoon · on June 16, 2022

I didn't know that! Interesting.

Glide · on June 14, 2022

That sounds like a much older congregation. If the worship has a guitar, it’s probably using ESV.

wincy · on June 14, 2022

Must be for Mormons. They only read the King James Version because otherwise the Book of Mormon would sound super weird seeing as how it’s the worlds most popular Bible fanfic. Source: technically a Mormon.

dbrueck · on June 14, 2022

Not really. It's true that KJV is the default for English speakers, but less than half of the church's members are native English speakers. Even among those that speak English, many study the NIV and ESV, for example. It's not even that uncommon to hear non-KJV quotes in GenCon for that matter.

wincy · on June 14, 2022

Hey today I learned, thanks! I catch maybe five minutes of General Conference, my wife’s more likely to watch it than I am. I was also very confused about what a tabletop gaming convention founded by Gary Gygax (of Dungeons and Dragons fame) had to do with Mormons at first hahah.

eru · on June 14, 2022

Well, I guess the Bible is also its own 'fanfic', as not every part was written at the same time and some later parts clearly reference earlier parts.

hnal943 · on June 14, 2022

Fanfic would imply that the work is non-canon, wouldn't it? The bible represents some of the earliest arguments about what is canon.

eru · on June 15, 2022

Well, as you suggest, opinions on what is canon differ.

ARandomerDude · on June 14, 2022

This seems to struggle with phrases.

For example I expected Luke 9:13 as a result for the phrase "you feed them" or "you give them something to eat." The King James reads "give ye them to eat" but the app never found it.

shrimpx · on June 14, 2022

Is the source available? Curious if the streamlit app makes requests to an external service or if it’s self-contained.

punk_ihaq · on June 14, 2022

It makes a request to the Pinecone API to search through vector embeddings. [1][2]

[1] https://github.com/chrislee973/bible-semantic-search/blob/ab...

[2] https://www.pinecone.io/

punk_ihaq · on June 14, 2022

It is. Click the hamburger menu to the top-right, followed by 'View app source': https://github.com/chrislee973/bible-semantic-search/blob/ma...

shrimpx · on June 14, 2022

Thanks!

wvlia5 · on June 14, 2022

funny story comes up first when you search "bears"

badrabbit · on June 14, 2022

Is this site hijacking just my history or something (can't get back to hn after opening it all the history is spammed with their own domain). Poor quality site this behavior should be discouraged.

uean · on June 14, 2022

Hmm. Searching “baby” returns nothing but references to Babylon. Not sure why this was my first and only search, less clear why it is doing partial text matching.

pvg · on June 14, 2022

It's so that you can learn about the Tower of Babies and the great confusion it sowed among humankind.

Yenrabbit · on June 14, 2022

I'd guess Babylon is not in the models vocabulary and gets tokenized as <baby><lon> or similar

chrislee973 · on June 14, 2022

The app performs full text search as well as semantic search. The full text search results are presented first, so if you scroll down to the bottom of the page you should see the semantically related results under the header "Semantic Search Results"

blondin · on June 14, 2022

this is cool. streamlit is super cool.

totally missed when they made a sharing platform.

bibelo · on June 14, 2022

Why with a 400 year-old translation though?

chrislee973 · on June 14, 2022

I figured it was the quintessential version of the Bible ¯\_(ツ)_/¯. I'm not religious so cut me some slack haha. Looking back, I should've used a version using modern English.

Amezarak · on June 14, 2022

KJV is still wildly popular and some sects even regard it as an inspired translation. Never heard that about anything else.

NKJV is more readable for people who grew up reading blog posts instead of old books and generally retains the poetry of the KJV, but it's under copyright. I think it would be allowed for a use like this though, because they explicitly allow (as do most translations) quoting up to a certain number of verses.

EDIT: Maybe not, here's the rules: https://www.harpercollinschristian.com/sales-and-rights/perm...

Seems like you'd fall afoul for the % of total text requirement.

telotortium · on June 14, 2022

My understanding of the Eastern Orthodox Church is that they see the Septuagint as an inspired translation of the Old Testament (from Hebrew to Koine Greek).

elihu · on June 14, 2022

...another reason is that KJV is not currently under copyright, unlike (as far as I know) all modern English translations.

bentley · on June 14, 2022

Fun fact: in the United Kingdom, the King James Version is still copyrighted—perpetually copyrighted, in fact—by (who else?) the British monarchy.

Since the author of this app is based in the United States, he should be safe… we have a bit of precedent for ignoring royal prerogatives here :)

elihu · on June 14, 2022

Interesting.

pbhjpbhj · on June 14, 2022

I think there are a couple of free and open translations that are reasonable - the WEB (World English Bible) maybe.

https://www.biblegateway.com/versions/World-English-Bible-WE...

hopsworks · on June 14, 2022

Is the source code available for this?

saltdoo · on June 15, 2022

Need this for the Koran

enw · on June 14, 2022

> jesus being a gigachad

>

> Mark 1:25 - And Jesus rebuked him, saying, Hold thy peace, and come out of him.

yosito · on June 14, 2022

Is just two hours on HN enough to kill a Streamlit app? Or is this running into limits of a certain plan level? Or is the problem with the code itself?

slater · on June 14, 2022

[flagged]

justinhj · on June 14, 2022

6:7: “Do not be deceived, God is not mocked; for whatever a man sows, this he will also reap.”

irrational · on June 14, 2022

> Republicans

Isaiah 1:15 - And when ye spread forth your hands, I will hide mine eyes from you: yea, when ye make many prayers, I will not hear: your hands are full of blood.

1-6 · on June 14, 2022

> Joe Biden

Matthew 22:8 - Then saith he to his servants, The wedding is ready, but they which were bidden were not worthy.

Luke 14:24 - For I say unto you, That none of those men which were bidden shall taste of my supper.

heh heh.

blueberrychpstx · on June 14, 2022

[flagged]

eatonphil · on June 14, 2022

Tried to use it and got:

    AttributeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
    Traceback:
    File "/home/appuser/venv/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 475, in _run_script
        exec(code, module.__dict__)
    File "/app/bible-semantic-search/app.py", line 11, in <module>
        index = PineconeIndex("qa-index")
    File "/app/bible-semantic-search/pinecone_index.py", line 16, in __init__
        self.index = self.connect_to_index(index_name)
    File "/app/bible-semantic-search/pinecone_index.py", line 30, in connect_to_index
        index = pinecone.Index(index_name)
    File "/home/appuser/venv/lib/python3.8/site-packages/pinecone/index.py", line 34, in __init__
        openapi_client_config.api_key = openapi_client_config.api_key or {}

gk1 · on June 14, 2022

Worked for me. Streamlit might be overloaded by HN traffic.

slater · on June 14, 2022

Yep, looks like it's been hugged.