Say I have a large txt or CSV file with data I want to search. And say I have several files.

What is the best way to index and make this data searchable? I’ve been using grep, but it is not ideal.

Is there any self hostable docker container for indexing and searching this? Or maybe should I use SQL?

  • Anna@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    4 months ago

    Depends on the size of data, use case like will you be doing any constant updates to it or just reading, you mentioned you have several files so do you need joins if so what will be an approx max number of joins you’ll be doing on per query basis, I guess you said CSV so I’m assuming it is structured data and not semi structured or unstructured.

    Few more questions, do you need a fast indexing but are not planning on doing any complex operations, areyoiu going to do a lot of OLTP operations and you need ACID. Or are you going OLAP route. are you planning on distributed database if so then which 2 do you want from CAP, do you want batch processing or stream processing,

    I’ve few dozen other questions also