Open-Source weekly

Hi and welcome to Star History Weekly #24!

Hello to the 61 of you who joined us this week!

If you are new, it's Mila here. Each week, we curate some open-source news and take you to explore an open-source project, the Starlets. Meanwhile, we share fascinating GitHub repos daily over at @StarHistoryHQ, so make sure to follow us if you haven't already.

If you like this newsletter, we ask you to subscribe and share!

In this Issue #24:

🗞️ News & Links
💫 Starlet of the week: Khoj

Enjoy!

🗞️ News & Links

Databricks to acquire Tabular, creators of Apache Iceberg.
Neosync: data anonymization and synthetic data orchestration.
Flyimg: a Dockerized application that resizes and crops images on the fly.
Dot: Text-To-Speech, RAG, and LLMs. All local!
An oldie but goodie: Does your Domain Name Pass the Radio Test?

💫 Starlet of the week - Khoj

ICYMI: If you wish to prompt your open-source project on Star History, check out our announcement.

TL;DR

Khoj is your open-source, personal AI companion for instant answers. Dive into knowledge effortlessly as Khoj simplifies complex info, integrates your personal context, and tailors responses to your unique needs.

What is Khoj?

There's a huge opportunity to improve the way people work, think, and engage now with AI. Now that we can scale friendly, usable interfaces, we can increase the overall capability of each of us individually.

With Khoj, we're reducing how much time people spend on research, looking through their documents, and repetitive information look-ups. Simplifying the way we interact with knowledge helps boost our capability, productivity, and well-being in one go. While we're deeply excited about all the ways this can help change things for the better, we're also cautiously aware that there's inherent risk in how unexplainable, novel technology reaches new people.

To that end, Khoj is a thinking tool that helps you reason, aggregate information, and create content in a transparent way.

Core Capabilities

Retrieval Augmented Generation: RAG with your personal notes and documents. You can manually share your PDFs and plaintext files, or hook it up to Obsidian to directly talk to your knowledge base.
AI Search Engine: Khoj is connected to the internet, which means that you can build on top of and retrieve information straight from online sources. Summarize articles, and blog posts, or just get real-time information.
Automations: Create smart, contextual notifications using our automations service. Use it for writing prompts, news summaries, mindful moments, weekly summaries of trending songs. The limit is your imagination.
Personalized Artwork: Create rich, personalized images. Our image generation infrastructures help ensure that you're creating beautiful, personalized images whenever you tell Khoj to create a picture for you.

You can see some of these capabilities highlighted here.

Open-Source

On a personal basis, we're strong believers that products should be transparent, accessible, and self-hostable. Though Khoj is designed as a production-ready, multi-user personal AI application, it can also be self-hosted and run for a single user on a home server or laptop.

As AI becomes a mainstay in people's lives, it's important that ownership is retained in the hands of the people using our services. We definitely need an open-source alternative to ChatGPT, which is what Khoj is.

We've also made it easy for self-hosted users to integrate with open-source, local LLMs so that anyone can work completely offline. You can either hook the application up to Ollama or use any gguf model off of HuggingFace.

Architecture

Lifecycle of a chat message

When a query lands on the Khoj server, it goes through a series of subprocesses to deliver an appropriate response. This can take a minute, which is why we've integrated a web socket into our web UI to provide real-time updates in Khoj's thought process.

How do we find the correct context

When you have data indexed with Khoj, we dynamically determine what information is most relevant to each of your queries. We use a sophisticated RAG pipeline to optimize the quality of matches returned to the LLM.

How you can upload files

There are a number of ways to upload files from our different clients. The easiest way to do it is just to drag/drop your file into the chat window. Otherwise, you can use any of the below clients to get started. See the docs for more details.

How do we help with recurring tasks

There's some information out there we're repeatedly looking up. Rather than Googling and investigating yourself, you can put Khoj on the job to help you with smart aggregation of information and reminders. You can go to our automations page to try it yourself.

Our team

Debanjum and Saba met at Microsoft while building products for Cortana AI. Within a massive organization, we'd managed to find a team that was building user-facing AI products for enterprise customers, and it was loads of fun. It was exciting watching it grow from 0 to 10s of millions of users over the course of a year. It quickly became apparent how useful personal AI productivity tools could be for everyone, not just enterprise users.

That was five years ago. Technology has advanced by leaps and bounds since then, inspiring many more imaginations. Last summer, we got backed by YCombinator to work on Khoj full-time. We're really excited to be part of the process of making open, personal AI available for everyone.

Star History Weekly