![]() very long design document about how exactly this works, so if you're interested in understanding the internals of a code search engine, you can check it out: I made a huge effort to document it as much as I can and the code is, I believe, very readable (although I'm obviously very biased because I spent a loot of time with it). My implementation of the index is still largely untouched and is a hot path of Clangd. That project was a lot of fun and was also based on Russ Cox's original Google Code Search trigram index. That was my intern project back in 2018 and it was very successful in reducing the delays and latencies in the auto-complete pipeline. I also worked on something similar to the search engine that is described here for the purposes of making auto-complete fast for C++ in Clangd. Some of the folks going to GitHub to work on this I know are just incredible and I have no doubt GitHub's code search will be amazing. I always wondered why GitHub didn't invest into a decent code searching features, but I'm happy it finally gets to the State of the Art one step at a time. This is exciting! I see a lot of familiar pieces here that propagated from Google's Code Search and I know few people from Code Search went to GitHub, probably specifically to work on this. I’d contrast to Java where is would be very easy to write a rule that would turn up a class definition if not a method definition, in my case finding the class would have solved my problem. ![]() Highcharts is in typescript which I’d don’t know well but in JavaScript the later might be a little tricky because there are so many ways to define a function (one hell of a regency.). This search could be implemented by something that compiles and indexes like the IDE (sourcegraph) or maybe some kind of shallower parsing. If I’d been able to the same w/ the search on GitHub it would have saved me considerable time and hassle. I was able to find the function immediately in my IDE once I checked out the 100mb+ repository and it indexed it. ![]() To be specific, I was looking for the definition of one method in Highcharts so I could understand what it does and override it, GitHub gave 6 pages of results. I am talking about searching in a single repository, who would expect to get useful results otherwise? I have no idea how you’re going to rank 28 million repos in a way that matches my perception of relevance. Either way, best of luck to you from a fellow S-grapher! How has GitHub Code Search impacted your product direction? Do you see it as an opportunity to focus more on the internal use case, or do you have plans for some other differentiation? It's always unfortunate when a big company introduces a product so similar to the core product of a startup, but I'm sure there is a silver lining there, especially when you have a talented team and a mature codebase (for example, Fly.io has been able to carve out a niche for itself despite Cloudflare moving to compete in the same areas). we might have even gotten a few candidates because of that initial confusion. I've always liked Sourcegraph just because our company names are so similar (founder of Splitgraph here). I didn't realize Steve Yegge had joined your team - congratulations, that's quite an endorsement! ![]() I can't remember the specific queries I've tried, but I only ever used it when looking for "how have people used this function," so I remember the limited depth of results being particularly annoying. Sure! I will try it next time I'm using code search. GitHub Code Search doesn't seem to have this problem to such a degree, since I can use negation more naturally, and since my query is not limited to some shallow subset of the corpus before refining it. I stopped using Sourcegraph because I could never get the deeper results I wanted - it would just return the top five repositories including some common code snippet and I couldn't explore further than that. Personally, I'm happy with the new code search so far. But other times you're looking for more estoteric usage of a certain function, and it would be nice to filter the "standard usage" from the results (although you can already do this with carefully chosen negated keywords). You wouldn't always want this, because sometimes you're searching for how something is typically used, and with many duplicative results you can confirm that's the preferred pattern. Perhaps it could benefit from something like a "dissimilarity" filter, which ranks the current result set by returning the most unique hits first.
0 Comments
Leave a Reply. |