TL;DR: it looks like HBase (like BigTable), uses as a structure similar to a B+ tree to do the lookup. So the row-key is the primary index (and the only index of any sort in HBase by default.)
Long answer: From this Cloudera blog post about HBase write path, it looks like HBase operates the following way:
Each HBase table is hosted and managed by sets of servers which fall
into three categories:
- One active master server
- One or more backup master servers
- Many region servers
Region servers contribute to handling the HBase tables. Because HBase
tables can be large, they are broken up into partitions called
regions. Each region server handles one or more of these regions.
There's some more detail in the next paragraph:
Since the row key is sorted, it is easy to determine which region
server manages which key. ... Each row key belongs to a specific
region which is served by a region server. So based on the put or
delete’s key, an HBase client can locate a proper region server. At
first, it locates the address of the region server hosting the -ROOT-
region from the ZooKeeper quorum. From the root region server, the
client finds out the location of the region server hosting the -META-
region. From the meta region server, then we finally locate the
actual region server which serves the requested region. This is a
three-step process, so the region location is cached to avoid this
expensive series of operations.
From another Cloudera blog post, it looks like the exact format used to save the HBase on the file system keeps changing, but the above mechanism for row key lookups should be more or less consistent.
This mechanism is very, very similar to Google BigTable's lookup (you will find the details in Section 5.1 starting at the end of page 4 on the PDF), which uses a three-level heirarchy to query the row-key location: Chubby -> Root tablet -> METADATA tablets -> actual tablet
UPDATE: to answer the question about lookups inside a Region Server itself: I don't know for sure, but since the row keys are sorted, and HBase knows the start and end keys, I suspect it uses a binary search or interpolation search, both of which are really fast - log(n) and log(log(n)) respectively. I don't think HBase would ever need to scan rows from the start row key to the one that it needs to find, since searches on sorted keys is well-known problem that has several efficient solutions.