feat: implement row and cell model classes#753
Conversation
| Returns cells sorted in Bigtable native order: | ||
| - Family lexicographically ascending | ||
| - Qualifier lexicographically ascending | ||
| - Qualifier ascending |
There was a problem hiding this comment.
I had in my notes that Qualifier should be sorted lexicographicaly, but I assume this is a mistake, since qualifier is bytes?
I implemented this by directly comparing the bytes values instead
There was a problem hiding this comment.
There was a problem hiding this comment.
Ok, so yeah it looks like it is comparing them as unsigned byte arrays, which I believe is the same operation as Python's byte comparison. I'll have to add some tests to make sure they are consistent though
| Returns cells sorted in Bigtable native order: | ||
| - Family lexicographically ascending | ||
| - Qualifier lexicographically ascending | ||
| - Qualifier ascending |
There was a problem hiding this comment.
| def keys(self): | ||
| """ | ||
| raise NotImplementedError | ||
| Returns a list of (family, qualifier) pairs associated with the cells |
There was a problem hiding this comment.
I would expect keys() to return the row key of the response, maybe there's a better name for this? Or maybe I misunderstand something?
There was a problem hiding this comment.
Yeah, so for convenience, I made RowResponse conform to the Sequence base type, so it can be treated as a standard sequence:
len(row_response)for cell in row_response:,first = row_response[0]sample = row_response[0:5]- etc
I also wanted to make it partially compatible with the Mapping base type, to make indexing convenient. In this context, the row_response can be treated as a key/value dictionary, where the keys are families or family/qualifier pairs, and the values are the associated cells:
cells = row_response["my_family"]cells = row_response["my-family, "my-qualifier"]
I added also implementations for Mapping's keys() values() and items() functions to make iteration easier:
for family,qualifier in row_response.keys():
cells = row_response[family,qualifier]
for cells in row_response.values():
print(cells)
for (family,qualifier), cells in row_response.items():
print(cells)
There was a problem hiding this comment.
So TL;DR: keys here refers to dictionary.keys(), to make iteration over stored data easier. But I can see how that may be confusing, and I'm ok with making changes here
There was a problem hiding this comment.
gotcha, thanks for the explanation! maybe we can leave the implementations for Mapping's keys() values() and items() functions part out for now? I can see how it's convenient, but maybe we can keep it simple at the first iteration?
There was a problem hiding this comment.
Yeah, sounds good. I changed keys() to get_column_components, and removed the others
| for cell in sorted(cells): | ||
| if cell.row_key != self.row_key: | ||
| raise ValueError( | ||
| f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})" |
There was a problem hiding this comment.
Our backend should never create rows that hit this, but end users could attempt to
And it doesn't hurt to include it as a sanity check either way
There was a problem hiding this comment.
as discussed, endusers should never construct these
Co-authored-by: Mattie Fu <mattiefu@google.com>
| cells: list[CellResponse] | ||
| | dict[tuple[family_id, qualifier], list[dict[str, Any]]], |
There was a problem hiding this comment.
I think there should only be one way to represent data in a response
There was a problem hiding this comment.
maybe make this internal as well
| self._cells_map: dict[ | ||
| family_id, dict[qualifier, list[CellResponse]] |
There was a problem hiding this comment.
I think a better data structure would be: OrderedDict[family, list[Cell]) and use bsearch to find a qualifier in the cell list
There was a problem hiding this comment.
Can you remind me the motivation for this?
| for cell in sorted(cells): | ||
| if cell.row_key != self.row_key: | ||
| raise ValueError( | ||
| f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})" |
There was a problem hiding this comment.
as discussed, endusers should never construct these
| this_ordering = ( | ||
| self.family, | ||
| self.column_qualifier, | ||
| -self.timestamp_micros, |
There was a problem hiding this comment.
please dont fight native ordering
There was a problem hiding this comment.
My understanding was that newer cells should come first? By default, older timestamps would come first.
That said, comparison is less relevant if we just use cells as they come from the backend though. If we don't have a way to determine proper ordering from the client side, maybe we should remove these comparison implementations entirely?
This PR implements the RowResponse and CellResponse classes, used to represent the data returned by read_rows queries
RowResponse implements Sequence, allowing it to be treated like a list of Cells (len, iteration, indexing, etc)
It also implements some properties of Mapping (allowing it to be indexed by
familyand(family, qualifier)keys, retrieving the list of keys, values, items, etc)Note: This is merging into a new v3 branch, not main. I plan on making all PRs related to this rewrite to v3 and then merging v3 into main when development is complete