Featured post
optimization - C++ String Interview Question -
i in c++ technical interview, given bit of simple string manipulation code, intended take string , return string comprised of first , last n-characters, , proceed correct bugs , make function efficient possible, came solution below, interviewer claimed there faster more optimal way:
original code:
std::string first_last_n(int n, std::string s) { std::string first_n = s.substr(0,n); std::string last_n = s.substr(s.size()-n-1,n); return first_n + last_n; }
my code:
bool first_last_n(const std::size_t& n, const std::string& s, std::string& r) { if (s.size() < n) return false; r.reserve(2 * n); r.resize(0); r.append(s.data(),s.data() + n); r.append(s.data() + (s.size() - n), s.data() + s.size()); return true; }
summary of changes:
changed interface take return string reference (assuming rvo , rvalues not available yet)
removed temporary strings being constructed via substr
passed input string const reference inorder around temporary instantiation of input
fixed off-by-1 error in last_n string
reduced number of times each character touched down once, or twice (in event of overlapping scenario)
placed check in event string s's size less n, returning false failure.
assuming native c++ allowed, there other way of doing above more efficiently or optimally?
note 1: original input string instance not modified.
note 2: solutions must pass following test case, otherwise not valid.
void test() { { std::string s = "0123456789"; std::string r = first_last_n(10,s); assert(r == "01234567890123456789"); } { std::string s = "0123456789abc0123456789"; std::string r = first_last_n(10,s); assert(r == "01234567890123456789"); } { std::string s = "1234321"; std::string r = first_last_n(5,s); assert(r == "1234334321"); } }
this implementation should fast:
inline std::string first_last_n(std::string::size_type n, const std::string& s) { n = std::min(n, s.size()); std::string ret; ret.reserve(2*n); ret.append(s.begin(), s.begin() + n); ret.append(s.end() - n, s.end()); return ret; }
when using gnu libstdc++, line declares & initializes ret
extremely fast because libstdc++ uses global "empty string" variable. thus, it's pointer copy. calls begin
, end
on s
fast because resolve const versions of begin
, end
, begin() const
, end() const
, internal representation of s
not "leaked". libstdc++, std::string::const_iterator
const char*
, pointer type , random access iterator. thus, when std::string::append<const char*>(const char*, const char*)
calls std::distance
obtain length of input range, pointer difference operation. also, std::string::append<const char*>(const char*, const char*)
results in memmove
. finally, reserve
operation ensures enough memory available return value.
edit: curious, here initialization of ret
in assembly output of mingw g++ 4.5.0:
movl $__znss4_rep20_s_empty_rep_storagee+12, (%ebx)
it's copying pointer global "empty representation".
edit2: okay. have tested 4 variants g++ 4.5.0 , visual c++ 16.00.30319.01:
variant 1 (the "c_str variant"):
inline std::string first_last_n(std::string::size_type n, const std::string& s) { std::string::size_type s_size = s.size(); n = std::min(n, s_size); std::string ret; ret.reserve(2*n); const char *s_cstr = s.c_str(), *s_cstr_end = s_cstr + s_size; ret.append(s_cstr, s_cstr + n); ret.append(s_cstr_end - n, s_cstr_end); return ret; }
variant 2 (the "data string" variant):
inline std::string first_last_n(std::string::size_type n, const std::string& s) { std::string::size_type s_size = s.size(); n = std::min(n, s_size); std::string ret; ret.reserve(2*n); const char *s_data = s.data(), *s_data_end = s_data + s_size; ret.append(s_data, s_data + n); ret.append(s_data_end - n, s_data_end); return ret; }
variant 3:
inline std::string first_last_n(std::string::size_type n, const std::string& s) { std::string::size_type s_size = s.size(); n = std::min(n, s_size); std::string ret(s); std::string::size_type d = s_size - n; return ret.replace(n, d, s, d, n); }
variant 4 (my original code):
inline std::string first_last_n(std::string::size_type n, const std::string& s) { n = std::min(n, s.size()); std::string ret; ret.reserve(2*n); ret.append(s.begin(), s.begin() + n); ret.append(s.end() - n, s.end()); return ret; }
the results g++ 4.5.0 are:
- variant 4 fastest
- variant 3 second (5% slower variant 4)
- variant 1 third (2% slower variant 3)
- variant 2 fourth (0.2% slower variant 1)
the results vc++ 16.00.30319.01 are:
- variant 1 fastest
- variant 2 second (3% slower variant 1)
- variant 4 third (4% slower variant 2)
- variant 3 fourth (17% slower variant 4)
unsurprisingly, variant fastest depends on compiler. however, not knowing compiler used think variant best because familiar style of c++, fastest when using g++, , not slower variants 1 or 2 when using vc++.
one thing interesting vc++ results using c_str
rather data
faster. perhaps why interviewer said there faster way implementation.
edit3:
actually, thought variant:
variant 5:
inline std::string first_last_n(std::string::size_type n, const std::string& s) { n = std::min(n, s.size()); std::string ret; ret.reserve(2*n); std::string::const_iterator s_begin = s.begin(), s_end = s.end(); ret.append(s_begin, s_begin + n); ret.append(s_end - n, s_end); return ret; }
it's variant 4 except begin , end iterators s
saved.
when variant 5 tested, beats out variant 2 (the data string variant) when using vc++:
- variant 1 fastest
- variant 5 second (1.6% slower variant 1)
- variant 2 third (1.4% slower variant 5)
- variant 4 third (4% slower variant 2)
- variant 3 fourth (17% slower variant 4)
- Get link
- X
- Other Apps
Comments
Post a Comment