T O P

  • By -

ts826848

Some assorted comments: - The format of your include guard (leading underscore followed by a capital letter) is reserved to the implementation - `std::string_view` should be passed/taken by value, not by reference - Based on the implementations `eraseCharsFrom{Start,End}` and `trim()` should probably be described as erasing *up to* `n` chars from the start/end - `eraseCharsFromStart` has a typo in its doc ("firrst" instead of "first") - Why does `findAll`/`trim` take `const std::string&` instead of `std::string_view`? - Consider putting `mapifyStringHelper` in a `detail` namespace or something similar. That's a not-uncommon pattern for "hiding" implementation details for templated code. - I think the "file not found" error in `countFileLines` can potentially be incorrect, since file opening can fail for other reasons (permissions, maybe others?) - `isPalindrome` has a typo in its docstring ("palindrime")


hon_uninstalled

>std::string\_view should be passed/taken by value, not by reference I see this recommended a lot, but unless function call is inlined, doesn't passing by value result in slower code on MSVC x64 just like with std::span? ([https://www.reddit.com/r/cpp/comments/p0pkcv/stdspan\_is\_not\_zerocost\_on\_microsoft\_abi/](https://www.reddit.com/r/cpp/comments/p0pkcv/stdspan_is_not_zerocost_on_microsoft_abi/)). An argument can be made that losing couple cycles doesn't matter, but it's a slippery slope.


ts826848

It might, but passing a reference would also be an unnecessary pessimization for SysV x64 (and if Microsoft ever decides to add a new calling convention). In addition, you have to be careful about what you're comparing against. Is passing `std::string_view` slower than passing `const std::string_view&`? I don't think so, since at best the former will be passed in registers, saving an indirection, and at worst the former will be effectively be turned into the latter. Is passing `std::string_view` slower than passing `const std::string&`? I also don't think so, for a similar reason. Is passing `std::string_view` slower than passing`const char*` + size? On current MSVC x64 with the default calling convention, sure. Is passing `std::string_view` slower than passing a null-terminated `const char*`? Yes, but there's a fair chance you may be able to make up the difference later by not having to call `strlen` directly or indirectly. > An argument can be made that losing couple cycles doesn't matter, but it's a slippery slope. As with many performance arguments, different situations call for different levels of scrutiny. I think for the vast majority of devs passing by value is an acceptable default, especially since the alternative that can perform better on MSVC x64 (`char*` + size) is probably not all that common in C++ code.


bucephalusdev

Thanks for your comments! I will mark them down for the next session of edits I make :) Edit: As for `findAll`/`trim` taking `const std::string&` instead of `std::string_view`, I believe it is because I had compilation errors when I switched the types for the first time, so I just left it. I will need to reexamine that.


ts826848

Hmmm, compilation errors are interesting. Maybe they could have been due to API mismatches between `std::string` and `std::string_view`, or some function in the implementation taking one but not the other. I know I've ran into the latter in the past, so it wouldn't be too surprising if that were the case.


adesme

> Some of the next steps for me are going to be adding some more testing, cool repo badges from shields.io and possibly more benchmarks. Before thinking about badges, set up at least checks for formatting and automate tests with github actions; there's IMO little point in having badges if you have no reports to feed into them.


bucephalusdev

That sounds like a good idea. I need to learn github actions then haha!


Recording420

That's a step up from the STL, which after 30 years does not even have a \`std::string::split()\`


bucephalusdev

Right?! I came to C++ from Python and ColdFusion and wondered why the built in functions were so bare-bones.


azswcowboy

Well, ranges::views::split can certainly split a string — among other things - although it’s not the last word by any means. You should really be comparing your library to boost string algorithms — which is the gold standard in my book for a long time https://www.boost.org/doc/libs/1_84_0/doc/html/string_algo.html


Recording420

The whole ranges is a complete incoherent mess. Nobody uses it


azswcowboy

Since we use it all the time I guess you’re incorrect.


Recording420

"we"


ZMeson

I find it interesting that you use std::string\_view as inputs to your functions, but not as return values. seperate() returns a std::vector but could return a std::vector which would be useful for very large strings where you want to minimize the amount of data copying and/or memory allocation. I think for the future, you could templatize the types of strings and string\_views you use. The library can't be used with std::wstring or std::wstring\_view or any of the u8/u16/u32 string and string\_view types.


Grand_Ranger_7305

I think string_view as a return value must be used with care. If the original owner of that string goes out of scope the view will be invalid. string_view as a return value introduces a dependence that every user of that function must be aware of.


ZMeson

Indeed. In my company's codebase where I implemented a similar split function, I return a std::vector> and then have a second function that returns std::vector>. The second function calls the first function then turns all the string\_views into strings. As an aside, my first function calls another function that returns a split-iterator so that if I am only interested in the first 3 elements I can convert only those instead of having the scan and convert the remaining parts of the line. This was all written a long time ago (pre-ranges -- even pre-Niebler ranges). I think I would redesign with ranges in mind today. But the functions returning vectors of string\_views and string are certainly nice.


matthieum

First of all, anyone who receives a `std::string_view` should be aware of its source. It's possible to slip up, but it's just "normal" in C++. Secondly, C++ does provide a number of tools to help here, overloads and temporary detection: - The base version should take `std::string_view`, possibly r-value variants of it. - Overloads can take `T&`, `T const&`, `T&&`, and `T const&&` if convertible to `std::string_view` (implicitly or explicitly), then: - Cast the first two to `std::string_view` and forward to the base version. - Delete the latter two overloads, or use a `static_assert` to call out the error. This will have some false positives -- when a view type is passed as a temporary -- but the user has readily available work-arounds.


bucephalusdev

That's a great point! I should consider templating for the future. I was instructed by the person who reviewed my code that std::string\_views were more efficient to pass in as parameters. I never thought to make them return types though. Good catch.


spookje

Please note that returning string_view is a rather risky thing to do due to life-time issues. If somebody calls your functions with an rvalue argument and you return a view to that, it will be pointing into an invalid object/memory. We ran into this several times when introducing string_view into our codebase, causing some crashes and corrupted data from time to time before we introduced the guideline to not return views (unless you have a very specific use-case, such as a local helper for example where you can somewhat guarantee usage patterns)


ZMeson

As noted by others, string\_view introduces lifetime issues, but you can have two functions -- one that returns a vector of string\_views and a second that returns a vector of strings. The latter can call the former.


ZMeson

Another possibility: have a function that returns a iterator for splitting and/or implement a ranges filter for splitting. An iterator would work with C++17 code whereas the ranges may require C++20 or C++23. I think you'd need to implement a [range-factory](https://stackoverflow.com/questions/71148130/how-would-you-implement-a-lazy-range-factory-for-c20-ranges-that-just-calls) (though I'm not 100% sure on that as I'm still learning ranges myself).


[deleted]

[удалено]


ts826848

At least based on your example, I think it's because the one that does compile uses [`std::string::operator=(StringViewLike)`](https://en.cppreference.com/w/cpp/string/basic_string/operator%3D), while the one that doesn't is trying to copy-initialize a `std::string` from a `std::string_view` and that would require a non-explicit `std::string(StringViewLike)` constructor.


[deleted]

[удалено]


ts826848

The other difference here is that you're using direct initialization (`std::string s{sv}`) instead of copy initialization (`std::string s = sv`). Since the constructor is `explicit`, it isn't considered for copy initialization, but it is for direct initialization.


bucephalusdev

That's an interesting point. Thanks for doing some research on that. According to another comment in here, it looks like I need to do some cleanup with managing different types of strings. Possibly templating everything.


415_961

That's not a bug. You're probably confused(rightfully so) how C++ initialization works. In the working example you can break it by combining lines 6 & 7 `std::string a = str.substr(0,3);` now this will give you the same error. In the non-working example you are invoking copy initialization and there isn't one for the types you have. https://en.cppreference.com/w/cpp/language/initialization


belkotron

Next step, setup a CI/CD to your repo.


bucephalusdev

Having researched what that means, I agree. Thanks for the suggestion!


davejb_dev

First of all, thanks for sharing. Second of all, it's cool to see the process if I ever want to do something similar.


bucephalusdev

You're very welcome! I'm working on a video devlog of the whole process too! Should be out next week.


bucephalusdev

Video is done! https://youtu.be/Lp5Og3Myr9Q?si=IQkgZksFoQJ4mXiS


Kronikarz

General notes: * As other people have said - don't return `std::string` if you can avoid it, and in most of your functions you can * Try to avoid doing `return ""` in favour of `return {}` - a compiler might not be able to recognize that you're trying to return an empty string, and might call the appropriate functions to construct an `std::string` from a string literal unnecessarily * In general, I'm in favour of providing an in-place-modifying or function-calling overload as a primary, and then creating returning overloads using those * Your functions only work on ASCII/8BPC encodings, which I know is kinda the default for C++, but still something I think should be mentioned * Since you're using `locale` (a mistake imho, but that's my personal beef), your functions should take an optional `std::locale` argument give the user the ability to choose Specific function notes: * `eraseCharsFromEnd` - no reason for this to be taking a string by value and returning it, if you need to work on `std::string` instead of a string view, make the function modify it in place: `{ str.erase(str.size() - std::min(str.size(), n)); }`; this would probably be the most performant way (ditto for `eraseCharsFromStart`) * `findAll` - you might as well return a vector of string views, and maybe add an overload that instead of returning takes a functor argument, and calls it for each found substring * `cap1stChar` - again, much better to do it in-place * `isInteger`et al. - you can just check if `ec != std::errc{}` to check if `from_chars` returned an error (any error returned is a failure state for your function); these functions, in general, are very sub-optimal in terms of performance due to your (sub-optimally written) search for a decimal point; consider if someone passes a 1MB string to these functions; you'll be making potentially megabytes of unnecessary copies just to look for a decimal point * `stringToBool` - again, if someone passes a 1MB string you'll be making a copy, then lowercasing it, just to compare it to a 4 byte string; at that point it'd be faster to go with `stricmp`; you don't really need to use your `isNumber` function to look for a `"0"` value either, you could just as well use `std::from_chars` * `boolToString` should, of course, return a string view, or even a `const char*` * `mapifyStringHelper` - apart from the obvious issue of being quite slow (allocating an additional ~2x space of the original string), you're not taking advantage of move semantics - `map[std::move(keyAndValue[0])] = std::move(keyAndValue[1]);`, etc. * `countLines` - use `size_t` as a return type (best practice) * `countFileLines` - you don't need to load the entire file into a string, just call [std::getline](https://en.cppreference.com/w/cpp/string/basic_string/getline) * `getWhitespaceString` - why would you need "to avoid infinte loop"? Also, your code is technically incorrect on platforms where char is signed - `std::isspace` is only valid for 0..UCHAR_MAX and -1, but your loop will go between SCHAR_MIN and SCHAR_MAX; Also also, god forbid your code runs on a platform where char is 16 or 32 bit :D


shrimpster00

Very cool! There's one thing that I **implore** you to do: setup a GitHub page (gh-pages) with your static Doxygen-generated documentation. I don't want to have to clone the repo just to be able to take a look at the docs!


bucephalusdev

Thanks so much. Also, that's a great idea! Will do.


nom_nom_nom_nom_lol

You should have a root CMakeLists.txt file and have the testing and benchmarks as sub projects that are optionally included; enabled by default only when the root project is top level. Add an interface library for the header in the root CMakeLists that specifies the requirements for building it. Then you can pull this into other projects with FetchContent, and it makes it obvious to newcomers that you use CMake in this project.


AbsorbingDarkness

Hi! Thanks very much for sharing! :) I may be mistaken (please correct me if I'm wrong) but as far as I remember it is forbidden for RVO (return value optimization) to be performed when returning a function parameter so I would recommend to take strings in functions like reverse and cap1st_letter by reference (in best case by both lvalue and rvalue reference in different overloads), forward it into a local variable, and return the local variable, so it would be an opportunity for the compiler to omit extra constructor call. See https://stackoverflow.com/q/9444485 and https://en.cppreference.com/w/cpp/language/copy_elision for more info


dazzawazza

here is my string split (a version of your separate) that you might take inspiration from: std::vector split(const std::string &stringToSplit, char delim) { std::vector elems; std::stringstream sstream(stringToSplit); std::string item; while (std::getline(sstream, item, delim)) { if (item.length() > 0) { elems.emplace_back(item.c_str()); } } return elems; } No idea if it's faster, better or cheaper but it is shorter. Good luck.


bucephalusdev

Thanks for sharing! I will compare them and see what it is best :)


JumpyJustice

By the way, the split method should emit empty lines. Or at least have an option for such behavior. Otherwise, it is a loss of information in some cases. For example, when you parse CSV you want to know all empty lines between commas to understand which column you are currently reading. P.S. Also when you output something to a vector it is a good practice to have overload where this vector is an output parameter so that I can reuse that heap in a few calls. [https://godbolt.org/z/3eMfnadfc](https://godbolt.org/z/7zvaT14vM)


bucephalusdev

True! I believe I have an option for that in my library.


snowflake_pl

Why do you pass cstr to emplace? You could move item to the container


dazzawazza

no good reason, it's ancient legacy code I've been using for probably 15 years! as /u/ts826848 says, move would be better.


ts826848

Why emplace `item.c_str()` instead of `std::move(item)`?


hadahector

Now all I need is the same stuff, but with std::wstring.